1. Introduction

March 4, 2026

Hello! Welcome to my senior project titled “Mechanistic Remedies to LLM Hallucination.” In this post, I’ll be introducing the topic of my research, my goals for the next few months, and my progress so far.

Intro

Generative AI

Generative AI refers to a subfield of artificial intelligence that focuses on the creation of new content, especially text, images, code, or music, through learning the patterns from massive datasets.

NanoBanana’s generation to the previous prompt

I will be focusing on large language models (LLMs) in my project. LLMs are supposed to understand, process, and generate human-like text after a training process. During training, the model analyzes vast datasets using deep learning. LLMs power the text-based aspect of popular chatbots like Gemini, Claude, and ChatGPT.

Hallucination in LLMs

If you’ve interacted with or read anything about artificial intelligence, you’d probably know that AI is not 100% trustworthy. An AI model in its essence is a convoluted web of interconnected numbers. Chatbots display this phenomenon the most clearly; they will often give falsified information in response to a prompt. In one case, the chatbot doesn’t know the answer to the question, yet it refuses to confess to the user. “Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty” (Kalai). Researchers have argued that this type of hallucination is innate, arising naturally because the way LLMs are trained and evaluated. On the other hand, the chatbot could be simply intentionally lying.

Regardless, misinformation spread by AI is a very serious issue in the current age where almost everyone is depending on it, especially the general public. According to the CollegeBoard, the percentage of high school students who report using GenAI tools for schoolwork is growing, increasing from 79% to 84% between January and May 2025. These students make up the future of society, and consistent misinformation can affect their mindset or skillset needed in their future professions. AI is a powerful tool for academia, giving researchers information they need concisely and quickly. However, in December 2025, GPTZero’s Hallucination check tool found over 50 hallucinated citations in submissions to the prestigious AI conference ICLR. Professional paper submissions were citing prior work that didn’t even exist, which is a problem for building trust. These fake citations were both undetected by the authors themselves and multiple rounds of peer reviews, showing how sneaky hallucinations can be. Below I myself tried to use AI for paper searching. The first paper listed has the wrong author. The second one is displaying the publisher instead of the author, which is an incorrect format.

I prompted Gemini to give me some papers related to AI hallucination.

Yet hope should not be lost! The internal representations of LLMs, that is the many numbers making up these AI systems, have been shown to provide signal of when hallucination and lying with intent occurs. There also exist techniques to reduce hallucination altogether, such as improving prompting or slowing down the “thinking” process. In my next blog posts, I will be diving into such techniques into further depth.

Statement of Purpose

The main goal of my project is to create a framework that is better at mitigating LLM hallucination and detecting it. I will be combining the methods traditionally used to target either mitigation or detection individually. From this project, I’ll gain experience dealing with LLM’s internal representations and explore the processes behind LLMs during inference as well as getting more familiar with AI research’s landscape and tools in general.

Progress

So far, I’ve just been working on replicating results from existing hallucination and detection methodologies. I’ve been cloning repositories from GitHub and just experimenting with them! I’ll report next week on my findings. I’ve also been getting more familiar with the Python libraries I’ll be using in my project.

Thanks for reading, and looking forward to the rest of this week!

View more of Anna D.'s posts.

Comments

rinishag2026 says

March 6, 2026 at 7:53 pm

Hi Anna! Wonderful blog post! I’ve been looking to learn about the internal structure of LLMs, so I look forward to staying up-to-date on your project.

I was hoping you could share a bit more on why LLMs could “intentionally lie.” In what situation would it choose to do that, and has it necessarily been “trained” to do so?

Log in to Reply
Leo L. says

March 10, 2026 at 4:09 am

Hello Anna! I am eager to see more about the various hallucinations of AI as well as how we are able to reduce or detect them.

One question I have is, are there specific benchmarks or scales than you plan to use to evaluate how effectively hallucination has been reduced or how accurately it has been detected?

Log in to Reply