AI StrategyAdvanced7 min read

Hallucination Mitigation

Hallucination is when an LLM confidently produces output that is not supported by reality — fabricated citations, invented statistics, made-up product features, false legal precedent. Hallucination is not a bug to be patched but an inherent property of how generative models work: they sample from a probability distribution, and the most-probable token is not always the correct one. Mitigation is a systems-design problem, not a prompt-engineering trick. The mature mitigation stack has four layers: (1) Grounding via Retrieval-Augmented Generation (RAG) so the model cites source material. (2) Constrained outputs (structured JSON, tool use, function calling) that limit free-form fabrication. (3) Verification — programmatic checks, second-model judges, or human review. (4) Calibrated abstention — the model is taught to say 'I don't know' when confidence is low. Stack them; do not rely on any single layer.

Also known asLLM GroundingReducing AI HallucinationsRAG ReliabilityAI FactualityHallucination Rate

Challenge a friend Browse library

The Trap

The trap is believing 'we'll fix hallucinations with a better prompt.' You won't. No prompt eliminates hallucinations because the model has no truth oracle. The second trap is treating hallucinations as a uniform problem: a hallucinated word in a brainstorming session is harmless; a hallucinated dosage in a medical summary is malpractice. Tier your tolerance to the use case. The third trap is over-trusting RAG — if your retrieval returns garbage, the model will confidently cite garbage. RAG without retrieval evaluation is hallucination with extra steps.

What to Do

Build a hallucination control plane: (1) Map each AI output to a hallucination risk tier (informational vs. decisional vs. regulated). (2) For decisional/regulated outputs, REQUIRE grounded retrieval with cited spans the user can click. (3) Use structured outputs (JSON schemas) wherever possible to constrain free generation. (4) Add a verification layer: a deterministic check, a second-model judge, or human review proportional to risk. (5) Track your hallucination rate as a first-class metric — sample 100+ outputs weekly, measure factual accuracy, and set thresholds (e.g., <2% for legal/medical, <8% for marketing copy). (6) Train users to distrust unsourced AI output — design the UI to make sources mandatory and visible.

Formula

Hallucination Rate = (# of unsupported claims in output) / (Total claims in output) — measure on a held-out evaluation set, target varies by risk tier

In Practice

The 2023 Mata v. Avianca case is the most cited cautionary tale: New York lawyers used ChatGPT to research a personal-injury case and submitted a brief citing six legal precedents — all hallucinated. The cases looked perfectly formatted with realistic citations and quoted holdings, but none existed. The judge sanctioned the lawyers $5,000 and the case became a federal-bar-association teaching case. The mitigation lesson: use legal-research tools (Westlaw, LexisNexis Cocounsel, Harvey) that ground outputs in real case databases with verified citations — not generic LLMs without retrieval. The hallucination rate of grounded legal-research tools is dramatically lower precisely because every citation must resolve to a real document.

Pro Tips

01
Force the model to cite or abstain. The pattern 'answer with citations to the provided sources, or respond with NO_ANSWER if the sources do not contain the answer' eliminates 60-80% of hallucinations on grounded tasks. Couple this with a UI that hides outputs marked NO_ANSWER and your perceived hallucination rate drops further.
02
Use a second model as a judge (LLM-as-judge) to score factuality before responses ship. Have model B verify each factual claim from model A against the source documents, flagging unsupported claims. The cost is roughly 1.5x inference but the accuracy improvement on regulated workflows is significant.
03
Distinguish 'extractive' tasks (find this fact in this document) from 'generative' tasks (write a summary). Extractive tasks have measurable ground truth; generative tasks do not. Where business risk is high, restructure the workflow so the LLM is doing extraction with a constrained schema, not open-ended generation.

Myth vs Reality

Myth

“GPT-5/Claude/the next model will solve hallucinations”

Reality

Frontier-model hallucination rates have improved (from ~15% in GPT-3 era to 1-5% on grounded tasks with current models) but not eliminated. The rate plateau is fundamental to autoregressive sampling. Architectures change the failure modes — they do not remove them. Plan for hallucination as a permanent constraint of the technology, not a transitional problem.

Myth

“RAG eliminates hallucinations”

Reality

RAG reduces hallucinations significantly when the retrieval is high-quality and the prompt enforces grounded answering. But RAG can also introduce a new failure mode: the model confidently cites the wrong passage, or extrapolates beyond what the source supports. Stanford's RAGAs benchmarks show that hallucination rates in RAG systems range from 1% to 30% depending on retrieval quality and prompt design. RAG is necessary but not sufficient.

Try it

Run the numbers.

Pressure-test the concept against your own knowledge — answer the challenge or try the live scenario.

🧪

Knowledge Check

Your legal team wants to use a GenAI assistant to draft contract clauses citing relevant case law. They've asked you to ensure 'no hallucinated citations.' What is the FIRST control you implement?

Industry benchmarks

Is your number good?

Calibrate against real-world tiers. Use these ranges as targets — not absolutes.

Hallucination Rates by Mitigation Stack (Mature Production Systems)

Approximate ranges from Stanford RAGAs, Vectara HHEM, and Anthropic factuality benchmarks 2024

Frontier model only (no grounding)

5-15% on factual queries

RAG with quality retrieval

1-5%

RAG + structured output + verifier

0.2-1.5%

RAG + verifier + human review on flagged outputs

<0.1% user-facing

Source: https://github.com/vectara/hallucination-leaderboard

Real-world cases

Companies that lived this.

Verified narratives with the numbers that prove (or break) the concept.

⚖️

Mata v. Avianca (US District Court)

2023

failure

Two New York attorneys used ChatGPT to research a personal-injury case and submitted a court filing citing six precedents — all of which ChatGPT had fabricated. The cases had realistic captions, court designations, and quoted holdings, but none existed. Opposing counsel could not find them, the judge could not verify them, and the attorneys were sanctioned $5,000 each. The case became the legal industry's canonical hallucination cautionary tale.

Hallucinated Citations Filed

6 of 6

Detection Method

Opposing counsel and judge

Sanction

$5,000 per attorney

Mitigation in Use

None — generic LLM, no grounding

Generic LLMs without retrieval are unsuitable for any task where factuality is enforceable. Use purpose-built grounded systems (Westlaw, LexisNexis Cocounsel, Harvey) and verify every citation programmatically.

Source ↗

📚

Hypothetical: Enterprise Knowledge Assistant

2024

mixed

Hypothetical: A 15,000-employee professional services firm rolled out a GenAI knowledge assistant grounded on 8 years of internal project documents. Initial release used a generic prompt — hallucination rate measured at 11% on factuality samples, with consultants citing fabricated project precedents in client briefs. After 90 days, the team rebuilt the stack: tight RAG over a curated corpus, mandatory citation rendering, an LLM-as-judge verifier flagging unsupported claims, and a UI change forcing consultants to click each citation before it could be copied. Hallucination rate dropped to 0.6%. User trust — initially destroyed — recovered within 6 months.

Initial Hallucination Rate

11%

Post-Mitigation Rate

0.6%

Mitigation Stack

RAG + citations + verifier + UI controls

Trust Recovery Time

~6 months

Hallucination is a stack problem, not a model problem. The fix is layered controls plus UX that makes sources mandatory and visible. Once trust is destroyed, recovery takes months — invest in mitigation BEFORE launch.

Related concepts