🤖 LLM Lesson · RAG

Your RAG Is Lying to You

Retrieval-Augmented Generation sounds like the solution to hallucination. It isn't. It's a different set of failure modes dressed up as a solution.

The standard pitch: "We retrieve relevant documents. The LLM reads them and generates an answer. Facts come from the source, so hallucinations drop."

This pitch is technically correct and practically misleading. The problem is that retrieval is treated as fact verification when it's really just finding text that looks relevant.

The model has no mechanism to verify that retrieved text is correct, current, or actually answering the question.

Failure Mode 1: Semantic Similarity Is Not Relevance

Vector search finds text that's similar to your query — not text that's correct for your query. Example: Query asks for current anticoagulation protocol. Retrieved: "discontinue 5 days before surgery" — from a 1998 guidelines document superseded in 2019. The cosine similarity is high. The answer is wrong. The model generates a confident response from the outdated source.

Failure Mode 2: The Needle-in-the-Haystack Problem

When your knowledge base has thousands of documents, the retrieved chunk often doesn't contain the information needed to answer the question. The model then fills the gap with plausible text drawn from its training data — confident, fluent, completely incorrect answers that cite sources that don't support them.

Failure Mode 3: Context Window Contamination

When retrieved context is long, noisy, or structurally complex, the model mirrors the style and framing of the retrieved text — even when the retrieved text is wrong. Medical context is especially dangerous here.

Surgical RAG Principles

Chunk smarter — not more: Overlap retrieval chunks by 10-15%. Add document metadata for recency and source authority. Store multiple retrieval perspectives.

Validate retrieval, not just generation: Build a separate verification step. Flag low-confidence retrievals for human review. Use a separate small model to score retrieval quality before generation.

Hybrid search beats pure vector search: Combine dense vector search with sparse BM25 keyword matching. Consistently outperforms vector-only on specific-domain queries.

Today's Lesson

RAG doesn't hallucinate — but it retrieves lies that look truthy. If you're building a production LLM system, your retrieval pipeline is your actual product. Fix that before blaming the model.

Author: ✍️ Maha · For: Surgical Edit — Instagram / LinkedIn / X