A company deployed a knowledge assistant built on RAG (Retrieval-Augmented Generation). The demo was impressive—answered product questions instantly, cited documentation, reduced support ticket volume by 40%. Then someone asked about the return policy. The AI confidently stated customers had 90 days to return items. The actual policy: 30 days. The AI hallucinated a more generous policy. Customer service spent a week handling angry customers who tried to return items outside the window.
RAG systems promise to solve LLM hallucination by grounding answers in real documents. But badly designed RAG still hallucinates, just more subtly. Here's how to build RAG systems that actually deliver trustworthy answers.
What Makes RAG Fail
RAG failures follow predictable patterns:
Poor Document Retrieval
The foundation of RAG is retrieving relevant documents. If retrieval fails, everything fails. Common problems: Keyword mismatch—question uses different words than documents. Semantic gaps—embeddings don't capture meaning well. Stale content—retrieved documents are outdated. Missing content—relevant info doesn't exist in knowledge base.
Example: User asks "How do I reset my password?" System retrieves documents about "credential recovery" and "authentication troubleshooting"—technically related but not what the user needs. Answer mixes irrelevant information.
No Source Verification
LLMs are trained to generate plausible text. Given retrieved documents, they might: Use information from one document while citing another. Combine facts from multiple documents in ways that distort meaning. Fill gaps with plausible-sounding but incorrect information. Confidently present speculation as fact.
Without verifying every statement has a source, RAG systems hallucinate just like standalone LLMs.
Access Control Blindness
RAG systems typically have access to entire knowledge base. But users shouldn't. Documents might be: Department-specific, confidential, draft versions not yet approved, deprecated content that's outdated.
A RAG system that can't enforce access control will leak information to unauthorized users.
Context Window Limitations
Retrieved documents must fit in the LLM's context window. For long or many documents: Only partial content fits. Critical information gets truncated. Answer is based on incomplete picture.
Designing Reliable RAG
Trustworthy RAG systems require careful design:
1. Chunk Documents Intelligently
Break documents into retrievable chunks. Chunking strategy matters: Too small: lose context. Too large: retrieve irrelevant content.
Best practices: Chunk at natural boundaries (sections, paragraphs). Include metadata (document title, section headers). Overlap chunks slightly (preserve context at boundaries). Size chunks for your embedding model (typically 200-500 tokens).
Example: Don't chunk mid-paragraph. Include section header with each chunk so retrieved content has context.
2. Hybrid Retrieval
Don't rely only on semantic search. Combine: Semantic search (embeddings, vector similarity). Keyword search (BM25, traditional search). Metadata filters (date, author, document type).
This catches documents semantic search misses and improves precision.
3. Rerank Retrieved Documents
Initial retrieval gets candidates. Reranking refines: Use cross-encoder models that score query-document pairs. Consider recency (newer documents often more relevant). Apply business rules (prioritize official docs over drafts).
Return top-k after reranking, not top-k from initial retrieval.
4. Cite Every Claim
Prompt LLM to cite sources for every statement: Bad prompt: "Answer this question using the provided documents." Good prompt: "Answer the question using ONLY information from the provided documents. For EVERY statement, include a citation [1], [2], etc. If information isn't in the documents, say you don't know."
Verify citations actually support claims. If LLM makes claim without citation or with false citation, flag it.
5. Implement Access Control
Filter retrieved documents based on user permissions: Index documents with access metadata. Query retrieval with user context. Only return documents user can access.
This must happen before LLM sees documents. Can't un-leak information after LLM processes it.
6. Handle "Don't Know" Gracefully
When information isn't in knowledge base: Don't make up answers. Say "I don't have information about that in my knowledge base." Provide way to escalate (e.g., "Would you like to contact support?").
A system that admits ignorance is far more trustworthy than one that confidently lies.
7. Show Sources
Let users verify answers: Display source documents used. Link to original documents. Show relevant excerpts with citations highlighted.
Users should be able to check your work.
Evaluation and Monitoring
Build ongoing quality assurance:
Create Test Sets
Before deployment, create evaluation sets: Questions you expect users to ask. Known correct answers. Source documents that contain answers.
Measure: Retrieval accuracy (do you retrieve right documents?). Answer accuracy (is answer correct?). Citation accuracy (do citations support claims?).
Monitor in Production
Track: Answer confidence scores. Questions where no relevant documents found. Low user satisfaction ratings. Common questions that fail.
These signal where knowledge base has gaps or retrieval fails.
User Feedback Loops
Let users rate answers: Thumbs up/down. "This answered my question" vs. "This didn't help." Report incorrect information.
Analyze feedback to identify: Systematic errors. Content gaps. Retrieval failures.
Maintenance Is Not Optional
RAG systems degrade without maintenance:
Keep Knowledge Base Current
Establish processes to: Add new content (product launches, policy changes). Update existing content (corrections, clarifications). Archive outdated content (deprecated features, old policies).
Stale content is worse than no content—causes confident wrong answers.
Monitor Document Quality
Ensure source documents are: Well-structured (headings, sections). Clear and concise. Accurate and up-to-date. Free of contradictions.
Poorly written source documents = poor RAG answers.
Retrain Embeddings
As knowledge base evolves: Embedding model may need updates. Consider fine-tuning on your domain. Reindex documents periodically.
Review Failed Queries
Regularly examine: Questions that got low ratings. Questions where no documents retrieved. Questions that led to escalations.
Each failure points to improvement opportunity.
Real-World Example
A financial services company built RAG for regulatory compliance:
Initial Implementation (Failed): - Indexed 1000+ regulatory documents. Used basic semantic search. No citation verification. No access control. Result: Gave incorrect advice. Mixed regulations from different jurisdictions. Cited non-existent clauses. Legal team shut it down.
Redesigned System: - Chunked documents at regulation boundaries, preserved context. Hybrid search (semantic + keyword + metadata filters). Reranking by document authority and recency. Strict citation requirement—every claim cited. Verification step: does citation support claim? Access control by user role (consumer vs. advisor vs. analyst). Clear "I don't know" when uncertain.
Results: - Answer accuracy: 92% (verified against legal team review). Citation accuracy: 98%. Zero instances of citing wrong regulation. User satisfaction: 4.4/5. Legal team confident enough to allow customer-facing deployment.
Key Improvements: - Chunking at regulation boundaries preserved legal context. Hybrid search caught edge cases semantic search missed. Citation verification caught hallucinations. Access control prevented information leakage.
Common Mistakes
Mistake 1: Relying Solely on Semantic Search
Embeddings aren't perfect. Keywords matter. Combine approaches.
Mistake 2: No Citation Verification
Prompting for citations isn't enough. Verify claims match sources.
Mistake 3: Ignoring Access Control
Don't assume all users should see all documents. Implement permissions.
Mistake 4: Static Knowledge Base
Content becomes outdated. Build update processes from day one.
Mistake 5: No User Feedback
You can't improve what you don't measure. Collect ratings, analyze failures.
The Bottom Line
RAG doesn't automatically prevent hallucination. It provides tools to reduce hallucination: Grounding in real documents. Ability to cite sources. Mechanism for verification.
But those tools only work if you: Retrieve the right documents. Verify citations match claims. Enforce access control. Maintain knowledge base. Monitor and improve.
Done well, RAG systems provide trustworthy AI assistance. Done poorly, they're just hallucination machines with footnotes.


