Video lesson coming soon
We're filming this one. The full written lesson below is ready to study right now.
Retrieval-Augmented Generation is the workhorse pattern for grounding an LLM in your own knowledge. Most teams build the happy path and get burned by the retrieval half — that’s where RAG actually fails.
The pipeline
Chunk + embed
docs → vectors
Retrieve
top-k relevant chunks
Augment
stuff into the prompt
Generate
answer + citations
Where RAG actually breaks
Retrieval miss
Right answer exists but the wrong chunks were fetched. Garbage in → confident garbage out.
Chunking
Chunks too big (noise) or too small (lost context) wreck relevance.
Stale index
Source changed; embeddings didn’t. Answers from the past.
No grounding check
Model answers from training data, ignoring (or contradicting) the retrieved context.
✓
Evaluate retrieval separately
Measure retrieval (did we fetch the right chunks?) independently from generation (did we answer well from them?). Most “the LLM is dumb” bugs are actually retrieval misses.
ⓘ
Design for citations
Return the source chunks with the answer. Citations make the system verifiable, debuggable, and trustworthy — and turn a black box into something you can audit.
Takeaway
RAG = chunk→retrieve→augment→generate, and it fails mostly at retrieval. Evaluate retrieval separately, keep the index fresh, and return citations so the system is auditable.