INNOVESH

The written lesson, interview questions and practice are below — sign in to play the video lesson.

Retrieval-Augmented Generation is the workhorse pattern for grounding an LLM in your own knowledge. Most teams build the happy path and get burned by the retrieval half — that’s where RAG actually fails.

The pipeline

Chunk + embed

docs → vectors

Retrieve

top-k relevant chunks

Augment

stuff into the prompt

Generate

answer + citations

Where RAG actually breaks

Retrieval miss

Right answer exists but the wrong chunks were fetched. Garbage in → confident garbage out.

Chunking

Chunks too big (noise) or too small (lost context) wreck relevance.

Stale index

Source changed; embeddings didn’t. Answers from the past.

No grounding check

Model answers from training data, ignoring (or contradicting) the retrieved context.

✓

Evaluate retrieval separately

Measure retrieval (did we fetch the right chunks?) independently from generation (did we answer well from them?). Most “the LLM is dumb” bugs are actually retrieval misses.

ⓘ

Design for citations

Return the source chunks with the answer. Citations make the system verifiable, debuggable, and trustworthy — and turn a black box into something you can audit.

Takeaway

RAG = chunk→retrieve→augment→generate, and it fails mostly at retrieval. Evaluate retrieval separately, keep the index fresh, and return citations so the system is auditable.

← Lesson 1: Choosing the Pattern — Rule, ML, or LLM/AgentPrevious Lesson 3: Agent Architecture — Loops, Tools & GuardrailsNext →