iTechNotes

I Scribble My Tech Thoughts Here.

RAG

RAG isn’t dead – It Just Grew Up

Retrieval-Augmented Generation · A Field Guide

RAG isn’t dead — it just grew up.

One retrieval recipe was never going to feed every problem. Here are 12 distinct ways teams connect AI to their own knowledge with a real-world job each one is actually good at.

There’s a recurring claim that “RAG is dead” now that models have bigger memories and longer context windows. In practice, the opposite is happening: RAG is splitting into specialised forms. The question for a product team was never “should we use RAG?” It’s “which kind of retrieval fits the problem in front of us?”

Think of it like databases. Nobody asks whether to “use a database.” They ask whether the job needs a relational one, a search index, a graph, or a cache. RAG is heading the same way. A customer-support assistant, a legal research tool, and a live fraud monitor all need to pull in outside knowledge — but the way they pull it is completely different.

The one-line version: RAG = give the AI a way to look things up before it answers, so it stays accurate and current instead of guessing from memory. The 12 types below are just different lookup strategies for different jobs.
Foundations — the retrieval core
01

Standard RAG

“Find the most relevant documents, then answer using them.”

This is the original and still the most common pattern. Your documents are broken into chunks and converted into “embeddings” — a way of storing meaning so the system can find passages that are about the same thing as the question, even when the wording differs. The AI retrieves the closest few chunks and writes its answer grounded in them, which cuts down on confident-but-wrong “hallucinations.”

Better accuracy Fewer made-up answers Vector search · LangChain
Where it fitsAn internal “ask the handbook” assistant. An employee types “how many sick days do I get?” and the bot answers from the actual HR policy PDF — not from whatever the model vaguely remembers. This also covers open-domain Q&A, the classic job RAG was built for.
↑ Back to index
02

Graph RAG

“Don’t just find facts — follow how they connect.”

Standard RAG treats each chunk as an island. Graph RAG first organises knowledge into a network of entities and relationships — people, products, companies, and the links between them. When a question needs reasoning across several hops (“which suppliers are affected if this factory shuts down?”), it can trace the connections instead of hoping one paragraph happens to mention everything.

Multi-hop reasoning Richer context Neo4j · knowledge graphs
Where it fitsA pharma research assistant asked “what’s the chain from this gene to this side effect?” needs to walk gene → protein → drug → trial, not retrieve one lucky paragraph. (This absorbs what’s sometimes called “knowledge-enhanced” RAG — same idea: structured knowledge feeding retrieval.)
↑ Back to index
03

Hybrid RAG

“Match the exact words AND the underlying meaning.”

Meaning-based search is great for “find me something similar,” but it can fumble exact terms — a part number, an error code, a person’s name. Keyword search nails those but misses paraphrases. Hybrid RAG runs both and blends the results, so you get precision on the literal stuff and recall on the conceptual stuff.

Higher recall Catches exact terms Search index + vector (FAISS)
Where it fitsAn e-commerce support bot. A customer pastes order code “INV-90412” and also asks “why was I double charged?” — one needs an exact match, the other needs meaning. Hybrid handles both in a single query.
↑ Back to index
04

Multi-Modal RAG

“Retrieve from pictures and diagrams, not just words.”

A lot of real knowledge lives in images: product photos, scanned invoices, X-rays, architecture diagrams, screenshots. Multi-modal RAG can index and retrieve across these, so a question can be answered using a chart or a photo rather than only the surrounding text.

Visual + text answers Richer responses CLIP · vision models
Where it fitsAn insurance claims tool where a customer uploads a photo of car damage. The assistant retrieves similar past claims and the relevant policy text together to estimate coverage — reading the image, not just the form.
↑ Back to index
Smarter retrieval — when one lookup isn’t enough
05

Streaming RAG

“Answer from data that’s changing by the second.”

Most RAG assumes a fixed pile of documents. Streaming RAG plugs into live feeds — transactions, sensor readings, news, log events — so answers reflect what’s true right now, not what was indexed last night. The emphasis is on low latency: get fresh data in and out fast.

Real-time freshness Low latency Kafka · Kinesis
Where it fitsA fraud-monitoring assistant for a payments team. When an analyst asks “is this transaction pattern suspicious?”, it pulls the last few minutes of live activity — a yesterday-old snapshot would be useless.
↑ Back to index
06

Recursive / Multi-Step RAG

“Look something up, learn from it, then look up the next thing.”

Hard questions can’t be answered in one search. Recursive RAG retrieves, reads what it found, realises what’s still missing, and searches again — in stages — building toward a complete answer. It trades a little speed for noticeably better reasoning on complex questions.

Stronger reasoning Handles complex questions LangChain orchestration
Where it fitsA market-research assistant asked “should we enter the Vietnam market?” first pulls market size, then regulations, then competitors — each step informed by the last — instead of forcing one shallow search to carry everything.
↑ Back to index
07

Self-RAG

“Draft an answer, then critique and fix it before sending.”

Self-RAG adds a reflection loop. After retrieving and drafting, the system asks itself: is this actually supported by the sources? Is anything missing or contradicted? If the answer is shaky, it retrieves more or rewrites — a built-in quality check that catches weak answers before the user sees them.

Self-correcting Higher trust Reflection · human-in-the-loop
Where it fitsA medical information assistant for clinicians. Before stating a drug interaction, it verifies the claim against the retrieved guideline and flags low confidence — exactly the kind of high-stakes answer you don’t want guessed.
↑ Back to index
08

HyDE RAG

“Write a fake answer first — and use it to find the real one.”

HyDE (Hypothetical Document Embeddings) is a clever trick. Short questions make for weak searches. So the system first drafts a guess at what a good answer might look like, then searches using that richer draft. The guess doesn’t need to be correct — it just casts a wider, smarter net than the bare question would, improving what gets retrieved.

Better recall Helps vague questions Custom embeddings
Where it fitsA research assistant where users type two-word queries like “battery degradation.” HyDE drafts a plausible paragraph on the topic, then finds papers matching that fuller picture — far better than searching two loose words.
↑ Back to index
Autonomy, memory & the enterprise edge
09

Agentic RAG

“Let the AI decide what to look up, which tool to use, and when.”

Instead of a fixed retrieve-then-answer pipeline, an agent plans. It can choose between sources, call tools (search the web, query a database, run a calculation), and chain several actions to complete a task. Think of it as RAG that can act, not just fetch. The trade-off is more moving parts to test and supervise.

Handles complex tasks Uses tools autonomously LangChain Agents
Where it fitsA B2B sales assistant asked to “prep me for the Acme call.” It decides on its own to pull the CRM history, fetch recent news on Acme, and check open support tickets — then assembles a briefing. No human scripted those steps.
↑ Back to index
10

Memory-Augmented RAG

“Remember the conversation and the person, not just the documents.”

Standard retrieval is stateless — every question starts cold. Memory-augmented RAG keeps an external store of past interactions and user context, so the assistant stays consistent across a long conversation and personalises to who’s asking. This is also the answer to “context-aware” retrieval: it carries the thread instead of dropping it.

Continuity Personalisation Redis · Pinecone
Where it fitsA customer-support agent handling a multi-turn chat. The customer mentioned their plan tier ten messages ago — memory-augmented RAG still has it, so it doesn’t ask again or contradict itself. (At scale, this is also where conversation costs balloon, since every turn carries the growing history.)
↑ Back to index
11

Federated RAG

“Search across data sources without moving the data.”

Sometimes you can’t pile all your knowledge into one index — it’s spread across departments, regions, or partner companies, often for privacy or legal reasons. Federated RAG queries each source where it lives and combines the results, so sensitive data stays put. It’s the privacy-and-governance-friendly flavour of RAG.

Data stays in place Privacy & security Federated learning tools
Where it fitsA hospital network assistant that answers across several hospitals’ records without ever copying patient data into a central pool — each site is queried in place, satisfying regulators and patients alike.
↑ Back to index
12

Domain-Specific RAG

“Tuned to speak one industry’s language fluently.”

A general assistant doesn’t know that “consideration” means something specific in contract law, or how a radiology report is structured. Domain-specific RAG is built around a particular field’s vocabulary, document formats, and rules — trading breadth for trustworthiness where the details really matter.

High relevance Trustworthy in-field Specialised models (legal, health)
Where it fitsA legal research assistant that understands jurisdictions, precedent, and clause structure — so when a lawyer asks about a non-compete, it retrieves the right case law instead of a generic web summary.
↑ Back to index

So, which one do you need?

Almost certainly more than one. Real systems combine these — a support assistant might be Hybrid and Memory-Augmented and Self-correcting all at once. The 12 aren’t competing products; they’re ingredients.

The leadership takeaway is simple: don’t ask your team “are we using RAG?” Ask “what does this problem need to retrieve, from where, how fresh, and how carefully checked?” The answer to that question is your architecture.