RAG
RAG vs fine-tuning: eight document types in a law firm intake
Sort a law firm's intake into eight document types and the RAG-versus-fine-tuning question stops feeling theological. Here is the rubric we use.

It is eight on a Tuesday. The intake paralegal at a Rotterdam commercial firm has thirty-seven PDFs queued in a SharePoint folder. A draft engagement letter. A conflict memo from 2017. Two articles of the BW the partner asked to be quoted verbatim. A client's old NDA. Six emails from the prospect. A court filing template. A billing narrative the partner rejected because it "doesn't sound like us." She wants to drop the lot into an LLM and walk out.
The question on the partner's mind the next morning is not whether AI can read the stack. It is: do we fine-tune a model on twenty years of our work, or do we bolt a retrieval layer onto an off-the-shelf one?
The answer is sitting in the stack. Sort the documents into eight types and the question answers itself in about an hour.
Two techniques, two different questions
Fine-tuning teaches a model a pattern. RAG hands a model a document. They look similar from the outside, since both make the model sound smarter about your firm, but they solve different problems and they fail in different ways.
A fine-tune bakes style, structure, and recurring reasoning into the weights. It is the right tool when you want the model to sound like the firm without being told to. It is the wrong tool when the facts change every Monday. A thread on Hacker News last week about fine-tuning an LLM to write docs like it's 1995 is the cleanest illustration: a few thousand examples of voice produced a model that consistently wrote in the target register, with no prompt-engineering circus. Voice is what fine-tunes are for. OpenAI's fine-tuning guide is a reasonable place to ground the mental model if you have never run one.
Retrieval augmented generation does the opposite. The model stays general. You build an index over your firm's actual documents and the system pulls the relevant chunks into the prompt at runtime. RAG is the right tool when you need a verbatim quote, a current statute, a specific matter file, or a citation that must be checkable. Pinecone's overview covers the moving parts well.
Now the eight types.
Documents that want retrieval
Statutes, case law, and regulations
These are long, public, and must be cited exactly. A fine-tune will paraphrase. A paraphrased statute in a memo is a malpractice claim waiting for its file number. Index them, chunk by article, retrieve, quote. Nothing else.
Conflict-check memos and the party graph
A conflict check asks: have we ever acted against this counterparty, anywhere in the firm, in the last decade. That is a structured lookup against a database of parties, not a creative writing task. RAG with an actual SQL or graph query underneath, not vector similarity, beats every other approach. Vector search will quietly miss "Acme Holdings BV" when the new matter says "Acme Holding NV."
Client intake forms and matter facts
Every matter has a unique fact pattern. The model needs the facts at inference time, not in its weights. Drop the intake form into the context window. If you fine-tune on intake forms you will end up with a model that hallucinates a plausible-sounding client situation that does not exist.
Documents that want a fine-tune
Engagement letters
Engagement letters are 90% boilerplate that varies subtly by jurisdiction, practice group, and partner preference. The firm has hundreds of them. The structure is stable. The voice is the firm's. This is the textbook case for fine-tuning a small model on the corpus and letting it draft the first pass, leaving the partner to redline three clauses instead of twelve.
Internal precedent memos
The way your firm reasons through a non-compete dispute, or a Section 2:336 BW exit, is not in any textbook. It is in the heads of three senior associates and a stack of internal memos going back twelve years. That reasoning pattern is what a fine-tune captures well. The risk: if the senior associate was wrong in 2018, the fine-tune will be confidently wrong in 2026. Curate the training set.
Never fine-tune on documents that cite live statutes. The statute changes; the weights don't. You will ship a model that confidently quotes a repealed article from 2019 in a 2026 memo.
The hybrid documents
Court filings and pleadings
A pleading has firm voice, jurisdictional structure, and case-specific facts. Fine-tune the model on the firm's filing style. Retrieve the facts of this matter at runtime. Retrieve the current procedural rules separately. Three sources of signal, one output. The integration is most of the work.
Client correspondence
Tone is firm-specific. Status is matter-specific. A fine-tune handles the tone. RAG handles the status. The split lets you change a partner's email register without rebuilding the matter pipeline, and vice versa.
Time entries and billing narratives
This is the document type partners actually argue about. "0.4, review of draft SPA, call with K." is the firm's shorthand. Fine-tune for the shorthand. But the model has to know which SPA, which K, what call. Retrieve from the matter file. A fine-tune alone will invent a believable narrative for a call that never happened, and the client will spot it on the invoice.
A five-minute rubric
For each document type in your intake, ask three questions and write down the answer.
- Does the answer change between matters? If yes, you need retrieval.
- Does style matter more than fact accuracy? If yes, you can fine-tune.
- Will a paraphrase create legal exposure? If yes, retrieval and only retrieval.
Tally the answers across all eight types. If retrieval wins more than five of them, start with RAG. If fine-tune wins more than four, you have a strong house style and a stable corpus; start there. Most firms land on a hybrid, which is fine. The wrong move is to pick the technique first and ask which documents fit it.
Here is the decision in pseudo-code, the way we sketch it on a whiteboard before writing a line of agent code.
def route(doc_type, query, matter_id):
if doc_type in {"statute", "case_law", "intake_form", "conflict_party"}:
return rag(query, index=firm_index[doc_type])
if doc_type in {"engagement_letter", "precedent_memo"}:
return fine_tuned_model(query)
# Hybrid: pleadings, correspondence, billing
style = fine_tuned_model(query)
facts = rag(query, index=firm_index["matter"], filter={"id": matter_id})
return compose(style, facts)
The closing scene
The partner the next morning does not want an architecture diagram. He wants to see his engagement letter, drafted in his voice, citing the right article of the BW, with the prospect's name spelled correctly. That single output is three different decisions stacked. When we built the matter-routing agent for a Dutch litigation firm earlier this year, the surprise was how much of the work was sorting documents into the eight buckets above; the model picking was the easy half. The same sort sits behind every AI agent we ship for legal and operations teams.
Five minutes today: open your intake folder, count the document types, and write a single letter next to each (R for retrieve, F for fine-tune, H for hybrid). That list is the spec for whatever you build next.
Key takeaway
Fine-tuning teaches a model how the firm sounds. RAG tells it what's true today. A law firm needs both, separated cleanly by document type.
FAQ
When does fine-tuning actually beat RAG for a law firm?
When the corpus is stable, the voice matters more than the fact, and a paraphrase carries no legal risk. Engagement letters and internal precedent memos are the textbook cases.
Can we fine-tune a model on statutes?
No. Statutes change and model weights don't. You will ship a model that confidently quotes repealed articles. Keep statutes in a retrieval index where you can update them in minutes, not weeks.
How do we handle billing narratives?
Hybrid. Fine-tune for the firm's shorthand and tone. Retrieve the actual matter activity from your time-tracking system. A fine-tune alone will invent plausible work that never happened.
Is one base model enough for all eight document types?
Usually yes. The same base model can serve fine-tuned outputs and retrieval-augmented outputs by routing on document type at runtime. You separate the techniques, not the models.