RAG

RAG for a notariskantoor: citing every passage back to Wna

A kandidaat-notaris asks the system one question about a vruchtgebruik clause. The agent must produce the Wna-artikel or BW-bepaling for every sentence, or refuse.

Jacob Molkenboer· Founder · A Brand New Company· 20 Jun 2026· 10 min

Open oak index-card drawer with raised cream card, brass divider, folded ledger paper, green tab, red wax seal on ivory surface.

It is 22:47 on a Tuesday. A kandidaat-notaris in Haarlem is two paragraphs into a conceptakte for a splitsing in appartementsrechten and she cannot remember whether the modelakte she pulled from the KNB-bibliotheek still carries the 2024 wijziging on artikel 5:113 BW or the older versie. Fourteen tabs open. The akte must be passeerklaar by 10:30. She opens the RAG agent in the second monitor instead of asking the senior notaris at home.

That is the moment the RAG agent we built for a 23-person notariskantoor on the Spaarne has to earn its place. It gets one question, returns one answer, and every sentence in that answer carries a citation to either a Wna-artikel or a BW-bepaling. If it cannot cite, it refuses. The kandidaat-notaris does not get a polite "ik kan u helaas niet helpen". She gets the candidate passages the agent considered and a one-line reason it would not use them.

Across the kantoor that pattern fires 1,080 times a week. Roughly 60% of the volume touches BW Boek 4 (erfrecht), 25% touches Boek 3 and Boek 5, the rest is a long tail. This is the playbook for how we got there without breaking the tuchtcode.

A corpus that is two corpora

The maatschap handed us two sources of truth on day one. The KNB-bibliotheek of 26,400 modelakten — clean, versioned, every clause traceable to the KNB-werkgroep that signed off on it. And a 14-year archive of roughly 38,000 passed aktes in their custom Dias Notariaat dossier-systeem, a PHP/MySQL build from 2012 with a homegrown WYSIWYG and three rounds of "let's just add a column" schema decay.

The two corpora answer different questions. The modelakten tell you what a clause should look like in 2026 under current law. The dossier-archief tells you what this kantoor has actually been doing for fourteen years — including the awkward cases, the familiebedrijf-overdracht where two siblings disagreed, the levenstestament with the unusual volmacht. Mixing them in one retrieval pool destroys both signals.

So we split. Two indices, two embedding passes, two retrieval calls per question, separate score thresholds.

-- Pass 1: vetted Wna/BW passages for the claim-citation gate
SELECT id, text, artikel, lid, geldig_tot
FROM vetted_passages
WHERE corpus IN ('wna', 'bw')
  AND (geldig_tot IS NULL OR geldig_tot > CURRENT_DATE)
ORDER BY embedding <#> $query_embedding
LIMIT 12;

-- Pass 2: precedent retrieval for style, phrasing, prior-akte context
SELECT id, text, modelakte_ref, dossier_ref, passeerdatum
FROM kantoor_chunks
WHERE intrekkingsdatum IS NULL
ORDER BY embedding <#> $query_embedding
LIMIT 8;

Pass 1 is the only pool the citation gate is allowed to read from. Pass 2 informs the draft but never becomes a citation. That is the single most important architectural choice we made on the project. The kandidaat-notaris can see both pools in the UI, labelled distinctly, but only Pass 1 ever appears in the bronvermelding block of an answer.

Chunking on artikel, not on paragraph

The default RAG advice — 800-token sliding window with 100-token overlap — was wrong for the Wna and disastrous for Boek 4 BW. Erfrecht artikelen have leden that contradict each other if you read them out of order. Artikel 4:13 BW lid 1 sets up the wettelijke verdeling; lid 2 names the kinderen as schuldeisers van een geldvordering; lid 4 lets the langstlevende echtgenoot postpone that vordering. Retrieve lid 4 in isolation and the agent will confidently tell a kandidaat-notaris that the kinderen have no claim. They do, at lid 2.

So the unit of retrieval is the artikel. The unit of citation is the lid. Every chunk carries both.

def chunk_wetboek(text: str, corpus: str) -> list[Chunk]:
    # Retrieval unit: the whole artikel (so leden read in context).
    # Citation unit: the lid (so the bronvermelding is precise).
    for artikel in split_on_artikel(text):
        full = artikel.text
        for lid_no, lid_text in artikel.leden:
            yield Chunk(
                corpus=corpus,             # "wna" or "bw"
                artikel=artikel.number,    # e.g. "4:13"
                lid=lid_no,                # e.g. 2
                text=lid_text,             # the cited span
                parent_text=full,          # what the embedder sees
                geldig_van=artikel.geldig_van,
                geldig_tot=artikel.geldig_tot,
            )

The embedder sees the full artikel. The citation refers to the lid. The reranker, a bge-reranker-v2 multilingual checkpoint that sits on top of open-source BGE-M3 embeddings, sees the same full artikel. The LLM that drafts the answer sees only the lid that the gate accepted.

The KNB-modelakten are chunked differently. Each modelakte is split on the <clausule> tags the KNB ships in its XML export, with the parent-akte title kept as metadata. We do not embed the toelichting paragraphs separately. They pull retrieval toward the explanatory text and away from the operative clause. Toelichting lives in a sibling field the UI shows on hover, never in the retrieval pool.

The citation gate

The citation gate is the part of the system the maatschap insisted on before the eerste kandidaat-notaris touched it. The rule is simple: no claim in the answer leaves the server unless it carries a Wna or BW citation, and that citation passes three checks.

def vet_citation(claim: str, cite: Citation, corpus: VettedCorpus) -> Verdict:
    if cite.source not in ("Wna", "BW"):
        return Verdict.reject("source not in vetted set")

    passage = corpus.fetch(cite.source, cite.artikel, cite.lid)
    if passage is None:
        return Verdict.reject(
            f"{cite.source} art. {cite.artikel} lid {cite.lid} not in corpus"
        )

    if passage.geldig_tot is not None and passage.geldig_tot <= today():
        return Verdict.reject(
            f"passage withdrawn on {passage.geldig_tot.isoformat()}"
        )

    if entailment_score(claim, passage.text) < 0.72:
        return Verdict.reject("claim not entailed by cited passage")

    return Verdict.accept(passage)

Three checks: the artikel-lid exists, it is still in force on today's date (the Wna and BW are amended often enough that this is not optional), and the claim is actually entailed by the cited passage. The entailment check uses a small Dutch-tuned NLI model running on the kantoor's own hardware, deliberately not the same model that drafts the answer.

Warning

The first month in production, the drafting LLM produced citations like "art. 4:13 lid 2 BW" that were syntactically perfect and pointed at a lid that existed, but cited the wrong lid. The artikelnummer pattern is so regular the model could generate plausible references without ever opening the passage. The lid-level entailment check, not the artikel-level existence check, is what stops this. Do not skip it.

If any claim fails the gate, the whole answer is held back. The agent returns the original question, the retrieved candidates it tried to use, and the per-claim rejection reasons. The kandidaat-notaris reads four lines and decides whether to escalate to the notaris or rewrite the question. We log every rejection. Six months in, the rejection log is the single most useful training signal we have, better than upvotes, better than a feedback form.

Wrapping the Dias archief without migrating it

The Dias dossier-systeem is 14 years old, runs on PHP 7.4, and the schema has 211 tables. Three different developers built three different ways to store partijen: a JSON blob in one table, normalized rows in another, a third one that points at a deleted user table from 2016. Migrating it would have cost the project. So we did not migrate it.

Instead we built an ETL that runs nightly, reads a Dias MySQL replica, and emits a clean Parquet snapshot of (dossier_id, akte_type, passeerdatum, partij_rollen, clause_texts, akte_status). The PHP layer is never touched. The Dias UI keeps working. The secretariaat keeps doing what it has done since 2012. Our snapshot is downstream; if it breaks at 03:17, we fix it before 09:00 without anyone noticing.

def snapshot_dias(replica: MySQL, out: Path) -> None:
    # The schema is messy, so we coerce as we read.
    rows = replica.query("""
        SELECT d.id, d.akte_type, d.passeerdatum, d.status,
               d.partijen_json, k.tekst, k.clausule_type
        FROM dossiers d
        LEFT JOIN klauzules k ON k.dossier_id = d.id
        WHERE d.status IN ('gepasseerd', 'concept', 'ingetrokken')
          AND d.bijgewerkt_op > %(since)s
    """, since=last_snapshot())
    for batch in chunked(rows, 500):
        write_parquet(out / f"dias-{batch_id()}.parquet", coerce(batch))

The clause_texts are chunked, embedded, and pushed to Pass 2 of the retrieval pool. Ingetrokken aktes are kept but flagged. Kandidaat-notarissen need to see them in retrieval ("we tried this, it did not pass") even though they are not precedent. Concept-aktes from the last 30 days are excluded; the kantoor does not want a draft from yesterday morning to shape an answer today.

Logging for the tuchtcode

A notaris in the Netherlands is under KNB tuchtrecht, supervised by the Kamer voor het notariaat. The tuchtcode does not name AI specifically, but the underlying obligation is unambiguous: a notaris must be able to reconstruct the reasoning behind every clause in a passed akte, years later, in front of a tuchtrechter if it comes to that. A RAG agent that informs a conceptakte sits squarely inside that obligation.

So the system writes a content-addressed event for every question. Each event captures the question text, the retrieval results (with embedding model version and index timestamp), the gate verdicts, the final answer, the user, and a SHA-256 of the prompt that went to the drafting LLM. Events go to append-only object storage with a separate write key the application server cannot revoke.

The retention bit matters more than people expect. The kantoor's IT was retaining server logs for 90 days because that is the GDPR-comfortable default. For a system that informs conceptakten, 90 days is wrong. We anchored retention to the longer of the tuchtrechtelijke verjaringstermijn and the kantoor's archiefverordening: in practice, the life of the dossier plus three years.

One side effect: when a kandidaat-notaris asks "why did the agent say this in March?", we can replay the exact retrieval and the exact gate verdicts in seconds. That replay capacity has done more for adoption than any feature we shipped. The notarissen trust the system because they can audit it.

What we would change next time

Three things, in priority order.

We chunked the BW boek-by-boek on day one because the law's structure invited it. We should have layered rechtsgebied metadata on top from the start, so erfrecht questions never retrieve a personenrecht artikel that happens to embed-close. We added that metadata later, but the retrieval bias was already baked into the index and we paid for a re-embedding pass.

We undersized the rejection-log UI. Kandidaat-notarissen wanted to see, on one screen, every question they asked that week that the gate refused, sorted by reason. We shipped it as a hidden /audit page. It should have been the dashboard.

And we let the drafting LLM see the Pass 2 precedent pool too early. The first version handed it three retrieved precedent clauses alongside the vetted passages, and the model began copying clause language verbatim from a 2019 modelakte that was about to be withdrawn. Pass 2 is now summarized into a structured precedent-context object (akte_type, datum, status, one-line gist) and the literal clause text is only fetched after the gate has accepted the answer.

The smallest thing you can do today

Take the next five answers your team writes to a recurring question, a legal one, a process one, a pricing one. For each, write down the single source you would have to cite to defend the answer in front of a regulator or a customer. If you cannot point to one source, the answer is not yet citation-ready, and any RAG you build on top will inherit that gap. The corpus work comes before the model work.

When we built this RAG agent for the Haarlem kantoor, the part that took the longest was not the embeddings or the reranker; it was the citation gate and the Dias snapshot. We have since reused the same gate pattern for two other regulated-knowledge projects (a zorginstelling and a fiscalist), each time with the corpus boundaries redrawn. If the same shape fits your work, our AI agents page walks through the rest.

Key takeaway

A RAG agent for regulated work earns its place at the citation gate, not at the embedder. If you can't cite, you don't draft.

FAQ

Why split the corpus into vetted passages and precedent clauses?

The Wna and BW tell you what the law currently allows. The kantoor's past aktes tell you how it has handled awkward cases. Mixing them lets precedent language leak into citations, which is exactly what the tuchtcode forbids.

What happens when the citation gate rejects an answer?

The kandidaat-notaris sees the original question, the candidate passages the agent retrieved, and a one-line reason per claim. The draft path is never reached. The rejection is logged and becomes training signal.

Did you migrate the 14-year Dias Notariaat dossier-systeem?

No. We left Dias running and built a nightly ETL that reads a MySQL replica and writes a clean Parquet snapshot. The PHP layer is never touched. Migrating 211 tables of schema decay would have killed the project.

How long do you retain the agent's logs?

For the life of the dossier plus three years, anchored to the kantoor's archiefverordening and the tuchtrechtelijke verjaringstermijn, not the 90-day default IT was using. Replayability is the whole point.

ragknowledge baseai agentscase studyarchitectureoperations

Building something?

Start a project