RAG
RAG inside an ethical wall: a patent-firm playbook
A Leiden patent firm. 240,000 documents. Two document systems. One ethical wall that cannot bleed. The RAG playbook that kept every clause on the right side.

It is 22:47 in Leiden. A partner is two days out from a preliminary injunction at the Rechtbank Den Haag. She knows the firm made the exact obviousness argument she needs in an EPO opposition in 2019. She does not remember the matter number. iManage Work search returns 312 hits. None of the top twenty are it. The document is also not in iManage Work, because the matter closed before the firm migrated off NetDocuments in 2018.
This is the brief we got. Build a retrieval agent that finds the argument. Across both document systems. Without ever showing a chunk that belongs to a matter the partner is not allowed to see. Twenty-one lawyers, 240,000 documents, one ethical wall that cannot bleed.
What follows is the playbook we used. We are not going to pretend it was clean.
The two-corpus problem
iManage Work holds everything since 2018, roughly 100,000 documents with a clean profile schema: matter ID, client ID, document type, author, sensitivity flag. The iManage REST API is straightforward and the profile metadata is reliable enough that we trusted it without a normalisation pass.
NetDocuments holds the previous thirteen years. About 140,000 documents, pulled through the NetDocuments REST API, with a metadata schema that drifted three times across that period. "Client" is sometimes the four-digit billing code, sometimes the abbreviated client name, sometimes blank because someone uploaded a PDF into a "scratch" cabinet in 2014. About 8% of the archive is scanned paper, OCR quality between fine and tragic.
We ingested both into a single corpus table with a discriminator column. The integration code is two thin connectors, both feeding the same chunker. The interesting work happens downstream.
create table chunks (
id bigserial primary key,
source text not null check (source in ('imanage','netdocs')),
external_id text not null,
matter_id text not null,
client_id text not null,
wall_group_id text not null,
doc_type text,
section_path text,
page_from int,
page_to int,
body text not null,
body_tsv tsvector generated always as (to_tsvector('dutch_legal', body)) stored,
embedding vector(1024) not null,
ingested_at timestamptz not null default now()
);
create index chunks_embedding_hnsw on chunks
using hnsw (embedding vector_cosine_ops)
with (m = 16, ef_construction = 64);
create index chunks_body_tsv on chunks using gin (body_tsv);
create index chunks_wall_grp on chunks (wall_group_id);
The two indexes feed two ranked candidate lists at query time. We fuse them with reciprocal rank fusion. We will get to that.
The ethical wall is a column, not a filter
The first version of this work treated the ethical wall as a post-retrieval filter. Retrieve the top fifty chunks, drop the ones the user is not allowed to see, hand back what remains. This is wrong for two reasons.
It is wrong on safety: a chunk from a forbidden matter sits in the candidate set long enough to be logged in our trace, to be embedded in a prompt buffer, to be inspected by a developer debugging a query. Even if it never reaches the lawyer, it has crossed the wall in every meaningful sense.
It is wrong on quality: if you retrieve fifty and lose forty to the wall, you have ten candidates left, and the model gets a thin context. The user thinks the firm has no precedent on the question. There is plenty. They just cannot see it.
So the wall is a query predicate. Every chunk carries a wall_group_id. Every user session resolves to a set of permitted wall groups, computed from the lawyer's matter assignments and the firm's conflict matrix. The retrieval SQL filters on the index, before ranking.
with allowed as (
select unnest($1::text[]) as wall_group_id
),
dense as (
select c.id, row_number() over (order by c.embedding <=> $2) as rnk
from chunks c
join allowed a using (wall_group_id)
order by c.embedding <=> $2
limit 200
),
sparse as (
select c.id, row_number() over (
order by ts_rank_cd(c.body_tsv, query) desc
) as rnk
from chunks c
join allowed a using (wall_group_id),
plainto_tsquery('dutch_legal', $3) query
where c.body_tsv @@ query
limit 200
)
select id, sum(1.0 / (60 + rnk)) as score
from (
select id, rnk from dense
union all
select id, rnk from sparse
) fused
group by id
order by score desc
limit 30;
The forbidden set is the source of truth, not the permitted set. If a matter is missing from the conflict matrix entirely, default-deny. We learned this when a new associate's first matter was not yet synced and the agent happily returned chunks from an opposing party's file to her.
Hybrid retrieval, because patent law is half text and half citation
Dense retrieval is excellent at "find me the argument that says the priority date does not save this claim because the prior art reads on every element." It is mediocre at "find me everything that cites Article 56 EPC paragraph 2." Patent litigation runs on both.
The hybrid setup is pgvector for the dense side and Postgres native tsvector full-text search for the sparse side. We considered a dedicated BM25 extension (ParadeDB's pg_search is the current good option) but ts_rank_cd with a custom dictionary was enough at this corpus size, and we did not want a second extension in the dependency tree.
The custom dictionary matters. We taught Postgres that "art. 56 EPC", "Article 56 EPC", and "EPC art. 56" are the same token. We added a thesaurus mapping common Dutch and English patent vocabulary ("uitvinding" / "invention", "stand van de techniek" / "prior art"). We did not try to be clever about claim numbers; we kept them as plain tokens because lawyers search for them verbatim.
For the dense side, we used a multilingual embedding model fine-tuned on roughly 3,000 query / relevant-passage pairs collected from the firm's last year of partner searches. The fine-tune lifted nDCG@10 by 11 points over the off-the-shelf checkpoint. It cost a long weekend and one rented GPU.
Chunking, the part nobody writes about
Naive chunking on a patent corpus is a disaster. A 400-word fixed window splits a claim chart between claim 7 and claim 8, splits an expert report mid-sentence on the obviousness analysis, and splits an EPO opposition brief at a random paragraph break.
We built a per-doc-type chunker.
- Claim charts chunk by row. Each claim element gets one chunk with the claim number, the alleged corresponding element in the accused product, and the cited evidence. Section path becomes
Claim 7 / element [b]. - EPO and EPC briefs chunk by argument heading. We parse the heading hierarchy first and emit one chunk per leaf section, up to 800 tokens. Long sections split at paragraph boundaries.
- Depositions and witness statements chunk by Q-and-A pair, with a 200-token sliding window for context.
- Expert reports chunk by numbered paragraph because experts always number their paragraphs and the partners cite them by number.
- Everything else (correspondence, memos, scanned exhibits) falls back to 600-token semantic chunks with 80-token overlap.
The chunker is 340 lines of Python. It is the single highest-leverage piece of code in the project. We have re-chunked twice. We will re-chunk again.
Identity, sessions, and the audit trail
Every query is bound to a lawyer's session, and every session resolves to a permitted-wall-group set at the moment the question is asked, not when the session was created. This matters because matter assignments change daily and we do not want a partner walking off a matter at 14:00 to keep seeing its chunks at 14:05.
The conflict matrix itself is a graph in Postgres. Nodes are clients, matters, and parties. Edges encode "represents," "is opposing party in," and the explicit wall instructions from the conflicts partner. A wall_group_id is a deterministic hash over the cluster of nodes the firm has chosen to isolate. When the conflicts partner decrees a new wall, a new wall_group_id appears and the affected chunks are re-tagged within minutes by a small background worker.
Every retrieval writes a row to an immutable audit log: who asked, what they asked, which chunks were returned, which wall groups were permitted at that moment, which model and prompt version. The log is the artefact a compliance officer reads when something goes wrong. It is also the artefact the Nederlandse Orde van Advocaten will eventually want to read; the Verordening op de advocatuur does not yet have a clean line about retrieval-augmented systems, but the existing rules on confidentiality and conflict already imply one.
We do not let the model do the wall enforcement. The model never sees a forbidden chunk. There is no clever prompt that says "do not mention these documents." The wall is in the SQL.
What we would do differently next time
We indexed the full archive in week one. We should not have. About 60% of the value lives in active matters and the last three years of closed matters. Starting with that subset would have moved the firm from "no answer" to "good answer" two weeks sooner, at one-fifth of the embedding cost.
We OCR'd every scanned exhibit in the first ingestion pass. Most of the scans are not what lawyers search for. A second-pass OCR triggered by zero-result queries would have caught the long tail at a fraction of the cost.
We waited too long to put a small evaluation harness in front of the team. The partners have strong opinions about retrieval quality and they are usually right. The day we shipped a one-screen "rate these ten results" tool, the feedback loop tightened by a factor of ten. Three weeks of partner ratings produced a relevance set we are still using to evaluate every change to the retrieval stack.
The smallest thing you can do today
If you are sitting on a multi-system document estate and considering a RAG agent, do not start with the embedding model. Start with one SQL query against your matter database that returns, for any given lawyer, the wall groups they are allowed to see. If that query is fast and correct, the rest is engineering. If it is slow or wrong, you have a conflict-management problem, not an AI problem, and no amount of retrieval cleverness will fix it.
When we built this AI agent for the Leiden firm, the part that took the longest was not the hybrid index or the chunker. It was getting wall-group resolution down to a 12-millisecond query that the partners trusted enough to put their names on. Once that was solid, the agent shipped in five weeks.
Key takeaway
The ethical wall is a SQL predicate at retrieval time, not a filter after the fact. If it lives in the prompt or the post-processor, it is already too late.
FAQ
Why hybrid retrieval instead of dense alone?
Dense embeddings are weak at exact references. Patent work runs on exact references: article numbers, claim numbers, statute citations. BM25 catches those verbatim. The two ranked lists fuse with reciprocal rank fusion.
How is the ethical wall enforced?
Every chunk carries a wall_group_id. Every session resolves to a permitted set computed live from the firm's conflict matrix. The retrieval SQL filters on the index before ranking, so forbidden chunks never reach the model or the trace.
Why Postgres pgvector instead of a dedicated vector database?
The firm already ran Postgres. Adding pgvector kept the wall predicate, the metadata join, and the dense index in one transactional store. At 240k chunks the HNSW index sits comfortably in memory on a modest box.
How do you keep the conflict matrix up to date?
A small background worker watches the conflicts system. When a wall instruction changes, affected chunks are re-tagged within minutes. Permitted-wall-group sets resolve at query time, never at session start.
What about hallucinations in legal answers?
The agent never answers without citing the chunks it used, and every answer renders the source document, page, and section path next to the claim. Lawyers verify before quoting. We treat the agent as a research aide, not an author.