RAG
RAG retrieval compared: notary deed-search at 80k pages
Three retrieval stacks sized against the same job: indexing 80,000 pages of Belgian deeds for a 19-person notary office, scored on cost, GDPR, and who handles reindex.

It is 19:30 in Leuven. A notary partner finishes restructuring the case-folder tree on the office NAS. What used to be sorted by year and act type is now grouped by client family. Twenty thousand folders moved. The RAG-powered deed-search agent the team uses every morning has no idea any of this happened. Tomorrow at 09:00 it will return citations that point to filenames that no longer exist.
This is the boring half of every RAG project. The retrieval layer you pick decides how that night ends.
We sized three RAG retrieval stacks for the same job: a 19-person Leuven notary office, roughly 80,000 historic pages (akten, statuten, hypotheekakten, fiscale aktes), one Dutch-and-French deed-search agent on top. The criteria were per-tenant cost at that page count, GDPR data residency, and who pages out of bed when a partner reorganises the folder taxonomy.
What a notary office actually needs from retrieval
The agent has to answer questions like welke akte regelt het vruchtgebruik op de gezinswoning van de familie X. That means retrieving the right clause out of a 60-page deed, citing the exact pagina, and being defensible if anyone asks why this clause was surfaced and not another.
Three constraints shape every decision below.
Data residency is hard. Notary acts contain bank details, family relationships, tax positions, sometimes medical context attached to a will. The Belgian Nationale Kamer van Notarissen treats this with the gravity a hospital reserves for patient records. "EU region" is not the same as "data sovereignty"; we wanted both the storage and the embedding-generation step to happen on EU soil.
Reindex cadence is not zero. Notary offices reorganise. Partners retire, new clerks impose their own structure, a digital archivist decides client-family beats year-and-type. The retrieval layer has to follow without an engineer in the loop every time.
There is no DevOps team. Nineteen people, one office manager who is also IT support. Anything that requires a midnight pager rotation is a non-starter.
Option 1: Anthropic hosted file-search
The simplest path. Upload PDFs through the Files API, attach them to a search tool, let the model retrieve. One bill, one vendor, no embedding pipeline to maintain.
What it costs at 80k pages: dominated by per-search and per-stored-token pricing rather than infrastructure. At our query volume (a few hundred searches a day across nineteen people) our model puts the monthly bill in the low hundreds of euros once storage is included. Not cheap, not painful.
Where it breaks for this client: data processing happens primarily on Anthropic-managed infrastructure under their commercial DPA. They are good operators with serious controls, but the question the bar will ask is "where physically are these acts being processed when a clerk types a query." That conversation is doable for a customer-support agent. For a notary firm whose entire business model is the sealed envelope, it is a fight we did not want to have.
Reindex is also manual. When the partner reshuffles folders, you re-upload with new metadata and remove the old objects. There is no bucket watcher. For an office without a developer on staff, that becomes a recurring billable engagement for someone like us, which the client correctly hated as a long-term shape.
Option 2: Cloudflare AutoRAG with R2 and Workers AI
Cloudflare AutoRAG wires three pieces together: R2 as the document bucket, Workers AI for embeddings, Vectorize as the index. You point it at a bucket, it reindexes when objects change, you query it through a single endpoint.
For a small EU office this is the boring-good answer. R2 storage at 80,000 PDFs is rounding error on the monthly bill. Embeddings are a one-time cost at ingestion plus a cheap re-embed when documents change. Queries are flat-rate cheap. Our model puts the all-in monthly cost in the low tens of euros at this scale, roughly an order of magnitude under Option 1.
Data residency is configurable. R2 has location hints, Workers AI runs at the edge with EU pinning available, and Cloudflare's DPA covers EU SCCs cleanly. For a Belgian notary that is a defensible posture, and Cloudflare's market position makes the procurement conversation much shorter than a smaller vendor would be.
The reindex story is where this option wins. If the firm's DMS can sync to R2 (most modern ones can, and we can wedge an rclone job onto the NAS for the ones that cannot), then a folder reorganisation propagates automatically. The partner moves the folders at 19:30, the sync runs at 20:00, AutoRAG detects the changed objects and reindexes the delta overnight. By 09:00 the agent is returning correct citations against the new tree.
The tradeoffs are real. You are locked into Cloudflare's choice of embedding models for now (their lineup is fine, but if a stronger Dutch-tuned encoder appears next year and you want it, you wait). And debugging a misranked retrieval against a managed pipeline is harder than against your own.
Option 3: Self-hosted LlamaIndex with Postgres pgvector
Full sovereignty, full control, full ops burden. A Hetzner box in Falkenstein, Postgres with pgvector, LlamaIndex orchestrating chunking and retrieval, an embedding model of your choice (we tend to reach for a small EU-hosted one for this kind of client).
Raw cost is the lowest of the three by a wide margin. A dedicated EX44-class machine with NVMe at Hetzner runs roughly €40 a month and handles this index with capacity to spare. There are no per-query fees. At ten times the page count the cost barely moves.
GDPR is the strongest of the three. Storage, embedding, retrieval, and query logs all stay on hardware the client can point at on a map. For a notary office whose internal compliance officer wants one paragraph of explanation, "rack 17 in Falkenstein, controlled by us, no third-party processing" is the easiest paragraph to write.
The cost the client actually pays is in human time. Someone has to keep Postgres patched, rotate backups, watch for index bloat, re-run embeddings when the model updates, and ship a sync job that watches the DMS for taxonomy changes. None of that is hard. All of it is recurring. For a nineteen-person firm with no in-house engineer, that "someone" is either us on retainer or nobody. "Nobody" is the failure mode where retrieval quality silently rots over twelve months until people stop trusting the agent.
Sovereignty is a posture, not a feature. A self-hosted stack with no one paying attention to it is worse than a managed stack with a real SRE behind it. Decide who owns the pager before you decide which box runs the index.
The scorecard
| Criterion | Anthropic file-search | Cloudflare AutoRAG | Self-hosted pgvector |
|---|---|---|---|
| Monthly cost at 80k pages | Low hundreds € | Low tens € | ~€40 infra + ops time |
| EU data residency | Partial, DPA-covered | Configurable, defensible | Full, on-soil |
| Reindex on taxonomy change | Manual re-upload | Automatic via bucket sync | Custom sync job |
| Vendor lock-in | High | Medium | Low |
| Who pages out of bed | Anthropic | Cloudflare | You |
What we picked, and why
AutoRAG. Not because it scored highest on every axis (pgvector wins on sovereignty and cost-at-scale, hosted search wins on simplicity-of-mind), but because the failure modes were the easiest to live with for this specific office.
The decisive question was the 19:30 partner. Whatever stack you pick, the day a senior partner reorganises the case-folder tree must end with the agent still working. Anthropic's stack requires us to come in and reupload. Self-hosted requires us to come in and run the sync. AutoRAG requires us to come in roughly never, because the bucket-watcher is the entire reindex pipeline.
Cost mattered less than it usually does. The agent is replacing an average of forty minutes a day of clerk time spent pawing through scanned PDFs. Even at the most expensive option the payback is measured in days. The variable that actually moves the decision is who owns the next folder reshuffle.
For small EU offices with no in-house engineering, pick the retrieval layer whose reindex story survives a folder reorganisation done by someone who did not tell you about it.
One cost-runaway note
You will have seen the recurring genre of story about an AI agent that runs up an unbounded bill, most recently one that tried to scan DN42 with no upper limit. RAG retrieval pipelines are tamer than autonomous network agents, but the principle is the same: any layer with per-query billing has to be wrapped in a hard rate limit on the agent side, scoped per user and per session. We cap the notary agent at 200 retrievals per clerk per day, alarm at 80%, and hard-stop at 100%. Cheap insurance against the day a misconfigured loop turns a €40 month into a €4,000 month.
What to do tomorrow
If you are sizing a retrieval layer for a small European office, run the same three-question filter before you write any code: where does the data physically sit, what happens when the document tree gets restructured next quarter, and who answers the phone at 22:00 when retrieval starts returning wrong citations. Cost is the last question, not the first.
When we built the deed-search agent for the Leuven notary office, the thing we kept hitting was that folder-reshuffle problem. We solved it by wiring a small rclone job from their DMS into R2 so the reindex story belonged to Cloudflare and not to us. That kind of pipeline-glue is most of what an honest AI agents build looks like once the demo is over.
Key takeaway
For a small EU office, pick the retrieval layer whose reindex story survives a folder reorganisation done by someone who did not tell you about it.
FAQ
Do I have to choose just one of these three?
No, but mixing managed and self-hosted retrieval doubles the ops surface for a small office. Pick one and own it. The decision should be driven by who handles reindex, not by stack purity.
Is Anthropic file-search workable for EU compliance at all?
Yes for many use cases, under their DPA and EU standard contractual clauses. For high-sensitivity sectors like notary or healthcare the compliance conversation gets harder, and EU-hosted alternatives are easier to defend.
What does 'reindex when a partner reorganises folders' actually involve?
Re-embedding any document whose path or metadata changed, removing stale vectors, and updating citation links so the agent cites filenames that still exist. Automatic with a bucket-watcher, manual with most file-API stacks.
Why not just use the embedding model from the LLM vendor for the self-hosted option?
You can. We tend to prefer an EU-hosted encoder when sovereignty is the whole reason you went self-hosted in the first place. Otherwise you reintroduce the cross-border processing question at the embedding step.