Voice agents
Voice agents in a notariskantoor: 920 weekly calls, one queue
A seven-partner Utrecht notariskantoor used to lose 40 minutes per partner per day on intake calls. The voice agent that replaced it handles 920 weekly aanvragen.

The morning queue at 08:14
A kandidaat-notaris at one of the seven partner desks in our client's Utrecht office opens her queue at 08:14. Forty-one calls came in overnight. The voice agent took every one of them. Nineteen are routine afspraak-aanvragen: koopakte, hypotheek, verklaring van erfrecht. Twenty-two sit in a separate column flagged orange. Each one is a Wwft-melding boven €15.000, waiting for a kandidaat-notaris to read the transcript before the file moves into Compromis.
Six months earlier that column did not exist. The calls did. They sat on a Post-it on the intake paralegal's monitor. This is what happened in between.
This is a case study, not a tutorial. The voice agent we built is a Twilio trunk, a fine-tuned ASR, a structured-output layer, and a routing rule. The interesting part is none of those. The interesting part is the eleven weeks we spent making a thirteen-year-old Dutch notarial database, a regulator with a one-working-day clock, and a partnership of seven cautious lawyers all agree to put a machine in front of the phone.
The office, before the agent
Seven partners. Four kandidaat-notarissen. Eleven paralegals. Roughly 4,200 active dossiers, split across koop, hypotheek, familie, and ondernemingsrecht. The phone rings about 184 times on a weekday morning. Half are status enquiries, a quarter are appointment requests, and the rest is what one partner called "the long tail of confusion about what a notary actually does."
The intake desk used to be one paralegal on rotation. By 11:00 she was 40 minutes behind. By 15:00 the rotation was three deep. Calls after 17:30 dropped to voicemail and were processed the next morning, which meant a Wwft-relevant transactie spotted on Wednesday evening was first flagged on Thursday at 10:00. That is already a working day late for the KNB Centraal Digitaal Repertorium window.
They had tried two things before us. A standard IVR menu, which annoyed clients and routed 60% of calls into the wrong queue. And a generic chatbot on the website, which the licence holder shut down after a kandidaat-notaris found it suggesting clauses she had never authorised. Voice was the only channel where the actual clients (older, often Dutch-speaking with a regional accent, sometimes calling from a koopwoning bezichtiging) would talk to a machine at all.
What an appointment request actually means
Before we wrote a line of code, we spent four mornings sitting next to the intake paralegal. An afspraak-aanvraag is not "book a slot." It is, in this office:
- Identify the caller against the hash we hold in Compromis.
- Identify the dossier-type (koopakte, hypotheek, testament) within the first ninety seconds.
- Decide whether the conversation needs a Wwft-melding entry.
- Check partner and kandidaat-notaris availability separately, because Dutch notaries cannot legally delegate certain acts.
- Write into Compromis, archive the audio in the Exchange 2017 dossier-archief, send a confirmation, and for any transactie boven €15.000 drop a structured row into a queue a kandidaat-notaris will read inside the working hour.
That last point is where the deadline comes from. The KNB Centraal Digitaal Repertorium expects certain registrations binnen één werkdag. Miss it and the office files a melding-laat. Three of those in a quarter and the dean calls. Our client had four in 2024.
Architecture, as plain as we could keep it
The voice agent sits in front of everything. A Twilio PSTN trunk lands on a Dutch-tuned ASR (Whisper-large-v3 fine-tuned on roughly 40 hours of in-house notary calls, recorded over six weeks with explicit caller consent and a retention policy). The structured-output layer is an LLM with a strict JSON schema and a refusal mode wired to a human handoff. Two backends matter: Compromis (thirteen years old, SQL behind a creaking web form) and a homegrown Exchange 2017 archief that the partners refuse to migrate because every dossier since 2017 sits in it.
Why fine-tune at all. Off-the-shelf Whisper transcribed "kandidaat-notaris" as "candidate notary" or stripped the hyphen in about 38% of calls. It transcribed "huwelijksvoorwaarden" correctly less than half the time, and the regional vowel in "testament" tripped it in roughly one call in twelve. Forty hours of in-domain audio took the word-error rate on the notarial vocabulary from 11.2% to 2.4%. That was the unit of work that mattered before any of the rest of the stack earned its keep.
We do not touch Compromis through a real API. There isn't one. We read from a read-replica of the SQL backend, and writes go through a synthetic intake-bot user that submits the same form the paralegal uses. The form has not changed since 2014. We treat that as a feature.
For the Exchange archief, audio gets stored by self-mailing the transcript and a base64 attachment to a shared mailbox keyed by dossier number. Crude. Reliable. The partners can search "jansen koopakte 2026" in Outlook and find the call within four seconds. We considered building a separate audio service. We decided the partners' ability to search their own archief from their own client mattered more than architectural purity.
The Wwft routing logic
This is the part that actually decides who looks at a transaction before it moves. The voice agent extracts a structured object from the call and runs a routing rule that looks like this:
# Returns one of: human_review, kandidaat_notaris_wwft, partner_intake, automated_book
def route_intake(call):
value = call.structured.transaction_value_eur # int | None
confidence = call.structured.confidence # 0.0 - 1.0
# confidence floor calibrated on a held-out set of 1,200 calls
if confidence < 0.78:
return "human_review"
if value is not None and value >= 15_000:
# Wwft trigger: a kandidaat-notaris must eyeball the transcript
# within one working hour, before the KNB CDR clock starts.
return "kandidaat_notaris_wwft"
if call.structured.dossier_type in {"testament", "huwelijksvoorwaarden"}:
return "partner_intake"
return "automated_book"
The confidence floor (0.78) is the single most important number in the system. Below it, no automated booking happens. The voice agent reads back what it heard, the caller confirms or corrects, and if the confirm fails twice in a row the call is transferred to the intake paralegal. About 6.4% of calls go that route. The partners are happier with that number than they were with the 100% paralegal load they had in May 2025.
The Wwft threshold is hard-coded in deterministic Python on a parsed integer, never in the model. In week three we learned why: an LLM will cheerfully reinterpret "above fifteen thousand" as "above fifteen hundred" or "above one hundred and fifty thousand" once a caller drops the cents in the wrong order. Numeric comparisons live outside the model. The model proposes a value; Python decides what to do with it.
The twenty-five second SLA
From the moment a caller utters a transactiebedrag above €15.000 to the moment a row appears in the kandidaat-notaris queue, we have twenty-five seconds. That budget breaks down roughly as follows:
- ~6s: streaming ASR catches the number and the model emits the structured field.
- ~3s: a second pass re-asks the caller to confirm the figure aloud.
- ~4s: the routing rule runs, the queue row is written, and the Compromis intake-bot user begins the form submission.
- ~5s: confirmation email goes out and the audio archive job starts.
- ~7s: slack for retries, network jitter, and the occasional Exchange hiccup.
The median routing time over the last 30 days is 17.4s. The 95th percentile is 24.1s. The one breach we logged in May was a Compromis read-replica that had fallen 90 seconds behind primary. We added a heartbeat check and a fail-open path that posts to the kandidaat-notaris queue first, Compromis second, audio archive third. Order matters when the deadline is regulatory.
What broke, and what we learned
The factor-100 misread. A caller said "vijftienduizend, vijfhonderd" (meaning €15,500). The ASR dropped the comma. The structured-output pass read it as €1,550,000. The routing rule did the right thing because the value was still above €15.000 and went to the kandidaat-notaris queue, but the confirmation email quoted the wrong number to the client. We now read transactiebedragen back to the caller digit by digit. Eight extra seconds per affected call. Zero misreads since.
The accent misclassification. A caller from Limburg said "testament" with a vowel the ASR transcribed as "statement," which the structured pass classified as the wrong dossier-type. We retrained the dossier classifier on a corpus of regional Dutch recordings we sourced from the partners' own old voicemails, with consent and a retention policy. Misclassification on regional accents fell from 4.1% to 0.9%.
The certificate rotation. Our PSTN provider rotated certificates without notice and dropped 11 minutes of calls one Tuesday evening. The lesson, unromantic: an agentic system is only as reliable as the dumbest infrastructure layer underneath it. Idempotent retries, structured logs, and a human dashboard that shows you when calls stop arriving (not just when they fail) are non-negotiable. Anthropic's published guidance on building effective agents hits the same point from a different angle: the failure modes that bite in production almost never sit inside the model.
Why the partners signed off
Two of the seven partners were openly skeptical for the entire scoping. Their objection was not technical. It was that clients would feel fobbed off onto a machine on a deeply personal phone call, a passing parent's testament, a divorce, and the office would lose trust that took decades to build. We agreed. The voice agent never handles testaments or huwelijksvoorwaarden end-to-end. It does the identification, confirms the dossier-type, and transfers. The PARTNER_INTAKE branch in the routing snippet exists because those two partners said no to anything else. We treat that branch as a contract: when a new dossier-type is added to the firm, both partners review whether it joins it.
Numbers, after twelve months
The figures we cleared with the office before publishing:
- 920 weekly afspraak-aanvragen processed end-to-end through the agent.
- 92.4% complete-the-booking rate without human handoff.
- 6.4% transferred to a paralegal on the agent's own confidence floor.
- 1.2% caller-initiated handoff ("kan ik gewoon iemand spreken").
- 17.4s median Wwft routing latency. 24.1s p95.
- 0 melding-laat filings with the KNB in the last five months. The 2024 baseline was four per year.
- The intake paralegal has been reassigned to dossier-completion work. The partners report a 31% drop in the 17:30 voicemail backlog.
If the regulatory clock is one working day, your routing budget is twenty-five seconds. Build for the deadline that bites, not for the median.
The smallest thing you can do today
If you run a regulated practice with a phone that still rings: spend tomorrow morning sitting next to whoever picks it up. Time every call from "hallo" to "tot ziens." Tag each one with the legal threshold it touches. You will end the morning with a routing matrix that no consultant could have invented for you. That matrix is the entire spec. When we built the voice agent for this Utrecht office, the unglamorous part was those four mornings on the intake bench. The model is interchangeable. The matrix is not.
Key takeaway
If the regulatory clock is one working day, your routing budget is twenty-five seconds. Build for the deadline that bites, not the median.
FAQ
How does the voice agent handle clients with strong regional Dutch accents?
We fine-tuned Whisper on roughly 40 hours of in-house notary calls covering several regional accents. Calls below a 0.78 confidence floor (about 6.4%) still route to a human.
What happens if Compromis is offline when a call lands?
The agent posts to the kandidaat-notaris queue first, Compromis second, and the audio archive third. The regulatory deadline gets met even if the legacy backend lags behind.
Why didn't you replace Compromis instead of integrating with it?
Replacing a thirteen-year-old notarial database with 4,200 live dossiers is a two-year project. Integrating with the SQL backend and the existing intake form took eleven weeks.
Is the call audio archived in line with AVG and KNB rules?
Yes. Audio is stored in the office's own Exchange archief, keyed by dossier number, with a retention policy aligned to KNB requirements. Callers are notified at the start of every call.