Voice agents

Voice agents in Dutch home care: an Apeldoorn case study

On a Tuesday at 7:42 the voicemail count is 84 and the coordinator has not had coffee yet. This is the voice agent we built to handle that inbox, end to end.

Jacob Molkenboer· Founder · A Brand New Company· 16 Apr 2025· 10 min

Black bakelite phone receiver off-hook on ivory linen, coiled cord, chartreuse ribbon bookmark, brass paperweight, index card.

The 7:42 inbox

On Tuesday morning at 7:42 the coordinator at a 42-person home-care cooperative in Apeldoorn opens her shared phone inbox. There are 84 voicemails from the night. Most are family members of clients. Twenty-three are about medication. Four are in Turkish. Two are about a fall. One is a man who just wants to confirm that the nurse who comes at 11:00 is the same one as yesterday.

She has 50 minutes before the morning standup. She also has her own caseload of 14 home visits today.

This is the shift a voice agent was asked to make smaller.

Why a voice agent and not a chatbot

The first conversation with the cooperative's board lasted forty minutes. They had been pitched a chatbot twice before, by two different vendors. Both pitches died on the same sentence: "the families won't use it."

They were right. The families in question are mostly between 58 and 81. They call. They leave messages. A handful of the Turkish-speaking families use WhatsApp voice notes. None of them are going to type a question into a web form at 22:30 about whether Mevrouw De Boer can take her metoprolol with her diuretic.

A voice agent meets people where they already are: on the phone. It picks up. It understands the question. It logs the answer somewhere a human can audit it in the morning. That is the entire product.

The 90-second contract with Nedap Ons

The cooperative runs on Nedap Ons, which is the de facto electronic dossier for home-care organisations in the Netherlands. Anything the agent does has to land in Nedap Ons, in the right client's dossier, in the right structured field, within 90 seconds. That number was the coordinator's. It was the longest delay she would tolerate before she would call a family back herself.

90 seconds is not a technical constraint. It is a trust constraint. If the summary takes three minutes to appear in the dossier, the team stops trusting the system, and they start calling families back the old way "just in case." Then the agent becomes overhead, not relief.

We hit the SLA with a four-step pipeline:

phone → STT (Dutch + Turkish) → intent + entity extraction → Nedap Ons API write
   ~6s        ~8s                       ~3s                          ~2s

Each step has a knob and a failure mode. The STT runs on a self-hosted Whisper variant, batched per call, with the audio chunked at sentence boundaries rather than fixed intervals. Fixed-interval chunking cost us 4 to 9 seconds on every call because the model had to wait out the chunk. Sentence-boundary detection cut that to ~6 seconds median. The intent layer runs on a small classifier with a hard list of 41 medication names that the cooperative actually prescribes, plus the staff roster and the standard visit times. Constraining the vocabulary at this layer is what gets the false-write rate low enough to use at all.

The remaining 71 seconds is buffer. We use it. Roughly 4% of calls bounce back through a second extraction pass because the first one returned low confidence on the medication name. Confidence below 0.86 triggers a re-prompt or a clarification turn ("did you say metoprolol or metronidazol?"), which adds 12 to 18 seconds but cuts the false-write rate by an order of magnitude.

The Nedap Ons write itself is two operations: a transcript-attached note on the client's timeline, and a structured field update if the call carried an actionable entity (a medication name, a visit time, a recorded fall). The structured update is what makes the call searchable later. Free-text transcripts in a dossier are useful at 8:00 the next morning and forgotten by 8:00 the following Monday. A flagged_medication field with a timestamp is what the medication review on Friday actually queries.

That is the SLA the team cares about. Not "how fast does the AI respond," but "how long before I can see in my EHR that this call happened."

Routing for Wkkgz

The Wet kwaliteit, klachten en geschillen zorg (Wkkgz) is the Dutch law that governs quality, complaints and disputes in healthcare. For our purposes it does one important thing: it says that certain categories of family communication must be reviewable by a qualified care professional, and that complaints have a formal handling track with statutory deadlines.

The agent cannot answer those. We do not want it to.

So the second job of the pipeline, running in parallel with the dossier write, is a classifier that looks at the transcript and decides whether the call contains:

a complaint, even a soft one ("ze was twee uur te laat")
a clinical question that needs a registered nurse's judgement
a safeguarding signal (any mention of a fall, confusion, refused medication, or aggression)
a request to change the care plan

If any of those fire, the agent stops being the answerer. It becomes a router. The call summary lands in the dossier, but it also pages the wijkverpleegkundige on call via a separate channel (in our case, a Teams webhook into a "Wkkgz" channel, with a 15-minute SLA timer that the on-call nurse has to acknowledge).

The agent never tries to talk a family member out of a complaint. It does not say "I understand your frustration." It says, in Dutch: "I will make sure a nurse calls you back today. Is the number I called from the right one?"

Warning

If you let a voice agent answer complaint-shaped calls in healthcare, you are not building automation. You are building a liability. Route, do not resolve.

Turkish is not a translation problem

About 11% of the cooperative's clients are first- or second-generation Turkish-Dutch. The families speak a mix. The grandmother speaks Turkish. The daughter who actually makes the calls speaks Dutch with Turkish loan words and switches mid-sentence. The son in the background speaks Dutch fluently and Turkish for emotional emphasis.

We tried two approaches.

The first was to detect language at the start of the call and route to a Turkish-only flow. It failed within a week. Half of the Turkish-Dutch families code-switch in the same sentence ("hij heeft zijn ilaç niet ingenomen"). A monolingual flow either missed the medication name or asked the family to repeat themselves, which they found patronising.

The second approach, which is what we run now, is a single pipeline that handles mixed Dutch-Turkish input end to end. The speech-to-text model is a Whisper variant fine-tuned on Dutch healthcare audio with Turkish medical vocabulary added. The intent layer is language-agnostic. The reply is generated in whichever language the caller used last in their utterance.

We also keep a separate text-to-speech voice for each language with a deliberate accent match. The Turkish voice is not Anatolian Turkish. It is Turkish as it is spoken in the Netherlands by the generation that arrived as gastarbeiders and the children who grew up here. The families noticed in the first week. Two of them asked who recorded the voice.

This is the change that moved Turkish-speaking family satisfaction from "they call back" to "they don't have to."

What the agent does not do

This is the slide we keep showing the team during onboarding, because it is the slide that gets the agent trusted:

It does not give medication advice. It logs the question and pages the nurse.
It does not confirm visit times unilaterally. Visit times come from the planning module of Nedap Ons, and the agent reads them, but the source of truth is the planner.
It does not handle new client intake. New families talk to a human.
It does not stay on the line during a fall or a confused-elderly-person call. It triggers the safeguarding flag and warm-transfers to the on-call number.

Warm transfer here means the agent stays on the line until a human picks up. If nobody picks up within 45 seconds, the agent escalates to a second number, then a third. Every step is logged with the timestamp the cooperative needs for its audit trail. In healthcare, the trail matters more than the latency.

The point of listing what the agent does not do is that it is the only honest way to describe what it does. "Voice agent for home care" is too broad. "Voice agent that handles routine family callbacks about medication, visit logistics, and care plan questions, and routes everything else to a human in under 90 seconds" is the product.

The news cycle this week is full of stories about AI agents that did things they were never supposed to do, including one that took unauthorised actions inside a Fedora environment. The lesson for healthcare is not "do not deploy agents." It is the opposite: deploy them, but write the deny-list before the allow-list, and make the deny-list non-negotiable in code, not in the prompt.

Numbers after five months

We turned the agent on in January, after eight weeks of shadow-mode operation in which the agent transcribed and classified every call but did not respond. The coordinator reviewed a sample of 30 transcripts a day and corrected the classifier. Most of the errors in that period were "this is a complaint" being misread as "this is a clinical question." By week six the classifier agreed with the coordinator on 94% of cases, which was the threshold the board set for going live.

As of last week:

1,640 calls handled per week on average (up from a soft launch of 380/week)
71% closed by the agent without human intervention
23% routed to the on-call nurse via the Wkkgz channel
6% escalated to the coordinator for non-clinical reasons (mostly billing)
Median time from call end to dossier write: 47 seconds
95th-percentile time: 81 seconds (under our 90-second contract)
Coordinator inbox at 7:42: 11 voicemails on average, down from 84

The coordinator now does morning standup with her coffee.

We are not going to publish a percentage for "family satisfaction" because we do not have a clean way to measure it yet. What we do have is that the cooperative renewed the contract in May and added a second site.

The build, in one diagram

The whole system is six boxes:

[Phone (KPN)]
    |
    v
[Voice gateway / SIP]
    |
    v
[STT (NL + TR mixed)]
    |
    v
[Intent + entity layer] -----> [Wkkgz classifier] --> [Teams webhook to on-call nurse]
    |
    v
[Reply generator (NL or TR)]
    |
    v
[Nedap Ons API write]

There is nothing here that requires a research budget. The interesting work was not in the model layer. It was in the 90-second SLA, the Wkkgz routing logic, and the decision to treat Dutch-Turkish code-switching as a single language rather than two.

When we built the voice agent for the Apeldoorn cooperative, the thing we kept running into was the gap between what a voice agent can plausibly say and what a healthcare team will tolerate it saying. We ended up solving it by writing the "does not do" list before the "does" list, and treating the Wkkgz router as the most important component in the stack. That is the same shape of work that goes into our other AI agents.

The smallest thing you could do today, if you run a care team or a similar after-hours inbox: pull last week's voicemails into a spreadsheet, tag each one with "answer," "route," or "complaint," and count. Most teams discover that the first column is two thirds of the volume. That is the part the agent should take. The rest is still a human's job, and that is the point.

Key takeaway

In healthcare voice agents, write the does-not-do list before the does list, and put the deny-list in code, not in the prompt.

FAQ

Why a voice agent instead of a chatbot for home-care families?

Family members between 58 and 81 call, they don't type. A voice agent meets them on the channel they already use. Chatbot pilots at the same cooperative had failed twice.

What is the 90-second SLA and why does it matter?

It is the maximum time between a family call ending and the summary appearing in Nedap Ons. Past 90 seconds the team stops trusting the agent and starts calling families back manually.

How does the agent handle complaints under Wkkgz?

It does not. A parallel classifier detects complaints, safeguarding signals and clinical questions, and pages the on-call district nurse via a Teams webhook with a 15-minute acknowledgement timer.

How is Turkish handled when families code-switch with Dutch?

A single pipeline handles mixed Dutch-Turkish input end to end. A Whisper variant fine-tuned on Dutch healthcare audio with Turkish medical vocabulary; reply in whichever language the caller used last.

Can the agent change a client's care plan or confirm visit times?

No. Visit times are read from the Nedap Ons planner but the planner is the source of truth. Care plan changes and new-client intake always go to a human.

voice agentsai agentsautomationcase studyintegrationsworkflow

Building something?

Start a project