Voice agents

Voice agents for a Dutch physio chain: the build playbook

A 24-person Maastricht physio chain was drowning in 1,820 weekly reschedule calls in Dutch and Limburgs. Here is the voice-agent build that pulled the front desk back above water.

Jacob Molkenboer· Founder · A Brand New Company· 28 Aug 2025· 9 min

Cream Bakelite phone receiver on ivory leather blotter, green ribbon bookmark in ledger, brass bell, linen napkin.

The first ring at 08:47

The receptionist at the Wyck branch picked up at 08:47 on a Tuesday. By 09:03 she had taken seven calls. Five were the same conversation. “Ik moet woensdag verzetten, kan vrijdag?” Two were a vergoeding question she couldn’t answer without opening Intramed, finding the cover sheet, and reading three lines of fine print to someone who was already late for work.

The chain has four locations across Maastricht and Heerlen. Twenty-four physios. One front desk per branch, sometimes two during the morning rush. Phone logs from the SIP trunk told us they were taking 1,820 inbound calls a week. About 63% were straight reschedules. 22% were billing or vergoeding questions. 11% were new bookings. The rest was a small but operationally lethal category: people whose pain had spiked overnight and who were trying to talk their way into a same-day slot.

Why voice, not a chat widget

The practice already had a Calendly-style web booker. It handled new bookings fine. It did almost nothing for the call volume, because the people calling were not the same population. The reschedulers were older. They had a relationship with a specific fysio. They wanted to be heard say “ik moet verzetten” by a Dutch voice, not by a chat input field on a website they had never visited.

A reschedule is also a worse fit for a chat agent than people assume. The caller usually does not know their next-available preference until they hear the options. A voice loop (“woensdag om half drie, of donderdag om elf uur?”) closes in under twenty seconds. A chat thread of the same exchange runs to four minutes.

Reading the EPD without an API you can trust

The chain runs Intramed. It has been the standard Dutch physio EPD for over a decade, and the install at this practice is twelve years old. There is an API. It is not the API you want.

Intramed exposes a SOAP service that is enough to read agendas and write back single appointment changes, but it does not give you the patient-side context a voice agent needs: which fysio they normally see, which insurer they are with, whether their vergoeding pool has been exhausted, whether there is a no-show flag. All of that lives in screens the front desk reads, not in the SOAP envelope.

We had two options. Add a screen-scraping layer over the Windows client. Or build a read-through cache that pulls overnight and stays close-to-fresh during the day. We picked the cache.

// Read-through cache, refreshed every 4 hours during opening.
// Writes go straight to Intramed SOAP; cache is invalidated on write.

type PatientCtx = {
  patient_id: string;
  primary_therapist_id: string;
  insurer: 'CZ' | 'VGZ' | 'Zilveren Kruis' | 'Menzis' | 'other';
  vergoeding_remaining_sessions: number | null;
  no_show_flag: boolean;
  last_seen_at: string; // ISO
};

async function getPatientCtx(phone: string): Promise<PatientCtx | null> {
  const cached = await cache.get(`ctx:${phone}`);
  if (cached && !staleBy(cached, '4h')) return cached;
  const fresh = await intramed.readPatientByPhone(phone);
  if (!fresh) return null;
  await cache.set(`ctx:${phone}`, fresh, '24h');
  return fresh;
}

The cache pattern keeps the voice path off the slow SOAP endpoint. A cached read is under 40ms. A live SOAP read is between 1.4 and 3.1 seconds, which is enough to break the conversational rhythm. Writes (reschedules) still go straight through. We do not cache writes.

Dutch and Limburgs in the same turn

The first thing we discovered in testing was that the off-the-shelf Dutch STT models could not hold up when a caller switched into Limburgs mid-sentence. “Ich höb mörge gein tied” is not “ik heb morgen geen tijd,” and a model trained on Standaardnederlands hears it as gibberish, then guesses.

We solved this two ways. First, we ran a small bilingual fine-tune on a Whisper-large variant using around 18 hours of Limburgs audio from the regional broadcaster L1, plus an internal corpus of consented call recordings (with the legal paperwork the chain’s DPO signed off on before any audio was kept). Second, we widened the agent’s tolerance for ambiguous transcriptions: instead of acting on the first STT pass, the agent confirms the intent back in plain Dutch (“u wilt uw afspraak van woensdag verzetten, klopt dat?”) and only proceeds on a yes.

Warning

If you fine-tune on real call audio, the GDPR paper trail is the project. We did not start any training until the practice had a signed verwerkersovereenkomst, a retention schedule, and a deletion procedure the praktijkmanager could run herself. Skip that and the regulator will end your pilot.

The 90-second code-rood path

The riskiest category of call was the smallest. Roughly 2% of weekly volume was a patient describing acute pain (“ik kan mijn arm niet optillen sinds gisteren”) who would otherwise sit in the reschedule queue while the agent cheerfully offered them a slot in eleven days.

We built a separate classifier on the live transcript that scores each turn for what we internally called code-rood signals: acute onset words, severity descriptors, red-flag anatomy (chest, jaw, sudden weakness on one side), and a small set of phrases that map to potentially serious pathology. When the score crosses a threshold, the agent does three things in sequence:

It stops the booking flow mid-turn and says, in calm Dutch, that it is going to put a senior fysio on the line.
It pages the on-call senior via the practice’s existing Signal group, with the call’s transcript-so-far attached.
It warm-transfers the SIP leg to whichever senior accepts first, with a 90-second hard ceiling. If no senior accepts, it transfers to the praktijkmanager and logs an incident.

The 90-second ceiling is not arbitrary. The seniors agreed in advance that anything longer than that defeats the point: the caller is already in distress, and silence at that length sounds like the line dropped. In the first six weeks of live operation, the agent triggered the code-rood path 11 times. Nine reached a senior within ceiling. Two went to the praktijkmanager, who got a fysio on within another minute. Zero misclassifications in the senior-reviewed sample, though we keep the false-negative risk on the dashboard and the praktijkmanager reads it weekly.

The vergoeding queue

Insurance reimbursement questions in Dutch physio are not solvable by an agent alone. The answer depends on the patient’s specific polis, which physical therapy chapter they fall under, how many sessions the GP referral covers, and whether they have already hit their eigen risico for the year. The agent can read most of this from the cache. It still cannot give the answer, because giving the wrong answer is a regulated harm.

So we built a queue. Every vergoeding question lands in a small web app the praktijkmanager opens once in the morning and once after lunch. The agent has already done the work: pulled the polis, read the chapter, fetched the remaining sessions, drafted a one-paragraph answer. The praktijkmanager either approves it (one click, the agent calls back), edits and approves, or escalates. Average handle time per item is 41 seconds. Before the agent shipped, the same conversations were taking the front desk between 4 and 7 minutes each.

The boring infrastructure

The stack is unromantic on purpose. A SIP trunk into a Twilio number that fronts the Maastricht main lines. A small Node service receives the media stream and runs the STT and TTS legs. The orchestration model runs in a Dutch hosting region. Function calls go out to the Intramed cache, the Signal pager, the queue API, and a Postgres write-log that records every state transition.

The write-log is where we paid attention. A voice agent without a state log is a black box during incident review, and incidents are when you actually learn what the agent does. There has been a recurring Hacker News observation that the only scalable delete in Postgres is DROP TABLE, which is half a joke and half an operational truth. We partitioned the call-log table by week from day one (see the PostgreSQL partitioning docs for the pattern), so the praktijkmanager’s quarterly cleanup (we keep call metadata 90 days, audio 30) is a DROP, not a bulk DELETE. It costs nothing and it keeps GDPR retention from becoming a Sunday-night cron disaster.

What we got wrong on the way

The “always be booking” failure

The first version of the agent was tuned to close. If a slot existed, it offered the slot. Patients started complaining that they felt rushed. We retuned to offer two options, always in the same format (“woensdag om half drie, of donderdag om elf uur”), and to wait a full beat before re-asking. Bookings did not drop. Satisfaction went up.

The accent-on-the-name problem

Fysio names like “Roel” and “Geert” came back clean, but the longer Limburgs surnames were a mess. The agent would invent a phoneticisation that made the caller laugh, which is fine once and bad twice. We pre-recorded the 24 fysio names in the voice of the agent and spliced them in, instead of letting the TTS sing them.

The Tuesday-morning thundering herd

The first Tuesday after a long weekend, the agent took 312 calls in 90 minutes. The cache held. The Intramed SOAP write endpoint did not. We added a write-queue with a 2-per-second ceiling against Intramed and a “we hebben uw verzoek genoteerd, u ontvangt een SMS-bevestiging zodra die in het systeem staat” fallback for the overflow. Both branches now succeed.

Where the numbers land

Six weeks in: 84% of reschedule calls are fully closed by the agent without any human touch. The remaining 16% transfer to the front desk with the context already gathered, so the call lasts an average of 47 seconds instead of the previous 3 minutes 20. The praktijkmanager queue closes 96% of vergoeding questions by end of same day. The front desk recovered roughly 31 hours of phone time per week across the four branches, which they spent on the in-person reception experience that the chain actually wants to invest in.

The voice agent is not the product. The product is the boundary you draw between what the agent handles, what queues for a human, and what escalates within 90 seconds. Draw that boundary before you write a line of code.

When we built the voice agent for the Maastricht chain, the harder half of the project was not the model. It was getting a twelve-year-old Intramed install, a regional dialect, a GDPR-sensitive audio pipeline, and a praktijkmanager’s daily workflow to agree on the same boundary. The technical lift was Twilio plus a cache plus a queue. The operational lift was three workshops with the front desk before any code ran.

If you want to start small on your own practice, the five-minute audit is this: pull one week of inbound call metadata, bucket the calls by reason (reschedule, vergoeding, new booking, pain-now), and ask the receptionist which bucket she would gladly never answer again. That single answer tells you which agent to build first.

Key takeaway

The voice agent is not the product. The boundary between what it handles, what queues for a human, and what escalates within 90 seconds is the product.

FAQ

How accurate does Dutch STT need to be for a healthcare voice agent?

Aim for over 95% word accuracy on standard Dutch and over 88% on regional dialects like Limburgs. Below that, confirm intent back in plain Dutch before acting on any booking change.

Can a voice agent legally trigger appointment changes in a Dutch physio EPD?

Yes, with a signed verwerkersovereenkomst, a documented retention schedule, and a fallback to a human for any escalation. The praktijkmanager remains the data controller.

Why not integrate with Intramed's SOAP API directly for every read?

Live SOAP reads run 1.4 to 3.1 seconds, which breaks conversational rhythm. A read-through cache refreshed every four hours keeps context lookups under 40ms while writes still go straight through.

How do you handle a caller describing acute pain?

A separate classifier scores each transcript turn for red-flag signals. When it crosses threshold, the agent warm-transfers to a senior physiotherapist within 90 seconds, with the transcript-so-far attached.

voice agentsai agentsautomationintegrationscase studyoperations

Building something?

Start a project