Voice agents

Voice agents in healthcare: a Wkkgz-safe escalation playbook

On a Tuesday at 17:48, the clinic's last receptionist clocks out. The voicemail queue still has 38 calls. By Wednesday morning the orthopedic surgeon owes someone a callback.

Jacob Molkenboer· Founder · A Brand New Company· 17 Apr 2026· 9 min

Cream Bakelite phone receiver off-hook on green felt, ivory ledger with chartreuse ribbon, brass bell, red wax seal.

On a Tuesday at 17:48, the last receptionist at a 24-person orthopedic clinic in Maastricht clocks out. The voicemail queue still has 38 calls. Most of them are nothing: patients asking whether they can shower over the scar yet, or whether the new fysiotherapeut slot at 09:20 is in the Tongersestraat or the Brusselseweg building. One of them, statistically, is a temperature of 39.1°C six days after a total knee arthroplasty. By morning that call is twelve hours old, and the Wkkgz incidentmeldingsplicht 24-hour clock has been ticking the entire time.

The clinic hired us in late 2025 to make that scenario impossible. What follows is the playbook — every wire, every prompt, every regulatory edge case — so you can copy what worked and skip the week we burned.

The constraint that designs the system

The Wkkgz (Wet kwaliteit, klachten en geschillen zorg) requires Dutch healthcare providers to notify the IGJ inspectorate within 24 hours of any calamity involving patient harm. For a post-op orthopedic clinic, "any calamity" includes a missed signal of a deep infection — and the case law has been clear for years: a voicemail no one listened to is not a defence.

That single sentence determined every later choice. The voice agent is not a labour-saver. It is a regulator-facing system whose primary job is to never silently swallow a melding. Saving the receptionist eleven hours a week is the side effect.

Volume and shape

The clinic runs roughly 1,420 inbound calls per week, most of them concentrated between 07:30 and 09:30 and again between 16:00 and 18:30. Three quarters are routine: appointment shuffles, parking questions, fysiotherapeut bookings. About 18% are clinical questions a verpleegkundige can answer in two minutes. Around 6%, eighty to ninety calls a week, are postoperative check-ins for a knee or hip replacement done in the previous 90 days. Of those, a handful per week contain a temperature, swelling, or wound-drainage signal that legally must be triaged within hours, not days.

You cannot solve this with a chatbot on the website. The patient cohort is 58 to 84 years old. They call.

Stack reality, not stack wishful

The EPD is ChipSoft HiX, twelve years deep, running on the clinic's tenancy. Patient bookings, surgical notes, and discharge instructions all live there. There is no documented public REST API. There is an HL7v2 integration layer that the vendor will enable on request, and a SOAP service exposed behind a VPN. That is what we had.

The fysiotherapeut rooster lives in a homegrown calendar on Exchange Server 2016 — yes, the version Microsoft stopped supporting in October 2025. Migrating it was not in scope and not in budget. We had to talk to it as-is, over EWS, and pretend it would still be there next year. If you're integrating against a 2016 Exchange in 2026, you are writing code on a fuse; budget the migration in the same statement of work, or accept that one of your two systems is one CVE away from a forced cutover.

Architecture in one diagram

The voice path is short on purpose. Every hop is a place a regulator-relevant signal can die.

  PSTN
   |  (Voys, NL geographic numbers)
   v
 SIP gateway --> Voice agent (LLM + ASR/TTS, NL-only)
                     |
                     +--> HiX adapter (HL7v2 / SOAP over VPN)
                     |      - patient lookup by BSN-hash + DOB
                     |      - recent surgery (<= 90 days?)
                     |      - write back: contact note + escalation flag
                     |
                     +--> Exchange 2016 adapter (EWS, NTLM)
                     |      - find / move fysiotherapeut slot
                     |      - write iCal invite
                     |
                     +--> Triage router
                            - score >= T   --> Surgeon queue       (SLA: 40s)
                            - score mid    --> Verpleegkundige     (SLA: 30m)
                            - score low    --> Async callback      (SLA: 4h)

Everything is logged twice: once into HiX as a contactnotitie, once into an append-only audit store the clinic's privacy officer can hand to the IGJ without us in the loop. The audit store is the artefact that lets us sleep.

The conversation graph, not the prompt

People who have never shipped a voice agent reach for a single prompt. People who have shipped one reach for a graph. A graph survives interruptions, dialect, hearing aids, hold music bleeding back into the line, and the patient who answers the first question with a story about her grandson.

Our top-level graph has eleven nodes. The one that earns its keep is postop_screen, which fires the moment HiX returns a surgery within the last 90 days:

node: postop_screen
guards:
  - patient.last_surgery.days_since <= 90
say: |
  Voor we verder gaan - hoe gaat het met de wond?
  Heeft u koorts gemeten vandaag? En zo ja, hoeveel graden?
extract:
  - field: temperature_c
    type: number
    range: [34.0, 42.5]
    on_missing: ask_once_then_skip
  - field: wound_drainage
    type: enum
    values: [geen, helder, troebel, bloederig, pus]
  - field: pain_increase_24h
    type: boolean
route:
  - when: temperature_c >= 38.5 OR wound_drainage in [troebel, pus]
    to: escalate_surgeon
  - when: temperature_c >= 37.8 OR pain_increase_24h == true
    to: escalate_verpleegkundige
  - else: continue

Two things worth stealing. First, the temperature range is bounded — we have seen agents accept "honderdvijftig graden" as 150 and route it as critical. Second, ask_once_then_skip means we never trap a patient in a loop for a missing answer. If she does not remember her temperature, the agent escalates anyway. Silent skip is worse than over-escalation.

The 40-second SLA, mechanically

"Within 40 seconds" is not a marketing number. It is the time between the agent's escalation decision and a human voice on the line. We measured every component on day one and budgeted backwards:

Triage decision (LLM + rules): 1.4 s p95
HiX write of the escalation flag: 2.1 s p95
Queue pickup notification to the on-call surgeon: 0.8 s
Surgeon device ring + answer (3 devices, parallel): 28 s p95
Bridge and context handoff (TTS reads the structured note in NL): 6 s

That leaves us about two seconds of slack. The non-negotiable line is the parallel device ring: desk phone, DECT, and a Voys-registered mobile, all dialed at once. A serial ring kills the SLA every time and we will not ship one again.

If no surgeon picks up inside the SLA, the system pages a second-tier escalation (chef de clinique plus the on-call internist) and stamps the audit log with a missed-SLA marker. We have not yet had to use the second tier in production, but the test suite hits it every night.

HiX integration without a real API

The HL7v2 path is brittle. Messages occasionally arrive out of order, ACKs sometimes time out, and the SOAP endpoint silently drops a field if the patient name contains a single quote (yes — we logged this and the vendor is aware). We treat HiX as eventually-consistent and idempotent on our side: every write carries a deterministic message ID and we re-send on missing ACK with backoff. The contactnotitie is written first, the escalation flag second. If the second write fails, the note still gives the surgeon the context.

We considered scraping HiX through its desktop client. We decided not to. RPA on an EPD is the kind of decision that looks clever at week six and looks negligent at the first audit.

The Exchange 2016 problem

EWS still works. NTLM still works. What does not work is throughput: more than about eight concurrent EWS sessions and the server's RPC layer begins to throttle silently, returning stale calendar data without an error. Fysiotherapeut slots were appearing as free in the agent's mouth and then double-booked an hour later.

The fix was a single-writer queue in front of EWS. Every slot read and every slot write goes through one process, serialized, with a one-second cache. Throughput is fine because the call volume is human-scale. The patient never notices. The receptionist stopped getting double-booking calls within a week.

Language, accent, and the consonants that matter

Limburgse patiënten do not always speak ABN Dutch. Our ASR baseline missed "knie" as "nie" about 4% of the time, which is fine, and missed "koorts" as "korst" about 1.2%, which is not. We fine-tuned the recogniser on roughly 40 hours of recorded clinic intake calls (with consent, anonymised, signed off by the FG) and the koorts confusion dropped below 0.2%. The lesson: the words you cannot afford to lose are a tiny list. Spend the effort on those, not on a generic accent model.

We kept the agent monolingual Dutch on purpose. A patient who code-switches to English mid-sentence triggers a human transfer. The cost of a wrong escalation in English is higher than the cost of one extra transfer.

What the regulator actually wants to see

The IGJ has not, to our knowledge, audited a voice agent in a Dutch clinic yet. The standard they will apply is the one they apply to any other clinical triage system: was the protocol explicit, was it followed, was the decision logged in a way a third party can reconstruct?

The audit store answers all three. Every call produces a JSON record with the transcript, the extracted fields, the routing decision, the rule that fired, the version of the prompt graph, and the timestamp at every hop. It is retained for the seven years the Wkkgz expects. It is the boring artefact that makes the rest of the system defensible.

Takeaway

A voice agent in healthcare is not a phone tree with an LLM bolted on. It is a regulated triage system whose hardest requirement is losing the call to a human inside a deadline you cannot miss.

What we would do differently

Three things. We would build the audit store before the conversation graph, not after. We would budget the Exchange migration into week one of the project, even if the client pushes back. And we would not let the first production cohort be 90 days of patients at once. We ran the first two weeks at 20% traffic with a receptionist shadowing the queue, and that caught the koorts/korst issue before it caught us.

The wider lesson is not specific to clinics. Any voice agent that touches a regulated decision needs to be designed around the handoff first and the conversation second. The chat is the easy part now. The audit log, the SLA, and the failure mode when the model is wrong are where the real work lives.

When we built the voice agent for this Maastricht clinic, the thing we kept underestimating was how much of the project lived in the seams between two old systems nobody owned anymore. We ended up writing more glue than agent, and that was the right ratio.

If you want to pressure-test your own setup today: time the gap between "your system decides this call needs a human" and "a human says hello." If it is over 60 seconds, you have a design problem, not a staffing problem.

Key takeaway

If your voice agent cannot lose the call to a human inside 40 seconds, it is not fit for a Wkkgz-regulated triage line — design the handoff first, not the chat.

FAQ

Why a voice agent instead of a chatbot for postoperative follow-up?

The patient cohort is 58 to 84 years old and overwhelmingly prefers the phone. A web chatbot misses the calls that matter — the postop fever at 18:30 from someone who does not use the patient portal.

How do you integrate with ChipSoft HiX if there is no public REST API?

Ask the vendor to enable the HL7v2 integration layer and the SOAP service over VPN. Treat all writes as idempotent with deterministic message IDs, and re-send on missing ACK with backoff.

Is Exchange 2016 safe to integrate against in 2026?

No. Mainstream and extended support ended on 14 October 2025. EWS still functions, but you should budget a migration to Exchange Online or another scheduling backend in the same statement of work.

What does the Wkkgz require from an AI triage system?

An explicit protocol, evidence the protocol was followed, and a log a third party can reconstruct. Calamities involving patient harm must be reported to the IGJ within 24 hours.

How do you guarantee a 40-second human handoff?

Parallel-ring three devices (desk, DECT, mobile) the moment the triage decision fires, pre-write the structured context to HiX, and page a second tier automatically if no one picks up inside the SLA.

voice agentsai agentsintegrationsoperationscase studyworkflow

Building something?

Start a project