Chat agents
WhatsApp chat agent for a Dutch physio chain: the playbook
A Maastricht physio chain was drowning in 2,100 WhatsApp rebooking threads a week, tangled with a 12-year-old Intramed install and a BSN that legally cannot reach a US datacenter.

On a Monday at 08:14 the front-desk lead at a 24-person physio chain in Maastricht had 47 unread WhatsApp threads. Three patterns, over and over: "kan ik verzetten naar donderdag?", "ben ziek, moet ik opnieuw plannen?", and "klopt mijn afspraak van morgen nog?". Each reply took her about ninety seconds, because every one needed a check against Intramed, the practice-management system the chain has run since 2014.
By the end of that quarter the same inbox was carrying roughly 2,100 rebooking threads a week. We shipped a WhatsApp Business agent that now handles the bulk of them. The interesting parts were never the model. They were the legacy integration, the BSN problem, and the proxy sitting between them. This is the playbook.
Architecture in three boxes
The system is small on purpose. Three pieces:
- A Cloudflare Worker that fronts every outbound call to the WhatsApp Cloud API.
- An agent runtime on a Hetzner VM in Falkenstein that holds session state and runs the LLM.
- A FastAPI shim, on the same VM, that speaks ODBC to Intramed.
Inbound webhooks from WhatsApp hit the Worker first and are forwarded to the runtime over a signed HTTPS call. Outbound messages from the agent take the reverse path. WhatsApp necessarily sees the phone number because that is how the channel routes a message. What it does not see is a BSN, an insurance number, a date of birth, or any free-text field that has not been resolved through our token table. The agent itself works with opaque tokens like PT_4f7a that the Worker translates back to human-readable strings on the way out.
Stripping BSN at the edge
The legal pressure here is real. The Autoriteit Persoonsgegevens treats burgerservicenummers as a special category of identifier under the Dutch GDPR implementation act, and a healthcare provider that lets BSNs cross the Atlantic without a clear legal basis is not in a happy regulatory position. The WhatsApp Cloud API is operated by Meta and its logs sit in US infrastructure. We built the proxy on the assumption that any payload reaching the Cloud API is, for all practical purposes, in the United States.
The Worker does three things on the outbound path. It validates an HMAC signature from the runtime, so a leaky service inside our own network cannot post to Meta directly. It looks up tokens in a short-lived KV namespace and substitutes them back into the message body. And it runs a regex sweep for nine-digit BSN-shaped numbers, IBANs, and insurance prefixes. Nine-digit hits are run through the eleven-test, the same checksum the government uses to validate BSNs. Anything that passes is treated as a live identifier and dropped.
// worker/src/whatsapp-proxy.ts
import { verifyHmac } from "./hmac"
import { isValidBsn } from "./elfproef"
const NINE_DIGITS = /\b\d{9}\b/g
const IBAN = /\bNL\d{2}[A-Z]{4}\d{10}\b/g
export default {
async fetch(req: Request, env: Env): Promise {
if (!await verifyHmac(req, env.RUNTIME_SECRET)) {
return new Response("bad signature", { status: 401 })
}
const msg = await req.json()
const body = await expandTokens(msg.body, env.TOKENS)
const bsnHits = [...body.matchAll(NINE_DIGITS)].filter(m => isValidBsn(m[0]))
if (bsnHits.length || IBAN.test(body)) {
await env.AUDIT.put(crypto.randomUUID(), JSON.stringify({
kind: "leak_blocked",
thread: msg.thread_id,
reason: bsnHits.length ? "bsn" : "iban",
}))
return new Response("sensitive identifier in payload", { status: 422 })
}
return fetch(`https://graph.facebook.com/v20.0/${env.PHONE_ID}/messages`, {
method: "POST",
headers: {
"Authorization": `Bearer ${env.WA_TOKEN}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
messaging_product: "whatsapp",
to: msg.to,
type: "text",
text: { body },
}),
})
},
}
The 422 response is not a generic error. The runtime treats it as a hard signal that the prompt, a template, or a tool result needs to be reworked. Over the first three weeks that bounce rule fired eleven times. Each one was a real issue we would otherwise have shipped: a template that interpolated the wrong field, a tool result that included a join row, a model that helpfully repeated back a number the patient had mentioned in passing. None of them reached Meta.
Treat the LLM as untrusted output. The redaction belongs at the boundary, in code you can read in a five-line function, not in a system prompt.
Wiring into Intramed without a real API
Intramed has been around since the nineties. Its current versions expose a partner SOAP interface, but our client was on a 2014 install where the only practical way in was a read-only ODBC connection into the underlying Firebird database, and a small set of stored procedures the vendor had once documented in a PDF the office still kept on a USB stick. We were not going to modernise Intramed. That was the wrong battle.
Instead we wrote a thin FastAPI shim on the same VM. It exposes seven endpoints: find_patient, list_upcoming, list_slots, hold_slot, confirm, cancel, and log_note. Each one is a function around either an ODBC query or a stored procedure call. The shim is roughly 380 lines of Python. It has its own integration tests against a nightly snapshot of the production database, restored into a sandbox schema so the tests never touch a live row.
The agent never talks to Intramed directly. It talks to the shim. The shim is the contract. When the chain upgrades Intramed next year, the only thing that needs rewriting is the shim. The agent runtime, the Worker, the prompts, and the WhatsApp templates all stay where they are.
Two Firebird quirks chewed a day each. ODBC connection pooling under the driver we ended up on leaked file descriptors on long-running processes, so we pinned a fresh connection per request and ate the latency cost. The stored procedures also returned VARCHAR columns padded with trailing whitespace from a nightly fixed-width import, which the model otherwise echoed back to patients as 'Jansen '. Neither problem is interesting. Both are typical of any 12-year-old practice-management stack, and both belong in the shim, because every future consumer of Intramed will hit them too.
The rebooking state machine
The conversation is not freeform. It is a state machine the LLM drives, not a chain of generic chat turns. Six states and a fallback:
# runtime/state.py
class State(str, Enum):
IDENTIFY = "identify" # match phone number to patient
INTENT = "intent" # rebook, cancel, question, other
PROPOSE = "propose" # offer 3 slots from list_slots()
HOLD = "hold" # patient picked, soft-hold for 90s
CONFIRM = "confirm" # send approved WA template
DONE = "done"
HANDOFF = "handoff" # escalate to a human
TRANSITIONS = {
State.IDENTIFY: {State.INTENT, State.HANDOFF},
State.INTENT: {State.PROPOSE, State.HANDOFF, State.DONE},
State.PROPOSE: {State.HOLD, State.PROPOSE, State.HANDOFF},
State.HOLD: {State.CONFIRM, State.PROPOSE, State.HANDOFF},
State.CONFIRM: {State.DONE, State.HANDOFF},
}
The LLM picks the next state from the allowed set and writes a short rationale to the audit log. Slots that were offered but not taken get released after 90 seconds, which solved the original race where two patients picked the same Thursday 14:30 in quick succession. The model does not invent times. list_slots returns a JSON array and the agent has to pick from that array verbatim, or call list_slots again with a different window. The WhatsApp template used for CONFIRM is pre-approved through Business Manager, so the message goes out even if the patient has been quiet for more than 24 hours.
IDENTIFY was the state we underestimated. Most patients message from their own phone, but roughly one in seven entries in the chain's contact database carries a shared family number, and a meaningful slice of the elderly cohort writes from a son or daughter's WhatsApp. The lookup table holds (phone, [patient_candidates]). If exactly one candidate matches we move on. If two or more match, the agent asks a single confirming question on first name and birth year. If the answer is ambiguous or empty, we hand off. The model is never allowed to guess on identity. The cost of being wrong is sending the wrong person someone else's appointment time, which is the one failure mode the chain genuinely cannot tolerate.
Where humans take over
The agent hands off in four situations: a patient asks anything medical, a patient is in arrears, the conversation has gone five turns without progress, or our confidence on the patient match drops below a fixed threshold. Handover is not a polite "let me pass you to a colleague" message. It is a Slack ping into the right channel at the right clinic, with the last six messages, the patient's display name (not BSN), and a deeplink into Intramed. The clinic staff take it from there.
Roughly one in seven threads ends in handover. That is the number we tune against. If it drops too low we are probably overreaching. If it climbs the prompts or the slot logic need work.
What the first six weeks looked like
The agent went live on a Tuesday at 11:00, behind a feature flag that scoped it to one Maastricht-Centrum clinic for the first ten days. The front-desk lead at that clinic was given the Slack handoff channel and an explicit veto: anything that looked off, she pulled the conversation back into her inbox, and we treated her bounce-back as the canonical bug report. By Friday she had stopped opening WhatsApp Web on her second monitor. By the end of week three the per-thread median time from first patient message to confirmation template had dropped from a little over four minutes to under fifty seconds. We rolled the agent to the remaining three clinics in week four. The numbers we actually watch:
- Threads handled end-to-end without a human, tracking around 84%.
- Worker 422 bounces (sensitive payload blocked), 0 in the most recent fortnight against 11 in the first three weeks.
- Patient complaints traced back to the agent, one, about tone, fixed in the system prompt.
- Front-desk hours freed per week, by the lead's own count, around 28.
The most useful side effect had nothing to do with WhatsApp. The audit log we built for the proxy turned out to be the cleanest record the chain had ever had of patient-facing communication. The data-protection officer pulled it during the next quarterly review and signed off the WhatsApp channel as a controlled processing activity inside the existing DPIA. That was the moment the project stopped being a pilot.
The five-minute audit
If you run a similar setup, open the network panel on whatever sends your messages to WhatsApp, Telegram, or any third-party channel. Look at the raw JSON of the next ten outbound payloads. Search for nine-digit numbers, IBANs, and email addresses. If you find anything you would not want a regulator to read in a year, you have a redaction problem and the fix is a proxy, not a prompt. When we built this for the Maastricht chain, the surprise was not the BSN leaks we expected; it was a stored procedure that returned the patient's verzekeringsnummer in a comment field nobody had thought to scrub. That is the kind of bug a proxy catches and a prompt never will. If you want to see how we structure these chat agents against legacy systems, the rest of the site is the longer answer.
Key takeaway
If your chat agent touches Dutch health data, redaction belongs in proxy code at the boundary, not in the LLM prompt.
FAQ
Why a Cloudflare Worker proxy instead of redacting in the agent runtime?
A proxy lives at the boundary you actually care about. Even if the runtime, the prompt, or a tool result leaks, the Worker catches it before Meta logs it. The runtime is too easy to bypass from inside.
Can the LLM see the patient's BSN?
No. The runtime resolves sensitive fields server-side and hands the model opaque tokens like PT_4f7a. The BSN is never in the model's context, so it cannot return it in a message even if asked.
What happens when WhatsApp's 24-hour messaging window closes?
Confirmations go out as a pre-approved utility template registered in Business Manager. Templates are exempt from the 24-hour customer-service window, so the message reaches the patient regardless of silence.
Why not migrate off Intramed first?
Wrong battle. A thin FastAPI shim over ODBC bought us a stable contract in two weeks. The chain can replace Intramed on its own timeline, and only the shim needs rewriting when it does.