Chat agents

Homecare chat agent: 2,140 weekly questions, one safe queue

At 23:14 on a Sunday, a mantelzorger asks when her mother's morning round will pass. The widget answers in eight seconds, without touching the medication dossier.

Jacob Molkenboer· Founder · A Brand New Company· 18 Jun 2026· 9 min

Brass call-bell on open linen ward ledger, folded card with chartreuse ribbon, porcelain saucer with pen rest.

It is Sunday, 23:14. A daughter in Wormerveer is awake again. Her mother, eighty-one, congestive heart failure, lives alone in a flat near the Zaanbocht. She wants to know when the morning round will pass tomorrow. She does not want to call. She wants a number on a screen so she can go to bed.

She types into the chat widget on the cooperative's website. Eight seconds later the answer is back: tussen 07:40 en 08:10, Carla rijdt. Between 07:40 and 08:10, Carla's shift. The daughter closes the tab. The widget logs the exchange and does not page anyone.

This post is about why that interaction is harder than it looks. We built the agent earlier this year for a thirty-three-person thuiszorg-coöperatie in Zaanstad. It now handles around 2,140 mantelzorger questions a week. Almost none of them reach a human. The few that do reach the right human within sixty seconds, and they never include medication advice.

The cooperative behind the widget

Names and exact identifiers are out of this post for the obvious reason. The shape is this: thirty-three people, of which twenty-two are wijkverpleegkundigen and verzorgenden IG, three teamleiders, one bestuurder-zorgmanager, the rest planning and back office. Roughly 410 clients across Zaanstad and the lintdorpen up to Krommenie. Mixed funding: most clients on Zvw wijkverpleging, a meaningful chunk on Wlz, a handful on Wmo huishoudelijke hulp via the gemeente. The ECD is Nedap Ons, in production since 2014, deeply customised, with twelve years of routes, planningsprofielen and indicatie-templates behind it.

The cooperative is not large enough to staff a contact centre. Before the agent went live, every after-hours question landed in one shared inbox monitored on a rota by the teamleiders. The inbox averaged 380 messages a week, with a seventeen-hour median first response and the kind of weekend backlog that ends teamleider careers. The question for us was never “can a chat agent do this.” It was: what is the smallest set of questions we can answer without humans, and what is the largest set we are not allowed to answer at all?

Two hard rules, agreed on day one

Before any prompt, any retriever, any integration: two rules from the bestuurder, written into the project brief in plain Dutch.

Rule one. If a question concerns a client whose dossier carries a Wlz-flag — CIZ-indicatie, more complex care profile, different administrative rails — the agent must not respond from cache, RAG or paraphrase. It must hand off to the teamleider queue. Wlz clients move through different rates and different inspection regimes. A wrong answer about a Wlz client is not just embarrassing; it is a notifiable incident.

Rule two. Any question that mentions or implies medication — dosage, timing, interaction, “may my mother take her oxazepam now,” “she forgot her metoprolol” — is blocked at the widget. The agent never composes a medication answer. It tells the user a verpleegkundige will look at the dossier and call back, then writes the question to a four-eyes queue. Two nurses must read the dossier and the question before a response is sent. This is not us being conservative. It is what the Inspectie Gezondheidszorg en Jeugd would expect from a small thuiszorg-organisatie, and what the bestuurder did not want to defend on a Tuesday morning.

Talking to a twelve-year-old Nedap Ons

Nedap Ons has a modern API surface. OAuth2, JSON, versioned endpoints. On paper, integration is straightforward. In practice, an instance that has run since 2014 looks nothing like a fresh tenant. This one had:

Two parallel client-identifier schemes from a 2018 merger, with about sixty clients carrying both.
A custom indicatieprofiel field with thirty-one values, only nine of which were still in active use.
Three free-text fields that the planning team had quietly been using as structured data for seven years.
A scheduled job from 2016 that nightly rewrote certain status flags. Nobody on staff knew it existed. We found it because two of our queries returned different answers at 03:00 than at 09:00.

We did not touch Nedap Ons. We built a read-only adapter in front of it that did three things: normalise the identifier schemes, project indicaties through a current-vs-legacy mapping table maintained by the bestuurder, and snapshot the canonical client view every fifteen minutes into a Postgres on our side. The agent reads from the snapshot, never from Ons live. The snapshot is the source of truth for the widget; Ons is the source of truth for everything else.

This is not elegant. It is what twelve years of organic schema growth requires. If we had insisted on live reads we would still be debugging the nightly job.

def classify_question(text: str, dossier: Dossier) -> Route:
    if mentions_medication(text):
        return Route.FOUR_EYES_NURSE  # rule two: hard block
    if dossier.wlz_flag:
        return Route.TEAMLEIDER_QUEUE  # rule one: hand off
    if is_planning_question(text) and dossier.has_active_route():
        return Route.AUTO_ANSWER
    return Route.TEAMLEIDER_QUEUE  # default: human

def route(q: Question) -> None:
    dossier = snapshot.load(q.client_id)  # never Ons live
    target = classify_question(q.text, dossier)
    sla = None if target == Route.AUTO_ANSWER else 60
    queue.dispatch(target, q, sla_seconds=sla)

The classifier is two layers. A regex and keyword pass for medication terms runs first and never escalates to the LLM. Then a small fine-tuned Dutch intent model. The medication check covers around 380 generic and brand names plus common misspellings. We update it monthly from the Farmacotherapeutisch Kompas vocabulary. It is not clever. It does not need to be.

Sixty seconds in the teamleider queue

When a Wlz-flagged question lands, the agent does three things in the same call: it writes an entry to the teamleider queue with full context and dossier links, it sends an iOS push to the teamleider on rota, and it tells the mantelzorger that a human will look at this within the hour. The SLA we wrote into the contract is sixty seconds from question to queue entry plus push. Median in production is 4.1 seconds. Ninety-fifth percentile is 22 seconds. The slow tail is almost entirely cold-start on the Ons snapshot reader after long idle periods, which we have not bothered to fix because nothing breaks at 22 seconds.

The queue itself is a small React view. It shows the question, the client (name, geboortejaar, route, current indicatie, Wlz status), the last three contactmomenten, and a one-line summary the agent generated. The teamleider taps behandeld or doorzetten naar verpleegkundige. That is the entire interface. We resisted every request to add filtering, sorting and dashboards for the first six months on purpose. A queue that looks like a tool gets used like a tool. A queue that looks like an inbox gets ignored like an inbox.

Medicatie: four eyes, no exceptions

The medication pipeline is deliberately the most boring system in the stack. When the keyword pass hits, the agent emits a fixed Dutch response: “Voor vragen over medicatie kijkt een verpleegkundige eerst in het dossier. We bellen u binnen twee uur terug op het nummer dat bij ons bekend is. Klopt dat nummer nog?” The number is then confirmed in-widget. The question drops into a separate queue that two nurses must clear before the call is logged as resolved. We log both reviewer IDs and the timestamps. Audit-ready by design.

In nine months of production traffic, roughly 78,000 mantelzorger interactions, the agent has produced zero medication answers. Not because it learned not to. Because it cannot. The path through the code that would generate one does not exist.

Takeaway

The safety of an agent in a regulated domain is set by what it cannot do, not by how well it is told not to do things.

What 2,140 questions a week actually look like

Of the roughly 2,140 questions the widget sees in a typical week:

About 1,610 are answered fully by the agent. The bulk are planning questions (“when is the next visit”), route confirmations, sick-leave notifications by mantelzorgers, and address or phone changes that the agent writes back through the adapter to a te-controleren queue for the planning team.
About 380 land in the teamleider queue. Most are Wlz-flagged routine questions that we could in principle answer; the bestuurder has not yet relaxed rule one and may never. About forty of those are genuinely complex.
About 150 land in the medication four-eyes queue. Slightly under half are about timing rather than dosage and resolve in a single nurse callback.

The numbers we did not expect: 61% of questions land outside 08:00–17:30; 88% come from mantelzorgers rather than clients themselves; and mantelzorgers learned to ask shorter, more answerable questions within two weeks of launch. Behaviour adapts to the interface faster than you would think.

What we would do differently

Two things.

First, we would build the snapshot layer before the agent, not alongside it. We lost two weeks chasing intermittent answers that turned out to be the 2016 nightly job. If you are integrating with a long-lived ECD, assume there is at least one undocumented scheduled job and at least one free-text field used as structured data. Audit before you build. Run the same read query at 02:00, 09:00 and 17:00 for a week and diff the results. Boring; cheap; will save a sprint.

Second, we would put the teamleider queue interface in front of users earlier. The first version we shipped had the queue, the SLA, the push notifications, and a teamleider experience we had not stress-tested at 11pm on a tired phone over 4G. The first weekend, two teamleiders missed pushes because the queue view took four seconds to load. We rewrote it in a day. We should have rewritten it the week before launch.

What you can do this afternoon

If you run support or operations for a regulated business — care, finance, legal, anything with an inspector — and you are thinking about a chat agent, the most useful five minutes you can spend is to write down two lists. List one: every question type that, answered wrongly, is a notifiable incident. List two: every question type that is genuinely high-volume and low-risk. Hard-block list one in the classifier. Build the agent for list two. The middle, where most projects die, you do not need yet.

When we built this widget for the Zaanstad cooperative, the thing we kept running into was that the safety constraints were not a feature on a roadmap. They were the architecture. We ended up shipping fewer capabilities than the brief asked for, and the bestuurder was happier for it. That is broadly the pattern we have settled into for chat agents in Dutch care, legal and finance work: build to the hard rules first, then see what the agent can usefully do inside them.

Key takeaway

The safety of an agent in a regulated domain is set by what it cannot do, not by how well it is told not to do things.

FAQ

Can a chat agent legally answer medication questions in Dutch thuiszorg?

Not in any pattern we would defend to the IGJ. We hard-block medication questions at the classifier and route them into a four-eyes queue where two nurses must read the dossier before a response is sent.

Why route every Wlz-flagged question to a teamleider instead of answering it?

Wlz clients have different administrative rails, rates and inspection regimes. The marginal time saved by auto-answering is not worth the audit risk for a small thuiszorg-organisatie.

How do you integrate a chat agent with a twelve-year-old Nedap Ons instance?

Read-only adapter with a fifteen-minute Postgres snapshot in front of Ons. The agent never queries Ons live. Audit for legacy free-text fields used as structured data and undocumented scheduled jobs first.

How fast does the teamleider queue actually respond?

Sixty-second SLA from question to queue entry plus push. Median in production is 4.1 seconds, p95 is 22 seconds. The tail is cold-start on the snapshot reader after long idle periods.

chat agentsai agentscase studyintegrationsautomationoperations

Building something?

Start a project