Chat agents

Chat agent for verzuim intake: a Leuven case study

It is 7:45 on a Monday in Leuven. Eighty calls are in the queue. Nine months in, one chat agent handles 1,260 verzuim-meldingen a week in Dutch and French.

Jacob Molkenboer· Founder · A Brand New Company· 24 May 2025· 8 min

Antique brass desk bell, folded paper slip with green ribbon, ivory card with red seal on linen blotter.

It is 7:45 on a Monday in Leuven. The intake desk for an occupational-health service has eighty calls in the queue and three people on the phones. By 09:30, the queue will peak at 220. Each conversation is the same shape: name, employer, kind of illness, how long, when can we expect a return. Each one ends with a summary that has to land inside Humannet, the legacy verzuim-management system the service has run since 2013, and it has to be there before the employer's first coffee.

For nine months we have been running an intake chat agent that catches the first wave of that queue. It now handles 1,260 conversations a week in Dutch and French. Average time from ik wil mij ziekmelden to a clean Arbowet-aligned summary written into Humannet: 47 seconds, p95 at 71 seconds. The intake team went from 11 people to 4. Nobody was let go. Seven moved into case management for the long-term files, which is where they wanted to be.

This is how that agent works, what we had to argue with, and the parts that almost killed the project.

The shape of the queue before the agent

The service has 36 employees and roughly 700 client employers spread across Vlaanderen and the southern Randstad. Their employees verzuim-meld through three lanes: a phone number, a web form, and an email address nobody admits to monitoring. The phone number was 71% of volume. The web form, 22%. Email, 7% and almost guaranteed to be late.

The structure of a verzuim-melding does not change. You need the melder's name and BSN or rijksregisternummer, the employer, the type of melding (first day, vervolg, betermelding), the symptom cluster in the melder's own words, an estimated duration, and a flag if there is any indication of psychosocial cause. The Arbowet for Dutch employers and the Codex over het welzijn op het werk for Belgian ones both want those fields, named slightly differently, on the same day.

Before the agent, intake clerks asked these questions in a fixed order, typed the answers into a hand-rolled web app, and then re-typed about 60% of them into Humannet because the export connector had stopped working in 2019 and nobody wanted to be the one to fix it.

Why the language layer had to be solved first

Leuven sits close enough to the language border that about 18% of the service's caller base speaks French as a first language. Until the agent, this had been handled by routing every French caller to one of two clerks who happened to be Wallonian. Both were on the verge of leaving because they were doing 40% of the French volume with 18% of the intake headcount.

The first thing we built was not the agent. It was a small classifier that listens to the first three messages from the user, decides the language, and locks the conversation. Dutch and French are linguistically far enough apart that this is almost a non-problem at the token level. The harder case was Brussels callers who switch mid-sentence. We resolved this by treating language as a session variable, not a turn variable: once locked, the agent stays. If the melder explicitly asks kan ik in het Frans verder, we re-lock.

We did not let the LLM pick the language out of vibes. The classifier is a deterministic fastText model with a confidence floor. Below the floor, the agent asks, in both languages: Liever Nederlands of Frans? / Néerlandais ou français? That single question, asked maybe 3% of the time, removed an entire category of complaint. A chat agent that guesses the wrong language is worse than one that asks. Language belongs in a deterministic step, not an LLM judgment.

The Humannet integration was the project

LLMs are easy. The hard part of this build was always going to be writing a clean summary into a thirteen-year-old verzuim system whose API is a SOAP endpoint documented in a PDF last touched in 2017.

We had two options. Option A: rebuild the connector properly, with retries, idempotency, and a dead-letter queue. Option B: drive Humannet headlessly with Playwright like a person typing. Option A would have been the right answer if the SOAP service had not started returning HTTP 200 with an XML envelope containing the string InternalServerError as its body. We tried it. We logged 600 false-success writes in the first week of staging before we caught it. We moved to Option B.

So the agent now writes summaries through a headless browser that logs into Humannet, navigates to the verzuim module, opens the new melding form, fills the fields, and saves. The whole drive takes about 4.5 seconds when the network is friendly. The form layout has changed twice in nine months. Both times, the selectors broke loudly. Loudly is the only kind of broken you want.

We wrote a small shim that maps the agent's structured summary onto Humannet's form field IDs, runs a dry-run against a staging tenant every night, and emails the service's operations lead if a selector goes stale. In nine months it has fired three times. All three were caught before the morning queue.

# structured summary → humannet form mapping
SUMMARY = {
    "melder_naam": str,
    "werkgever_id": str,              # mapped from agent's freeform "werkgever"
    "type_melding": Literal["eerste", "vervolg", "beter"],
    "klacht_kort": str,               # max 280 chars, no PII beyond name
    "vermoedelijke_duur_dagen": int,
    "psychosociaal_flag": bool,       # routes to bedrijfsarts queue if True
    "taal": Literal["nl", "fr"],
}

HUMANNET_FIELDS = {
    "melder_naam":              "#meldingForm\\:naamMelder",
    "werkgever_id":             "#meldingForm\\:werkgeverSelect",
    "type_melding":             "#meldingForm\\:typeRadio",
    "klacht_kort":              "#meldingForm\\:klachtTextarea",
    "vermoedelijke_duur_dagen": "#meldingForm\\:duurInput",
    "psychosociaal_flag":       "#meldingForm\\:psychosocCheckbox",
}

def write_melding(page, summary: dict) -> None:
    for key, selector in HUMANNET_FIELDS.items():
        fill(page, selector, summary[key])
    page.click("#meldingForm\\:opslaanButton")
    assert_visible(page, "text=Melding opgeslagen", timeout_ms=4000)

Burn-out as a hard route, not a probability

In Belgian and Dutch occupational health, the cost of missing a burn-out signal early is measured in years. The service told us, plainly, on the second meeting: als de agent twijfelt, stuur door naar een mens. Dat betalen wij wel.

We took that seriously. The psychosocial flag fires on any of:

The melder uses the word burn-out, overspannen, épuisement, fatigue chronique, overprikkeld, or anything within two edits of those
The melder mentions sleep loss for more than two weeks
The melder describes a workplace conflict and a physical symptom in the same turn
The melder has had a vervolg-melding in the last 8 weeks
The classifier's confidence is below 0.7 and the conversation has run longer than 8 turns

Any of those, and the conversation hands off to whichever bedrijfsarts is on call, with a structured handover packet: the full transcript, the trigger reason, the melder's history if we have it, and the start time of the conversation. The doctor takes over inside the same chat. The melder sees a single sentence: Ik laat onze bedrijfsarts hier even bij aansluiten. / Je transfère le médecin du travail dans cette conversation. That sentence has been A/B tested into the ground. Earlier versions caused panic. This one does not.

The trigger fires on about 4.2% of conversations. In the first month, the on-call rotation pushed back hard, said it was too many. We held the line. After three months they stopped complaining. After six months one of them said, on a coffee break, that two cases the agent had flagged in week 14 were already in long-term sick leave. Both came back to part-time work in week 16 because the contact had been early.

Warning

If you build a triage agent and the humans on the receiving end do not push back at first, your threshold is too loose. Make them ask for it, then hold.

The 60-second SLA

The brief said sixty seconds from ik wil mij ziekmelden to a clean summary written into Humannet. We hit 60 seconds at p50 in week three, p90 by month two, p95 by month five. The thing that pulled the tail in was not the LLM. The LLM is the same model the whole time. It was the Humannet drive.

Specifically: the Humannet login session expires every 30 minutes, and the re-login flow has a captcha. We built a session pool. Five logged-in browser contexts, rotated, refreshed at 25 minutes, captcha solved at off-hours by the night-shift triage coordinator who is paid extra for it. Yes, a person solves five captchas a night. Yes, this is the boring answer. No, we have not found a better one, and we have looked.

Watching it work

The agent has one dashboard. It is unloved and ugly and it works. Live conversation count, language split, p50 and p95 latency, the last 50 handovers with the trigger that fired and whether the doctor accepted the routing, the last 50 Humannet writes with a green or red status, and the selector-watch result from last night's dry run. The service's operations lead checks it twice a day. Nobody else looks. That is fine. The point of a narrow agent's dashboard is not to convince anyone, it is to catch the one thing that broke before the queue does.

Quiet agent, narrow scope

There is a pattern in the agent discourse this autumn that we have been thinking about. Stories of agents going wide, scanning networks they had no business scanning, running up bills their operators did not authorise. Stories of flagship models being so eagerly proactive that their users spend half the conversation fighting them off. We read those and felt grateful that this agent does the opposite. It does one thing. It asks the questions in a fixed order, it writes one summary, it routes when it should route. It has no tools beyond the Humannet shim and a single function for handover. It does not browse. It does not send email. It does not summarise anything except a verzuim-melding.

Narrow agents that do one thing, watched every day by the people whose work they touch, are the boring shape of useful AI. The work was not in the model. It was in the field mapping, the language lock, the selector watch, the captcha pool, the trigger thresholds, the handover sentence, and the conversation with the on-call doctor in month one who told us we were wrong. We were wrong. We changed.

What month nine looks like

1,260 verzuim-melding conversations per week, up from 980 phone-plus-web before the agent
47 second median time-to-Humannet, 71 second p95
4.2% routed to bedrijfsarts on triage flag, of which 71% were assessed by the doctor as correctly routed
Zero compliance escalations from employers since week two
Intake team moved from 11 to 4, with seven into long-term case management
The two French-speaking intake clerks are both still at the service. Both told us they wanted to stay.

What you can do this week

If you run an intake desk of any kind, the leverage you have, with or without an agent, is structure. Write down the seven fields a clean intake actually needs. Stop asking the other twelve that have crept in over the years. If a person can land an intake conversation cleanly into the system of record in under two minutes by hand, an agent can do it in 60 seconds. If a person cannot, no agent will save you, because the agent will be slow in exactly the places your humans are slow today.

When we built this for the Leuven service, the thing that almost stopped the project was the legacy Humannet API and the SOAP envelopes that lied to us. We solved it by driving the UI like a careful human and watching the selectors every night. If you are looking at a similar build, or sitting in front of a 13-year-old Belgian or Dutch system that does not want to be integrated with, that kind of narrow, observed AI agent work is what we do. Show us the queue.

Key takeaway

Narrow chat agents win on intake when the field map, the language lock, and the handover threshold are settled before a single LLM call.

FAQ

How does the agent decide which language to use?

A deterministic fastText classifier locks language on the first three messages with a confidence floor. Below the floor, the agent asks 'NL of FR?' in both languages.

Why drive Humannet through a browser instead of using its SOAP API?

The SOAP endpoint returns HTTP 200 with error strings inside the XML body. We logged 600 false-success writes in staging before moving to a headless browser flow.

What triggers the burn-out routing to a bedrijfsarts?

Specific tokens, sleep loss over two weeks, conflict-plus-symptom in one turn, a vervolg-melding within 8 weeks, or low classifier confidence past 8 turns. Any one fires.

How many intake staff lost their jobs?

None. The service moved seven of eleven into long-term case management. Four stayed on live intake handling edge cases and French-speaking escalations.

What happens when a Humannet form selector breaks?

A nightly dry-run against a staging tenant detects the change and emails the operations lead. In nine months it has fired three times, all caught before the morning queue.

ai agentschat agentscase studyintegrationsworkflowoperations

Building something?

Start a project