Chat agents

GGZ chat agent case study: 30 seconds to C-SSRS escalation

A 22-person Utrecht GGZ-praktijk needed to triage 1,720 weekly chat questions across a 13-year-old EPD and a homegrown ROM-archief without missing a C-SSRS ≥4 signal. Here is what we built.

Jacob Molkenboer· Founder · A Brand New Company· 19 May 2026· 9 min

Ivory desk with cream triage card, green flag tab, brass bell, folded ink handkerchief, red wax seal, side light.

07:42 on a Tuesday

A client opens the portal of a Utrecht GGZ-praktijk and types one line: "Ik weet niet of ik dit nog kan." It is 07:42. The receptionist is not in until 08:30. The intake-coördinator is in een teamoverleg until 10:00. The behandelaar opens the inbox after the first session of the day, around 11:15.

Three and a half hours of silence in a chatbox at a mental-health practice is the failure mode by itself. Add the Wkkgz — the Dutch Quality, Complaints and Disputes Act — and it is also a legal exposure. Once the praktijk is on notice of a calamiteit, the clock to file with the Inspectie Gezondheidszorg en Jeugd starts ticking. The clock does not pause because the inbox was busy.

This is the post about how we closed those three and a half hours into thirty seconds, on top of a thirteen-year-old EPD.

The stack we inherited

The praktijk has 22 mensen on payroll: 9 behandelaren (GZ-psychologen, klinisch psychologen, one psychiater), 6 basis-psychologen, 4 in administratie, 2 in management, one ICT functie carried by a single contractor on a 12-hour-per-week contract. Their EPD is USERvision, installed in 2013 and patched in place since. Their ROM-archief — the SQL store behind every BSI-23, OQ-45 and CORE-OM ever filled in by a client — is a custom build on SQL Server 2016. Microsoft's extended support for SQL Server 2016 ends 14 July 2026. We are writing this three weeks before that date. Migration is on the 2026 roadmap but had to wait its turn behind the chat agent and the wachtlijst-portaal.

The portal already existed. It was a thin React app dropped in front of USERvision's web-form endpoints. What it did not have was triage. Every message went into one FIFO queue, polled by whichever administratief medewerker had a free moment. A logistics question about een herhaalrecept and the sentence from 07:42 were treated identically. Monday morning was a 180-message backlog. Friday afternoon was a 60-message backlog. The clinical signal lived in the middle of both.

The Wkkgz clock in plain terms

Article 11 of the Wkkgz requires every Dutch zorgaanbieder to operate an internal incident-reporting system. Beyond that, when an incident meets the legal definition of a calamiteit — an event that has led to, or could have led to, the death or serious harm of a client — the practice must report it to the Inspectie Gezondheidszorg en Jeugd. The IGJ's guidance is "onverwijld" and in practice within three working days of the moment the praktijk could reasonably have known.

That last clause is what made the inbox a legal artefact. Once a high-risk message has been delivered to the portal, the praktijk is on notice. If the message sits unread for nine hours, the position that "we did not know" gets thinner with each passing hour. The chat agent's job is to make sure the praktijk knows within thirty seconds, with a paper trail that proves it.

What 1,720 weekly questions actually look like

Before we built anything, we read four weeks of inbox. The volume was 1,720 messages per week with no real seasonal swing. The shape was:

~72% logistics — afspraken verzetten, factuur-vragen, parkeren, herhaalrecepten.
~22% intake en wachtlijst — questions about zorgvraagtypering, wachttijd, eigen risico.
~3% clinical content — symptoms, medication side-effects, sleep, panic.
~0.4% trigger a C-SSRS rerun. That is roughly seven messages per week.

0.4% sounds harmless until you turn it into wall-clock. Seven messages per week is one every twenty-four hours. If one of them sits unread overnight, you have a Monday morning that ends in a telephone call from IGJ. The product the praktijk asked us to build was not a chat agent. It was the elimination of that overnight risk.

The C-SSRS gate

We score every incoming message against an adapted Columbia Suicide Severity Rating Scale. The clinical version has six items; we use the screening version with the praktijk's own thresholds, locked in writing with the verantwoordelijke psychiater. Anything that scores 4 or higher — active ideation with some intent to act — is parked in a dedicated queue that only GZ-psychologen and the dienstdoende crisisdienst see.

The SLA we committed to is brutal in its simplicity: thirty seconds from message receipt to queue assignment. That window includes the classifier call, the disagreement check against a second model, the patient lookup against the USERvision replica, the latest ROM-score pull, and the queue write. The agent runs on a dedicated VPS in Frankfurt. We measured tail latency in shadow mode for three weeks before we let it route a single message.

No clinical response is ever drafted by the agent for these messages. The product is the queue, not the reply.

Reading from a 13-year-old EPD

USERvision from 2013 has no real API. The portal endpoints we could see were unauthenticated form-handlers that wrote straight into the EPD's MSSQL backend. Calling them from the agent would have been faster to ship and impossible to defend in front of the FG or the IGJ. So we did not.

What we did instead:

Stood up a read-only replica of the USERvision MSSQL database. Change Data Capture where supported, hourly snapshot where it was not.
Mirrored the ROM-archief from SQL Server 2016 the same way.
Built a thin staging layer in Postgres that normalises patient-id, BSN-hash, behandelaar, dossierstatus, and the latest three ROM-scores into a single view.
Never, under any condition, write back to USERvision. The EPD remains the single source of truth; the agent's writes go into its own append-only audit log.

The read-only constraint sounds like a limitation. It is the feature. When the FG audits the system — and at a praktijk that handles 1,720 messages a week, the FG audits the system — the praktijk can show that the AI-laag is observability, not authority.

-- The view the agent reads from before it scores anything
CREATE VIEW staging.client_context AS
SELECT
  c.client_id,
  c.behandelaar_id,
  c.dossier_status,
  c.laatste_contact,
  rom.bsi_23_total           AS rom_bsi_latest,
  rom.bsi_23_subscale_si     AS rom_bsi_si,
  rom.measured_at            AS rom_measured_at,
  ev.events_last_72h
FROM uservision_replica.client c
LEFT JOIN rom_replica.latest_rom_score rom
  ON rom.client_id = c.client_id
LEFT JOIN events.client_event_summary ev
  ON ev.client_id = c.client_id
WHERE c.dossier_status IN ('actief', 'wachtlijst', 'nazorg');

The handover protocol

When a message lands in the GZ-psycholoog queue, the receiving clinician sees four panels. The raw bericht, untouched. The C-SSRS score with the model's reasoning trace and the verbatim phrases that triggered it. The last three contacts pulled from USERvision plus the most recent ROM-subschaal voor suïcidale ideatie. And a suggested response template: empathetic acknowledgement plus an offer to call back within fifteen minutes. The clinician edits the template before sending. Always.

The point of the suggested template is not speed. It is to remove the cognitive cost of "what do I write" so the clinician can spend that cost on "what do I do next." In practice the clinician rewrites about half of every suggestion. That is the right number. If the rewrite rate went to zero we would worry that the clinicians had stopped reading carefully.

The handover panel also shows a freshness indicator on every data field. If the ROM-score is older than four hours, the panel says so in red. The clinician decides whether to trust a stale score, not the agent.

What we did not build

We did not build a clinical chatbot. The agent does not answer questions about medication. It does not validate emotional states. It does not write reassurance.

We also did not build it to triage out the easy logistics. Other teams do that with smaller, narrower agents on top of the same staging layer. This one had a single job: never let a high-risk message sit unread.

And we did not connect it to the dialer or the SMS gateway. When a clinician picks up an escalated message, they call the client from their normal werktelefoon. Every channel that touches a patient leaves the same audit trail it has always left. The chat agent is, from the patient's perspective, invisible. From the IGJ's perspective, it is a log file that proves the praktijk acted within thirty seconds.

What broke in v1

The first six weeks were not the demo. Three things needed fixing.

Dutch dialects scored too low. The classifier was a fine-tuned multilingual model. It picked up "ik wil niet meer leven" cleanly. It missed "ik zie het niet meer zitten," which is the more common Dutch phrasing and unambiguously high-risk in a clinical context. It also missed "ik ben het zat" when the surrounding sentence made the meaning clinical rather than logistical. We added a second model with a different prompt and required agreement at C-SSRS ≥3. Disagreement at or above 3 routes to a human review queue staffed by a senior administratief medewerker who is trained to escalate but not to respond.

The ROM-archief replication lagged. SQL Server 2016 on the on-prem box ran out of disk during a CDC catch-up on a Sunday evening. The replica stopped updating for nine hours. The agent kept scoring, but with stale context. We added the freshness check described above and a Slack-page to the contractor if the lag exceeds an hour.

The clinicians did not trust the queue. Of course they didn't. We logged every escalation, every override, and every dismissal for the first eight weeks and ran a Friday review with the GZ-psychologen and the verantwoordelijke psychiater. After six reviews the false-positive rate sat at 1.4% of escalations and the conversation shifted from "is this thing safe" to "can we add a flag for eetstoornis-terugval." That shift was the milestone.

The numbers after eight months

From October 2025 to mid-June 2026:

1,720 weekly messages, average.
11 escalations per day to the GZ-psycholoog queue, average.
97.8% of escalations acknowledged inside 30 seconds; the rest inside 90.
4 confirmed C-SSRS ≥4 events. All reached a clinician inside 90 seconds; all received a clinical callback inside 15 minutes.
0 missed Wkkgz reporting windows.
1.4% false-positive rate on escalations, holding steady.

The Wkkgz number is the one the bestuurder cares about. The 1,720 is the one the administratief medewerkers care about. Their FIFO backlog used to peak at 180 messages on a Monday morning. It now peaks at 31. Friday afternoon is empty by 16:00.

We do not claim the agent prevented a suicide. We do not have the data to make that claim and we would not make it if we did. What we can say is that four times in eight months, a clinician was reading and calling within fifteen minutes of a message that, in the old workflow, would have sat in a 180-message Monday backlog.

Takeaway

The chat agent did not replace anyone. It changed which messages each person reads first, and which messages each person reads at all.

What to do Monday

If you run a praktijk or a clinic with a similar shape — a legacy EPD, a small clinical team, an inbox that does not separate logistics from risk — the smallest useful thing you can do on Monday morning is read one week of incoming messages with a stopwatch. Count how many are clinical, how many touch suicidaal-ideatie language, and how long each one sat unread. That number is your starting point. Anything else is an optimisation on top of a problem you have not measured.

When we built the chat agent for this Utrecht praktijk, the part we underestimated was not the classifier. It was the read-only replication of USERvision and the SQL Server 2016 ROM-archief — making the agent observability-only against two systems that were never built to be read from at this rate. We solved it with hourly CDC snapshots, a Postgres staging view, and a freshness check the clinicians can see. That is the kind of work we do as AI agents for clients with legacy clinical stacks.

Key takeaway

A chat agent in a GGZ-praktijk should never write the clinical reply. Its product is the queue and the context that arrive in front of a human, fast.

FAQ

Why is C-SSRS 4 the trigger threshold and not 3 or 5?

A 4 on the Columbia Suicide Severity Rating Scale means active ideation with some intent to act. It is the clinically accepted threshold where guidance moves from monitoring to immediate intervention. The praktijk locked that threshold in writing with its responsible psychiater.

Could the chat agent not also draft the clinical reply?

Technically yes. Legally and clinically, no. The agent's product is the escalation queue and the patient context. The reply is written by a GZ-psycholoog every time. An LLM cannot accept clinical accountability under the Wkkgz.

Why a Postgres staging layer instead of querying USERvision directly?

To keep the EPD as the single source of truth and to insulate live patient care from a 13-year-old database that was never designed for LLM-driven query patterns. The staging layer is read-only; the agent never writes back to USERvision.

How long did the project take end to end?

Twelve weeks from kickoff to clinical rollout, including six weeks of shadow mode where the agent scored every message but routed nothing. Shadow mode is non-negotiable for a triage agent in a clinical setting.

What happens if the second model disagrees with the first above C-SSRS 3?

The message routes to a human review queue staffed by a senior administratief medewerker trained to escalate but not respond. That keeps disagreement out of the clinician queue without losing the signal.

chat agentsai agentscase studylegacy sitesarchitectureoperations

Building something?

Start a project