Chat agents

Chat agent playbook: triaging 1,560 weekly storingsmeldingen

A 26-person Zwolle installatiebedrijf, 1,560 weekly storingsmeldingen, a 14-year-old Syntess ERP, a SQL Server 2014 box, a 4-hour SLA. Here is how we triaged it in 75 seconds.

Jacob Molkenboer· Founder · A Brand New Company· 19 Jun 2026· 10 min

Wooden manual telephone switchboard with brass jacks and cloth cords on ivory paper, one cord tagged chartreuse green.

It is a Sunday in February, minus four in Zwolle, and the office line at a 26-person installatiebedrijf rings before the espresso machine has warmed up. A woman in Stadshagen is calling because her CV-ketel will not fire and her shower is cold. She is the seventh caller of the morning. The dispatcher on weekend duty is logged into Syntess Atrium on one monitor and a homegrown SQL Server 2014 onderhoudscontract database on the other. He has roughly four hours, the Techniek Nederland norm for a warmwater-weg call, to get a monteur to her door. He will not take her call. A chat agent answers first, and it has 75 seconds before the SLA clock starts to bite.

That dispatcher is the bottleneck we were hired to remove. Not replace. Remove from the critical path. This is the playbook of what we built, in roughly the order we built it.

Mapping the intake before writing a line of code

Before we touched the agent, we sat in the dispatch office for two full Mondays. Mondays at this client are the spike: 312 storingsmeldingen on an average week, almost a fifth of the weekly 1,560. Half are warmwater-related. The other half split between CV-druk, thermostaat-koppeling, vloerverwarming, and het deed het gisteren nog.

We logged every call into a spreadsheet with eight columns: caller, postcode, klacht in their own words, klacht as the dispatcher coded it, contract type, monteur dispatched, response time, outcome. After two weeks we had 624 rows. From that the routing logic almost designed itself.

Takeaway

If you cannot draw the triage tree on a single A4 before sprint planning, your chat agent will hallucinate the tree at runtime.

The two systems that own the truth

Syntess Atrium is a fourteen-year-old ERP built for the Dutch installatiebranche. It holds the customer record, the address, the geïnstalleerde ketels (make, model, serial), and the historical werkbonnen. It exposes a SOAP web service that nobody at the client had touched since 2018. The credentials had been written on a Post-it that had since been thrown away.

The onderhoudscontract database is a SQL Server 2014 instance running on a Dell tower under a desk in the back office. It holds 4,200 active service contracts, the SLA tier of each, and the last-served date. SQL Server 2014 reached the end of Microsoft's extended support in July 2024, which means the box was, at the moment we started, running on borrowed time and unpatched.

We did not migrate either system. That was important. The client had been quoted €180k by another agency to rebuild both. Our brief was the opposite: leave the systems alone, sit in front of them, and let the agent be the one human-equivalent that talks to both.

The 75-second triage budget

The agent has 75 seconds from first message to monteur is onderweg SMS. That number is not arbitrary. We worked back from the 4-hour SLA, subtracted the median drive time in the Zwolle service area (38 minutes), the median repair window (110 minutes), and a 27-minute safety margin for monteur-side delays. What remained was 65 seconds. We rounded up to 75 to give the model room to clarify ambiguous klachten.

Inside those 75 seconds:

0–8s — greet, ask postcode and huisnummer, fire a lookup into Syntess.
8–25s — while Syntess responds (it is slow, usually 6 to 9 seconds), confirm the klacht in the caller's own words.
25–45s — cross-reference the contract tier in the SQL Server database. Spoed-class contracts get queue-jumped.
45–60s — classify the storing. The model picks one of seventeen tags, with one specific exit: cv_zonder_warmwater.
60–75s — hand off. If it is a spoed, write into a Redis-backed queue that the dispatcher's screen polls every two seconds, and send the SMS confirmation.

Why the agent does not talk to Syntess directly

A common mistake when wiring an LLM to a legacy ERP is to give the model the SOAP client and a prompt. We have done this once. It worked seventy percent of the time, which in operations means it did not work.

Instead we wrote a thin Python proxy, eight endpoints, 340 lines of FastAPI, that sits between the agent and Syntess. The agent calls GET /customer?postcode=8043AB&huisnummer=12. The proxy translates that into the SOAP envelope Syntess wants, handles the WS-Security headers, parses the response, normalises the dates, and returns clean JSON. If Syntess times out, the proxy retries with backoff and surfaces a clean 503 the agent knows how to recover from.

from fastapi import FastAPI, HTTPException
import httpx

app = FastAPI()
SYNTESS_URL = "https://syntess.internal/atrium/ws"

@app.get("/customer")
async def get_customer(postcode: str, huisnummer: str):
    envelope = f"""<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
      <soap:Body><GetKlantByAdres>
        <postcode>{postcode.upper()}</postcode>
        <huisnummer>{huisnummer}</huisnummer>
      </GetKlantByAdres></soap:Body></soap:Envelope>"""
    headers = {"Content-Type": "text/xml", "SOAPAction": "GetKlantByAdres"}
    try:
        async with httpx.AsyncClient(timeout=9.0) as client:
            r = await client.post(SYNTESS_URL, content=envelope, headers=headers)
            r.raise_for_status()
    except (httpx.TimeoutException, httpx.HTTPStatusError):
        raise HTTPException(503, "syntess_timeout")
    return {"raw_xml": r.text}  # parsing/normalisation handled downstream

That proxy is the boring part of the system. It is also the part that has not failed in seven months.

The priority lattice

Inside the agent we keep a small priority lattice. It is not a model decision. It is hard-coded, because Techniek Nederland's response norms are not negotiable and because a CV-storing zonder warmwater on a Sunday with a baby in the house is not the kind of judgement we are willing to outsource to a probability distribution.

The lattice has four tiers:

Spoed: no warm water, or no heat with outdoor temperature at or below 5°C, or a Spoed-tier contract. Hits the spoed-monteur queue. SLA clock starts immediately.
Urgent: no heat with indoor temperature still above 15°C, or a partial outage. Booked into the next morning's slot.
Regulier: intermittent issues, lukewarm water, thermostat pairing. Next regular slot, usually two to four working days.
Geen storing: caller wants a quote, an inspection appointment, or has the wrong number. Routed to the office line on Monday morning.

The LLM proposes a tier. A rules engine validates it against the contract data and the outside temperature pulled from the KNMI Zwolle station. If the two disagree, the rules engine wins and the case is flagged for human review at end of shift.

Warning

An LLM that classifies emergencies must be allowed to over-escalate but never to under-escalate. Asymmetric loss functions belong in code, not in the system prompt.

What we did with SQL Server 2014

We did not upgrade it. We did three things instead.

First, we put it behind a read-only replica running on a small Ubuntu VM, fed by Change Data Capture. The agent reads only from the replica. If the agent's traffic ever spikes — a storm rolls through, a brand of ketel gets recalled — the primary box stays untouched.

Second, we wrapped the schema in a tiny query layer. Three views, two stored procedures. The agent never sees the raw tables, which means the day the client decides to migrate to PostgreSQL or to a hosted Atrium successor, we change the view definitions and nothing else moves.

Third, we locked the network. The Dell tower can now only talk to the replication target and to the office printer. Nothing else. That is not strictly an agent concern, but if you are reading an old Microsoft SQL instance from any new system, you should treat the box as quarantined.

Observability without buying Datadog

A 26-person company does not need a 1,200 EUR/month observability stack. We log one structured JSON line per call to a file the proxy rotates daily: timestamp, hashed postcode, klacht tag, tier proposed, tier final, latency in milliseconds, hand-off path. That file rsyncs to a second box every five minutes. Nothing else.

Three counters live in Redis: tier-override count, Syntess-timeout count, SMS-failure count. If any counter crosses a threshold inside a fifteen-minute window, the dispatcher's screen turns the corresponding tile amber. The thresholds are not clever. Three overrides in fifteen minutes. Two Syntess timeouts in fifteen minutes. One SMS failure, ever. Cleverness is what you add when boring counters stop being enough.

Every weekday at 09:00 the operations manager gets an email with the prior twenty-four hours: total calls, tier distribution, every override with the original transcript snippet, median latency, and any KNMI feed staleness. Six lines of text and one table. She reads it before her coffee. Twice in seven months she has changed an internal SOP because of what she saw in that report.

The first month, in numbers we trust

We launched on a Tuesday in November, deliberately not on a Monday. Week one carried 1,488 storingsmeldingen. The agent handled 1,213 of them end-to-end without a dispatcher touching the case. Median time-to-spoed-dispatch dropped from 6 minutes 40 seconds, the pre-launch baseline measured over the prior six months, to 58 seconds.

Two cases were under-escalated by the model in the first three weeks. Both were caught by the rules-engine override before they reached a customer. Neither would have breached the 4-hour bound regardless, but we treat both as P1 incidents and ran a post-mortem each. The fix in both cases was the same: the KNMI temperature read had been cached longer than we thought, so an outdoor reading that should have triggered the cold-weather Spoed rule was still showing a stale 7°C from an hour earlier. We dropped the cache TTL from sixty seconds to ten and added a freshness check that the rules engine refuses to evaluate without.

The dispatcher's role has shifted, not disappeared. He now reviews the agent's classifications during the day, handles the eighteen percent of calls that need a human (mostly elderly customers who do not want to talk to a bot), and runs the spoed-monteur board. His Sundays are quieter.

The architecture on a napkin

If we had to fit it on a napkin: one webhook from the phone provider, one agent process, one Python proxy talking SOAP to Syntess, one read replica of SQL Server, one Redis queue, one SMS gateway, one KNMI feed, and one dashboard for the dispatcher. Eight moving parts. We refused to add a ninth.

When we built the chat agent for this installatiebedrijf, the thing we kept running into was the asymmetric cost of misclassification: getting Spoed wrong is unforgivable, getting Regulier wrong is a phone call. We solved it by putting the LLM on one side of a rules engine and the contract data on the other, with the engine as the final word. That is the pattern we now reach for in every AI agent we ship into an SLA-bound operation.

The smallest thing you could do today: open a spreadsheet, sit next to your dispatch desk for one Monday morning, code every call into eight columns. The triage tree will be in front of you by lunch.

Key takeaway

Hard-code the SLA lattice, let the LLM propose, let the rules engine decide. Under-escalation is the failure mode you design out before launch.

FAQ

Why not just replace Syntess Atrium and SQL Server 2014?

Because the client was quoted 180k for that and did not have it. The agent sits in front of both, reads what it needs, and leaves the systems untouched. Migration is a separate, slower decision.

What happens if Syntess is down when a customer calls?

The proxy returns a 503, the agent falls back to a degraded flow that captures the call manually, queues it as Urgent by default, and pings the dispatcher's phone. No call is ever lost to a backend outage.

Does the LLM make the priority decision?

It proposes a tier. A rules engine validates the proposal against contract data and the KNMI temperature feed. On disagreement the rules engine wins. The model is never the final word on Spoed.

How long did the build take end-to-end?

Seven weeks from kickoff to launch. Two weeks of dispatch-desk observation, three weeks of integration work on Syntess and SQL Server, two weeks of supervised rollout with a dispatcher shadowing every call.

chat agentsai agentscase studyoperationsintegrationslegacy sites

Building something?

Start a project