Chat agents

Chat agents for roofers: triaging 1,340 leaks a week

How we wired a chat agent to a Syntess ERP and a SQL Server 2014 dak-paspoort archive, so a flat-roof leak in Breda gets a roofer dispatched in 90 seconds.

Jacob Molkenboer· Founder · A Brand New Company· 20 Jun 2026· 9 min

Brass hotel call-bell on ivory paper beside cream dockets, a chartreuse index card, slate tile, red wax seal.

Eleven minutes past three on a Wednesday in October. A weather front is moving east over Brabant. The phone in the dakdekker's office in Breda starts. Over the next two hours that office will take 47 calls, 22 WhatsApps, and 11 emails. Every one of them mentions water somewhere.

This is what 1,340 leak reports a week looks like when you spread it across a month of autumn weather. The dakdekker has 21 people. Three of them are on the phone today. None of them are full-time triagers.

We built the chat agent in late 2025. It went live in February 2026. This is what it actually does.

The two-hour clock

BDA-garantie — the construction-industry warranty regime that covers most commercial flat roofs in the Netherlands — has a clause that matters here. If standing water is reported and the dakdekker takes longer than two hours to start an emergency response, the warranty can be voided. BDA Dakadvies is the certification body the policy traces back to. Two hours sounds generous. It is not. The clock starts when the customer reports. Not when the dakdekker logs the ticket. Not when the operator finds the right contract in Syntess.

So the first job of the chat agent was never "be smart". It was: shave the gap between the customer pressing send and a real human standing on a real roof.

What 1,340 looks like

I'll keep using this number. 1,340 is the rolling 7-day average of leak-related inbound messages across all channels — phone (transcribed by a separate Twilio + Whisper hook), email, WhatsApp Business, and the form on the dakdekker's website. About 70% of them are routine: a property manager noting a small drip, a homeowner whose dakkapel has a stain, a facility lead reporting that the cleaner mentioned water in the lift shaft three days ago. The other 30% are urgent, and a small fraction of those — somewhere between 4% and 9% in any given week — are the ones the warranty clock applies to.

The whole point of the agent is to find that 4–9% in under 90 seconds, regardless of which channel the message arrived on, and regardless of whether the building lives in Syntess or only in the old SQL Server archive.

The two systems behind the desk

The dakdekker runs Syntess Atrium, a Dutch construction ERP that has been their system of record since 2014. Customers, contracts, planned maintenance, invoicing — Syntess. It's solid. It's also closed-ish: there's a SOAP API and a daily ODBC export but no event stream worth pulling from in real time.

The second system is what they call the dak-paspoort archief. A SQL Server 2014 database with about 3,800 roofs in it, each one carrying a per-layer build-up, a last inspection date, and notes typed by a now-retired projectleider between 2013 and 2021. Microsoft pulled extended support for SQL Server 2014 in July 2024, which means it sits on a Windows Server in the dakdekker's storage closet, behind a Watchguard firewall, with a static IP and a backup script nobody has touched since the projectleider left. They knew they had to migrate it. They were not ready to migrate it. We worked with what was there.

The agent talks to both. Syntess via a thin proxy that wraps the SOAP endpoints and caches customer/contract lookups for 30 seconds. SQL Server via a read-only login on a VPN, joined to the proxy. Neither system gets written to by the agent. Writes are still the operator's job.

The SOAP API has its quirks. Customer lookups by KvK-nummer return cleanly; lookups by postcode return the first match, which on a multi-building postcode is almost never the right one. Contract lookups need an explicit fiscal year, even when the contract is open-ended. The nightly export job runs at 02:14 and locks two tables for about eleven minutes — invisible to the office, rough for an agent that has to answer right now. We learned each of these by writing a request, watching it return nothing useful, and emailing a Syntess support engineer at four o'clock on a Friday.

The 30 mm rule

There is one rule in the agent that decides everything else. If the customer describes a platdak and standing water exceeding roughly 30 mm, the message routes to the spoed-dakdekker queue immediately and a pager goes off.

30 mm is not arbitrary. On a bitumen flat roof with a typical Dutch winter loading, 30 mm of ponding starts to threaten the structural load assumptions in NEN 6702 / EN 1991-1-3, and it correlates strongly with seam failure when the rain keeps coming. The dakdekker's senior projectleider gave us this number on a whiteboard. We did not invent it.

The trick is that customers do not say "30 millimetres of standing water". They say een plas, best wel veel water, het staat tot aan mijn enkels. The agent has to translate that. Here is the prompt fragment that drives the classification, lightly redacted:

# water_depth_classifier.py
WATER_DEPTH_RUBRIC = """
Classify reported ponding depth on a flat roof into one of:
  - NONE       (no standing water mentioned)
  - WET        (damp, droplets, sheen)
  - PUDDLE     (visible plas, < 20 mm equivalent)
  - PONDING    (sustained pool, 20-40 mm)
  - CRITICAL   (> 40 mm OR described relative to ankle/calf)

Rules:
  * "tot mijn enkels" or "tot mijn schoenen"  -> CRITICAL
  * "plas zo groot als een tafel"              -> PONDING minimum
  * "het loopt over de dakrand"                -> CRITICAL
  * If uncertain between PUDDLE and PONDING,
    take the higher class. The cost of a false
    PONDING is a phone call. The cost of a
    false PUDDLE is a warranty claim.
"""

That last comment matters. We tuned the classifier to be biased toward the urgent side. The dakdekker would rather dispatch a roofer who turns around at the kerb than miss one. The agent runs the classifier on every inbound, even ones that look routine, because customers bury the lead: three sentences about a dakkapel, then one line about the warehouse next door where the water is up to the safety rail.

The 90-second budget

90 seconds is the time from message arrival to spoed-dakdekker queue insertion, when the queue insertion is warranted. Here is how that budget breaks down in practice:

T+00.0s  message ingested (Twilio / SMTP / WhatsApp webhook)
T+00.3s  language + channel normalised, attachments fetched
T+01.2s  LLM classification pass 1 (intent, urgency, reply channel)
T+04.8s  customer + address resolution: Syntess SOAP, fallback SQL Server
T+05.6s  if commercial flat roof: dak-paspoort lookup
T+07.1s  LLM classification pass 2 (depth rubric above, full context)
T+07.4s  decision: route to spoed queue OR routine queue
T+08.0s  spoed queue insert + pager fan-out (Pushover + group SMS)
T+08.1s  customer auto-reply sent with case number and ETA window

The bulk of the budget is the Syntess SOAP call. It is slow on a cold cache and unreliable on a busy hour. We cache aggressively — 30 seconds on customer lookups, 5 minutes on contract lookups, invalidated only by webhook hints — and we tolerate stale data for read-only paths. The 90-second number is the SLO we promise. The median in March 2026 was 11 seconds.

What pushes us past 30 seconds is almost always the same thing: a customer the agent cannot resolve. A property manager from a new account, a building not yet in Syntess, an address typed with a typo. In that case the agent escalates to a human operator with a one-line summary: "Mogelijke spoed, platdak, water 'tot mijn enkels', klant niet in systeem. Verifieer en bel terug binnen 15 min." The operator decides. The agent does not invent a customer record.

Warning

The cheapest way to blow the warranty clock is to silently fail customer resolution. Build the unknown-customer escalation path before you build the happy path. We had to learn this twice.

Two failures we learned from

Week 3, a customer reported "het lekt bij de zonnepanelen". The agent classified it as routine. It was not. The roof was flat. The water was 50 mm deep around an inverter mount. The classifier had been trained on language about water; it didn't have a rule that lek bij zonnepanelen op een platdak should auto-escalate regardless of explicit depth. We added the rule. Solar mounts on flat roofs are a known seam-failure pattern.

Week 7, an out-of-hours WhatsApp arrived in dialect — a Brabants vernacular for water depth, roughly "tot in mijn schoenen" phrased the way only someone from west of Tilburg would phrase it, that the classifier read as PUDDLE. It was CRITICAL. We did not retrain. We added a regional phrase dictionary that runs before the LLM pass, and we widened the unknown-phrase escalation default to PONDING. False positives cost a phone call. We will pay that.

What we did not do

We did not replace Syntess. The dakdekker has 12 years of process inside it and the staff know where every menu lives. We did not modernise the SQL Server. That is a separate project with its own budget and its own anxiety. We did not let the agent close tickets — only open them. We did not let the agent write anywhere except its own queue table. Every Syntess write is still a human.

This is the boring part of the work, and it is most of the work. Building the chat agent was about three weeks. Getting the operators to trust it took three months and a lot of sitting in the office in Breda.

Earning operator trust

For the first four weeks after go-live, the operators second-guessed every spoed classification the agent made. We let them. The agent emits a shadow record into a separate log on every decision: what a rule-only baseline would have classified, what a no-LLM keyword scan would have classified, and what the agent actually decided. The operators could check whether the agent was reaching the same conclusion their gut would. They argued with the shadow log out loud, in the office, with us in the room.

By week eight the shadow checks dropped off. By week twelve, two of the operators we had worked with most closely started defending agent decisions in the planning meeting against the projectleider's pushback. That was the moment we knew it had stuck. The shadow log still runs. We have not turned it off, and we do not plan to. The day someone notices it disagreeing with the live decision is the day we want to find out.

What changed

Median time-to-roofer on a spoed melding went from 38 minutes (the office's own measurement, from before the agent) to 9 minutes in the first quarter after launch. Across 17 weeks of operation, the agent has not missed a CRITICAL classification on a verified flat-roof leak. It has produced 14 false positives — calls where a dakdekker drove out and didn't need to. The office considers that a good trade.

The dakdekker also got something they didn't ask for: a clean stream of structured data about every leak report, with classification, channel, customer, building, and outcome. They had never had that. Their planner now schedules preventive inspections against actual leak density per building, not against the retired projectleider's memory. A junior projectleider, hired in 2024 with no Brabants accent and no whiteboard intuition, has started using the same data to write the maintenance-bid documents that used to take her supervisor a week.

Takeaway

The agent doesn't have to be smart. It has to be faster than the warranty clock, and it has to know when to hand the message back to a human.

The smallest thing you could do today

If you run an operations team that has any kind of clock on it — warranty, SLA, response time, regulatory — go count the median seconds between message arrives and human-with-tools dispatched. Not the time you put on the invoice. The real number. If you can't measure it, build the measurement first. The agent is the second job.

When we built this chat agent for the Breda dakdekker, the thing we kept running into was that the two-hour BDA clock had no observability — nobody on the team could tell you, before our work, how often it had been blown. We ended up wiring the agent to emit timing telemetry from the first second of the project, so the team could argue with the data instead of the gut.

Key takeaway

A chat agent on a roofing crew doesn't have to be smart — it has to beat the warranty clock and know when to hand the message back to a person.

FAQ

Why not replace Syntess Atrium with a modern ERP?

Twelve years of process and muscle memory are not a bug — they are most of the value. We wrap the legacy system, we do not rip it out. ERP migrations are a separate project with their own risk budget.

Does the chat agent close tickets on its own?

No. It only opens them and routes them. Every status change in Syntess is made by a human. The agent's job is triage and timing, not authority over the work itself.

What happens when the customer is not in the ERP?

The agent escalates within seconds to a human operator with a structured one-line summary of the suspected urgency. It never invents a customer record or guesses a contract.

Why 30 mm as the flat-roof water threshold?

It correlates with seam failure on bitumen roofs under Dutch winter loading and approaches the structural assumptions in NEN 6702 / EN 1991-1-3. The dakdekker's senior projectleider set the number.

How long did the build take?

About three weeks to build the agent. Three months to earn operator trust. Most of the real work was the second part — sitting in the office, watching them use it, and tuning the edges.

chat agentsai agentsautomationcase studyintegrationslegacy sites

Building something?

Start a project