Voice agents

Voice agent for HVAC dispatch: 1,420 Dutch calls a week

It is Tuesday 06:42 in Eindhoven, the first frost of November, and 38 boilers in the region have already failed. The cooperative's three planners are not on the phone.

Jacob Molkenboer· Founder · A Brand New Company· 15 Jun 2026· 8 min

Vintage black bakelite phone receiver off cradle on ivory leather blotter, chartreuse ribbon, wax-sealed card, brass bell.

It is Tuesday 06:42 in Eindhoven, the first frost of November, and 38 boilers in the region have already failed. Six of those calls came in within ninety seconds of each other. The cooperative has three planners on duty and a queue depth of forty-one. None of the planners are on the phone.

That last sentence is the point of this post. Two years ago they would have been on the phone, taking the same address eight times in a row, looking up the customer in a 14-year-old AFAS InSite portal that still uses framesets, and writing down the model number of the CV-ketel on a sticky note. Today a Dutch voice agent does the first ninety seconds of that work, and a planner only picks up the receiver when a CO sensor is mentioned or when the customer says the word "noodgeval".

This is the case study. We will not call it a transformation.

The cooperative

The customer is a 29-person installer cooperative in the Eindhoven region. Twenty-two monteurs in vans, three planners, two back-office staff, the founder, and a part-time bookkeeper. They share a brand, a phone number, a planning system, and a spare-parts contract. Each installer keeps their own customer book, which matters later.

They handle around 1,420 storingsmelding calls a week in the winter quarter, and roughly 600 in summer. Of those, about 4% involve a carbon monoxide reading the customer can see on a sensor or smell. Every single one of those 4% must reach a human in under one ring. Everything else can wait between five and forty-five minutes, depending on urgentie code.

What was there before

Two phone lines. AFAS InSite 9 (released 2011, never upgraded, still running because the in-house planner module depends on a 2012 ActiveX control). Outlook. A whiteboard with magnets. A WhatsApp group called "Dispatch (lees mee)".

When all three planners were on calls, line three rolled over to a generic voicemail at KPN that nobody listened to until lunch. We measured that in the first week of discovery: 71 missed calls across a single Monday in February. About a third of those people called a competitor.

The first design we threw away

Our first instinct was to put a chatbot in front of the website. Wrong instinct. The cooperative's customers are 56% over the age of 55, two thirds are calling about a heating system that has just stopped working in winter, and a quarter are calling from a flip phone in a garage. They want a human voice on the line, fast, in Brabants-accented Dutch, and they want to know the monteur is on the way before they hang up.

A voice agent was the only acceptable surface. The question was what it could safely do and what it absolutely must not.

Voice agents fail badly when they fail. A misheard postcode reads as competence. A misheard "ik ruik gas" reads as negligence. Pick the topics where the cost of being wrong is recoverable, and route the rest to a human on the first sentence.

The agent's actual scope

We gave the agent exactly five jobs, and no others. In production they are visible to the planner as five status lights on the dispatch screen.

Recognise the customer from their phone number, address, or boiler serial. If it cannot do all three with confidence above 0.92, hand off.
Classify the call into one of four urgentie codes (U0 through U3) using a rubric the cooperative already used on paper.
Slot a U2 ticket into the geographically nearest monteur's open route, respecting the lunch block and the parts-pickup window.
Read back the appointment window in Dutch, then SMS a self-service rescheduling link.
Detect any mention of CO, gas smell, smoke, or the word "noodgeval", and bridge the call to a senior planner inside one ring. Never close such a call itself.

It does not take payments. It does not change addresses. It does not handle complaints. It does not promise an arrival time more precise than a forty-five-minute window. It does not speak English unless the customer speaks English first.

Sitting on top of a 14-year-old AFAS portal

The hardest part of this project was not the speech recognition. It was the AFAS InSite 9 portal.

AFAS InSite 9 was last updated by the cooperative in 2014. It is the planner's source of truth for customer records, boiler serials, contract status, and historic monteur visits. The vendor's REST surface did not yet exist for that version. The portal serves HTML inside a frameset, and the in-house planner module is a thick ActiveX control nobody on the current team can recompile.

We did not migrate it. We do not migrate things that work, on a tight timeline, when the people who use them are calm under load.

What we did instead was build a thin read-only adapter that scrapes the portal over an authenticated session, caches the customer record for ninety seconds, and exposes one HTTP endpoint to the voice agent. Writes happen the old way: the planner enters the appointment into AFAS by hand when they confirm it, the same way they always have. The agent's slot proposal is a suggestion, not a write.

async def lookup_customer(phone: str) -> Customer | None:
    cached = await cache.get(phone)
    if cached and cached.age_seconds < 90:
        return cached.value

    session = await afas_session.ensure()
    html = await session.get(
        "/InSite/Klanten/Zoek",
        params={"telefoon": phone},
        timeout=4.0,
    )
    record = parse_klant_frame(html)
    if record is None:
        return None

    await cache.set(phone, record, ttl=90)
    return record

Four seconds is the hard timeout. If AFAS does not answer in four, the agent says "een moment, ik verbind je door" and bridges to a planner. The planner already has the customer on screen because their own InSite session is open. The customer hears one warm word of human voice instead of a 9-second silence.

Takeaway

The voice agent does not replace the legacy portal. It puts a four-second budget in front of it, and treats every miss as a routing event, not a failure.

The 75-second slotting loop

A U2 ticket is the everyday case: the boiler has stopped, the house is cold, the customer wants someone today. The agent has 75 seconds from "hallo, mijn ketel doet het niet" to "een monteur is bij u tussen kwart over twee en drie uur" before the customer starts to feel managed-by-machine.

Inside those 75 seconds the agent has to confirm the postcode and house number, look up the customer in AFAS (the four-second budget above), ask three clarifying questions from the urgentie rubric, pull the live van positions and route plans for the three monteurs in the nearest cluster, propose the cheapest insertion in the route that respects lunch, parts pickup, and the customer's stated availability, then read back the window, confirm, and trigger the SMS.

The slotting itself is a constrained insertion problem: given a route of stops with hard time windows, insert a new stop minimising the total added drive time. The math is undergraduate (see the standard vehicle routing problem if you want the textbook version). The art is the constraints nobody writes down. Tim does not eat between 12:00 and 12:30. Sander will not take a job in the postcode 5611 cluster on a Wednesday because his daughter goes to crèche there and he stops by. The monteur on the Heeze route always picks up parts at 14:00 and the wholesaler closes at 17:00.

We wrote those down. They live in a YAML file the planners can edit themselves. We do not let the agent learn them. Soft preferences turn into hard accidents when nobody can read why a decision was made.

What the agent does not optimise

It does not minimise the cooperative's total drive time. It optimises for one ticket at a time, against the routes as they exist now, with the monteurs as they are now. The cooperative tried a global optimiser in 2019 from a different vendor and the monteurs revolted within a week, because the algorithm sent Tim across the city to save Sander four minutes. We learned from that.

The CO-alarm path

There is exactly one part of this system we worry about every week. A customer who mentions carbon monoxide, gas smell, or "noodgeval" must reach a senior planner on the first ring. The voice agent must not try to help them.

The detection is a layered cascade. First, a small classifier listens for a dictionary of trigger phrases in Dutch, including dialect variants ("ik rook gas", "het stinkt naar gas", "de melder gaat af", "rode lamp op de melder"). Second, the language model checks intent on every utterance against a single instruction: if there is any chance of a gas or CO emergency, bridge immediately and say so out loud. Third, on bridge failure the call falls through to the cooperative's old line three, which still rings the planning room. Three independent layers. We assume each has a 1% miss rate, which gives us roughly a 1-in-a-million combined miss.

We test the cascade weekly with a recorded set of 40 trigger calls, half in Standard Dutch, half in Brabants accent. If pass rate drops below 100% the agent is paused automatically and the line goes to humans. It has paused four times in eight months, always after a model update, always recovered the same day.

If your voice agent has any chance of being the first to hear "ik ruik gas", you owe it three independent escape hatches and a weekly automated test. One layer is not enough.

Numbers after eight months

We measured continuously from the day the agent went live. The numbers below are the cooperative's, not ours, and they are running averages over the last twelve weeks.

1,420 storingsmelding calls per week handled end-to-end without a planner.
Median time from call start to confirmed appointment: 71 seconds.
P95 time: 96 seconds.
U2 tickets correctly routed to the nearest monteur on the first proposal: 94%.
CO and gas trigger calls bridged to a senior planner inside one ring: 100% (out of 218 verified triggers).
False positives on the CO trigger: about 9 per week. Planners prefer this side of the trade.
Calls that hand off to a human for any reason: 17%.
Customer satisfaction (planner-rated, post-visit): unchanged. We expected a dip and it did not happen.

The number we did not expect: planner overtime is down 38%. Not because the agent took planning work away from them. Because the agent took the first ninety seconds off every call, and ninety seconds times 1,420 calls is 36 hours per week.

What we will not claim

We will not claim this was a hard technical project. The voice stack is the standard one (a Dutch ASR, a small classifier in front of a language model, a TTS that sounds like an actual person from Eindhoven and not a Belgian newsreader). The slotting is undergraduate algorithms. The AFAS adapter is a tired HTML scraper with a session cookie and a cache.

What was hard was sitting in the planning room for three weeks, watching what Tim, Sander, and the planners actually do, writing down the unwritten rules, and convincing the cooperative that the right move was to put a four-second budget in front of their 14-year-old portal rather than rebuild it.

When we built the voice agent for this cooperative, the thing we kept running into was the gap between what the AFAS portal could safely answer and what the customer needed to hear in the first ten seconds. We solved it by treating the portal as a slow oracle, the agent as a fast greeter, and the planner as the only writer. If you are looking at similar work, the same shape probably fits: read the legacy, write nothing, hand off fast. That is the bulk of our AI agents work for installers and field-service teams.

The smallest thing you could do today: open your call recordings from last Tuesday morning, count the calls that came in during the first 30 minutes, and ask what would have changed if a human voice had answered each one inside one ring.

Key takeaway

Put a four-second budget in front of your legacy portal, treat every miss as a routing event, and never let a voice agent close a CO-alarm call.

FAQ

Why didn't you migrate the AFAS InSite portal off version 9 first?

Because it works, the planners trust it, and the project would have stretched from eight weeks to eight months. A read-only adapter with a four-second timeout was cheaper and safer than a rebuild.

What happens when the voice agent mishears a postcode?

It reads back what it heard before booking, and only proceeds if the customer confirms. If confirmation fails twice, the call hands off to a planner. Wrong slots are rare enough that the cooperative does not separately track them.

Could a chatbot have done this instead?

No. Two thirds of these calls come from people with a cold house and a phone in hand. They want a voice, not a typing form. The voice channel is the product.

How do you train and test the CO-alarm trigger?

We keep a sealed test set of 40 Dutch recordings, half Standard Dutch and half Brabants accent. The cascade runs against it weekly. If pass rate drops below 100% the agent is paused automatically and the line falls back to humans.

What stack are you running on?

A Dutch ASR for transcription, a small intent classifier in front of a language model for dialogue, a Dutch TTS for output, and a Python service that talks to AFAS InSite and the routing engine. The hard work is in the rubrics and the YAML of operator preferences, not the model layer.

voice agentsai agentsautomationcase studyintegrationslegacy sites

Building something?

Start a project