Voice agents
Voice agents in three languages: a shift-cover case study
The first call came in at 4:47 on a Saturday. Six months ago it would have woken the dispatcher. This Saturday it woke no one, and by 4:53 the cover was confirmed.

The first call came in at 4:47 on a Saturday. A foreman at a meat-packing plant in Sint-Niklaas needed two cleaners on the kill floor by 7am, because one of the night crew had slipped on a drain cover and the other had refused to keep working solo. Six months ago that call would have woken Hilde, the dispatcher at a 19-person industrial-cleaning operator in Mechelen, who would then have phoned a Google Sheet of mobile numbers until she landed two yes-es. This Saturday the call woke no one. By 4:53 the voice agent had reached a Polish cleaner in Antwerpen-Noord, confirmed she could leave in fifteen minutes, slotted her into the roster, and sent the client a confirmation SMS in Dutch.
That voice agent now handles around 230 inbound shift-cover calls a week.
The shape of the problem
The operator runs nights across the Antwerp-Mechelen-Brussels corridor: food production, logistics warehouses, and a couple of hospitals. Most of the staff are Polish or Romanian, with a Flemish-Dutch core team and a smaller French-speaking group from the Brussels side. The clients call in three languages too. Roughly one in four scheduled shifts gets a last-minute change. A client wants two extra people, a cleaner is sick, a flight is delayed, a child is ill.
Hilde used to handle most of those changes off-hours on her personal mobile. Last September she told the owner she would quit by Christmas if nothing changed. That is the only sentence in this story that mattered to the decision to build anything at all.
What we actually built
Three components, none of them clever on their own.
A toll-free Belgian number on Twilio that routes inbound calls into a voice agent with three language profiles. Language detection happens on the first two seconds of audio using Deepgram's multilingual STT, biased toward Polish and Belgian Dutch with about forty minutes of in-house calls. The generic Dutch models were not great with Flemish phone audio. A small bias dataset closed the gap.
A reasoning loop that reads from the live roster in Shiftbase. It knows who is on shift right now, who is off rotation, who has done that client before, and who is within forty minutes' travel of the site. The list of eligible candidates becomes a callable function. The agent picks one, dials them out, asks in their preferred language whether they can take the shift, and either confirms or moves to the next candidate.
A write-back into Shiftbase via REST. Confirmed cover goes into the roster as a published shift. The original cleaner, if known to be sick, gets marked absent with the reason logged. The client gets a confirmation SMS via Twilio with the cleaner's first name and ETA.
There is no model trained on the operator's data. There is no "AI strategy". There is a voice agent that does the boring thing well.
Why three languages, not one
Dutch alone would have covered maybe 55% of inbound calls. French covers another 20%. Polish covers the rest. Polish is where most of the dispatcher's frustration lived, because she does not speak it. The cleaners speak enough Dutch to do the work, but a 5am call in their second language, half-asleep, is the worst possible UX. Calling them in Polish more than doubled the yes-rate on first contact.
We use ElevenLabs for TTS because the Polish voices were the only ones we tested that did not sound like a robot reading a phonics chart. Polish stress is unforgiving. Put it on the wrong syllable and the cleaner thinks it is a scam call and hangs up.
The Shiftbase integration that took the longest
The voice part went together in two weeks. The integration took five.
Shiftbase has a clean API and decent docs. The problem is not the API. The problem is that real rosters in real companies are full of unwritten rules. Marek will not work with Andrzej because of an old argument. The Sint-Niklaas plant requires a HACCP certificate that only six of the team have. One client refuses agency staff and only wants people on the payroll for at least six months. None of this lives in Shiftbase. It lives in Hilde's head.
We spent four evening sessions with her, a whiteboard, and a list of historical edge cases. The output was a callable filter, roughly this:
def eligible_for_shift(shift, candidates):
out = []
for c in candidates:
if shift.client.requires_haccp and not c.has_haccp:
continue
if shift.client.no_agency and c.employment_type == "agency":
continue
min_tenure = shift.client.min_tenure_months or 0
if c.tenure_months < min_tenure:
continue
if any(b in shift.crew_ids for b in c.cannot_work_with):
continue
if c.travel_minutes_to(shift.site) > 40:
continue
if c.is_on_shift_within(hours=8):
continue
out.append(c)
return sorted(
out,
key=lambda c: (
-c.shifts_at_client.get(shift.client.id, 0),
c.travel_minutes_to(shift.site),
),
)
That is a dozen lines of Python. It is also the part that makes the difference between the agent calling the right person first and calling the wrong person three times. Most of the time we charge for in voice-agent work goes into writing the equivalent of those twelve lines for a business that has never written them down.
Hilde did not get fired
This is the part of every voice-agent case study that gets skipped. There was a real risk, midway through the build, that the owner would look at the demo and conclude he could let Hilde go.
He did not. We had a long conversation with him before the project started about which decisions the agent could not make. The agent cannot reason about why a particular client keeps changing the brief. It cannot tell that Marek's wife sounded off on the last call. It cannot decide whether to push back when a client tries to add a fourth daily shift at no extra cost. Those are the calls Hilde now makes, with her full attention, during normal hours.
Her after-hours work dropped from roughly 22 hours a week to about 4. Her sleep recovered. She runs the client-relationship side of dispatch now, and she trains new hires on the agent's quirks. The honest version is that the agent did not replace Hilde. It moved her off the part of the job that was eating her weekends.
Numbers after eight weeks
The agent went live on March 10. As of last week:
- 230 inbound shift-cover calls handled per week, average.
- 89% of cover slots filled without dispatcher intervention.
- Median time from inbound call to confirmed replacement: 4 minutes 12 seconds.
- All-in cost per handled call (telco, STT, model, TTS, Twilio numbers): 38 cents.
- After-hours dispatcher workload: down roughly 80%.
The 11% of calls that still need a human are mostly the ones with a client edge case the agent does not understand yet. We add those to the rule set on a weekly review. The rule set is getting longer slowly, not exploding.
The hard part of a voice agent is not the voice. It is the dozen unwritten rules in the dispatcher's head, and the honest conversation about which decisions the human keeps.
What broke, and how we noticed
The first week, the agent confidently called a cleaner who had been off sick for three weeks, because Shiftbase only marks the current scheduled period as absent and we were reading from a cache. He picked up, said no, and the agent moved on. Hilde was furious that the cache existed at all. We removed it.
The second issue was subtler. The Polish TTS pronounced one of the client site names ("Stora Enso") as if it were Polish. The cleaner thought she was being sent to a different facility. We added a small pronunciation dictionary of client names to read in English regardless of language profile.
The third was a question of liability. If the agent tells a client something untrue (the wrong cleaner's name, an ETA that nobody actually committed to), the operator carries that, not the vendor of the model. We hardened the confirmation step so that any commitment to a client (cleaner's name, ETA, scope of work) is read back from the Shiftbase record after write, never from the conversation buffer. The agent will not promise something the system has not recorded.
The 38-cent question
At 38 cents a call, the unit economics are unrecognisable from what they were two years ago. A 2024 build of the same agent, with generic Dutch TTS and a hosted reasoning model, would have run closer to €1.80 per call all-in. Most of the saving is on TTS pricing and model pricing. A smaller part is improved European telephony routing.
Napkin math: 230 calls a week, 52 weeks, 38 cents. That is about €4,700 a year in operating cost. Hilde's after-hours pay alone was multiple times that. The build, including the four whiteboard sessions and the language tuning, paid back inside the first quarter.
The point is not the money. It is that voice agents have moved from "interesting if you have scale" to "obvious if you have a dispatcher". A 19-person operator can afford one now, and the difference between a good one and a useless one is not the speech stack. It is whether someone bothered to write down the rules that nobody had bothered to write down before.
When we built this voice agent in Mechelen, the thing we kept running into was the gap between what the dispatcher said the rules were and what the rules actually were once a shift went sideways at 4am. We solved it by sitting with her on three Wednesday evenings, recording her thinking out loud as she walked through historical cases, and turning that transcript into the filter shown above. That kind of work is the half of AI agents that doesn't get talked about, and it is most of the job.
If you run a dispatch operation, the smallest thing you can do this week is open last week's after-hours call log and count how many of those calls were a yes-or-no question for the right person, asked in the right language. That number is your shift-cover lift.
Key takeaway
Voice agents are no longer about voice. The hard part is encoding the unwritten rules in the dispatcher's head into a filter the agent can call.
FAQ
Why use three language profiles instead of one multilingual prompt?
A single multilingual setup tends to land in the wrong language about 4% of the time at 5am over phone audio. Separate profiles with hard language detection on the first two seconds were more reliable for us.
What happens when the voice agent runs out of eligible candidates?
It escalates to the dispatcher with a summary of who was tried, what they said, and which constraints removed the rest. The dispatcher decides whether to bend a rule or call the client back.
How do you stop the agent from over-promising to clients?
Any commitment (name, ETA, scope) is read back from the Shiftbase record after write, never from the conversation. If the system has not stored it, the agent will not say it on the call.
Can a voice agent like this work without a rostering tool like Shiftbase?
Yes, but most of the build cost moves into wherever the roster does live. A spreadsheet works if it has clean IDs and history. A whiteboard in the office does not.