Voice agents

Voice agents for a Dutch dental lab: Retell vs Vapi vs LiveKit

Friday, 22:13. The order-status agent for a Tilburg dental lab drops. We scored Retell, Vapi, and a self-hosted LiveKit stack on what actually breaks in production.

Jacob Molkenboer· Founder · A Brand New Company· 13 Jun 2026· 9 min

Black bakelite phone receiver off-hook on leather blotter, green ribbon on brown notebook, red wax seal on card.

Friday, 22:13. The order-status agent for a 28-person dental-lab group in Tilburg stops answering. Somewhere between the Belgian SIP provider and the agent runtime, a session ID expires and the bridge falls over. Patients who phoned to ask whether their crown was back from milling get dead air. The on-call phone that rings belongs to our studio, not theirs.

That moment is the part of the brief most voice-agent comparisons skip. Per-minute cost goes into a spreadsheet cleanly. ASR error rate on Dutch surnames is harder, but you can measure it. Who restarts the SIP trunk at 22:00 on a Friday is the variable that decides whether a voice agent saves the lab money or quietly costs more than the reception staff it was meant to spare.

Last quarter we scored three stacks against the same brief: Retell AI, Vapi, and a self-hosted LiveKit + Deepgram + Cartesia rig. Same numbers, three different shapes of risk.

The brief

28 employees across four labs in Noord-Brabant and Limburg. About 1,800 inbound calls per week. Roughly 85% are order-status checks (kroon, brug, frame, klaar of niet). The remainder are appointment shifts and the occasional escalation. Reception was losing about 22 hours per week to "is it ready" calls during peak weeks. The owner wanted those hours back, not the headcount.

Hard requirements:

Dutch conversation, with Brabants and Limburgs accents in scope.
Order lookup against their cloud lab-management system via REST.
06:30 to 22:00, Monday through Saturday.
Warm handoff to a human receptionist during opening hours.
Sub-€1,000 per month, all-in.

The accent angle matters more than it sounds. Dutch dental patients regularly have surnames like Vanderhaeghen, Mertens, van Schijndel, Vroomans. The ij digraph and the soft-g / hard-g split between north and south kill consumer-grade ASR pretrained on English call-centre audio.

Per-minute cost at 1,800 weekly calls

Average call length on the existing reception line was 88 seconds. We pulled that off the PBX log before the project started. At 1,800 calls per week that is roughly 11,500 minutes per month, seasonally adjusted.

Retell AI publishes per-minute pricing that bundles ASR, LLM, TTS, and telephony. The number lands in the low tens of dollar cents at the cheap tier and roughly three times that for the premium voices. At 11,500 minutes the platform line item alone sits somewhere between €700 and €2,800 depending on model choice. Telephony pass-through (Twilio under the hood for most regions) adds a few cents per minute on top.

Vapi looks similar in shape, priced differently. The platform layer sits around five to nine US cents per minute. The underlying STT, TTS, and LLM providers pass through at their published rates. At our call volume that totalled around €450 to €900 per month for the platform itself, plus another €300 to €700 for Deepgram + Cartesia + the model layer. The range is wide because Vapi lets you swap components, and the cheap path is genuinely cheap.

Self-hosted LiveKit + Deepgram + Cartesia is the cheapest variable cost and the most expensive fixed cost. Deepgram Nova-2 streaming Dutch sits at a few tenths of a cent per minute. Cartesia TTS is priced per character and worked out to roughly two cents per minute at our average call shape. The LLM round-trip per turn added another cent or two. Variable total: four to seven cents per minute, or €450 to €800 a month at our volume.

But the self-hosted line has a fixed cost the managed platforms hide. A small VPS to run the LiveKit agent, a Dutch SIP trunk (Voys or RoutIT, both reasonable), monitoring, log retention, and the engineering hours to keep it running. We costed engineering at one half-day per fortnight at our rate. That alone exceeded the managed platform fee.

The honest scorecard on cost: at 11,500 minutes a month, the three options end up in the same band once you include engineering. Retell is the most expensive and the easiest to budget. Vapi is the median. Self-hosted is cheapest in raw cloud spend, most expensive in attention.

Dutch-accent ASR on patient names

We ran the same 200 recorded calls through all three stacks during the trial week. The calls came from the lab's actual PBX, not a benchmark dataset. About 40% had a clear Brabants or Limburgs accent.

What we measured was not pure word error rate. We measured "did the agent retrieve the correct order using the spoken patient name". That is the only failure mode that matters for this product.

Retell on its default English-leaning ASR setting: 71% correct retrieval. It treated ij as I plus J and ran the name through phonetic guesswork.
Retell after switching to a Dutch model under the hood: 89% correct retrieval.
Vapi with Deepgram Nova-2 Dutch plus a keyword boost list (the 600 most common surnames in their order book): 93% correct retrieval.
Self-hosted LiveKit + Deepgram Nova-2 Dutch + the same keyword boost list + a second-pass surname disambiguation step against the customer database: 96% correct retrieval.

The last result is the one worth dwelling on. The win was not the stack. The win was being able to wire a domain-specific second pass into the pipeline. When ASR returned "Fanderhagen" at 0.62 confidence and the customer database had a "Vanderhaeghen" with an open order, we could fuzzy-match and confirm verbally ("ik versta Vanderhaeghen, klopt dat?"). Retell and Vapi can both do versions of this, but you are building inside their state machine. With LiveKit it is your function call.

For background on what Deepgram is actually doing under the hood for Dutch, their models and languages overview covers the trade-offs honestly. Cartesia's Dutch voices, documented at docs.cartesia.ai, were the only ones in the trial that did not sound like a generic European avatar.

Takeaway

Pick the stack that lets you bolt a custom surname-disambiguation step into the loop. That is where Dutch voice agents are won or lost.

Who restarts the SIP trunk at 22:00 on a Friday

The dimension nobody costs.

A managed platform like Retell or Vapi gives you a status page and a Slack or email channel for outages. If the SIP provider under their floor goes down, you wait. You cannot fix it. You can communicate to the client that "the platform is investigating" and that is the whole intervention available to you. That is a feature, not a bug, if you do not want to be on call.

A self-hosted stack means the pager is yours. LiveKit has been solid in our experience. Deepgram and Cartesia publish status pages and run decent uptime. But the SIP trunk is a fragile boundary, and Dutch business-grade SIP providers vary in night-time operational quality. When something at that boundary breaks at 22:13 on a Friday, the people who restart it are the people who built it.

There was a story on the Hacker News front page this week about an autonomous agent that bankrupted its operator while scanning the DN42 mesh. The dollar figure makes good copy. The lesson is duller and more general. The moment you ship an autonomous loop into production, someone is accountable for the surprises. With a managed platform, that someone is partly the platform. With a self-hosted rig, that someone is you, and only you, including at 22:13 on a Friday.

For the dental lab, the calculus came down to who was going to be unhappy at that hour. The owner did not want it to be his practice manager. We did not want it to be a junior engineer at our studio. The question reframed itself: how much operational margin are you buying with the managed platform's fee.

How we scored it

We rated the three stacks on five lines. Cost. Dutch ASR on real patient names. Latency. Operational ownership. Room to extend the agent (a future appointment flow, deeper integration with the lab management system).

Vapi won. Not by the largest margin on any single line, but by being acceptable on every line. The Deepgram Nova-2 + Cartesia path got us 93% retrieval accuracy at a cost that landed inside the budget. The platform handles the SIP trunk. We could still write custom function calls into the agent. When something broke during the trial, the Vapi side fixed it within their published response window without us paging anyone.

Retell would have worked. The premium voices are noticeably better than Cartesia for Dutch, and the platform is more opinionated, which means less rope to hang yourself with. It was the most expensive option by enough that we could not justify the delta given how small the accuracy gap was.

The self-hosted LiveKit stack would have won the accuracy line and the long-run cost line. It would have lost the "practice manager sleeps through Friday" line. For a 28-person lab whose product is teeth, not voice infrastructure, that was disqualifying.

The smallest thing you can do today

Pull two weeks of recordings off your PBX, sample 50 calls at random, and pipe them through any one of these three stacks at the free trial tier. Measure retrieval accuracy on whatever your equivalent of "the patient's surname" is, not generic WER. The number you get back will reorder your priorities.

When we built the order-status voice agent for the Tilburg dental-lab group, the hardest call was not which platform to pick. It was naming the moment we would say "no, we are not going self-hosted for this client" and meaning it. We work on AI agents for SMEs across the Netherlands and Thailand, and the answer to the build-versus-buy question almost always has more to do with who owns the pager than with which stack scores best on a benchmark.

Key takeaway

Pick the voice stack that lets you bolt on a custom surname-disambiguation step. That is where Dutch voice agents are won or lost.

FAQ

Which stack is cheapest at 1,800 calls a week?

Self-hosted LiveKit + Deepgram + Cartesia has the lowest variable cost, but once you include engineering time it lands in the same band as Vapi and Retell at this volume.

Why not just use the default English ASR with a Dutch voice?

Patient surnames break it. Default English ASR returned the correct order on only 71% of trial calls. Switching to a Dutch model raised that to 89% before any keyword boosts.

Does Vapi support Dutch SIP providers like Voys or RoutIT?

Vapi proxies telephony through Twilio for most regions, so you do not pick the SIP provider directly. For a Dutch number with Dutch routing, confirm the path with their support before signing off.

When is the self-hosted LiveKit stack actually the right choice?

When you have an in-house engineer who already owns a pager, when accuracy on niche vocabulary is mission-critical, or when call volume is high enough that fixed engineering cost amortises cleanly.

voice agentsai agentscase studyarchitectureintegrationsoperations

Building something?

Start a project