Voice agents
Voice agent telephony: Twilio, Vapi, or LiveKit for Dutch
A 26-person Den Bosch dental group needed Dutch voice rebooking on 2,100 weekly calls. We tested Twilio, Vapi, and self-hosted LiveKit. Here is what broke.

Tuesday morning, 09:14. The reception of a 26-person dental group in Den Bosch has three patients on hold and a fourth ringing through. Two of the three calls are some variant of "ik wil mijn afspraak verzetten". The receptionist has a printed weekoverzicht in one hand and a half-frozen practice management system in the other. She has done this rebooking dance roughly four hundred times this month. The clinic owner asked us what it would take to put a voice agent in front of the calls that clearly only want to move an existing appointment, and leave reception for the rest.
The clinic runs on 088-numbers, books appointments through a Dutch dental PMS over an undocumented REST surface, and handles around 2,100 inbound calls a week across two locations. We shortlisted three telephony stacks for the voice layer and ran each against a slice of real traffic for two weeks during the evenings when the practice was closed. The scoring categories were per-minute cost at that volume, barge-in latency on a noisy reception line, and who actually owns the SIP trunk when KPN does one of its scheduled reroutes.
The three options on the table
Twilio Voice. Inbound 088 calls forwarded over SIP from a Dutch wholesale carrier to Twilio, Twilio Media Streams piping raw audio to our orchestrator, Deepgram for Dutch STT, Cartesia for Dutch TTS, GPT-4o-mini for intent parsing, our own conversation state machine on a small VPS.
Vapi. The all-in-one. Vapi runs the telephony, the STT, the TTS, and the turn-taking, exposes a function-calling surface for the PMS adapter, and gives you a hosted dashboard. We brought our own number via SIP and pointed the rebooking logic at the same PMS adapter.
Self-hosted LiveKit Agents. The livekit-agents Python framework on a single VM in eu-west, a RoutIT SIP trunk pointed at LiveKit's SIP ingress, Deepgram nova-2 for Dutch STT, Cartesia for TTS, GPT-4o-mini for intent, the same PMS adapter. Two engineers on the on-call rota.
How we ran the bake-off
The practice agreed to a two-week evening pilot. Calls outside business hours hit an IVR with a single prompt: "We zijn gesloten. Druk 1 voor een terugbelafspraak of blijf aan de lijn voor een proef van onze nieuwe afsprakenassistent." Patients who stayed on the line were routed to one of the three stacks on rotation, weighted to send the highest-risk calls (older callers, returning patients with complex history) to the strongest candidate as the pilot progressed.
We chose evenings over daytime hours for one reason: a controlled failure mode. A patient calling at 19:30 to rebook a Thursday slot has time to call back if the agent fumbles. A patient calling at 08:45 to ask about a 09:00 appointment does not. The agent had to earn its way into the daytime queue only after the evening numbers held up across both clinics, and even then the first daytime week ran shadow-mode behind a human first.
We logged every call with a 16 kHz mono recording (consent prompt at the start), a transcript pair, VAD events, and the eventual PMS write or transfer. Three metrics mattered: call-completion rate (did the patient actually get rebooked), barge-in latency at the 90th percentile, and reception-handoff rate (how often we punted to a human). Total pilot volume: 612 calls across the two weeks, distributed roughly evenly across the three stacks. Per-minute cost projections used the full daytime volume the production stack would actually carry.
Per-minute cost at 2,100 weekly calls
Average call length on rebookings is shorter than people think: 1m 38s once you strip out failed pickups and same-day cancels. That gives the clinic about 3,400 voice-minutes a week, call it 14,500 a month. At that volume the per-minute cost has real bite.
Rough monthly cost in mid-2026 at vendor list prices, EUR-converted, all-in (telephony, STT, TTS, intent LLM):
- Twilio path: around €0.09 per minute, about €1,300 a month. Most of that is Twilio voice plus the Deepgram and Cartesia pair, since we kept the SIP trunk with a Dutch carrier and only used Twilio for media.
- Vapi: around €0.14 per minute at the volume tier we qualified for, about €2,030 a month before any minute-bundle discount we could not negotiate.
- Self-hosted LiveKit: around €0.06 per minute plus a fixed €80 for the VM and €30 for the SIP trunk subscription, about €980 a month.
What the table does not show: the cost of operational ownership. Self-hosted LiveKit needed about four engineering hours a week for the first month for tuning, monitoring, and a couple of upstream Deepgram blips. That settles to under an hour a week after week six. At European blended rates, factor in another €200 to €400 a month for the first quarter, then call it noise.
The cost gap is not philosophical. At this volume, self-hosted lands roughly 25% under Twilio and 50% under Vapi. Below about 4,000 minutes a month the gap narrows enough that the operational overhead of LiveKit stops paying for itself.
Barge-in latency on a noisy reception line
Barge-in is the moment a patient interrupts the agent mid-sentence. The agent has to stop talking inside roughly 300 milliseconds or the call starts to feel like arguing with a kiosk. Dental receptions are noisy: kids in the waiting area, an autoclave running its cycle, the espresso machine. Background noise pushes any VAD threshold harder.
Twilio Media Streams delivers 20 ms audio frames, but the bridge itself adds roughly 180 ms one-way between Frankfurt and Amsterdam. Barge-in below 500 ms total is possible, but it takes work: running your own VAD on raw RTP and using the Media Streams control channel to clear the TTS playback the moment the patient starts talking.
Vapi exposes a sensible turn-taking config with a configurable interruption threshold. We measured 540 to 720 ms barge-in on quiet evenings, drifting to 1.0 to 1.3 seconds when reception noise spiked. The VAD tunes well in a controlled call, less well on a busy front office.
LiveKit Agents with Silero VAD gave us a median 280 ms interrupt time on quiet calls and 480 ms on noisy ones. We had to drop the Silero min_speech_duration to 150 ms to stop the agent from talking over short Dutch responses like "ja", "nee", and "oké".
Noise gates matter more than vendor choice. A €40 noise-cancelling headset at reception did more for barge-in stability than any platform tweak we tried.
Who owns the SIP trunk when KPN reroutes a 088 number
This is the part nobody discusses until the day KPN does maintenance and the voice agent goes silent for forty minutes.
Dutch 088-numbers are non-geographic business numbers regulated by ACM and routable across carriers. The number sits in the ACM nummerregister. The physical signalling runs through whichever wholesale carrier you contracted with, and KPN sits in the middle of most of those paths because it owns large pieces of the national PSTN backbone.
When KPN reroutes a 088 range for scheduled work (and they do this several times a year), the carriers peered with them handle the failover. RoutIT, Voiceworks/Destiny, and Belcentrale all have direct KPN peering and fail over within seconds. Twilio's SIP interconnect for Dutch numbers does not have the same peering pedigree. If your 088 SIP trunk lives in Twilio's Dublin or Frankfurt POP, you sit one hop further away from the reroute event and the SIP REGISTER can lag.
During the pilot we caught one of those reroute events: a maintenance-driven path switch on a weekday morning that briefly affected the 088-range carrying the dental group. The Vapi-backed leg started returning SIP 503 within ninety seconds and stayed degraded for about forty minutes. The RoutIT-backed LiveKit leg kept ringing through the entire window because the failover happened inside RoutIT's own peering relationship with KPN. In that forty-minute slice, LiveKit handled twenty-three calls. Vapi handled zero.
For Dutch 088 voice agents, keep the SIP trunk with a KPN-peered wholesaler. Treat the AI vendor (Twilio, Vapi, LiveKit) as the media endpoint, never the carrier.
The stack we shipped
LiveKit Agents on a single 4-core VM in eu-west, Deepgram nova-2 Dutch model, Cartesia Sonic Dutch female voice, GPT-4o-mini for intent and slot-filling, a small Python REST adapter to the dental PMS, RoutIT SIP trunk. The agent answers, confirms the patient wants to move an existing appointment, looks them up by caller ID, reads back the current slot, offers the next three same-weekday slots, and writes the change back. Anything off-script transfers to reception with the captured context in a single line ("Patient Hassink wil afspraak van donderdag 18 juni 14:30 verplaatsen, voorkeur begin volgende week").
Voice assistant config, Python:
# agent.py - barge-in tuned for a Dutch dental reception
from livekit.agents import VoiceAssistant
from livekit.plugins import deepgram, cartesia, silero, openai
assistant = VoiceAssistant(
vad=silero.VAD.load(
min_speech_duration=0.15, # catch short Dutch "ja"/"nee"
min_silence_duration=0.35,
),
stt=deepgram.STT(language="nl", model="nova-2-general"),
llm=openai.LLM(model="gpt-4o-mini", temperature=0.1),
tts=cartesia.TTS(voice="nl-NL-female-1", language="nl", speed=0.92),
interrupt_speech_duration=0.2,
interrupt_min_words=1,
chat_ctx=load_system_prompt("rebook_dental_nl.md"),
)
SIP trunk config pointing RoutIT inbound at the LiveKit gateway:
# livekit-sip-trunk.yaml
trunks:
- name: routit-088-inbound
kind: inbound
numbers:
- "+31889999999"
auth_username: dental-rebook-eu1
allowed_addresses:
- sip.routit.net
- sip2.routit.net
media_encryption: prefer
What we got wrong
Three things, all fixable.
We started with Cartesia's default Dutch voice at the default speaking rate. It sounded too fast for the older patient demographic of this practice. We slowed the rate to 92% and the call-completion rate jumped from 71% to 84% in week two. None of our latency tuning would have made the same difference.
We underestimated how much the system prompt mattered relative to the latency tuning. The first prompt told the agent to confirm everything twice for safety. Older patients started repeating themselves, the agent re-confirmed, and a 90-second call ballooned to three minutes. Tightening the prompt to a single read-back at the end of the booking cut the average call by 47 seconds with no measurable drop in accuracy. The takeaway: optimise the script before you optimise the milliseconds.
We initially used Vapi's BYO-number setup for the pilot to save integration time. The reroute incident above is what tipped the decision toward fully self-hosting the media path. The lesson was not that Vapi is bad. It was that any vendor whose default trunk lives outside the Netherlands inherits a peering risk you cannot tune your way out of.
When self-hosted wins and when it does not
Self-hosting LiveKit makes sense when you have predictable volume above roughly 10,000 minutes a month, you can keep an engineer on call, and you operate in a language and number range where carrier ownership matters. For a small practice doing 500 minutes a week, Vapi or Twilio with a Dutch BYO trunk is the right call. The math only flips past a certain volume, and the operational overhead is real.
When we built the rebooking voice agent for this Den Bosch dental group, the thing that surprised us most was how much carrier-side peering mattered relative to the AI stack on top. If you are weighing the same trade-off for a Dutch SME, our work on voice AI agents defaults to the self-hosted media path for any client above 10,000 minutes a month.
The five-minute audit you can do today: open your last phone bill, find the carrier behind your 088 number, and check whether they support SIP trunking direct to a third-party media endpoint. If they do, you can swap the AI stack on top at any time. If they do not, that is the conversation to start before the voice agent project, not after.
Key takeaway
For a Dutch voice agent above 10,000 minutes a month, keep the SIP trunk with a KPN-peered wholesaler and treat the AI vendor as the media endpoint.
FAQ
Can Twilio handle Dutch 088 numbers natively?
Not as a primary carrier. You need a Dutch wholesale carrier (RoutIT, Voiceworks/Destiny, Belcentrale) to hold the trunk and forward over SIP. Twilio sits as the media endpoint.
What barge-in latency can LiveKit hit on Dutch calls?
Median 280 ms on a quiet line with Silero VAD set to 150 ms minimum speech duration. Expect 450 to 550 ms on a noisy reception line.
Does Vapi work in Dutch out of the box?
Yes through its bundled Deepgram and Cartesia config, but the default speaking rate is too fast for older Dutch demographics. Slow it 5 to 10% for clinics and care settings.
What does a self-hosted voice agent cost at 14,500 minutes a month?
Around €980 a month all-in at mid-2026 list prices: roughly €0.06 per minute media plus a fixed VM and SIP trunk subscription.
Why does KPN peering matter for a voice agent?
KPN reroutes 088 ranges several times a year. Carriers peered directly with KPN fail over in seconds. Trunks hosted outside the Netherlands can lose calls for the duration of the maintenance window.