AI agents

Voice agent telephony: Twilio vs Vapi vs LiveKit, ranked

A planning coordinator's phone rings 1,640 times a week. Care shifts swap, addresses change, codes get re-issued. The agent answering that line lives or dies by its telephony stack.

Jacob Molkenboer· Founder · A Brand New Company· 22 Mar 2026· 8 min

Brass switchboard with three braided cords, green tag on middle cord, ink ranking card and call bell on ivory desk.

The planning office sits above a fysiotherapie practice on the Westzijde. Two screens, three coffee cups, one rooster for 23 verzorgenden who together cover roughly 1,640 client-side shift changes a week. Until last autumn, every one of those changes ended in a phone call. Now most of them end in a phone call answered by a voice agent.

The agent works. The question we got asked, and the one this post answers, is which telephony layer to put underneath it. We benchmarked three: Twilio Voice paired with ElevenLabs, Vapi end-to-end, and a hand-rolled pipeline of LiveKit + Deepgram + Cartesia hanging off a SIP trunk. The scoring sheet had three columns: per-gesprek cost at production volume, AVG-defensibility under the Wkkgz retention rules, and the answer to one specific question — who picks up the pager when KPN ZBC re-routes the 075-number on a vrijdagavond.

The Zaandam baseline

1,640 weekly shift mutations, mean call duration measured over six weeks of pilot data: 84 seconds. Roughly 91% inbound from verzorgenden calling in a "ik kan niet" or "ik sta vast in de file vanuit Purmerend"; the rest outbound from the agent confirming a coverage swap. Roughly half of calls trigger a function call into Nedap ONS for the actual roster mutation; the other half answer a "wanneer is mevrouw De Wit ingepland" without writing anything.

Three constraints we did not get to negotiate:

The conversational audio touches health data, because care workers reflexively mention client names and conditions. That puts the whole pipeline inside the AVG perimeter and under the Wkkgz dossier-retention regime.
The number people call is an existing 075-line ported through KPN ZBC. Nobody is asking the verzorgenden to learn a new number.
Latency budget is 600 ms turn-around, measured ear-to-ear, because anything slower and the verzorgende starts repeating herself.

Twilio Voice + ElevenLabs

Twilio is the boring choice and that is, mostly, a compliment. You point your 075-trunk at a Twilio Programmable Voice endpoint, open a Media Streams websocket, and pipe the inbound audio frames to your own orchestrator: Deepgram or Whisper for STT, GPT-4.1 or Claude for the planner, ElevenLabs for the response. Twilio handles DTMF, jitter buffer, codec negotiation with the carrier, and the SIP-side of nummerportering.

from twilio.twiml.voice_response import VoiceResponse, Connect

response = VoiceResponse()
connect = Connect()
connect.stream(url="wss://agent.thuiszorg.nl/twilio")
response.append(connect)
# Twilio posts the TwiML; our orchestrator owns the websocket

Cost shape: Twilio inbound on a Dutch geographic number is currently around $0.0085/min for the call leg and $0.004/min for Media Streams. ElevenLabs Turbo v2.5 in Dutch lands at roughly $0.18 per 1,000 characters, which at our average 230 chars/turn comes to about $0.05 per call. Deepgram Nova-3 (Dutch is supported) adds $0.0043/min. Round-trip per 84-second call: roughly $0.09 in infra, before LLM tokens.

What you give up: orchestration. Every barge-in, every interrupt, every "sorry, ik viel weg" handoff is yours to wire up. Twilio gives you frames; the conversation model is your problem.

Vapi

Vapi is the opposite trade. You configure an assistant in their dashboard, point a SIP trunk at their endpoint, and they own the STT-to-LLM-to-TTS loop, barge-in, function calling, and call recording. You can BYO model (we used Claude) and BYO voice (ElevenLabs again). They charge ~$0.05/min for orchestration on top of the pass-through provider cost.

At 84 seconds and Dutch TTS, our per-call infra cost on Vapi landed around $0.11 — slightly higher than the Twilio hand-roll because of the orchestration fee, but with maybe three days of implementation work instead of three weeks. The catch is downstream of the price sheet: Vapi's data plane is primarily US-hosted. For an AVG-perimeter workload that also touches Wkkgz dossier retention, you need either a written EU-residency commitment in your DPA or a defensible reason for the transfer. Read your verwerkersovereenkomst before you ship.

LiveKit + Deepgram + Cartesia

The third option is the one a lot of teams reach for when they read the Vapi DPA and frown. LiveKit Agents is an open-source framework for real-time voice pipelines; it speaks SIP via a sidecar, exposes a clean Python/Node API for the turn-taking loop, and lets you pin every byte of audio to an EU region. Pair it with Deepgram's Dutch model for STT and Cartesia Sonic for TTS, and you have a pipeline you can deploy on Hetzner in Falkenstein with no transatlantic hop.

from livekit.agents import AgentSession
from livekit.plugins import deepgram, cartesia, anthropic, silero

session = AgentSession(
    stt=deepgram.STT(model="nova-3", language="nl"),
    llm=anthropic.LLM(model="claude-sonnet-4-5"),
    tts=cartesia.TTS(model="sonic-2", voice="nl-female-warm"),
    vad=silero.VAD.load(),
)

You still need a SIP trunk. We used Twilio Elastic SIP Trunking as the carrier-facing edge — direct KPN ZBC SIP-peering is possible but the procurement cycle takes months. Per-minute infra on this stack at production volume: $0.006 in carrier, $0.0043 in STT, ~$0.025 in TTS, plus a flat ~€90/month for the LiveKit Cloud Agents tier (Frankfurt). Per call it lands around $0.08 — the cheapest of the three, but only after the build.

The cost ledger at 1,640 calls a week

Weekly volume × 84s mean ≈ 2,296 minutes. Annualised, before LLM tokens:

Twilio + ElevenLabs hand-roll: ~€7,200/yr in infra.
Vapi: ~€9,400/yr.
LiveKit + Deepgram + Cartesia: ~€6,400/yr, plus the LiveKit Cloud subscription.

The infra delta between cheapest and most expensive is about €3,000/yr. Relative to a 23-person thuiszorgorganisatie's payroll, that is a rounding error. So we did not pick on price.

AVG and Wkkgz: who holds the recording

The Wkkgz requires a zorgaanbieder to keep enough record to handle a klacht, support kwaliteitsaudits, and — where the gesprek itself documents a care decision — feed the cliëntdossier. In practice for a planning call, that means a transcript plus a short retention window on the audio: long enough for a klacht to land, short enough not to drown in PII.

The AVG layer asks two harder questions. Where does the audio physically sit, and who is the verwerker? For Twilio you can pin Media Streams to Dublin and sign their DPA; for LiveKit you can deploy entirely inside an EU project; for Vapi you need to ask, in writing, and right now the answer is qualified.

Takeaway

If the conversation contains gezondheidsgegevens, your stack choice is not a vendor preference. It is a DPIA input. Decide on data residency before you decide on telephony.

Vrijdagavond, 21:47

In April KPN ZBC moved the 075-number to a new trunk and the inbound calls stopped. The agent kept booting; the dashboard kept green; the verzorgenden kept getting voicemail.

This is the column on the scoring sheet that decides things. The question is not "does the stack work" but "when it stops working at 21:47 on a Friday, who fixes it, how fast, and from where."

On Twilio, you open a ticket and you watch the SIP debugger in the console. Their carrier team is awake on a Friday night; the median time to a useful answer in our experience is roughly 40 minutes. On Vapi, you ping their Discord — responsive, but you are downstream of their carrier provider, which is one more hop in the escalation chain. On the LiveKit roll-your-own, you are the carrier team. Your phone rings. You SSH in. You read the SIP packet capture on the trunk side.

For a 23-person organisation without a NOC, that last option is romantic until the first vrijdagavond it happens. We built the option, we shipped the option, and we wrote the runbook — but the runbook is six pages long and a planning coordinator should not be reading it at 22:00.

What we shipped for Zaandam

Twilio Voice + ElevenLabs, with a thin LiveKit-style orchestrator on top of Media Streams, and a tertiary call-forwarding rule in KPN ZBC that flips the 075-number to a human voicemail box if the agent fails its health check for more than 60 seconds. Audio sits in the Twilio Dublin region; recordings hash into S3-EU with a 90-day TTL; transcripts feed Nedap ONS via a function call — and that, not the audio, is the Wkkgz-defensible artefact.

The boring choice won, because at this size the marginal infra cost did not matter and the on-call answer did. When we built the planning-agent for the Zaandam team, the thing we ran into was not voice quality or latency — it was the second Friday night, when the trunk hiccupped and we needed a vendor whose pager rang before ours. We solved it by buying that pager from Twilio. If you are stitching together voice agents against a Dutch carrier, that trade is worth pricing in early.

Five-minute audit for tomorrow: open your verwerkersovereenkomst, find the clause that names the sub-processors handling audio, and check whether any of them sit outside the EU. If you cannot answer that from memory, you have your Monday morning task.

Key takeaway

At thuiszorg scale the cheapest voice stack is not the right one; pick the vendor whose pager rings before yours when the SIP trunk drops on a Friday night.

FAQ

Why not pick the cheapest stack on paper?

At 1,640 calls/week the annual infra delta between the three is around €3,000. That is dominated by the cost of one outage on a Friday night, so on-call responsibility decides, not price.

Is Vapi disqualified for Dutch healthcare?

Not automatically. It depends on what their DPA commits to in writing about EU data residency and on whether your DPIA accepts the transfer. Ask before you build, not after.

Can we use LiveKit without managing SIP ourselves?

Yes. LiveKit Cloud terminates SIP via a trunk you bring (Twilio Elastic, Telnyx, or a Dutch carrier). You still own the trunk, but not the media infrastructure.

What about the audio retention window for Wkkgz?

The law sets dossier retention, not call-audio retention. Most teams keep transcripts long-term and audio for 30–90 days, enough to investigate a klacht without hoarding PII.

ai agentsvoice agentsintegrationsarchitectureoperationstooling

Building something?

Start a project