Voice agents

Twilio, Vapi, or LiveKit: voice agent stack for thuiszorg

Maastricht thuiszorg, 3,100 weekly intakes, KPN pushing TLS 1.3 on Hemelvaartsdag. Which voice-agent stack survives the per-minute math and the on-call?

Jacob Molkenboer· Founder · A Brand New Company· 18 Jun 2026· 9 min

Cream Bakelite phone receiver on ivory leather blotter, green silk ribbon, red wax seal on folded note.

It is Hemelvaartsdag, 10:42. The intake-coördinator at a 29-person thuiszorg-organisatie in Maastricht has fourteen missed calls on her work-mobile. The PBX dropped every handshake from the KPN SIP trunk at 09:00. KPN flipped TLS 1.3 to mandatory overnight, and the cipher list on the gateway has not been touched since 2020.

There is no engineer in the building. There is no engineer in the country.

This is the question behind every voice-agent buy. Not how good is the voice. Not does it speak Dutch. The question is: on a feestdag at 10:42, who is paged, and what is the per-minute bill while the line is silent?

We ran the comparison three times this spring for healthcare clients of comparable size. Twilio Voice with ConversationRelay, Vapi as managed orchestration, and a hand-rolled stack on LiveKit Agents with Deepgram and Cartesia, each with its own SIP-trunk story. Same workload spec, same compliance bar. Three very different bills and three very different on-call rotations.

The workload, in numbers

The reference client handles 3,100 cliënt-aanmeldingen per week: a mix of new-patient intake, callback requests from the wijkverpleegkundige-inbox, and same-day routing to the on-call team. Average call length 2 minutes 50, p95 closer to 6 minutes. Peaks at 08:30 Monday and 14:00 Wednesday.

That averages to roughly 8,800 minutes per week, ~38,200 minutes per month. A bursty workload, mostly NL-mobile to NL-fixed, with a hard requirement to drop the call into a human if the cliënt's voice carries panic, confusion, or the word spoed.

The agent does four things:

Greet, identify the cliënt against the AFAS-record (BSN never asked over the phone; postcode + huisnummer + geboortedatum is the legitimised path).
Categorise the request (intake nieuw, herhaal, klacht, spoed).
Schedule against the Beaufort or Nedap planning, or book a callback slot.
Write a structured note into the cliëntdossier under Wkkgz §7 retention.

That is the shape. Now the three stacks.

Twilio Voice with ConversationRelay

The path of least architectural resistance. ConversationRelay brings your own LLM and your own TTS, and lets Twilio handle media, transcription, barge-in, and the SIP/PSTN edge. Studio handles fallback flows, Flex handles the human handoff console.

Indicative per-minute cost at our usage profile, late Q1 2026:

Inbound PSTN to a NL number: ~$0.0085/min.
ConversationRelay session fee: ~$0.04/min.
STT (BYO Deepgram or Twilio's own): $0.005–0.008/min.
LLM (gpt-4o-mini at our token volumes): ~$0.012/min.
TTS (Cartesia, BYO): ~$0.02/min.

Conservative all-in: $0.09–0.12 per minute. At 38,200 minutes a month, roughly $3,400–$4,600 per month before call-recording storage and premium-rate routing.

The interesting part is not the bill. When KPN flipped TLS 1.3 on Hemelvaartsdag, our Twilio-fronted clients noticed nothing. Twilio's SIP edge does the negotiation. The on-call rotation for the carrier handshake is, by contract, theirs. You wake up to a status-page note, not a paged engineer.

The cost of that calm is vendor lock and a price floor you do not control. Twilio's pricing has moved twice in the last 18 months. The contract you signed in March is not the contract you renew in March.

Vapi

Vapi is the managed-orchestration choice. You define the assistant in their console or via API, point it at your tools, and Vapi sits between the transport (Twilio or LiveKit, your choice) and your LLM/STT/TTS providers. You can swap the transport later. You cannot easily swap Vapi.

Indicative per-minute, same profile:

Vapi orchestration: ~$0.05/min.
Transport (Twilio SIP passed through): ~$0.013/min.
STT (Deepgram Nova-3 streaming, BYO key): ~$0.006/min.
LLM (gpt-4o-mini, BYO key): ~$0.012/min.
TTS (Cartesia, BYO key): ~$0.02/min.

Conservative all-in: $0.10–0.13 per minute. At our volumes, $3,800–$5,000 per month.

Vapi's appeal is the dashboard. The intake-coördinator can read a transcript with sentiment, latency, and tool-call traces without a developer in the loop. For a 29-person organisation with no in-house IT, that is real value.

The trade-off is the same shape as Twilio's: when the integration drifts, diagnosis runs through someone else's support queue. We have seen 36-hour gaps on Vapi tickets that were not ours to escalate, and the cliënt does not care which logo is at fault.

LiveKit Agents, Deepgram, Cartesia, on our own SIP

This is the stack we ship when the client has, or hires, someone who can hold a pager. LiveKit Agents handles the realtime media plane. Deepgram does streaming STT. Cartesia does TTS. The LLM is whatever the workload demands: for intake routing, gpt-4o-mini is enough; for clinical handoff, we step up.

The SIP trunk runs through a Dutch carrier (Voiceworks, RoutIT, or Twilio Elastic SIP, depending on what survives procurement). Recording lands directly in an AVG-clean S3 bucket in eu-central-1 with object-lock, not in the vendor's region.

Indicative per-minute:

SIP trunk inbound, Dutch carrier: ~€0.006/min.
LiveKit Cloud media: ~$0.003/min.
Deepgram Nova-3 streaming: ~$0.0058/min.
Cartesia Sonic: ~$0.02/min.
LLM (gpt-4o-mini): ~$0.012/min.
Compute, small EU agent fleet: ~$0.002/min amortised.

Conservative all-in: $0.045–0.055 per minute. At our volumes, $1,700–$2,100 per month in infrastructure.

That number is misleading on its own. Add ~8 engineering hours per month of actual on-call and config-drift work at €90/hr blended: €720/month. The honest total is €2,500–€2,800 per month. Cheaper than Twilio. More expensive than the infrastructure bill suggests.

The reference flow is short:

from livekit.agents import Agent, JobContext, WorkerOptions, cli
from livekit.plugins import deepgram, cartesia, openai, silero

class IntakeAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions=open("prompts/intake_nl.md").read(),
            stt=deepgram.STT(model="nova-3", language="nl"),
            llm=openai.LLM(model="gpt-4o-mini"),
            tts=cartesia.TTS(model="sonic-2", voice="nl-female-warm"),
            vad=silero.VAD.load(),
        )

async def entrypoint(ctx: JobContext):
    await ctx.connect()
    agent = IntakeAgent()
    await agent.start(ctx.room)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Bewaartermijnen: AVG against Wkkgz

This is where Dutch healthcare voice agents earn their keep or kill the project. The recording of an intake call is not a casual log. The moment it contains medical information, it is part of the cliëntdossier and falls under Wkkgz retention: 20 years from the last entry, unless the cliënt requests deletion and the zorgaanbieder agrees.

The AVG-minimisation principle pulls the other way. You cannot keep what you do not need. So we split the artefacts:

Raw audio: delete after 30 days unless flagged for klachtafhandeling. Verwerkingsgrond: gerechtvaardigd belang voor kwaliteitsbewaking. Document the DPIA.
Transcript and structured note from the agent: retained 20 years as part of the dossier.
Metadata (caller-ID, duration, agent path): kept 7 years, separated from the dossier.

Twilio and Vapi will both happily store recordings for you. The default region is not the right region for healthcare in Limburg. Both support EU residency; you have to ask for it, you have to write it into the verwerkersovereenkomst, and you have to verify it in the console.

On the hand-rolled stack, the bucket is yours from minute one. The retention policy is a single Terraform file. No support ticket needed to prove residency to the Autoriteit Persoonsgegevens.

Warning

If you signed a Twilio voice contract before 2024 without an EU data-residency addendum, your call recordings may currently sit in us-east-1. Check before the next AP-audit, not after.

The Hemelvaartsdag question

Back to 10:42. KPN has rolled TLS 1.3 to mandatory on every NL SIP trunk overnight. Who patches the gateway?

Twilio: Twilio's carrier engineers, on their rotation. You get a status-page note. Cost: already inside the per-minute price.
Vapi: depends on whose trunk it is. If theirs, see Twilio. If yours, see hand-rolled.
Hand-rolled: the person whose pager you carry. Hopefully not the founder.

This is the single question worth burning a meeting on. Not the demo. Not the voice quality — Cartesia is excellent, ElevenLabs is excellent, the cliënt does not care. The question is: when KPN, or the LLM vendor, or the TTS provider pushes a breaking change on a feestdag, what is your MTTR and who is awake?

Our answer for the 29-person Maastricht client was the hybrid most operators land on once they do the math honestly: Vapi on top of a Twilio Elastic SIP trunk in the EU region, with a documented fallback playbook for the day we outgrow it. The per-minute is higher than the LiveKit stack. The on-call is lower than the LiveKit stack. The bill arrives in euros within the AVG perimeter. That was enough.

For the 180-bed verzorgingshuis-group we onboarded in February, we shipped the LiveKit + Deepgram + Cartesia path with a paid SRE rotation. The per-minute saving funds the engineer with margin to spare. The breakpoint between the two stacks sits around 25,000 monthly minutes in our experience. Below it, managed wins on TCO. Above it, hand-rolled wins, but only if the rotation is real.

The afternoon move

If you are sitting on a voice-agent decision and the demo went well: model the bill against your actual minute count, not the demo's. Forty-thousand minutes a month at $0.12 is not the same problem as four-hundred-thousand minutes at $0.05. Then write down, in one sentence, who you call when the line is silent on a feestdag. If the sentence ends with no one or support ticket, you do not yet have a stack. You have a prototype.

When we built the intake-agent for this Maastricht client, the part that kept biting was the gap between the demo bill and the production bill once Wkkgz-grade recording retention was wired in. We solved it by splitting audio retention from transcript retention at the bucket layer — small change, large compliance dividend. If you want that playbook applied to your own intake flow, our voice-agent work starts there.

Key takeaway

The cheapest voice stack on paper is the one with the highest on-call cost. Model the per-minute bill, then ask who patches the SIP trunk on Hemelvaartsdag.

FAQ

What does an intake voice agent actually cost per month at ~3,000 weekly calls?

Roughly €1,700–€2,100 in infrastructure on a hand-rolled LiveKit stack, or $3,400–$5,000 on Twilio or Vapi, before storage and ops time. Volume and call length shift the answer fast.

Are voice-agent call recordings allowed under AVG for Dutch healthcare?

Yes, with a documented DPIA, a verwerkersovereenkomst, and a clear retention split. Raw audio short-term, transcripts inside the cliëntdossier under Wkkgz, metadata separated.

Can Twilio or Vapi keep recordings in the EU?

Both support EU regions, but it has to be contracted and configured explicitly. Default routing for older contracts often still points at us-east-1. Verify in the console, not the salesdeck.

Do we need our own SRE rotation if we run LiveKit + Deepgram + Cartesia?

Effectively yes. Someone has to patch the SIP trunk when carriers force protocol upgrades and rotate keys when a vendor changes API surface. Budget ~8 engineering hours per month at our reference volume.

Where is the breakpoint between managed and hand-rolled?

In our experience, around 25,000 monthly minutes. Below that, Twilio or Vapi wins on total cost of ownership. Above it, the LiveKit stack funds a dedicated rotation with margin left over.

voice agentsai agentscase studyintegrationsoperationsarchitecture

Building something?

Start a project