Voice agents

Voice agent data residency: nine EU traps from the field

Nine data residency traps we hit shipping a voice agent for a German-Dutch insurer, ranked by which ones the vendor's compliance page understates the most.

Jacob Molkenboer· Founder · A Brand New Company· 8 Jun 2026· 10 min

Black bakelite phone receiver off-hook on dark green leather blotter, green ribbon, folded paper map, wax-sealed envelope, brass weight.

Frankfurt, a Tuesday in March. We are in a meeting room above the insurer's claims floor. The DPO slides a printed Datadog dashboard across the table. Three rows are highlighted in yellow: span attributes containing partial customer transcripts, all timestamped 02:14 CET, all routed to a us-east-1 endpoint.

"BaFin is here in six weeks," she says. "Explain this."

The voice agent we had been building for nine weeks handled inbound first-notice-of-loss calls in German and Dutch. Every vendor in our stack advertised "EU data residency" on their compliance page. Every vendor delivered on the headline claim. Every vendor also leaked something we had not thought to ask about.

This is the field guide we wrote after that meeting. Nine traps, ranked by how much the vendor's compliance page understates them. The order matters: the higher up the list, the more likely you will only find the leak by reading packet captures.

Residency is a graph, not a flag

A voice call is not one data flow. It is a graph: SIP signaling, RTP media, transcripts, embeddings, prompts, completions, tool-call payloads, logs, traces, retraining queues, backup snapshots. Each edge in that graph has its own region, its own sub-processor, its own log sink.

The vendor's compliance page collapses the graph into one cheerful node in Frankfurt. The work is to expand the graph back out and ask, for every step from the SIP INVITE to the printed claim notice, where the byte sits at rest, where it transits, and who can subpoena the box it sits on.

The German Federal Data Protection Act and the Dutch AVG mostly agree on the substance. Where they diverge is in tone: the BDSG is more particular about telco metadata under §3 TKG, the Dutch AP has been louder on vector embeddings. A German-Dutch deployment has to satisfy both readings.

Trap 1: observability shipping conversation text to us-east-1

This is the worst-hidden one. Every framework in our stack had a Datadog, Sentry, or OpenTelemetry integration that someone enabled in 2023 and forgot about. Default settings ship error payloads, span attributes, breadcrumbs, and request bodies to whichever region the observability tenant lives in. Our LLM gateway was wired to a US Datadog tenant by a previous contractor. The compliance page of the gateway vendor said "we host inference in the EU," which was true, and silent on the fact that the gateway also shipped every prompt and completion to a US trace store as a debug attribute.

The fix has two halves. First, force the observability tenant to the EU site: Datadog has an EU1 region at app.datadoghq.eu, Sentry has de.sentry.io, New Relic has api.eu.newrelic.com. Second, scrub request and response bodies before they ever reach the tracer. A short middleware does it:

const SCRUB_KEYS = ['transcript', 'prompt', 'completion', 'utterance', 'audio_url', 'tool_args']

function scrubSpan(span) {
  if (!span.meta) return span
  for (const k of SCRUB_KEYS) {
    if (span.meta[k]) span.meta[k] = '[redacted]'
  }
  if (span.meta.http?.body) span.meta.http.body = '[redacted]'
  return span
}

tracer.use('http', { hooks: { request: (span) => scrubSpan(span) } })

Vendor compliance pages say "we host your data in the EU." They do not say "your error tracker is shipping a caller's voice transcript to Virginia every time a tool call throws."

Trap 2: the RTP media path your SIP provider does not show you

Telephony providers route signaling (SIP) and media (RTP) on different paths. The signaling path carries the call setup and the caller ID. The media path carries the audio itself. A vendor with EU SIP PoPs can still anchor your RTP stream through a US transit when the call originates from a roaming number, when the EU media node is at capacity, or when the caller's carrier picks a bad route home.

Ask the telephony vendor for the media anchor region on a per-call basis, not as a global setting. Twilio exposes this through the MediaRegion field on a call; Vonage and Telnyx have similar knobs. Lock the region to de1 or ie1. Do not let it autoselect. Then log the actual anchor that the call used, because the requested region and the granted region are not always the same thing.

Trap 3: cascading sub-processor chains

Your contract with vendor A lists their sub-processors. Vendor A's sub-processor B has its own sub-processors. By the third tier, you have a CDN you never agreed to, terminating TLS for an analytics beacon you also never agreed to, on a us-east-1 box, because vendor B uses Segment for product analytics and Segment is hosted on AWS US.

We made a literal three-column table: vendor, sub-processor, sub-sub-processor. We required vendor A to flow our regional restriction down the chain in writing, with an audit clause. Two vendors said yes. One vendor said "we do not control B's region." We replaced that vendor before procurement signed anything.

The EDPB recommendations on supplementary measures name this cascade explicitly and explain why a single Standard Contractual Clauses copy on the front vendor does not cover it. Most vendors' compliance pages do not mention the recommendations at all.

Trap 4: backup and disaster recovery regions

A vendor hosts production in Frankfurt and is honest about it. The backup region defaults to whichever EU region is geographically nearest the primary, which is usually fine, except when "nearest" silently means London (post-Brexit, that is now a third country sitting under an adequacy decision that the Commission can revoke), or us-east-1, because the vendor's DR runbook never got updated after they opened the EU region.

Ask for the DR region in writing. Ask for the failover test log from the last quarter. If the runbook fails over to a US region "just to keep the service up," your residency claim dies the moment a real outage hits. We had one vendor offer to add a contractual clause forbidding US failover; we took it. Two refused; we replaced one and accepted a documented risk on the other after a Transfer Impact Assessment.

Trap 5: model fine-tuning telemetry

Every LLM vendor with a fine-tuning product ships training events to a US-hosted ML ops platform by default. The training data itself stays in the EU bucket. The telemetry around it (which examples failed validation, which prompts produced refusals, gradient norms, evaluation outputs) often does not. The telemetry contains enough text fragments to reconstruct a meaningful slice of the training set.

If you fine-tune, ask which MLOps stack the vendor uses for the run itself. Weights and Biases has an EU dedicated cloud now; Comet has region-locked instances; MLflow self-hosts wherever you put the server. The default tenant on every managed offering is US. The vendor will not volunteer this.

Trap 6: STT regional fallback under load

Speech-to-text vendors advertise an EU endpoint. Under load (and under "load" we mean a Tuesday morning during claim season after a windstorm), several of them fall back to a US region transparently. The vendor logs the fallback internally as a reliability event. You will not see it in your dashboard. The X-Region response header tells you where the request actually ran. The compliance page does not.

We added a smoke test that fires one synthetic call per minute against the STT endpoint and logs the X-Region header to a separate Loki instance. The first time we ran it, 8% of requests in a four-hour window came back as us-west-2. The vendor's incident response, after we escalated: "we did not document that fallback because we considered it a reliability feature." The fallback was disabled by a flag on our account within a day. The flag had been off by default for two years.

Warning

Always log the actual response region per request, not the configured one. Vendor compliance pages describe steady-state routing. Your residency case lives or dies on the failure modes nobody documents.

Trap 7: TTS voice clone artifacts

If you clone a voice for the agent (we did, with written consent and a separate processing agreement with the voice talent), the voice model itself is personal data of the speaker. The vendor stores that model artifact on whatever GPU pool was free when you trained it. We had a clone trained on a German Sprecher's three-hour studio session sitting on a us-east-1 inference cluster for six weeks before we noticed.

Ask the TTS vendor where the model file is stored at rest, and separately where inference runs. They are often different boxes in different regions. ElevenLabs, Cartesia, and Resemble all offer EU-region options now, but the toggle is buried in account settings and the default is US. The consent paperwork you signed with the voice talent very likely promised EU storage; the vendor's default very likely broke it.

Trap 8: vector embeddings and the source-text gap

Your knowledge base of policy documents lives in an EU bucket. You embed it through an EU LLM endpoint. You store the resulting vectors in a managed vector database. The vector database happens to be a managed Pinecone or Weaviate cluster that, until recently, did not offer an EU region.

Embeddings are not anonymous. Given the source documents and the embedding model, you can invert them and recover meaningful fragments. The Dutch Autoriteit Persoonsgegevens has made this point in writing in the context of medical RAG systems, and the same logic applies to insurance claim notes. Pinecone now offers a Frankfurt region. Qdrant Cloud has had Frankfurt for over a year. For an insurance use case, pgvector inside an EU-hosted Postgres remains the safest answer because the surface area is smaller and your DBA already understands it. We moved to that.

Trap 9: inference logs and abuse review queues

The vendor's compliance page says "we run inference in the EU." Read the next paragraph. Inference logs (the full prompt, the completion, the system prompt, the tool calls, the timing) often route through a US "trust and safety" pipeline before they are stored or discarded. Abuse review queues are almost always staffed by a vendor in the US or the Philippines, and the reviewers see complete prompts.

Ask: where do inference logs sit at rest? How long are they retained? Who reviews flagged completions? If the answer mentions a trust and safety team, ask where that team is and what fields they see. For our insurer we negotiated a no-logs tier with the LLM vendor and ran our own redaction pipeline at the boundary, so any PII was masked before the text crossed an API.

The shape of the residency map document

What the DPO eventually signed was not a memo. It was a 14-page map. Page 1 was a Mermaid diagram of every edge in the data graph. Pages 2 through 11 were one page per edge: source region, destination region, sub-processor, retention period, contractual clause, observed evidence. Page 12 was the open risks. Page 13 was the Transfer Impact Assessment for the one US edge we could not eliminate. Page 14 was the test plan: how we verify every claim above every quarter.

The map took a week to produce and a day to maintain per quarter. Without it, the BaFin conversation would have been a guess. With it, the conversation was a checklist.

The audit you can run today

Take the next ten minutes. Open your voice agent's runtime config and grep for these strings: datadog, sentry, endpoint, region, api.openai.com, eu1, us-east, us-west, amazonaws.com. For each hit, ask: where does the byte on the other end sit, and does the contract bind that vendor to the region they advertised on the slide deck?

That grep is not the audit. It is the first ten minutes of it. The full map takes a week, sometimes two.

When we built the voice agent for that German-Dutch insurer, the hardest part was not the prompting or the latency budget. It was producing a residency map the DPO could sign without crossing her fingers. We do this work on day three of every voice engagement now, before any code ships to a phone number.

Key takeaway

Residency is a graph, not a flag. Every vendor's compliance page collapses that graph into one cheerful node in Frankfurt; your job is to expand it back out.

FAQ

Does GDPR require my voice agent to keep all data in the EU?

No. GDPR allows transfers under adequacy decisions or Standard Contractual Clauses with supplementary measures. It does require you to know where each byte sits and to justify any transfer in writing.

What is the most commonly missed residency leak in a voice agent stack?

Observability. Datadog, Sentry, and OpenTelemetry exporters ship request and response bodies to whichever region the tenant lives in, and the default tenant is US for most teams.

Can I use a US LLM provider for an EU voice agent at all?

Yes, if you contract for an EU inference region, scrub PII before the API boundary, disable retention, and document the Transfer Impact Assessment. Most vendors offer this on enterprise tiers.

Are vector embeddings personal data under GDPR?

The Dutch AP and several EU supervisory authorities have written that embeddings can be inverted to recover source text, so embeddings derived from personal data inherit the same regime.

How long does a full residency audit take for a voice agent stack?

About one week of focused work for the map, then roughly a day per quarter to verify the claims still hold. Most of the time is spent reading sub-processor lists and running per-call region tests.

voice agentsai agentssecurityoperationsarchitectureintegrations

Building something?

Start a project