Voice agents
Voice agent region leak: the EU-pinning gate we now run
A Wednesday at 14:00 in Zwolle, a recruitment bureau's voice agent transcribed a candidate's salary band into a public ATS comment. Here is what broke, and the gate we now run before any tool-call.

The bureau's recruitment lead pinged us at 14:07. A candidate had just messaged her on LinkedIn, asking why his salarisindicatie of €78–84k was sitting as a public comment under his profile in the bureau's ATS. He had screenshotted it. He was, understandably, not delighted. The voice agent we had built for the bureau's inbound candidate calls had put it there four minutes earlier, with no human in the loop.
The agent's job is to capture intent, role preference, and a salary band, then push a structured summary to the ATS as a private internal note. The salary band is the only field marked AVG-sensitive. It handles the inbound calls candidates make when they return to a vacancy after a recruiter follow-up, and it had been live for four months at the point of the incident, with a clean record.
Today, the band landed in the wrong field. And the audio that contained it had quietly left the EU on the way there.
The voice pipeline on a normal day
The shape is standard. Twilio SIP into our orchestrator, audio chunks to a Whisper streaming endpoint, intent and entity extraction via a function-calling model, then a write to the ATS via the bureau's REST API. Every step is region-pinned to eu-central-1 (Frankfurt) in configuration. The ATS write itself uses an internal-only endpoint that is separated from the public comment field by both URL path and OAuth scope.
So on paper, the salary band cannot reach the public comment, and the audio cannot leave the EU. On Wednesday, both rules broke. They broke independently, in two different systems, and the leak required both to fail in the same minute.
The five-minute trace
We pulled the request log. At 13:59:48, the Whisper streaming connection to eu-central-1 returned a 429 with a Retry-After: 30 header. The vendor's EU quota had tripped, quietly, with no advance warning email and no visible status page incident. Our wrapper's fallback logic, written six months earlier when the same endpoint had a brief outage, retried the request without re-asserting the region pin. The retry succeeded against us-east-1.
The candidate's audio, in clear Dutch ("ik zit nu op tweeënzeventig, maar voor de juiste rol ga ik richting de tachtig"), went to a US datacenter, came back transcribed correctly, and got handed to the extraction model. Our pipeline did not care which region transcribed the audio. It only cared that the transcript was returned in under 800ms.
This is the part of the trace that, on a different day, we would have caught from monitoring alone. Our latency dashboard for the voice path tracks p50 and p99 separately per region, and a US round trip from Frankfurt typically adds 90–110ms over an EU one. But the dashboard rolls up in fifteen-minute windows, and the retry that crossed the Atlantic was a single call inside a four-minute conversation. Forty other EU calls in the same window held the average down. The drift was real, but invisible at the resolution we were watching.
The salary band was extracted. The ATS write fired. And then the second thing failed.
The OAuth scope we widened in April
The ATS write payload contained the salary band, the candidate's preferred role, and a courtesy summary string, mapped into separate fields. The OAuth token the agent held had a scope that we widened in April 2026 to support a feature the bureau had requested: per-field configuration of which fields the agent was allowed to write. The configuration UI had not yet rolled out. So the agent, at the moment of the write, had write access to every field in the ATS, including comment_visible, the public comment field.
The field mapping itself was generated by a model. We had given it a list of available fields with human-readable names. It picked salary_band_internal for the band, correctly. But it also wrote a courtesy summary into comment_visible, generated from the transcript, that contained the sentence "Candidate currently at €72k, looking towards €80k for the right role."
Two silent failures, both required to leak the data, both fixed independently.
A region pin set once at configuration time is not a region pin. If your fallback logic can retry without it, your data can leave the region without anyone noticing for hours.
What we now run before every tool-call
The fix had to live below the agent layer. Telling a model "please stay in the EU" is not a control; it is a vibe. So we built a three-step gate that sits between the orchestrator and any tool-call that touches AVG-classified data. Every voice agent we ship now runs through it.
Step 1: pre-flight region assertion
Before the orchestrator opens a connection to a model endpoint, it resolves the endpoint's hostname and checks every returned IP against an EU allowlist we maintain from RIPE's published ranges. If any resolved IP falls outside, the call fails closed. No fallback to another region, no retry, no "well, the EU one is down, let's try the next best thing." The call returns an error to the orchestrator, the orchestrator logs the incident, and a Slack page fires.
// pre-flight region pin — runs before any AVG-classified tool-call
import { resolve4 } from 'node:dns/promises'
import { ipInEuBlock } from './ripe-eu-blocks'
export async function assertEuEndpoint(url: string): Promise<void> {
const host = new URL(url).hostname
const ips = await resolve4(host)
const nonEu = ips.filter(ip => !ipInEuBlock(ip))
if (nonEu.length > 0) {
throw new RegionPinError(
`Endpoint ${host} resolved to non-EU IPs: ${nonEu.join(', ')}`
)
}
}
Step 2: per-region API key, not per-region header
The cleanest residency control is the credential itself. Our EU model key cannot be used to call a US endpoint, and vice versa, because the vendor enforces that on their side. We provision separate keys per region, store them in separate secret-manager paths, and the wrapper refuses to load a non-EU key when the request is tagged AVG-classified. Where a vendor only offers a residency header rather than scoped keys, we treat the header as advisory and add it anyway, but the IP check in step 1 is what we actually trust.
Step 3: post-response audit
After every model response that handles AVG data, we check the response's region header (most vendors expose one: x-region, openai-processing-region, or similar). If the header is missing, or its value is not on our EU allowlist, the response is not just logged. It is discarded. The transcript never reaches the extraction model. The ATS write never fires. The orchestrator returns an error to the caller and the call ends with an audible apology to the candidate: "Sorry, our system is having trouble right now, a human will call you back within the hour." We would rather drop the call than mishandle the data.
Fail closed, not open. A voice agent that drops the call when it cannot prove EU residency is a voice agent that does not generate a meldplicht datalek the next morning.
What we changed beyond the gate
Three more things came out of the post-mortem, and they matter as much as the gate itself.
First, we stopped using model-generated field mappings for AVG-sensitive writes. Those mappings are now hand-written, code-reviewed, and gated behind a per-field allowlist. The agent can ask to write to salary_band_internal; it cannot decide on its own to also write a summary into a public comment field. The model proposes, the allowlist disposes.
Second, we narrowed the OAuth scope back to the explicit set of fields the bureau had configured at the time of token issue. The "write any field" scope was convenient for our roadmap and lethal to the principle of least privilege. We had quietly accepted a six-month window where one mistake could write anywhere. That window is closed.
Third, we now scrape rate-limit response headers on every successful call to a model vendor and dashboard them per region. The 429 that started this incident never appeared on the vendor's public status page, but the regional quota had been visibly draining in the X-Ratelimit-Remaining header for the previous forty minutes. We could have seen what the vendor's on-call saw, at the same time they saw it, if we had been looking at the right field. We now page when the EU quota drops below 25% of the daily ceiling, and route a low-priority Slack notice when it crosses 50%. The quota dashboard sits next to the residency dashboard, because the two failures share a class: a vendor-side condition that a vendor-side fallback can paper over without telling you.
The meldplicht datalek, and what we told the bureau
The salary band was pulled from the public comment within nine minutes of the candidate's screenshot landing in our Slack. The bureau filed an AVG-melding within 24 hours, well inside the 72-hour limit set by Article 33 of the GDPR for any breach of personal data, and tracked separately by the Autoriteit Persoonsgegevens in the Netherlands. The candidate accepted the apology and stayed in the process, though he did not, in the end, take the role.
The bureau's recruitment lead asked us one question on the call afterwards, which is the one we get from every operations lead after an incident like this: "How would we have caught this ourselves?" The honest answer is, we would not have. The vendor returned a 200, the transcript was correct, the ATS write succeeded. Every system reported green. The only thing that would have caught the drift earlier was a synthetic check that resolved the model endpoint every minute and paged when the IP landed outside the EU allowlist. We have that running now, on a separate cron, on a separate cloud, against a separate alerting channel.
The AI-native infrastructure question
There is a popular AI-native startup playbook making the rounds on Hacker News this week that argues the right move is to wire up vendor APIs fast and treat infrastructure as a later problem. That is a fair posture when the infrastructure question is whether you self-host Postgres. It is not a fair posture when the infrastructure question is which continent a candidate's salary data lives on while it is being transcribed.
If you are running voice agents against EU candidates, customers, or patients, the gate between your orchestrator and your vendors is your problem, not your vendor's. The vendor will happily fall back to a different region to keep your latency low, and the report you get afterwards will say everything went fine.
One thing to do today
When we built the voice agent for this Zwolle recruitment bureau, the thing we ran into was that a region pin set at configuration time is not enforced at request time. We solved it with the three-step gate that now sits in front of every AI agent we ship.
If you run agents that touch EU personal data, the smallest audit you can do today is a five-minute trace of one production call: log the IP your model endpoint actually resolved to, and check it against the published EU ranges. If it lands outside, you have a problem you did not know you had.
Key takeaway
A region pin set at configuration time is not a region pin — your fallback logic can leak EU voice data to another continent without anyone noticing.
FAQ
Why did the EU region pin fail in the first place?
Our fallback logic, written for an earlier outage, retried the request after a 429 without re-asserting the region pin. The retry succeeded against a US region and the rest of the pipeline did not check.
Is OpenAI's European data residency enough on its own?
It helps, but it is not enforcement. Until you check the IP and the response region header on every call, a vendor-side fallback or misrouted key can still send data outside the EU without you noticing.
How fast do we have to file a meldplicht datalek?
Under GDPR Article 33 you have 72 hours from the moment you become aware of the breach to notify the supervisory authority. In the Netherlands that is the Autoriteit Persoonsgegevens.
Does the gate add latency to a voice call?
The pre-flight IP check is cached per hostname for 60 seconds and adds about 4ms on a cache hit. The post-response header check is a string compare and is free. Candidates do not notice.