Voice agents
Voice agents on a legacy GDS: 1,820 rebookings a week
A 28-person Maastricht travel firm parks every late rebooking in a duty-manager queue inside 60 seconds. The voice agent reads Galileo. The duty manager calls the customer back.

It is 22:47 on a Tuesday in March. The duty manager at a 28-person travel agency in Maastricht has fourteen tickets in her queue and thirteen minutes before the SGR garantiefonds reporting window closes at 23:00. Six months ago she would have been on her ninth call of the night. Tonight the phone has not rung once, because a voice agent is answering for her.
It has handled 247 calls today. Most got resolved without a human: a name spelling correction on an outbound PNR, a luggage upgrade on a Crete charter, a flight number lookup for a traveller who could not find her itinerary email. The fourteen tickets in the duty-manager queue are the ones the agent refused to touch. Every booking change inside 24 hours of departure goes to a human within 60 seconds, by design.
This is what one of our voice agents looks like in production, five months after go-live. The architecture is less interesting than the constraints that shaped it, so we will start with the constraints.
The 23:00 cliff
Dutch travel companies that hold an SGR garantiefonds guarantee carry a reporting obligation: every itinerary change that touches a guaranteed component (flight, hotel, transfer) inside the protection window has to be in their system before the end of the operational day. The fund settles on next-day reporting, and our client's operational day closes at 23:00 sharp. Miss the window and the change is still legal, but the reconciliation against the fund slips a day, and the duty manager spends Wednesday morning rebuilding Tuesday's numbers.
This was not hypothetical. Before we shipped the agent, the after-hours phone queue was unforgiving. A single Crete delay would stack thirty calls between 21:00 and 23:00. Two duty managers, ninety seconds per call on the easy ones, four minutes on the hard ones. The math is brutal. They missed the 23:00 cutoff on average twice a month, and twice a month somebody lost a chunk of their Wednesday.
Why voice and not chat
We had this discussion in the first scoping meeting. The agency's customer base skews older (mean booking age 54, median 58), and after-hours calls are disproportionately from travellers already at an airport or in destination. A 60-year-old standing at the Heraklion check-in desk with a missed connection does not open a chat widget. She calls the number printed on her itinerary.
So: voice, inbound only, Dutch and German and English. No outbound dial-out. We use a Twilio number that lands on a media stream piped through Deepgram for transcription and a function-calling model for intent and slot extraction. Text-to-speech is ElevenLabs with a custom voice that reads, in the words of the operations lead, like a Limburger who has slept enough.
The stack we walked into
Travel is one of the industries where you can still find production infrastructure older than the engineers maintaining it. Our client runs two systems that matter, and both are non-negotiable in scope.
The first is Travelport Galileo, the GDS the agency has used since 2011. Staff interact with it through Smartpoint, the Windows desktop terminal. Travelport ships a JSON API at higher subscription tiers; the existing licence did not include it. The second is a homegrown package-tour database on SQL Server 2012, holding their non-GDS inventory: Greek island charters, Moroccan riads, ski packages with their own contracted hotels. SQL Server 2012 reached end of extended support in July 2022. It still runs everything that matters.
We considered upgrading the SQL Server first. The client considered it too. Then we did the math on what migration would cost and how long the agency would be running in parallel, and we agreed that solving the after-hours phone problem was worth doing first, against the stack as it actually existed. The migration is on the roadmap. It was never going to ship before March.
If you build an agent against a legacy database you do not own end-to-end, make every write go through a queue you control. Never let the agent touch the legacy system directly. The agent that does the write is the agent that takes the blame for the corrupted row.
The 60-second guardrail
The agent reads from Galileo and from the SQL Server. It does not write to either. Every change request, whether a name correction, a date shift, a cabin upgrade, or a full cancellation, produces a structured handoff ticket in a duty-manager queue. The rule is binary. If the change falls within 24 hours of departure, the ticket appears in the duty-manager queue within 60 seconds of the call ending. If the change is more than 24 hours out, it goes to the back-office queue for next-morning processing.
The 24-hour cutoff is not ours. It comes from the agency's own SGR reporting policy: anything inside that window has to be on a human's screen before 23:00. The 60-second SLA is ours, because the difference between a ticket landing at 22:58 and 22:59 is the difference between making the reporting window and missing it.
We achieve the 60-second figure through brute simplicity. The agent's tool-call output for any "change-within-24h" intent writes straight to a Postgres queue table we run alongside the legacy stack. A small Go worker polls every five seconds, classifies the ticket, and pushes it into the duty-manager web UI over a WebSocket. The whole hot path — call ends, transcript settles, tool-call fires, queue insert, UI push — measures p95 at 14 seconds in production. The 60-second number is the SLA we tell the operations team. The real number is mostly bounded by how fast the human can click "next ticket."
What's in the handoff card
The duty manager does not want a transcript. She wants the next action.
So the card contains, in this order: caller name, return phone number, PNR or booking reference, departure date and time (large, red if within 6 hours), the change requested in one sentence, the agent's read of why in one sentence, and a deeplink that opens Smartpoint pre-loaded on the right PNR. No transcript on the card. The full transcript is one click away, and in five months the duty managers have opened it on roughly one ticket in twenty.
This was the biggest design fight on the project. Engineers want the transcript visible. Operations people want the action visible. Build for operations. The transcript exists for the bad day when something gets challenged. On a normal night nobody reads it.
Conversation design, in brief
The agent only asks for what it needs. It does not greet, ask how the caller is doing, or recap the weather. The first turn is: "Goedenavond, ABN-reizen, ik help u met uw boeking. Wat is uw boekingsnummer?" If the caller responds in German or English, the next turn switches language. If language detection is uncertain on the first utterance, it asks once, then commits.
It repeats every PNR back as both letters and digits, NATO-style, and only proceeds on explicit caller confirmation. Unconfirmed PNRs after two attempts route straight to a human. When asked if it is a person, it says it is a virtual assistant. When asked to promise anything irreversible, it says the duty manager will call back within fifteen minutes to confirm. That sentence appears at the end of every late-departure call, without exception.
What the agent refuses to do
We wrote these into the system prompt and into a guardrail layer that runs after each model turn. The agent will not take card payments — the voice channel is not PCI-scoped and we are not interested in changing that. It will not promise a goodwill credit, voucher, or refund value; it can acknowledge the request and route it. It will not confirm a change as booked. It says, every time: "I have placed your request with our duty manager and she will call you back to confirm within fifteen minutes." It will not touch a booking flagged as group travel or a corporate contract; those go straight to a human with no model in the loop.
The last rule surprised the client. We argued for it because group-travel rebooking logic is genuinely beyond what a model should be making judgement calls on. The cost of being wrong on a 40-pax school trip is too high for the upside of automating it.
Numbers, five months in
The agent handles 1,820 weekly rebooking and information requests. Of those, 62% are resolved without a human ever touching the call — name corrections, itinerary lookups, baggage upgrades, day-of check-in questions, voucher status checks. 26% are routed to the back-office queue for next-morning handling (changes more than 24 hours out, refund decisions, group bookings). 12% land in the duty-manager queue within the 60-second window.
The 23:00 cutoff has been missed once in twenty-two operational weeks. Once, on a night when an air-traffic control strike at Schiphol stacked 84 tickets and the duty managers needed an extra hour they did not have. The agency now has a documented policy for declaring an "incident night" and reporting the next morning under a different SGR clause. That policy did not exist before March, because the problem had never been quantified clearly enough to justify writing it down.
Average handle time for resolved-without-human calls is 2 minutes 11 seconds. The agency's old phone tree, when the duty manager answered, averaged 4 minutes 30 seconds. The agent is not faster than a good human. It is faster than a tired human at 22:30, and there is no shift change at 22:30.
What we would do differently
One thing. We built the agent's read-path against Galileo by scripting Smartpoint through a Windows VM, because the API tier was not available. It works. It is also the most fragile part of the system. Every time Travelport ships a Smartpoint update we hold our breath. If you are building voice agents against a GDS, get the API tier before you start. The cost of the tier is less than the cost of one Sunday night spent re-pinning XPath selectors.
When we built this agent for the Maastricht reisorganisatie, the thing we kept running into was that the customers were not the problem. The legacy stack was not the problem either. The problem was designing a handoff the duty manager actually trusted at 22:50 on a Tuesday. We solved it by stripping the duty-manager card down to one action and one number. If you are scoping a voice agent for an operations-heavy business, start there.
Spend twenty minutes tomorrow morning timing your duty manager's actual after-hours calls. Not estimating. Timing. The gap between what your operations team thinks the queue looks like and what it actually looks like is where the agent goes.
Key takeaway
Build voice agents against the legacy stack you actually have, and put a human between the agent and any irreversible action.
FAQ
Why not just upgrade SQL Server 2012 first?
Because the after-hours phone problem was bleeding the duty managers every night and the SQL Server migration is a four-month project. We solved the urgent problem against the stack as it exists. The migration is on the roadmap.
How does the agent know a departure is within 24 hours?
It reads the departure timestamp from the SQL Server package-tour record, or the Galileo PNR for flight-only bookings, and compares to call-start time. The 24-hour rule is hard-coded routing logic, not a model judgement.
What stops the agent mis-hearing a PNR and routing the wrong booking?
It reads every PNR back NATO-style (letters and digits) and only proceeds on explicit caller confirmation. Two failed confirmation attempts route the call straight to a human with no booking change attempted.
Can it handle Dutch, German and English on the same call?
Yes. Language detection runs every utterance and the TTS voice switches per turn. Most callers stay in one language, but Limburg travellers sometimes flip between Dutch and German mid-sentence and the agent follows.