← Blog

Voice agents

Voice agents in port logistics: a Rotterdam case study

A Rotterdam port-services operator used to burn four FTE on customs-broker callbacks. A trilingual voice agent now fields 920 calls a week, no BSN ever written to disk.

Jacob Molkenboer· Founder · A Brand New Company· 7 Mar 2025· 8 min
Black bakelite phone receiver on dark green leather blotter, chartreuse ribbon, brass bell, folded paper on ivory.

Tuesday 06:40 in Rotterdam. The dispatch room sits one floor above the truck yard. Three phone lines are already lit when the kettle clicks. Two are German customs brokers asking about CMR numbers from yesterday's load; one is a Polish driver still parked at Maasvlakte II asking which lane he is supposed to be in. The dispatcher has not yet sat down.

This is the call pattern we found when the operator first opened the books to us. A 44-person port-services firm in Rotterdam, handling containerised forwarding and customs brokerage, was burning roughly four full-time desks on inbound callbacks. Brokers, drivers, and terminal operators wanted the same three things over and over: a CMR status, a confirmed lane, and an ETA for the box. Ninety percent of those calls had a deterministic answer sitting in the operator's Cargonaut feed already. The humans were a relay.

Ninety days ago we shipped a voice agent that fields those calls in Dutch, German, and English. It now handles around 920 callbacks a week, escalates four percent to a human, and never writes a Dutch BSN to disk. This is what the build actually looked like.

The work that used to fill four desks

The intake mix was lopsided. Looking at the prior six months of call logs:

  • About 58 percent of inbound calls were CMR status lookups: where is consignment NL12345678, is it cleared, has it been released for pickup.
  • About 23 percent were lane and gate questions: which Maasvlakte berth, which RTM terminal, which slot window.
  • About 12 percent were ETA queries: the broker in Bremen wants to know when we will physically have it.
  • The remaining seven percent was the long tail: dangerous-goods paperwork, T1 transit questions, invoicing disputes, the occasional human who needed a human.

Language split, weighted by call volume: Dutch 60, German 25, English 15. The German callers were almost entirely customs brokers in the Hamburg, Bremen, and Rhine corridor. The English speakers were a mix of UK exporters and shipping-line desks in Singapore and Dubai. The Dutch callers were everyone else: drivers, internal staff, Belgian forwarders, neighbouring operators.

Every one of those CMR, lane, and ETA answers lived in the operator's Cargonaut feed already. Cargonaut is the IT backbone for the Port of Rotterdam community: it carries the customs status, terminal events, and pre-notification messages that move a container from quay to truck. The data was there. The bottleneck was that a human had to read a screen out loud, four hundred times a day, in three languages.

Why voice instead of a chat agent

We pushed hard for a chat-first approach in the first scoping call. It was the wrong instinct. The callers were not at desks. They were in truck cabs, at terminal gates, on a fork-lift platform with one earpiece in. They called because the alternative was pulling over and typing. Email was already an option; the volume on the inbound line told us why nobody used it.

Latency was the other reason. A broker chasing a release window has thirty seconds before the terminal moves on. A "we will email you the status in five minutes" response, however polite, costs the operator a lane slot. Voice was the only channel that returned answers fast enough to matter.

Architecture, in plain terms

The agent sits behind the operator's existing SIP trunk. Inbound calls hit the PBX as they always did; we added a new dial-plan rule that routes any call from a known broker number (matched against the CRM) to the agent's SIP endpoint instead of the human queue. Anything unmatched still reaches a human first, and the human can transfer to the agent if the question turns out to be routine.

The agent itself is a thin orchestration layer around three tools: a Cargonaut lookup, a CRM lookup, and a structured handoff to the human queue. The model handles language detection, intent parsing, and the readback. Everything stateful sits outside the model.

Here is roughly what the Cargonaut tool definition looks like, lightly anonymised:

const tools = [
  {
    name: "cmr_status",
    description:
      "Fetch the current customs and terminal status for a consignment by CMR number.",
    input_schema: {
      type: "object",
      properties: {
        cmr_number: {
          type: "string",
          pattern: "^[A-Z]{2}\\d{8}$",
          description: "ISO country prefix plus 8 digits, e.g. NL12345678",
        },
        caller_msisdn: {
          type: "string",
          description:
            "E.164 caller number, scopes the lookup to the caller's dossier.",
        },
      },
      required: ["cmr_number", "caller_msisdn"],
    },
  },
];

The pattern matters. A regex on the CMR number means the model cannot invent a consignment ID. If the broker mumbles the prefix, the agent reads it back and waits for confirmation before it makes the tool call. We learned that one the expensive way: the first prototype confidently invented an NL number that did not exist, and the broker hung up halfway through the readback.

Keeping BSN off disk

Dutch personal-data rules treat the BSN as a high-sensitivity identifier. You cannot store it casually, you cannot pass it to a third-party processor without an explicit basis, and the Autoriteit Persoonsgegevens has fined operators that got it wrong. Brokers, however, sometimes recite a driver's BSN on the phone to verify identity at the gate. We had to assume it would land in the audio stream and design around that.

The pipeline does three things to keep BSN off disk:

  1. The transcription stream runs through a redactor before anything is logged. A nine-digit numeric token that matches the BSN checksum (the 11-test) is replaced with [BSN_REDACTED] in the transcript before the line is written.
  2. The raw audio is held in a volatile in-memory buffer for the length of the call, then dropped. Nothing is persisted beyond the redacted transcript and the structured tool calls.
  3. The model provider is contractually bound to a zero-retention setup. No prompt logging, no completion logging, no training on the data. This was the gating decision for the vendor choice.
Warning

Recent vendor moves to mandatory 30-day data retention break this design entirely. If you operate under GDPR and your callflow touches a BSN, an IBAN, or a customs declaration, read the data-retention clause before you read the pricing page.

This is not theoretical. There has been a steady stream of model vendors quietly tightening their retention policies for abuse monitoring; some of them now require a 30-day window even for paid API customers. For an internal coding assistant that is fine. For a Dutch port operator whose call audio can contain a BSN, it is a non-starter. We spent two of the first three scoping weeks negotiating zero-retention terms in writing.

What broke in week one

The first week of live calls produced a list of failure modes that did not show up in any of our test scripts.

German prosody

German callers compose long sentences with the verb at the end. "Ich rufe Sie an wegen der Containerlieferung die ich gestern angemeldet habe und ihr System sagt der Container ist noch im Terminal." The model wanted to interject at the comma. We pushed the silence threshold for de-DE up by 600 milliseconds and stopped barging in. Dutch and English thresholds stayed put.

Port slang

Rotterdam dispatchers use vocabulary the model had not seen at full confidence. "Loods" can mean warehouse or pilot, depending on context. "Het schap" is a berth. "De los" is the discharge slot. We built a small in-context glossary, injected at the system-prompt level for calls flagged as port-domain, and that fixed most of it. We did not fine-tune. A glossary at runtime was cheaper and easier to audit.

Agent over-confidence

This was the embarrassing one. On day three, the agent told a broker that a consignment had been released for pickup when in fact it had only cleared customs; the terminal had not yet given the green light. The broker dispatched a truck for nothing. We tightened the tool-response parser so the agent now distinguishes between customs_cleared and terminal_released as separate states, and refuses to collapse them. We also added a supervisor pass on the first hundred calls per day for the next two weeks, which caught two more borderline misreads before they reached a customer.

If you have followed the Hacker News thread this week about a coding agent that started deleting unrelated files in a Fedora setup, the lesson is the same one we learned on day three. An autonomous agent in an operational loop needs a tight, narrowly-typed contract with the world and a human paid to read the first hundred outputs. The damage in our case was a wasted truck run. In other settings it has been worse.

Numbers after ninety days

The operator has had the agent in production for just over three months. The numbers, current as of last week:

  • 920 inbound calls per week handled end-to-end without human involvement, against an average of 1,020 total inbound calls per week.
  • 4.2 percent of calls escalated to a human, mostly dangerous-goods paperwork and invoicing disputes.
  • Median time-to-answer dropped from 23 minutes (the old voicemail-and-callback loop) to 11 seconds.
  • Two of the four customs-callback desks have been redeployed to active forwarding and customs-broker work. The other two cover escalations and night shifts.
  • Zero BSN values written to any persistent store, verified by a weekly grep of the transcript archive against the BSN checksum.

The cost shape is honest: the agent is not free, the SIP minutes are not free, and the integration work was a six-week build. Payback came in month three, which was faster than we forecast and slower than the sales pitch would have promised.

What we would do differently

Two things, in retrospect.

We over-engineered the IVR fallback. We built a careful three-level menu for callers who wanted a human, and it turned out that one option ("press 9 to speak to someone") would have been enough. The 4 percent of callers who escalate do so because the agent told them to, not because they fought the menu.

We under-engineered the after-hours German routing. German brokers call at 06:30 local Rotterdam time, which is 06:30 in Hamburg too, and we had the agent come online at 07:00 in line with the rest of the operator's shift pattern. That cost us three weeks of irritated broker emails before we caught it. The fix was one cron line.

The five-minute audit you can run today

If you run a callback-heavy operation, you can sanity-check the case for a voice agent without writing a single line of code. Pull last week's inbound call log. Tag each call with the question being asked, not the caller. Tally the top three questions. If the top three account for more than half your volume, and the answers all live in a system you already operate, you have a voice-agent shaped problem.

When we built this voice agent for the Rotterdam operator, the thing we kept tripping on was the BSN-on-the-wire problem: the redaction had to happen upstream of every log line, every audit trail, every observability hook. We ended up solving it by running the redactor inside the same process as the transcription stream, before the line ever left memory. That work is the kind of thing we do under our AI agents practice, and it tends to be the part the demos skip.

Key takeaway

Voice agents pay off when callers have one free hand, thirty seconds of patience, and three repeatable questions sitting in a system you already operate.

FAQ

Why a voice agent rather than a chat agent for customs callbacks?

Customs brokers and drivers call from cabs, gates, and dock platforms with one free hand. They had email already and were not using it. Voice was the only channel that returned answers fast enough to keep a lane slot.

How does the agent keep the Dutch BSN off disk?

Inline redaction inside the transcription process replaces any nine-digit token that passes the BSN 11-test with a placeholder before logging. Audio sits in a volatile buffer and is dropped after the call. The vendor contract is zero-retention.

What percentage of calls still go to a human?

About 4.2 percent over the first 90 days. Most escalations are dangerous-goods paperwork, T1 transit edge cases, and invoicing disputes. Status lookups, lane confirmations, and ETA queries are handled end-to-end.

Does the agent integrate with Cargonaut and Portbase data?

Yes. It calls a thin Cargonaut lookup tool that returns customs and terminal status by CMR number, scoped to the caller's dossier via the CRM. The model never sees raw feed payloads; it sees a normalised status object.

How long did the build take from scope to production?

Six weeks of build, then two weeks of supervised live calls before the supervisor pass was lifted. Payback for the operator landed in month three, against an internal forecast of month four.

voice agentsai agentsautomationcase studyintegrationsoperations

Building something?

Start a project