← Blog

Voice agents

Voice agent for binnenvaart: a 35-second escalation playbook

A skipper called in at 03:12 with a propane vent alarm. The voice agent had 35 seconds to wake a captain before the IVS90 30-minute clock started ticking.

Jacob Molkenboer· Founder · A Brand New Company· 8 May 2026· 9 min
Vintage black bakelite phone receiver on cream leather blotter, brass stopwatch, green silk ribbon, folded flag in shadow.

The 03:12 call

The number is painted on the side of the wheelhouse, weather-faded but legible. At 03:12 on a February Tuesday the skipper of an 86-meter type-C tanker punches it in. The wheelhouse smells of propane. He doesn't say "I have an ADN class 2 incident." He says, in slow Dordts: "Er zit gas in de stuurhut, ik weet niet of het van mij of van de buurman komt."

The voice agent has 35 seconds before this becomes a captain's problem, and 30 minutes before it becomes Rijkswaterstaat's problem. This post is the playbook we used to make those two clocks fit inside each other.

What we walked into

The rederij runs 28 people on land and a fleet of 19 hulls between Antwerp and the Ruhr. Their dispatch stack, in 2026:

  • Autena Maritime vlootbeheer, on-premise, version 4.x, last meaningfully upgraded in 2012. Exposes a SOAP 1.1 API behind a 100Mbit fiber and Basic Auth.
  • A homegrown sluis-planning tool built on Exchange 2019 calendar items. Each lock booking is a calendar event in a shared mailbox; the subject line carries the lock code, the body carries the cargo string.
  • A dispatch board running on a 32-inch touch screen, driven by an Excel workbook and a WinSCP cron job.
  • WhatsApp groups by route. Marifoon for everything safety-critical.
  • An emergency line that rang on a Polycom desk phone in the planning room. After hours, forwarded to whichever planner was on call, who answered with a Nokia 6310 he refused to upgrade.

1,540 skipper-meldingen a week were running through that desk phone and those WhatsApp groups. Roughly 4% were time-critical. The planners had no way to triage faster than "pick up, listen, write it down, walk over to the board." We were brought in to take the non-critical 96% off their hands without breaking the 4% that mattered.

Why voice, not chat

You can argue chat for almost any operations agent. We tried. Skippers don't type while sailing the Oude Maas. They speak. They speak with diesel running in the background, with wind in the mic, with one hand on the helm. So voice. We use a Dutch-Vlaams ASR with a maritime keyword bias list (lock codes, ADN classes, the names of the 19 hulls, the eleven sluizen between Lobith and Dordrecht). The keyword bias does more work than the model swap did.

The 35-second budget

Thirty-five seconds is the internal SLA from "ring received" to "captain phone vibrating." It is not the legal clock — that one is 30 minutes — but it is the only clock that buys the captain time to do the legal paperwork properly. Anatomy:

  • 0–2s: greeting in Dutch, barge-in window open so the skipper can interrupt.
  • 2–12s: intent capture. Free-form. We do not menu them.
  • 12–18s: ADN class detection. Cheap regex first (keywords like gas, lek, damp, klasse 2), then a small classifier as a second opinion.
  • 18–25s: load lookup in Autena to confirm what the boat is actually carrying.
  • 25–30s: lock window check in Exchange so the agent knows where on the route the boat is.
  • 30–35s: route. Class 2 to the captain queue, all other ADN classes to the planner queue, everything else stays with the bot.

The lookup steps run in parallel. If Autena is down the agent assumes the boat is loaded with the last known manifest, escalates anyway, and flags the staleness to the captain. We chose false positives over false negatives by a comfortable margin.

Warning

Your internal SLA is not the legal clock. The Rijkswaterstaat IVS90 meldplicht is the legal clock, and it starts when the skipper notices the incident, not when your agent picks up. If you brag about 35 seconds, brag about how many of the remaining 29 minutes 25 seconds you bought the captain.

The Autena adapter

Autena Maritime exposes a SOAP 1.1 service. No WS-Security, Basic Auth over plaintext on the LAN, MTOM for cargo attachments. The WSDL lies about array cardinality — when there is one cargo, you get a string; when there are several, you get an array. Fourteen-year-old SOAP services do this a lot. We wrap it in a thin Node service and treat the wrapper as the source of truth.

import { createClientAsync, BasicAuthSecurity } from 'soap'

const WSDL = process.env.AUTENA_WSDL!

export async function getLoadManifest(barge: string, signal: AbortSignal) {
  const client = await createClientAsync(WSDL, {
    wsdl_options: { timeout: 3000 },
    forceSoap12Headers: false,
  })
  client.setSecurity(new BasicAuthSecurity(
    process.env.AUTENA_USER!, process.env.AUTENA_PASS!
  ))

  const [res] = await client.GetLoadManifestAsync(
    { bargeCode: barge },
    { timeout: 3000, signal }
  )

  // WSDL says Cargo: Cargo[]; reality returns a single object when
  // the boat carries one cargo type. Normalise.
  const raw = res?.GetLoadManifestResult?.Cargo
  const cargoes = Array.isArray(raw) ? raw : raw ? [raw] : []

  return cargoes.map(c => ({
    adnClass: String(c.ADNClass ?? '').trim(),
    unNumber: String(c.UNNumber ?? '').trim(),
    tonnage: Number(c.Tonnage ?? 0),
  }))
}

The 3-second timeout matters. Inside the 35-second budget we have a 7-second window for the lookup. We give Autena three seconds, retry once, and if both fail we fall through to the last cached manifest in Redis. The cache is invalidated when the Autena laadbon webhook fires; if you don't have a webhook, poll the load journal every 60 seconds and accept the staleness.

The Exchange 2019 sluis-planning sync

The planning lives in calendar items on a shared Exchange 2019 mailbox. We touch it via EWS (Exchange Web Services), which Microsoft has been retiring for years but which still works on Exchange Server on-prem in 2026. Microsoft's EWS managed API documentation is still the canonical reference for the protocol; treat it as the contract.

The subject line is a strict format: SLZ-<lockcode>-<ETA HH:mm>-<bargeCode>. The body is a free-text cargo description, which is where the human planners get creative — abbreviations, typos, the occasional "check met Marco" note. We parse the subject with a strict regex (hard fail on malformed) and the body with a small LLM call that returns null when it isn't sure. Null is fine. The agent can still escalate without a parsed cargo string; the captain has the manifest.

Read-only is non-negotiable. The agent never writes to the planning mailbox. If the planners catch a single ghost calendar event created by a bot, the whole project loses trust. Read, log, escalate — that is the contract.

The escalation tree

Three buckets. The taxonomy is the agent.

  • Captain queue. ADN class 2 (gases) confirmed, or class 2 suspected and the load lookup failed. The on-call captain's phone rings. If no answer in 90 seconds, the second captain's phone rings. If no answer in 180 seconds, the agent escalates to the technical director's mobile and texts both captains in parallel.
  • Planner queue. Any other ADN class, schedule slips, lock conflicts, mechanical reports that aren't safety-critical. Goes to the planner on call. SLA: 4 minutes.
  • Bot. ETA updates, lock-window rebooks, cargo confirmations, simple availability questions. The agent handles these end-to-end and writes a single line back to the dispatch board.

ADN class definitions come from the UNECE ADN regulation; we encode them as a JSON file and version it. When the 2025 edition shifted one UN number from class 3 to class 6.1, we updated the JSON and ran a backtest against the last six months of calls. Two of 1,200 calls would have re-routed. We told the captains. They knew already.

What broke in week one

Three things, all of them obvious in hindsight.

One. A skipper from Sint-Niklaas with a thick West-Vlaams accent confused the intent classifier. The bias list helped on nouns but not on the surrounding grammar. We added a second-pass classifier that runs on the raw transcript when the first one's confidence drops below 0.6. Slower, but it stopped misrouting him.

Two. A captain had silenced his work phone for a doctor's appointment and forgot to turn it back on. The agent dutifully called him, got voicemail, escalated to the second captain after 90 seconds — and that worked. But we hadn't planned for "captain phone silent" as a recurring class. We now ping both captains in parallel on a class 2 from the first second, and the second hang-up costs us nothing.

Three. The Autena box was sized for ten dispatchers, not ten concurrent agent threads. Synchronous load lookups for the morning rush brought the SQL Server CPU to 95%. We added a 60-second TTL cache in front of Autena and a single-flight lock so two callers asking about the same barge share a lookup. CPU dropped to 30%.

Replay or it didn't happen

Every call is recorded with the skipper's consent (a one-time enrollment, not per-call). Every decision the agent makes is logged with the inputs that fed it: ASR transcript, ADN class candidate, Autena response, Exchange response, route. When a captain says "the agent escalated me for nothing," we pull the replay in 90 seconds and walk through the reasoning. Twice in five months we found a genuine bug. Six times we found a captain who had forgotten what the policy said. The replay tool is non-negotiable for trust.

What we'd do differently

Build the captain-on-call rota inside the voice agent's own data model. We initially used PagerDuty because it was sitting there, and we ended up writing more glue than the rota itself would have been. Five tables in Postgres would have been cheaper.

Run intent classification and ADN classification as two distinct calls. We tried to fold them into one prompt to save 400ms. We saved the latency and ate the accuracy. The cost of one wrong route at 03:12 is much higher than 400ms on a Tuesday afternoon.

Test with real recordings. We piloted with staged voice clips from the planners reading scripts. Real skippers have engine noise and Marifoon chatter behind them; staged voice doesn't. Three weeks of pilot data is worth more than three months of staged tests.

The smallest thing you can do today

Take your own emergency line, whatever shape it has, and write the 35-second budget on a whiteboard. Where does the time go today? Pickup, hold, lookup, hand-off. If you don't know, sit next to the person who answers it for one shift and time them. The agent comes later. The budget is the playbook.

When we built this voice agent for the Dordrecht rederij, the hardest part was not the model or the SOAP service — it was getting the captains and the planners to agree on what counted as a class 2 in the grey zone. We wrote the taxonomy with them on a Friday afternoon, then encoded it on Monday.

Key takeaway

Your internal SLA is not the legal clock. Build the 35-second budget first, then build the agent that fits inside it.

FAQ

How is this different from a standard IVR?

An IVR menus the caller. This agent doesn't. The skipper speaks freely; the agent classifies, looks up the manifest, checks the lock window, and routes. No '1 for planning, 2 for emergencies' tree.

Why doesn't the agent file the IVS90 melding itself?

Because the legal report is the captain's responsibility, not the agent's. The agent buys time, gathers context, and hands the captain a pre-filled draft. Filing stays human.

How do you handle Vlaams and regional accents?

A maritime keyword bias list does most of the work. When confidence on the first classifier drops below 0.6 we run a second pass on the raw transcript. It costs latency but stops misroutes.

What does class 2 detection cost in latency?

Six seconds inside the 35-second budget: a cheap regex first, then a small classifier as a second opinion. Lookups in Autena and Exchange run in parallel after, not before.

voice agentsai agentsintegrationslegacy sitescase studyoperations

Building something?

Start a project