Case study

WhatsApp dispatch agent: from 22 minutes to 4 at a 3PL

Twelve trucks. Three dispatchers. A WhatsApp group at 05:30 with two hundred messages from drivers asking the same five questions. Here is what we replaced it with.

Jacob Molkenboer· Founder · A Brand New Company· 3 Jun 2026· 10 min

Brass pneumatic tube canister on green leather blotter, folded paper slip with ribbon, red wax seal, ivory paper surface.

The depot at 05:30

Twelve line-haul trucks idling. Three dispatchers. One WhatsApp group with two hundred unread messages from the night before. Half are drivers asking where to pick up first. A quarter are drivers who already left and need the next drop sheet. The rest are photos of damaged crates, broken pallets, and stops with the wrong postcode.

Our client runs a 38-person regional logistics outfit in the south of the Netherlands. Last-mile and B2B pallet work, six warehouses, around 90 routes a day. Every morning the dispatch handoff (driver leaves the yard with a confirmed manifest and route) took 22 minutes on average. They had timed it themselves over two weeks. Mondays were closer to 35.

Twelve weeks later, average handoff is 4 minutes, drivers self-serve the next 80% of their questions through a WhatsApp agent, and the dispatch team went from three people at 05:30 to one. Here is exactly what we built, and what we got wrong on the way.

The 22 minutes, audited

We refused to start before we knew where the time actually went. We sat in the dispatch office for four mornings with a stopwatch, a notebook, and a printed copy of every WhatsApp thread from the prior week. The result was unflattering.

9 minutes per handoff was a dispatcher reading a question, opening the TMS in another tab, copying a field, pasting it back into WhatsApp.
5 minutes was clarifying the driver's question because the message was a voice note, a half-cropped photo, or the wrong vehicle ID.
4 minutes was status pings ("Ben je al onderweg?", "Heb je gelost in Tilburg?") that the TMS already knew the answer to.
3 minutes was the genuinely hard part: an exception, a damaged drop, a customer call.
1 minute was actual paperwork: signing off the manifest.

Only the last 4 minutes needed a human. The rest was the dispatcher acting as a slow translator between a phone keyboard and a Postgres database.

The case for WhatsApp over a driver app

The client had tried a driver app twice. Both times it died inside six months. Drivers ignored push notifications, kept the icon on the fourth home screen, and used the WhatsApp group anyway. The lesson was already there in the data: every driver, every shift, was on WhatsApp within the first thirty seconds of starting their truck.

We built the agent on top of the WhatsApp Business Cloud API. Drivers send messages to the same number they had used for the past three years. The number is unchanged. There is nothing to install. There is no login screen. There is no "please update the app" friction.

Takeaway

The right interface is usually the one your users are already in. The wrong interface is the one your product team likes building.

The shape of the agent

The agent does five things. We did not let it do more in v1, even though we could have, because every extra capability is a new failure mode at 05:30 on a Monday.

Answer "where do I go first?" with the first stop, address, gate code, and contact name.
Answer "what's my next drop?" by marking the previous stop complete and returning the next one.
Accept a photo of a damaged item, log it against the current stop, and notify the customer-success inbox.
Accept "ik ben laat" or similar and reshuffle the ETA for downstream stops.
Escalate anything it does not understand to a human dispatcher inside 30 seconds, with the full message thread attached.

Everything else (driver swap, vehicle swap, fuel-card lockouts, custom requests from a customer) goes to a human. The agent never tries to be clever about the unhappy path.

The Postgres view that did most of the work

The existing TMS was a 2014-era PHP application on MySQL. We did not touch it. We replicated the four tables we needed (routes, stops, drivers, vehicles) into a Postgres database via a Debezium change-data-capture pipeline, and we built one materialised view called driver_now.

CREATE MATERIALIZED VIEW driver_now AS
SELECT
  d.id              AS driver_id,
  d.whatsapp_number AS phone,
  d.full_name       AS driver_name,
  r.id              AS route_id,
  r.shift_start_at  AS shift_start,
  v.plate           AS vehicle_plate,
  -- The next pending stop, or NULL if route is done
  (
    SELECT jsonb_build_object(
      'stop_id',   s.id,
      'sequence',  s.sequence_no,
      'customer',  s.customer_name,
      'address',   s.address_line,
      'postcode',  s.postcode,
      'city',      s.city,
      'gate_code', s.gate_code,
      'contact',   s.site_contact,
      'window',    tstzrange(s.window_start, s.window_end),
      'notes',     s.driver_notes
    )
    FROM stops s
    WHERE s.route_id = r.id
      AND s.status   = 'pending'
    ORDER BY s.sequence_no
    LIMIT 1
  ) AS next_stop,
  (SELECT COUNT(*) FROM stops s
     WHERE s.route_id = r.id AND s.status = 'done')    AS stops_done,
  (SELECT COUNT(*) FROM stops s
     WHERE s.route_id = r.id AND s.status = 'pending') AS stops_left
FROM drivers d
JOIN routes  r ON r.driver_id = d.id AND r.shift_date = CURRENT_DATE
JOIN vehicles v ON v.id = r.vehicle_id;

CREATE UNIQUE INDEX driver_now_phone_idx ON driver_now (phone);

This view is refreshed every 30 seconds with REFRESH MATERIALIZED VIEW CONCURRENTLY. The agent never queries the live TMS. It looks up the driver by their WhatsApp number, gets one row, and that row has everything it needs to answer 80% of incoming questions. Read-only, indexed, predictable.

The trick is that the hard work (joining routes to drivers to stops, sorting by sequence, filtering by status, deciding what "now" means) lives in the database. The agent does not reason about scheduling. It reads a row.

The agent itself

The agent is a thin worker that subscribes to the WhatsApp Cloud API webhook, classifies the inbound message, calls one of four tools, and replies. We use a small Dutch-tuned model for classification and a larger model only for the damage-photo path, where we need vision and a more careful reply. Here is the entire tool surface:

type Tool =
  | { name: "get_next_stop";    args: { phone: string } }
  | { name: "mark_stop_done";   args: { phone: string; stop_id: string } }
  | { name: "log_damage";       args: { phone: string; stop_id: string; photo_url: string; note?: string } }
  | { name: "delay_route";      args: { phone: string; minutes: number } }
  | { name: "handoff_to_human"; args: { phone: string; reason: string } };

Every tool result is a small JSON object. The reply text is filled by a deterministic formatter, not by the model. The model picks the tool and the parameters. The wording of the reply is a Dutch template the dispatch team wrote and signed off on. This is the single most important architectural choice in the whole system.

Warning

Letting an LLM phrase a reply that contains an address, a gate code, or a time window is how you ship a "94% reliable" agent that gives a driver the wrong postcode on Friday afternoon. Use deterministic templates for any field a driver will copy or act on.

The eleven-day shadow run

We did not flip a cutover switch. For 11 working days the agent received every WhatsApp message and produced a reply, but the reply went to a private channel the dispatchers watched. The dispatchers answered the driver themselves. After every interaction, the dispatcher tapped one of three reactions on the agent's draft: green (would have sent), yellow (close, would have tweaked), red (wrong).

By day 4 we were at 71% green. By day 9, 92%. The yellow messages were almost all the same three things: drivers asking about a stop that had a same-day window change, drivers using a nickname for a customer (the agent looked up the legal name), and drivers sending a voice note in a regional dialect the speech model fumbled.

We fixed the first two in the view and the lookup table. The dialect one we accepted. Anything the agent cannot transcribe with high confidence escalates to a human. That is one extra human reply per dispatcher per day. We are fine with that.

Three things that broke

All three are worth telling.

The 24-hour session window

The WhatsApp Business Platform only lets you send free-form messages inside a 24-hour window after the user's last message. Outside that window you need an approved template. Our first version tried to send a "good morning, here is your first stop" message at 05:15 to drivers who had last messaged the system the previous afternoon. About a third of those drivers were outside the window. The messages silently failed.

The fix was twofold. We registered a small set of templates (route_start, stop_next, delay_ack) with Meta. And we now expect the driver to message first ("start" or a thumbs-up emoji) to open the session. Drivers learned this in two days.

The materialised view race

Twice in week one, a driver marked a stop done, immediately asked for the next stop, and got the same stop back. The view had not refreshed yet. The fix was to bypass the view for the mark_stop_done → get_next_stop chain: when we mark a stop done, the agent reads the next stop from the underlying tables directly, with a row-level lock, and returns it in the same response. The view stays fast for the 95% case. The chain stays correct for the 5%.

The photo upload pipeline

WhatsApp media URLs are signed and expire in 5 minutes. Our first damage-logging worker stored the WhatsApp URL in Postgres and tried to render it on the customer-success dashboard an hour later. Broken images everywhere. We now fetch the photo inside the webhook handler, push it to S3, and store the S3 URL. Three lines of code, half a day of customer-success confusion before we noticed.

The numbers, twelve weeks in

Average handoff time: 22 minutes to 4 minutes.
Dispatch staffing at 05:30: 3 people to 1 person. The other two start at 08:30 and handle exceptions for the rest of the day.
WhatsApp messages handled by the agent without a human: 81% of inbound, measured over the last 30 days.
One missed-stop incident, week 6, caused by an out-of-hours gate-code change that did not flow through CDC. Fixed at the source.
Driver feedback was not collected formally before. We ran a short survey in week 10 and the most common comment was that the WhatsApp group felt calmer, because the agent replies in DM and not in the group.

The headline cost saving is real but boring: two dispatcher shifts a day, five days a week, plus a meaningful drop in customer-service load because damage reports now arrive with a photo, a timestamp, and a stop ID inside 30 seconds. The interesting part is the second-order effect. Drivers finish their route 12 to 18 minutes earlier on average, because the friction of asking "what's next?" is gone. That is roughly one extra drop per truck per day.

The portable pattern

You probably do not run a 3PL. The pattern is portable anyway. Three pieces.

Identify the moment in your operation that looks like 22 minutes of WhatsApp ping-pong. It is almost always a status question, a "what next?" question, or a photo-of-something-broken question.
Build one read-only view over your existing database that answers that question in a single row. Do not migrate your TMS, ERP, or CRM. Replicate the four tables you need.
Put a thin agent in front of it on the channel your users are already in. Let the model pick tools. Let your templates pick words.

When we built this for the logistics client, the thing we kept underestimating was how much of the win came from doing less in the agent, not more. A small surface area with a deterministic reply template beat every "let the model handle it" prototype we tried. If you are looking at a similar build, the architecture choices around our AI agents work are the same ones that drove this case. The five-minute audit that starts every one of these projects is the same as the one we ran here: sit with the team, stopwatch, four mornings, no slide deck.

The smallest thing you could do today: open your operations WhatsApp group, scroll back two weeks, and count how many messages are status questions a database already knows the answer to. If it is more than half, you have the project.

Key takeaway

Put a thin agent on the channel your team already uses, point it at one read-only database view, and let templates pick the words.

FAQ

Why WhatsApp and not a custom driver app?

Drivers were already in WhatsApp on every shift. Two prior driver apps had failed because of install friction and ignored push notifications. Meeting users where they were beat building a new front door.

Do you have to replace the existing TMS?

No. We left the MySQL TMS untouched and replicated four tables into Postgres via change-data-capture. The agent only ever reads from the replica, so there is no risk to the system of record.

How long did the build take end to end?

Twelve weeks from kickoff to full cutover. Four weeks of audit, schema, and the first agent. Three weeks for shadow mode. Two weeks of fixes. Three weeks of gradual rollout across the six warehouses.

What about drivers who do not speak Dutch?

The classification model handles short Dutch, English, and Polish messages. Anything ambiguous escalates to a human. Reply templates are bilingual where the driver record flags a non-Dutch primary language.

Is the WhatsApp Business Cloud API free?

The API itself has no per-message fee for user-initiated session messages within 24 hours. Template-initiated messages outside that window are priced per conversation by Meta. Numbers are in the WhatsApp pricing docs.

case studyai agentschat agentsautomationworkflowoperations

Building something?

Start a project