Operations

AI voice picking on Zebra TC22: scoring three setups

A planner at a Dutch 3PL has three quotes on her desk: a Claude voice agent on Zebra TC22, a teamleider rotation, and a hybrid. Here is how we score them.

Jacob Molkenboer· Founder · A Brand New Company· 21 Jun 2026· 8 min

Three folded cream paper quotes with linen ribbons, chartreuse ribbon, brass bell, leather ledger, red wax seal on ivory desk.

It is Tuesday morning in a Tilburg DC, the kind of grey June morning that makes the dock doors look like they have always been open. The operations lead has three quotes on her desk. The first is for a Claude-driven voice agent that talks to pickers through Zebra TC22 handhelds. The second is a contract for four teamleiders working a 24/7 rotation. The third, the one her CFO keeps circling, is a hybrid: agent-first conversation, hand off to the existing SAP EWM when the path gets weird. She has to decide before the school holidays. She has 38,400 orderlines a week to move regardless.

We get this call about twice a quarter. The sender is usually a sub-€12M Dutch 3PL or a private-label brand that runs its own logistiek. The honest answer is that none of the three options is wrong. But only one of them is right for a given site, and the way to tell is to score three numbers, not ten. Here is the method we use.

The three numbers that decide it

Voice picking has been a vendor pitch for fifteen years, so most readers come in with a fog of "supposedly 35% productivity gains" claims. Strip that away. For a Dutch DC under €12M annual revenue, three numbers carry the entire decision:

Per-pick cost at 38,400 weekly orderlines, all-in.
Arbo defensibility under the RI&E tilt thresholds.
Who owns the foutpick correction when the agent compromises during the nachtploeg.

Every other concern — accent handling, MDM, OAuth, your operations manager's opinion of "AI" — collapses into one of those three.

Per-pick cost, fully loaded

The trap is to compare hourly labor against API cost. Don't. The pickers walk the warehouse in all three options. What you are actually replacing is the instruction layer: the voice telling the picker which SKU, which bin, which quantity, and what to do when reality disagrees.

For the four-person teamleider rotation, take the loaded cost of a teamleider in NL (we use €58k–€68k fully burdened depending on CAO and shift toeslag), multiply by four, add a 10% rotation buffer for vacation and illness. Divide by 1,996,800 annual orderlines. You land somewhere near €0.13 per pick.

For the pure voice agent, the cost is the model API plus device amortization. A Claude conversation per pick is usually 2–6 seconds of audio in, 2–4 seconds out, in bursts. At the voice rates we have been quoted for 2026, a normal pick is well under a eurocent. The Zebra TC22 amortizes at roughly €0.001 per pick over three years. Call it €0.02 per pick once you add error fallback and idle holds. That looks like a 6x saving.

It is not. The teamleider rotation also catches the bad fust, trains the uitzendkracht, and signs off on retours. The agent does none of that, so you carry it elsewhere — usually back to your dagploeg supervisor, who now has thirty more interruptions an hour. Score the option, not the line item.

The hybrid sits between the two: EWM handles the routine pick instruction (zero marginal cost since you already pay the licence), and Claude only enters the conversation on exceptions. In practice that means the model bills for 10–15% of picks instead of 100%. Our hybrid costs sit between €0.03 and €0.05 per orderline once integration is amortised over twelve months.

Arbo defensibility

This is the number most pitch decks skip, and it is the one that sinks the project after go-live.

Under the Dutch Arbobesluit and the arboportaal guidance on fysieke belasting, an instruction system that directs an employee into a tilt outside the safe NIOSH envelope is, in practice, the employer's liability. If the agent tells the picker "take 18 units from bin B-12-04" and bin B-12-04 is at 1.85m with a horizontal reach of 65cm, you have just instructed a tilt index above 1.0 over a four-hour period. The Nederlandse Arbeidsinspectie does not care that an LLM said it.

A human teamleider has fifteen years of "ja, dat pak je met de trap, niet met je rug" baked in. An agent has whatever you put in its tool definitions. So the test is: can the agent refuse a pick on RI&E grounds, and can it log the refusal in a way that an inspector would accept?

Warning

If the agent's only escape hatch is "vraag een collega", you do not have an Arbo-defensible system. You have a chatbot with deniability.

The hybrid changes this. SAP EWM already carries the bin master with height, depth, and product weight. Letting EWM gate every pick against an RI&E rule set, then letting Claude do the conversation around the pick, gives you a paper trail that holds up. The agent becomes the friendly voice; EWM remains the system of record.

Ownership of the foutpick at 03:14

It is night shift. The agent told Marek to pull a case of half-liter bottles from bin C-04. The bin was already empty because the dagploeg never closed the cycle count. The agent compromises: it routes Marek to C-07, which has the same SKU but a different lot. He picks. The order ships. Eleven days later a retailer rejects the pallet on lot mismatch and you eat the return.

Now: who owned that decision?

In the teamleider rotation, the answer is a name. Robin made the call, Robin signs the niet-conform, Robin learns. In the pure agent setup, the answer is "the model and the prompt." That is not a person. It is not a process. It is a debugging exercise three weeks later. Insurers and warehouse managers both hate that distinction.

The hybrid wins this category if — and only if — every agent compromise generates a typed escalation that lands in EWM as a transaction with a human signoff queue. No signoff queue, no ownership. No ownership, no hybrid.

The scoring sheet we hand the client

We score each option from 1 to 5 on each of the three axes, and we weight cost at 1, Arbo at 2, ownership at 2. Cost matters but it is the easiest to model; the other two are where the wheels come off. The sheet looks like this:

                          Cost (x1)   Arbo (x2)   Ownership (x2)   Total
4-pers teamleider rotatie    2           4             5             20
Pure Claude voice agent      5           2             1             11
Hybrid agent + EWM           4           5             4             22

Those are the typical numbers we see for a sub-€12M 3PL running broken-case picking on a single site with two shifts. They move. A pharma DC running koelketen scores Arbo and ownership higher and weights the hybrid even further ahead. A fashion DC with a 90% single-SKU pick mix can sometimes justify the pure agent because the compromise risk is genuinely small.

Why the hybrid almost always wins, with one hand-off rule

The hybrid only beats the teamleider rotation if you write the hand-off rule down before signing the contract. We use a single sentence:

Any agent action that deviates from the EWM-planned pick must be returned to EWM as a typed exception with a named human approver, before the picker hears confirmation.

That sentence has three loaded parts. "Typed exception" means EWM gets a structured payload, not a free-text note. "Named human approver" means a person, not a role mailbox. "Before the picker hears confirmation" means the agent waits — the picker cannot move on until the loop is closed. This is the line between a defensible system and a story you tell the Arbeidsinspectie afterward.

Concretely, our agent tools look like this:

{
  "name": "request_pick_deviation",
  "description": "Escalate a deviation from the EWM-planned pick.",
  "input_schema": {
    "type": "object",
    "required": ["plan_id", "reason_code", "proposed_bin", "rie_check"],
    "properties": {
      "plan_id":      { "type": "string" },
      "reason_code": {
        "enum": ["bin_empty", "damaged", "lot_mismatch", "weight_outside_envelope"]
      },
      "proposed_bin": { "type": "string" },
      "rie_check": {
        "type": "object",
        "required": ["height_cm", "weight_kg", "frequency_per_hour"],
        "properties": {
          "height_cm":          { "type": "number" },
          "weight_kg":          { "type": "number" },
          "frequency_per_hour": { "type": "number" }
        }
      }
    }
  }
}

The agent cannot send the picker to a deviating bin without filling in rie_check. EWM rejects the deviation if the implied tilt index goes above 1.0. The agent then has two choices: route the picker to a different bin, or escalate to the named approver on shift. That is the whole game.

Where the pure voice agent still wins

We are not anti-agent — we ship them for a living. The pure setup wins when three conditions all hold: a single site, a narrow SKU range with stable bin assignments, and a willingness to keep one human supervisor on a desk somewhere who can intervene through a soft phone. In practice that is a copacker with 600 SKUs and one shift. They are rare. If a client matches all three, the per-pick cost drops to about €0.018 and the supervisor's hours go down by 70%.

If even one condition is missing — multi-site, broad SKU, full 24/7 — the hybrid is the answer every time. We have run the numbers for nine clients in the past two years and the hybrid won eight of them. The ninth was the copacker.

The five-minute audit before you sign

Before you sign any of the three contracts, do this. Pull last quarter's foutpick log. For every error, write next to it: would a human teamleider have caught this in the moment? If the answer is "yes" more than 35% of the time, the pure voice agent is off the table. If the answer is "no" more than 80% of the time, the four-person rotation is overkill and the hybrid pays back in under nine months. If you land between 35% and 80%, the right move is to start with the hybrid and let the agent's escalation rate set the long-term staffing.

When we built the voice-agent layer for a flowershipping operator near Aalsmeer last winter, the thing we ran into was exactly this: the agent could converse beautifully, but compromises during the nachtploeg had no owner until we wrote them back into EWM as typed exceptions. We ended up solving it by treating every deviation as a workflow step with a name attached, which is the kind of thing our AI agents practice now defaults to. Try the five-minute audit on your own foutpick log this afternoon. The answer will surprise you, in one direction or the other.

Key takeaway

The hybrid wins because EWM stays the system of record and a named human owns every deviation. The agent is the voice, not the decision.

FAQ

Can a voice agent refuse a pick on Arbo grounds?

Yes, if its tool definitions force an RI&E check on every deviation and EWM rejects unsafe tilts. Without that gate, the refusal has no paper trail and no defensibility.

What does a Claude voice pick cost per orderline in 2026?

Under one eurocent in pure API terms, around €0.02 once you add device amortization and error fallback. The hybrid sits at €0.03–€0.05 because the model only talks on exceptions.

Is SAP EWM required, or will a lighter WMS work?

Any WMS that stores bin height, weight, and pick frequency, and accepts typed exceptions, will work. EWM is the most common in Dutch 3PLs but the pattern is portable to Manhattan or Körber.

How long does the hybrid take to ship for a single-site DC?

Eight to twelve weeks: four to build the agent and its tools, two to wire the EWM exception path, two to dry-run alongside the existing teamleider rotation before cutover.

voice agentsai agentsoperationsworkflowautomationstrategy

Building something?

Start a project