Automation

Order-agent workflows: Temporal, Inngest, or DIY outbox

Friday 21:47. PostNL pushes a tariff change. By 22:10 our client's order-agent dead-letter queue holds 312 messages. The workflow engine choice now matters.

Jacob Molkenboer· Founder · A Brand New Company· 21 Jun 2026· 9 min

Brass pneumatic tube capsule, paper form with carbon copies, chartreuse sticky note, wax seal on green felt blotter.

Friday 21:47. The order agent at our Almere fulfilment client is running its usual Friday peak: retour labels for last week's Cyber-Week orders, COD reconciliation for the weekend deliveries, a long tail of consumer questions hitting the chat side. Then PostNL pushes a tariff update. The rate endpoint starts returning a partial schema. By 22:10 the dead-letter queue holds 312 messages and climbing.

The on-call engineer is a co-founder. He has two laptops open. The question on his mind is not whether the queue is there. Every workflow engine has a queue. The question is whether we can ship Monday morning without losing those 312 messages, without manually rewriting last week's events, and without explaining to the auditor why one customer was charged the old tariff and another the new one.

That question is the whole reason we went through this comparison.

The order agent in question

The client is a 26-person e-fulfilment partner in Almere. They handle warehouse, pick-and-pack, and last-mile coordination for about forty mid-market e-commerce brands. The order agent is the layer that sits between the brands' shop systems (Shopify, Magento, a couple of WooCommerce holdouts) and the internal WMS, plus PostNL, plus DHL Parcel, plus a custom COD-to-bank reconciliation feed.

The volume profile, the week we started the comparison:

8,400 retour-en-rembours flows per week (a flow is one shipment with either a return label or COD reconciliation, often both).
Five to twelve workflow steps per flow, depending on whether COD bounces, whether the return scan triggers a refund hold, whether the brand wants a chat-agent follow-up.
Long tail: COD reconciliation can stay open four to seven days waiting for the bank file. Some retour flows stay open thirty days waiting for the consumer to actually post the package.

Forty thousand to a hundred thousand step-executions per week. Most steps short, a handful long-running, a handful waiting for an external file that may never arrive. This is the classic shape that durable-workflow engines exist to solve. The question was which one.

Three candidates on the whiteboard

Temporal. Durable execution. You write workflows in normal code (TypeScript SDK, in our case). The Temporal server records every step as an event in the workflow history. Crashes, deploys, and rate-limit timeouts are invisible to your code; on resume, the worker replays history to reach the current state. Open-source server you can self-host, or Temporal Cloud.

Inngest. Step-function-style. You write functions that yield steps. Inngest stores step results and replays the function from the last completed step. Hosted by default, with a self-host story that arrived in 2024. Pleasant developer experience, especially the local dev UI.

BullMQ + Postgres outbox. The hand-rolled option. BullMQ for the queue and worker lifecycle, Postgres for a transactional outbox and a step ledger we'd write ourselves. No new vendor. No new accent in the on-call rotation.

The three differ on a lot of axes. Three mattered enough to score against: per-shipment cost at the client's volume, replay-defensibility under Dutch trade-secret law, and who in the building can patch the worker on a Friday night.

Per-shipment cost at 8,400 flows a week

Both Temporal Cloud and Inngest price on actions or step-executions. The exact rate isn't the point of this post; the shape is. At 437k flows a year with five to twelve steps each, you are buying somewhere between three and six million step-executions per year. Whatever the per-unit rate, multiplying by single-digit millions concentrates the mind.

The BullMQ + Postgres option has effectively zero marginal cost per step beyond Redis memory and a row in the step ledger. The Postgres database was already there. The Redis instance was already there for the existing job queue this client had been running.

That sounds like an obvious win for the hand-rolled approach. It isn't. The right thing to add to the BullMQ bill is the salary cost of the person who owns the workflow primitives. Two days a quarter spent debugging idempotency in a hand-rolled step ledger is real money. So is the day spent reading Temporal's versioning docs the first time you need to change a workflow signature on an in-flight order.

What tipped the cost analysis was the long tail. A retour flow that sits open for thirty days is, in the Temporal and Inngest pricing model, a workflow that periodically wakes up and may incur an action each time. In the BullMQ model, it is a row in Postgres that costs nothing while it sleeps and one delayed job that BullMQ wakes on schedule. At this client's tail length and volume, the long-running flows tilted the per-shipment cost meaningfully toward the DIY option.

Takeaway

If most of your flows complete in seconds, hosted workflow engines are cheap. If a meaningful slice sleeps for days, model the long tail explicitly before you sign a contract.

Replay defensibility under Dutch trade-secret law

The Wet bescherming bedrijfsgeheimen is the Dutch implementation of EU Directive 2016/943 on the protection of trade secrets. For a fulfilment partner, the protectable assets are concrete: per-brand rate sheets, the COD-to-bank mapping, the consumer return-rate signal that some brands treat as competitively sensitive.

The law requires that the holder has taken redelijke maatregelen, reasonable steps, to keep the information secret. Case law on what counts as reasonable for digital records is still thin in NL, but our reading, and our client's lawyer's reading, is that the location and access path to your event log matter. If a dispute arises and you need to reconstruct what your system did during a specific 23-minute window, two questions follow: can you reproduce it deterministically, and who else has had access to the record?

Temporal's event history is the source of truth and replays deterministically by design. That is a real strength. On Temporal Cloud, however, that history lives in a vendor's storage, with the usual cross-border data flows. Inngest's hosted product has the same shape. Both vendors offer self-hosting, but the operational story for a 26-person company self-hosting Temporal is not trivial: it is Cassandra or PostgreSQL plus the Temporal cluster, plus the workers, plus the version upgrades. Don't pick the self-host path to satisfy a legal requirement and then under-resource it.

The BullMQ + Postgres path keeps the event ledger in the client's own Postgres in their own rack in Amsterdam. The auditor reads one database. The replay code is the worker code. The trade secret never leaves the building.

Who patches the worker at 22:10

This was the deciding question.

The client has two backend engineers and one ops lead. Nobody has shipped Temporal workflows before. Both engineers have written BullMQ workers for the past three years. Postgres is the language they think in.

Friday-night production patching, on every engine, requires three things: understand what state the in-flight work is in, ship a fix without corrupting that state, and replay the dead letters cleanly. On Temporal, the muscle memory you need is workflow versioning: Patched, GetVersion, deterministic replay rules. On Inngest, function versioning is more forgiving but you still need to understand step memoization. On BullMQ + Postgres, the muscle memory is whatever your team already has plus the discipline of an idempotent step handler.

The honest read: if your team already runs Temporal in production elsewhere, the Friday-night story is best on Temporal. If your team writes Postgres queries before breakfast and has never used a workflow engine, the Friday-night story is best on the engine they will not have to learn while the DLQ fills.

What we shipped

We picked BullMQ + Postgres outbox, with a thin step-record pattern borrowed from how Temporal models history. Every step is a row. The worker is idempotent on step-id. Replay is a SQL query plus a re-enqueue.

// step.ts — minimal Temporal-style step recorder on Postgres
import { sql } from './db'

export async function step<T>(
  flowId: string,
  name: string,
  fn: () => Promise<T>,
): Promise<T> {
  const existing = await sql<{ result: T }>`
    select result from flow_step
    where flow_id = ${flowId} and name = ${name}
    limit 1
  `
  if (existing.length) return existing[0].result

  const result = await fn()

  await sql`
    insert into flow_step (flow_id, name, result, recorded_at)
    values (${flowId}, ${name}, ${sql.json(result)}, now())
    on conflict (flow_id, name) do nothing
  `
  return result
}

Used inside a BullMQ processor, this gives you the one Temporal property that mattered: a step runs at most once per flow, and a crashed worker resumes from where it stopped. It is forty lines. It is not Temporal. It does not need to be.

The Friday-night PostNL incident, in this design: the rate-call step records its 422 in flow_step. The worker raises and BullMQ pushes the job to DLQ. Monday morning we publish a fix to the rate handler, requeue the DLQ jobs by flow-id, and every step that succeeded last week is skipped on replay. No customer charged twice. No event rewritten. The 23-minute window is one SQL query: select * from flow_step where recorded_at between … order by recorded_at.

Where the decision would flip

This is a comparison, not a coronation. If the client had been larger, call it 200+ engineers, a real platform team, multiple product lines sharing infra, Temporal would have won. The investment in workflow versioning, the cost of the action meter, and the operational weight of a self-hosted cluster all amortise across more workflows and more engineers.

If the client had wanted a great developer-experience front end and was indifferent to where the event log lived, Inngest would have been the easiest yes. Their local dev UI saves you from writing the observability you would otherwise hand-roll.

For 26 people in Almere, with 8,400 weekly flows, a Dutch trade-secret posture to defend, and a Postgres-first engineering team, the boring answer was the right one.

When we built the order-agent layer for this client, the thing we kept running into was not which engine to pick but how to keep the step ledger small enough that a junior engineer could read it on a Friday night. We ended up trimming the ledger to flow-id, step-name, result, and timestamp (four columns), and pushing everything else into normal application tables. Five minutes after you read this, the cheapest audit you can do on your own workflow layer is to count how many columns your equivalent table has. If it is more than six, ask why.

Key takeaway

For a 26-person fulfilment team with a Dutch trade-secret posture, the boring BullMQ + Postgres outbox beat Temporal and Inngest on per-shipment cost and Friday-night reality.

FAQ

When does Temporal beat a hand-rolled outbox?

When your team already runs Temporal elsewhere, when you have a platform team sharing it across products, or when long sleeps are rare. The action-meter cost amortises and the workflow versioning muscle pays back.

Is self-hosting Temporal realistic for a 26-person company?

Possible but expensive in attention. You're running Postgres or Cassandra plus the Temporal cluster plus workers, with version upgrades. If it's the right call, budget at least a quarter of an engineer for ops, not zero.

How does the Wbb apply to an event log in a vendor cloud?

It requires redelijke maatregelen, reasonable steps to keep the secret. A vendor cloud isn't automatically out, but you must document access, encryption, and contractual restrictions. Local storage simplifies the audit story.

What is the cheapest thing we can copy from this design?

The step recorder. A flow_step table with flow_id, step_name, result, recorded_at, plus a step() wrapper that checks for an existing row before running. Forty lines. Gives you at-most-once per flow.

ai agentsautomationworkflowarchitectureoperationse-commerce

Building something?

Start a project