← Blog

Automation

Workflow orchestration at 240k/day: n8n, Windmill, Temporal

A Utrecht SaaS team asked us to pick the orchestration backbone for 240,000 webhook events a day. We benched n8n, Windmill, and Temporal on the same hardware.

Jacob Molkenboer· Founder · A Brand New Company· 8 Jun 2026· 7 min
Three brass relays linked by copper wire on ivory paper, green sticky note and red wax seal, dark side light.

The brief landed on a Tuesday. A 34-person SaaS in Utrecht, post Series A, sells a logistics integration product. Their pipeline catches roughly 240,000 webhook events a day from carriers, marketplaces, and customer ERPs. The two engineers on the platform team were burning out on Cloudflare Workers and queue glue code, and asked us to pick the orchestration backbone they would live with for the next three years.

They had a shortlist: n8n self-hosted, Windmill, Temporal. We benched all three on the same hardware, with the same realistic load, then ran a chaos test that the marketing pages never mention. Here is what the numbers actually looked like.

The workload

240,000 inbound events per day. Peaks around 9 events per second sustained, bursts to 80 per second when a large carrier batches releases at 02:00 UTC. Each event triggers a chain of 4 to 9 steps: validate signature, enrich from internal Postgres, dedupe against Redis, write to the warehouse, fan out to one or three downstream APIs. Latency budget end-to-end: 800ms p95, otherwise the carrier retries and we double-process.

Hardware for the bench: one Hetzner AX52 (Ryzen 7700, 64GB RAM, NVMe). Postgres 16 on the same box. Same network, same load generator (a vegeta script replaying 24 hours of anonymised production traffic).

n8n self-hosted

n8n is the obvious starting point. Node.js, visual editor, hundreds of community nodes. We deployed it in queue mode with two worker containers, Postgres for the execution store, Redis BullMQ for the broker. Documentation is decent if you read it twice.

The good part: a non-engineer ops person can read the workflow. We had a finance lead modify a retry branch herself, which is the kind of thing that matters more than throughput in a 34-person company.

The not-good part showed up around 60 events per second sustained. The queue happily accepted work. Execution started backing up. Workers were CPU-bound on signature validation, which n8n runs inside the same Node process as the workflow engine. p95 step latency drifted from 110ms at idle to 480ms under burst.

Cold start of a single workflow with three steps, measured from POST /webhook to first node executing: between 80 and 220ms depending on whether the execution data was on warm pages. Not slow. Not free either, multiplied by 9 million invocations a month.

Windmill

Windmill is the newer contender. Rust core, TypeScript and Python workers, built by an ex-Palantir team in Paris. It feels like someone took n8n's editor and rebuilt the runtime to be honest about cold starts.

We ran Windmill with six worker processes against the same Postgres instance. The signature validation step we wrote in Deno; the enrichment step in a Python script that imported httpx. Both reused warm worker pools.

Cold start, same three-step workflow: 14 to 40ms. The Rust scheduler does not load a JavaScript interpreter to decide what to run, which sounds obvious but explains the gap. At 80 per second burst the box stayed under 35% CPU and p95 latency held at 190ms.

Two friction points we hit, neither documented. First, the shared Python worker pool is great until two scripts pin different httpx versions. We had to split pools per script family, which Windmill supports but the UI does not surface clearly. Second, workflow versioning is per-flow, not per-step. If you change a step that 12 flows reference, you get 12 new flow versions and the audit log gets noisy. The Windmill team has been responsive on their GitHub issue tracker, which counts for a lot when you are betting a platform on a tool that is still on a one-digit version.

Temporal

Temporal is the heavyweight. It is not really a competitor to the other two; it is a different shape of tool. You write workflows as code, deterministically, and the Temporal server records every event so a worker can crash mid-step and resume from the exact instruction it died on.

We ran Temporal with a Postgres persistence backend, officially supported since 1.20. One worker pool in Go, one in TypeScript.

Cold start is the wrong question for Temporal. Workers stay warm and poll. What matters is worker startup time when you deploy: registering activities and workflows took 1.8 to 3.2 seconds in our setup, which is the window where a rolling deploy can drop events if you do not drain properly.

Throughput was fine. Postgres became the bottleneck around 140 events per second, which is well above the customer's peak. The production checklist in their docs is unusually frank about what breaks at scale.

The cost is in the developer model. A workflow function cannot call Date.now() or Math.random() or do network I/O directly; everything goes through activities. The platform team picked this up in a week. Anyone outside the platform team will not touch it. That is fine if you accept it up front.

Cold starts the docs skip

Takeaway

Cold start is not a single number. Measure time-to-first-step, time-to-first-log-line, and time-to-completion. The three diverge wildly under load.

Here is the measurement script we used. Adapt it to your tool of choice.

#!/usr/bin/env bash
# Measure orchestrator cold start under realistic burst
# Usage: ./bench.sh https://orchestrator.local/webhook/test

URL="$1"
TMP=$(mktemp)

for i in $(seq 1 500); do
  start=$(date +%s%3N)
  curl -sS -o /dev/null \
    -H "Content-Type: application/json" \
    -d '{"id":"'$i'","payload":"x"}' "$URL"
  end=$(date +%s%3N)
  echo "$((end - start))" >> "$TMP"
done

sort -n "$TMP" | awk '
  { a[NR]=$1 }
  END {
    print "p50:", a[int(NR*0.5)]"ms"
    print "p95:", a[int(NR*0.95)]"ms"
    print "p99:", a[int(NR*0.99)]"ms"
  }'

What we observed across 500 sequential cold invocations on the bench box:

  • n8n: p50 92ms, p95 210ms, p99 380ms
  • Windmill: p50 18ms, p95 41ms, p99 78ms
  • Temporal: p50 24ms, p95 55ms, p99 95ms (workers warm)

These are our numbers on our hardware. Yours will differ. The shape will not.

Replay, the part that bites at 3am

Cold start is a marketing number. Replay is an outage number.

We simulated a worker crash mid-flow on each platform, then asked the question that matters: can the next worker pick up where the dead one left off, without re-running side-effects that already happened?

  • n8n stores execution data per node. Replay restarts from the last completed node. If your node has a side-effect that already fired (a Stripe charge, a Slack post), you re-fire it. You can guard with idempotency keys in the called services, and you must.
  • Windmill uses a similar model, with optional per-step retries and a resume-from-step admin button. Same caveat as n8n. The UI is better about showing you what already ran.
  • Temporal does deterministic replay; it is the whole point. The worker reconstructs state from the event history and skips already-completed activities. The Stripe charge does not re-fire because Temporal knows it already returned. This is the one thing the other two cannot match by design.

If your workflows touch money, inventory, or anything with downstream legal weight, this difference matters more than any cold-start number. If your workflows are fan-out enrichment that is safely idempotent, it matters less.

Warning

Idempotency keys in downstream APIs are not a replacement for replay safety. They are a backstop. We have seen four-figure double-charges from teams who assumed otherwise.

What we picked

For the Utrecht team, we shipped Windmill. The decision came down to three things: cold-start headroom for 5x growth, the platform team could read the runtime code (Rust plus TypeScript, not Go plus history-event), and one of the ops staff could keep building flows without learning workflow-as-code discipline.

We kept Temporal in the back pocket for the billing pipeline, which they are migrating in Q3. Two engines, one orchestration boundary, no apology needed.

The agent-shaped work the HN front page is talking about this week (agent runtimes, Codex-driven engineering) will land here too. Webhook orchestration today is agent orchestration tomorrow. Same primitives, longer-lived state, same need for a runtime that can replay a failed step without rebuilding the substrate in 18 months.

The smallest thing you can do today

Run the bench script above against your current orchestrator with 500 sequential invocations during a quiet hour. If p99 is more than 5x your p50, you have a cold-start tax you are paying every day and not measuring. That is enough information to decide whether the rest of this post is worth a Friday afternoon. When we built the process automation layer for this client, the thing we ran into was not throughput. It was that the visual editor non-engineers loved on n8n hid the replay model from them, and they only noticed when a Stripe webhook double-fired during a maintenance window. We solved it by routing money-touching flows through Temporal and keeping the rest on Windmill.

Key takeaway

Cold start is a marketing number. Replay is an outage number. Pick the orchestrator whose replay model matches your worst flow.

FAQ

Should I pick n8n if my team is not full of engineers?

Yes, if your event volume is under roughly 30 per second sustained and your flows do not touch money. The visual editor is its real strength and will save weeks of work.

Can Temporal replace n8n or Windmill?

Not for non-engineers. Temporal workflows are code with strict determinism rules. It complements the others for high-value flows; it does not replace them.

Is Windmill production-ready for serious workloads?

Yes for orchestration up to a few hundred events per second on commodity hardware. Watch for shared Python worker dependency conflicts and per-step versioning gaps in the UI.

What is the biggest cold-start gotcha?

Measure p99 under burst, not p50 at idle. The gap between them is the tax you pay during your busiest hour, and most vendor docs only quote the friendly number.

automationprocess automationworkflowarchitecturetoolingintegrations

Building something?

Start a project