Automation

Durable execution: Inngest vs Trigger.dev vs BullMQ at 1M

A 31-person Amersfoort agency runs 7,800 client reports a week. We rebuilt the queue on Inngest, Trigger.dev, and BullMQ. Here is what each costs at scale.

Jacob Molkenboer· Founder · A Brand New Company· 11 Jun 2026· 7 min

Three brass relays in a row on ivory paper, green index card beneath the middle one, paper slip and red wax seal.

It is 23:14 on a Friday in Amersfoort. An account manager pings the studio's #alerts channel: "Did the weekly report for the Volendam ferry client go out? They emailed asking." It did not. The cron job died at 21:40, took down four hundred PDFs with it, and the worker box has been restarting in a loop ever since. This is the kind of night that sends a team shopping for durable execution.

The agency in question: thirty-one people, sixty active retainer clients, around 7,800 client reports a week pulled from Meta, LinkedIn Ads, GA4, Search Console, Mailchimp, and a homegrown PostgreSQL warehouse. Reports get assembled into a PDF, mirrored into Notion, and emailed to the client lead. The first version was a Node script behind cron. It worked at fifty clients. At sixty, the failure modes started owning Friday nights.

We were asked to rebuild the queue on something that retries, deduplicates, and gives the agency's one in-house developer something to look at on Saturday morning that is not a 600-line stack trace. We benchmarked three options end to end: Inngest, Trigger.dev, and a hand-rolled BullMQ stack on Upstash Redis. Here is how they actually compare under load.

The shape of the work

Each weekly report runs roughly twelve steps: source token refresh, six API pulls (paginated, rate-limited), three transform-and-store steps, a Puppeteer render to PDF, a Mailgun handoff. Across sixty clients that is around 93,000 step-runs a week, or about one million step-runs every ten weeks. Most steps fail at least occasionally. Meta rate-limits two or three times a day. LinkedIn's marketing API throws 504s in clusters. Tokens expire on Mondays for reasons no one understands.

Any of the three options handles this volume technically. The question is what happens when steps fail at 23:14 on a Friday.

Retry semantics

Inngest is the most opinionated of the three. Every step is wrapped in step.run(), and Inngest treats each step as an idempotent unit of work. If a step throws, it retries on its own schedule with exponential backoff. Inputs to the next step are the outputs of the previous one, so a retry of step 7 does not re-run steps 1 through 6.

// Inngest: step-level durability is the default
inngest.createFunction(
  { id: "weekly-report", retries: 4 },
  { event: "report/weekly.requested" },
  async ({ event, step }) => {
    const tokens = await step.run("refresh-tokens", () =>
      refreshTokens(event.data.clientId)
    );
    const meta = await step.run("pull-meta", () =>
      pullMeta(tokens.meta)
    );
    const pdf = await step.run("render-pdf", () =>
      renderPdf(meta /* plus the other source results */)
    );
    await step.run("send-email", () => sendEmail(pdf));
  }
);

Trigger.dev v3 reaches the same outcome with a different posture. Tasks are first-class objects, and you compose them with await. Their durable runtime checkpoints state between awaits, so a crash mid-run resumes from the last checkpoint rather than the start. Retries are configurable per task, with the same exponential and jitter knobs.

// Trigger.dev v3: await is the boundary
import { task } from "@trigger.dev/sdk/v3";

export const weeklyReport = task({
  id: "weekly-report",
  retry: { maxAttempts: 4, factor: 2, minTimeoutInMs: 1000 },
  run: async (payload: { clientId: string }) => {
    const tokens = await refreshTokens.triggerAndWait(payload);
    const meta = await pullMeta.triggerAndWait({ token: tokens.meta });
    const pdf = await renderPdf.triggerAndWait({ /* ... */ });
    await sendEmail.triggerAndWait({ pdf });
  },
});

BullMQ leaves the durability up to you. Retries and backoff are first-class on the job options, but step-level checkpointing is not. If your processor function pulls Meta, then writes to Postgres, then crashes before rendering, the retry replays the Meta pull and the Postgres write. You either split each step into its own queue and chain them, or you bake idempotency into every side effect by hand.

// BullMQ: per-job retry is easy, step-level durability is your problem
new Queue("weekly-report").add(
  "client-42",
  { clientId: "42" },
  {
    attempts: 4,
    backoff: { type: "exponential", delay: 2000 },
    jobId: `weekly-report:${weekOf}:${clientId}`, // idempotency you own
  }
);

Warning

If you pick BullMQ, your retry story is only as good as your idempotency keys. We have cleaned up two production incidents this year where a "retry" sent the same Mailchimp campaign twice because the dedupe key quietly included a timestamp.

Debugging at 23:00

This is the line item that decides most adoption stories, and it is the one buyers underweight in spreadsheets. Failures will happen. When they do, someone needs a screen that shows what went wrong, with the right inputs to replay.

Inngest's dashboard shows every step's input, output, and timing, with a single "rerun this step" button. For an in-house developer who did not write the original code, this matters. They open one URL, see the failed step, look at the payload, fix the function, and rerun the step from the dashboard without redeploying.

Trigger.dev's run view is similar in spirit, with a cleaner timeline and an embedded log per task. Replay is per-run, not per-step. For our pipeline that is mostly fine: the steps are cheap to re-execute given idempotent writes.

BullMQ ships nothing. Bull Board gives you a respectable queue UI for free, but you are still parsing log lines on the worker box to see why a job failed. If you run on Kubernetes with a strong Grafana setup, that is fine. If you have one developer and a Hetzner box, it is not.

The bill at one million step-runs

Pricing is the part of these comparisons that ages worst, so anchor to the vendor pages, not this paragraph. As of June 2026:

Inngest charges per step-run above a generous free tier, with paid plans that scale with concurrency. The Inngest pricing page is the source of truth.
Trigger.dev charges per run and per compute-second on cloud, and zero in cash if you self-host on your own Postgres and worker fleet. See their pricing page.
BullMQ itself is free. Your bill is Redis (Upstash, Redis Cloud, or self-hosted) plus a worker box. For this agency that came to a small managed Redis plan and a modest VM.

At one million step-runs a month, the hand-rolled stack is the cheapest by a wide margin. The trap is that you also bought a pager. The Inngest and Trigger.dev bills look high in isolation. They look reasonable next to a senior developer's evening rate.

What we picked and why

For this agency we picked Trigger.dev. Three reasons. The in-house developer reads TypeScript faster than he reads dashboards, and the await-style API mapped cleanly onto the existing script with very few rewrites. The self-host escape hatch made the operations lead comfortable signing the renewal. And the checkpoint behaviour matched the failure modes we actually saw: half a report generated, then a Meta 504, then a clean resume.

Inngest would have been a defensible second choice. If the workload had been smaller or dominated by AI agent steps rather than reporting steps, we would have flipped to it for the per-step retry ergonomics. BullMQ would have been right if the agency had two infra people instead of one developer.

One adjacent note worth flagging: durable execution is also what separates a hobby AI agent from a production one. The recent stories about agents misbehaving on developer laptops and inside Linux distros read, in part, as stories about tools with no idempotency and no replay. Whether you are running reports or agents, the boring queueing primitives are the safety net.

The five-minute audit

When we built this reporting pipeline for the Amersfoort agency, the thing we ran into late was not retries. It was that two of the API pulls returned non-deterministic ordering, which broke a downstream diff and made the dedupe pointless. We solved it by sorting at the boundary and persisting a content hash per row, so "did anything change this week" became a one-line comparison. The lesson generalises: pick a durable runner, then spend the saved hours making your side effects idempotent. If you want a hand thinking through which runner fits your team, that is the kind of process automation work we do.

If you are evaluating this right now, the five-minute test is simple: open one of your existing failing jobs, count the lines of code between the failure and the rerun, and ask whether that number gets bigger or smaller as the team grows. The right runner is the one that shrinks it.

Key takeaway

Pick the durable runner that shrinks the distance between a failure and a one-click rerun. Cost is the easy axis. On-call hours are the hard one.

FAQ

Do I need durable execution if my cron jobs work today?

If your jobs are idempotent, low-volume, and tolerable to rerun by hand, no. If you have multi-step runs, partial failures, or hundreds of downstream emails per job, yes.

Can I self-host Trigger.dev or Inngest?

Trigger.dev has a documented self-host path on Postgres plus your own worker fleet. Inngest is primarily a hosted product, with a local dev server for development and testing.

Why not Temporal or Apache Burr?

Temporal is excellent for long-running, complex workflows but heavier to operate. Burr is aimed at AI agent state machines. For twelve-step reporting jobs, the three we compared are right-sized.

How do I make BullMQ retries safe?

Design every side effect to be idempotent: stable jobId keys without timestamps, conditional upserts in Postgres, dedupe headers on outbound email, and chained queues instead of one fat processor.

automationprocess automationarchitecturetoolingintegrationsworkflow

Building something?

Start a project