← Blog

Automation

Durable execution for legal AI: Temporal, Inngest, or BullMQ

The pager went off at 03:14. A worker pod handling the overnight contract-review queue had OOMed. Three durable execution stacks, one audit clock. Which survives the morning?

Jacob Molkenboer· Founder · A Brand New Company· 31 Oct 2025· 8 min
Brass relay coil, vintage stopwatch at 3:14, folded telegram, chartreuse sticky note, red wax sliver on ivory blotter.

The pager went off at 03:14 on a Tuesday. A worker pod handling the overnight contract-review queue for a 27-person legal-services firm in Den Haag had OOMed mid-clause. The partner who would walk in at 08:30 expected 380 dossiers triaged and a defensible log of every clause the agent had flagged for human review. The on-call engineer, who was also the only engineer, had to decide: replay the queue from where it died, or accept that 41 dossiers were in an unknown state.

This is the question durable execution is supposed to answer. The agent layer is interesting, but the layer underneath, the one that says this workflow ran, here is every input and every output, and I can prove it ran exactly once, is the one that survives an AVG audit and the one that lets the engineer sleep.

We have shipped this layer three different ways in the last eighteen months. Here is what each costs, what each owes you in evidence, and who you end up calling at 03:00.

The shape of the workload

Before the comparison, the workload. The firm processes around 380 dossier-runs per business day, roughly 1,920 per week. Each run looks like this:

  1. Ingest dossier (typically a PDF, sometimes a stack)
  2. Chunk and route to a specialist model
  3. Extract clauses, classify risk
  4. If risk crosses threshold, escalate to a partner with a one-paragraph summary
  5. Write the audit record

Steps 1 to 4 take between 18 seconds and 9 minutes. Step 5 is the only one a regulator cares about. The AVG (the Dutch implementation of the GDPR) does not care that you used an LLM. It cares that you can show, on demand, what data was processed, what decision was made, and that you could reproduce the decision if challenged. The Autoriteit Persoonsgegevens has been clear that automated decision-making in client matters needs a clean evidentiary trail.

So the bar is not "did it run." The bar is "can you replay it fourteen months from now."

Option 1: Temporal

Temporal is the heavyweight. Workflow code is the source of truth, event history is replayed on every recovery, and the framework guarantees that a function which has completed stays completed even across deploys. If your worker dies, a new one picks up the workflow at the exact next step.

import { proxyActivities } from '@temporalio/workflow';
import type * as activities from './activities';

const { extractClauses, classifyRisk, escalateToPartner, writeAuditRecord } =
  proxyActivities<typeof activities>({
    startToCloseTimeout: '10 minutes',
    retry: { maximumAttempts: 3 },
  });

export async function reviewDossier(dossierId: string): Promise<void> {
  const clauses = await extractClauses(dossierId);
  const scored  = await classifyRisk(clauses);

  for (const c of scored.filter(c => c.risk >= 0.7)) {
    await escalateToPartner(dossierId, c);
  }

  await writeAuditRecord({ dossierId, scored, ts: clauses.processedAt });
}

What you pay. Temporal Cloud charges per action, where an action is a step in the recorded event history. Our firm's workflow records roughly 22 actions per dossier. 1,920 dossiers per week is about 169,000 actions per month. At current published rates that is small in raw action terms, but storage and active-workflow charges push the real Temporal Cloud bill toward €200 to €400 per month at this scale. Self-hosting on a small Kubernetes cluster trades the licence for engineer time. The Temporal pricing docs are worth reading slowly before you size it.

What you owe the regulator. Temporal's event history is the replay log. Every input, every output, every retry attempt is sealed and queryable. A workflow that ran on 12 April 2026 can be replayed on 30 September 2027 against the same activity code and produce the same observable result, provided you also pinned model versions. That is exactly what an AVG auditor wants.

Who you call at 03:00. You. Temporal's failure modes are deep. A worker OOMs and you have to recognise that Workflow Task Failure plus a non-determinism code means your worker version drifted from the workflow's recorded history. The error messages are fine, but they assume the reader has put two quiet weeks into the docs. If nobody on your team has, the runbook is "page the consultant."

Option 2: Inngest

Inngest is the middleweight. It is event-driven, runs functions in steps, and persists every step output. It feels more like writing serverless functions with magic retries than writing workflow code.

import { inngest } from './client';

export const reviewDossier = inngest.createFunction(
  { id: 'review-dossier', retries: 3 },
  { event: 'dossier/uploaded' },
  async ({ event, step }) => {
    const clauses = await step.run('extract',  () => extractClauses(event.data.id));
    const scored  = await step.run('classify', () => classifyRisk(clauses));

    const high = scored.filter(c => c.risk >= 0.7);
    if (high.length) {
      await step.run('escalate', () => escalateToPartner(event.data.id, high));
    }

    await step.run('audit', () => writeAuditRecord({
      dossierId: event.data.id, scored, modelVersion: event.data.modelVersion,
    }));
  },
);

What you pay. Inngest charges per step. With eight to ten logical steps per dossier, 1,920 runs per week lands around 80,000 steps per month. The Pro plan covers most of it and the real bill is €50 to €80 per month at this scale (current pricing). Self-hosting is possible but not the default path.

What you owe the regulator. Step outputs are persisted and queryable from the dashboard, with retention you configure. Replay works for individual runs. The honest caveat: "replay" in Inngest means re-running the function against your code, not bit-for-bit deterministic playback of the original execution. For most AVG cases this is enough, because the legal question is "can you show what happened and produce the same answer," not "are these the same bytes." Pin your models, or you will not pass the second half of that question.

Who you call at 03:00. Often nobody. The dashboard shows the function failed, the step it failed at, the input it failed on, and gives you a replay button. The OOM scenario is usually a non-event: the function is paused, restarted on a healthy worker, and picks up at the failed step.

Option 3: BullMQ + Redis

The hand-rolled option. BullMQ is a mature Node job queue on Redis, fast and free. You write the durability layer yourself.

import { Queue, Worker } from 'bullmq';

const connection = { host: 'redis', port: 6379 };
export const reviewQueue = new Queue('dossier-review', { connection });

new Worker('dossier-review', async (job) => {
  const { dossierId } = job.data;

  await job.updateProgress({ step: 'extract' });
  const clauses = await extractClauses(dossierId);

  await job.updateProgress({ step: 'classify' });
  const scored = await classifyRisk(clauses);

  const high = scored.filter(c => c.risk >= 0.7);
  if (high.length) {
    await job.updateProgress({ step: 'escalate' });
    await escalateToPartner(dossierId, high);
  }

  await writeAuditRecord({ dossierId, scored, jobId: job.id });
}, { connection, concurrency: 8 });

What you pay. A managed Redis instance from Upstash or Redis Cloud with enough memory for a fourteen-month audit window costs €30 to €80 per month. Worker compute is whatever you were spending anyway. The cash bill is the smallest of the three.

What you owe the regulator. This is where BullMQ gets expensive in a different currency. Out of the box, BullMQ removes completed jobs after a configurable window. AVG-defensible logging means you write your own append-only audit table (Postgres with WAL retention, or S3 with object lock) and you write to it transactionally with every step. If the worker dies between step 3 and the audit write, you have to detect it and reconcile. That is a project, not a config flag.

Who you call at 03:00. You wrote the durability layer, so you own every failure mode. OOM at 03:14 is your problem to detect (Sentry, Prometheus, a heartbeat watchdog), your problem to recover (job retry semantics you configured), and your problem to prove was harmless to the audit log. There is no dashboard you didn't build.

The scoring at 1,920 runs per week

Translating the three measures into a single picture:

  • Cost per workflow. BullMQ wins on cash (€30 to €80), Inngest is close behind once you count the dashboard you would otherwise build (€50 to €80), Temporal Cloud is meaningfully more (€200 to €400). Self-hosted Temporal flips the cost into engineer time, which at this size is the more expensive currency.
  • AVG-defensible replay. Temporal is the only one where replay is the product. Inngest gets you most of the way and is sufficient for a partner-escalation audit if you pin models. BullMQ gets you nothing for free and gets you everything if you build it.
  • 03:00 runbook ownership. Inngest hands you a dashboard and a replay button. Temporal hands you a powerful CLI and the assumption that you've read it. BullMQ hands you a Redis CLI and the keys to your own kingdom.
Warning

None of these solve "the LLM gave a different answer this time." If your replay story depends on model determinism, pin the model version explicitly in every activity, log it in the audit record, and never let latest reach the workflow code. The durable layer can only replay what you told it about.

What we would actually pick

For 27 people, no full-time platform engineer, and AVG exposure that costs more than the hosting bill: Inngest. The dashboard at 03:00 is worth more than the €200 per month you would save on Temporal Cloud, and the audit story is closer to regulator-grade than BullMQ's out of the box.

For 200-plus people, a dedicated platform team, and contracts under six retention regimes: self-hosted Temporal. The complexity stops mattering when three people on the team already know it.

For a prototype where the AVG exposure isn't live yet and you are still figuring out what the workflow even is: BullMQ. Move to Inngest the week you sign your first regulated client. Do not skip that move.

There's a tangentially related thread on Hacker News this month asking whether anyone has replaced hosted models with a local one for daily coding. Same instinct, different layer of the stack: when regulators are the audience, "we run our own" is starting to be a defensible answer even at SME scale. Worth keeping an eye on if your firm's risk appetite shifts.

The audit record is the product

The mistake we have watched teams make twice now is treating the durable execution layer as plumbing and the audit record as a logging concern. They are the same artefact. The workflow engine writes the audit record by virtue of running. If you pick a tool that does not write the record you owe, you have not saved money, you have moved the work into a separate system that will drift.

When we built the contract-review AI agent for this Den Haag firm, the thing we ran into was not which durable engine to pick, it was that the audit record the AVG actually wanted was richer than any of these tools logs by default. We ended up writing a thin transactional audit layer on top of Inngest's step outputs and pinning every model version inside the activity itself.

The smallest thing you can do today: pull worker logs from the last 30 days and grep for OOM and ECONNRESET. If there are more than two of either, your durable execution story is not finished, regardless of which of these three you are on.

Key takeaway

Pick durable execution by who answers the 03:00 pager, not by the prettiest pricing page. The audit replay question comes second, never first.

FAQ

Is Temporal overkill for a 27-person firm?

Usually yes, on Temporal Cloud the bill is fine but the runbook assumes someone has spent two weeks in the docs. Self-hosted is heavier still. Inngest matches this team size more honestly.

Can BullMQ pass an AVG audit?

Yes, if you build an append-only audit table that is written transactionally with every step and you retain it for the required window. BullMQ itself will not do this. You will.

What counts as defensible replay under the AVG?

Being able to show what data was processed, what decision was made, and to reproduce the decision if challenged. Pin model versions or replay will fall apart on the second point.

Does Inngest support self-hosting?

It does, but the default path is hosted and the dashboard story is sharper there. For an SME without a platform team, the hosted tier is the realistic choice.

How do you handle the 03:00 OOM in practice?

On Inngest, the function pauses and resumes on a healthy worker. On Temporal, the workflow resumes once a worker is back, but you need a non-determinism check. On BullMQ, you wrote the recovery, so it works the way you wrote it.

ai agentsautomationworkflowarchitectureintegrationstooling

Building something?

Start a project