← Blog

Process automation

AI agent or Temporal workflow: a three-axis decision method

An operations lead has twelve processes to automate this quarter and two bosses with opposite advice. A three-axis rubric decides which belongs behind an agent and which behind a Temporal workflow.

Jacob Molkenboer· Founder · A Brand New Company· 29 Sept 2024· 6 min
Brass three-arm semaphore lever on ivory paper, cream card with green wax seal, dark leather notebook, red thread on pin.

An operations lead at a Dutch wholesale distributor sits with her Friday spreadsheet. Twelve processes are queued for automation this quarter. Her CTO has told her "use AI for everything." The CFO has told her "AI hallucinates, use workflows." Both are correct in some abstract sense, and neither helps her decide what to ship on Monday.

The agent-versus-workflow debate is currently a religious war. Both sides can point at a real production story from the same week, one where an agent saved a team forty hours and one where an agent sent a refund it should not have. The takeaway for our ops lead is that her question is wrong before she even asks it.

The wrong question

"Should this process be an agent or a workflow" is the wrong frame. An agent and a deterministic workflow engine like Temporal are not competitors. They have very different failure modes, and they earn their keep on very different kinds of work.

The right question is per-process: what does this specific job punish you for getting wrong, and how predictable is the definition of "right" on any given day?

We have a three-axis scoring rubric we use internally before we touch a single line of code. It takes about fifteen minutes per process. It does not produce a single answer. It produces a posture, and a posture is enough to start.

Three dimensions worth scoring

Failure cost

What does one bad run cost you, in money, reputation, or rework? A misrouted support ticket costs three minutes for the next human in the queue. A wrong VAT line on an outbound invoice costs a week of bookkeeper back-and-forth and a slightly chillier relationship with the customer. A misfired refund costs the refund.

Score 1 (trivial) to 5 (catastrophic, regulator-shaped, or six-figure-shaped).

Audit trail need

How often will somebody need to reconstruct exactly what happened, step by step, with timestamps? An internal Slack summary: never. Bank reconciliation, contract approval, KYC, anything that touches the auditor or a regulator: always.

Score 1 (nobody will ever ask) to 5 (an auditor will definitely ask, and "the model decided" is not an acceptable answer).

Edge case clarity

Bring the operations lead into a room with a whiteboard. Set a ten-minute timer. Ask one question: "list every weird case you remember from the last six months." Watch what happens.

If she lists seven cases and stops, you can write rules for seven cases. If she trails off at five and says "there are more I can't remember right now, they're in the inbox," you have an unknown distribution and your rules will never finish chasing it.

Score 1 (unbounded, ops cannot enumerate) to 5 (exhaustive list, all cases written on the whiteboard).

The rubric

Sum the three scores. The total runs 3 to 15. High totals push you toward deterministic workflow code. Low totals push you toward an agent. The middle band asks for a hybrid where a workflow owns the spine and the agent gets called for specific steps.

def posture(failure_cost: int, audit_need: int, edge_clarity: int) -> str:
    """Return the build posture for an automation candidate.

    Each input is a 1-5 score. Higher edge_clarity means ops can
    enumerate every weird case; lower means they cannot.
    """
    total = failure_cost + audit_need + edge_clarity
    if total >= 12:
        return "workflow"      # Temporal, n8n, plain cron + Postgres
    if total >= 8:
        return "hybrid"        # workflow spine, agent calls at named steps
    return "agent"             # agent owns the loop, deterministic guardrails

The thresholds are not magic. They are where our last forty-odd projects clustered. Adjust by one or two for your own appetite. The point is the conversation the scoring forces, not the number it spits out.

Takeaway

If the ops lead cannot list the edge cases in ten minutes, rules will never cover them. If she can, an agent is an expensive way to encode a flowchart.

Three processes, scored

Invoice chasing for a SaaS subscription business

Failure cost 4: legal teeth and angry customers if you dunning-mail a paid account. Audit need 5: the CFO and the eventual auditor will want a defensible trail. Edge clarity 2: renewals, partial refunds, currency swaps, and "the bank held the SEPA for a week" are listable, but the ops lead keeps remembering more.

Total: 11. Hybrid. The workflow owns the schedule, the state machine, and the side effects, including idempotency keys for every outbound charge attempt. The agent writes the next email body when the state machine says "send reminder #3 in tone X."

Inbound marketing inbox triage

Failure cost 2: a slightly slower reply to a hot lead, recoverable. Audit need 1: nobody will ever audit this. Edge clarity 1: every email is a new shape and the ops lead laughs when you ask her to enumerate.

Total: 4. Agent. It reads, classifies, drafts a reply, and posts to a queue. A deterministic check rejects anything that looks like an attached invoice or a signed contract before the agent sees it.

Payroll posting

Failure cost 5: wrong salary, ruined Friday, possibly a tribunal. Audit need 5: this is the textbook audit surface. Edge clarity 5: payroll has a finite, documented set of cases, and HR can hand you the spreadsheet.

Total: 15. Workflow. No agent. We do not put creative work near payroll. The cost of a wrong agent call scales with how much authority the system extends to it, and payroll extends the maximum.

Building from the score

A workflow posture means you write code. State machines, idempotency keys, retries with backoff, durable timers. Temporal is one option; a small Postgres-backed worker with explicit step tables is another. The defining quality is that every step is replayable and every decision is a function of the input, not of the model's mood.

A hybrid posture means the spine is deterministic and the model is a tool the spine reaches for at named steps. The agent never owns the loop. It writes copy, classifies a document, suggests a routing choice. The workflow records the suggestion, records the human override if any, and moves on. The model can fail safely because nothing it produces gets committed without a deterministic gate.

An agent posture means the model owns the loop, but you still wrap its side effects. The agent can read, summarise, draft, and decide. It cannot send money, send a contract, or change a production database without a check that does not consult the model.

Warning

If you score a process below 8 and it touches money, payroll, or anything regulated, your scores are wrong. Re-rate failure cost and audit need before you start.

When we built the invoice-chase AI agent for a Dutch SaaS business, we scored the process at 7 on a first pass and were ready to ship pure agent. Two weeks in, the finance lead asked us to reconstruct which customer got which dunning email and why. We had not built for that. We rebuilt the spine in Temporal and let the agent write only the body copy. That fortnight is the kind of cost the three-axis score is designed to catch before code gets written.

Print the three dimensions on an A4 sheet. Pick three processes off your own backlog. Score them before Monday. Whichever one lands hardest in the agent column, build that one first, because it is the one rules will never finish.

Key takeaway

If ops cannot list every weird case in ten minutes, rules will never cover them. If she can, an agent is an expensive way to encode a flowchart.

FAQ

When does an agent obviously beat a workflow?

When the input distribution is unbounded and the cost of a bad run is recoverable. Inbox triage, copy drafting, and classification of messy attachments all sit here.

Can a workflow call an agent?

Yes. That is the hybrid posture. The workflow owns state and side effects. The agent gets called for specific steps and its output passes through a deterministic gate before anything is committed.

Why score failure cost separately from audit need?

Because they are not the same axis. A low-cost process can still be audited for internal metrics. A high-cost one-shot creative task may have no audit need at all. Treating them as one number hides the real shape of the work.

What if the ops lead cannot get below ten edge cases in ten minutes?

That is a useful answer in itself. Score edge clarity at 1 or 2 and accept that rules will not cover the work. Plan for an agent with a tight guardrail around side effects rather than a flowchart that grows forever.

process automationai agentsworkflowarchitectureoperationsautomation

Building something?

Start a project