SaaS
Retool to Temporal: scoring your SaaS ops graduation
It's Tuesday 09:14 and someone on your ops team is replaying a failed Zapier task by hand. That's the fourth manual recovery this month. The question isn't if you graduate. It's when, and to what.

It's Tuesday 09:14. Your ops lead pings the engineering channel: the Zapier task that posts invoice statuses into a customer's Slack failed at 03:00. She has been replaying it by hand for the last twenty minutes. That is the fourth manual recovery this month, the second one she has done before her coffee landed, and the third time this quarter that you have had to write to a customer apologising for a status update that never arrived.
The question on the table is not whether your Retool-and-Zapier ops layer is "good enough." It is whether the cost of keeping it has crossed the cost of replacing it. We have sat in that room with three Dutch SaaS clients in the last eighteen months. The method below is the one we use to settle the argument in a single meeting.
Where Retool plus Zapier still wins
An honest opening: a SaaS doing €2M ARR with eight people, one CRM, and a Stripe webhook does not need Temporal. Retool's drag-and-drop UI plus Zapier's webhook routing covers maybe 80% of internal workflows for under €500 per month. Non-engineers can read the flows. Time-to-shipped is measured in afternoons. There is no infrastructure to operate.
The problem is not the tools. The problem is that neither tool has a concept of durability. A failed Zapier step is silent until a human notices in Slack. A Retool query that times out shows a toast and disappears. Neither has replay, neither has idempotency keys, and neither writes anything that your next auditor will call a control.
That tradeoff is fine until it isn't. The method below tells you when it stops being fine.
What Temporal plus internal APIs actually demands
Temporal is a durable execution engine. Workflows survive process restarts. Retries are automatic and configurable per step. Every input, output, and timer is recorded in an event history you can replay. The mental model is alien on day one, and obvious by week three.
It is not free in any sense. You will spend senior engineering time to author workflows in Go, TypeScript, Python, or Java; plan on two weeks for the first one. You will spend hosting cost, either self-hosted on Kubernetes if you have the operator, or Temporal Cloud at roughly €100 per month for a tiny team with short history retention. You will spend mental-model effort on ops people who can no longer click-edit a workflow; they get an internal admin UI instead, which you have to build.
None of that is wasted, but you should not pay any of it if the score below tells you to stay.
Dimension 1: incident frequency
Count anything where a human had to log into Zapier or Retool and replay a step, patch data by hand, or write a one-off SQL update to fix what an automation should have done. Score 0 to 3:
- 0: fewer than one manual recovery per month
- 1: one to three per month
- 2: one or two per week
- 3: daily, or close to it
If you do not already have a log of these, start one before scoring. A two-column Notion table with date and one-line description, kept for thirty days, is enough. If you want it in SQL:
-- Minimal incident ledger you can query before scoring
CREATE TABLE ops_incidents (
id bigserial PRIMARY KEY,
occurred_at timestamptz NOT NULL,
recovered_at timestamptz NOT NULL,
recovery_kind text NOT NULL, -- 'manual' | 'auto_retry' | 'noop'
workflow text NOT NULL,
note text
);
SELECT date_trunc('week', recovered_at) AS week,
count(*) FILTER (WHERE recovery_kind = 'manual') AS manual
FROM ops_incidents
WHERE recovered_at > now() - interval '90 days'
GROUP BY 1
ORDER BY 1 DESC;
Most teams who think they are at "1" find out, after a real thirty-day count, that they are at "2." We have seen this every time.
Dimension 2: on-call headcount
How many of your engineers are pulled into ops recovery on a rotating basis? Score 0 to 3:
- 0: no one is on call for ops failures
- 1: one engineer absorbs ad-hoc pages, no formal rotation
- 2: a two-person rotation, paid out of the engineering budget
- 3: a three-or-more person rotation with runbooks
The moment you hit a 2, you have quietly turned engineers into ops recoverers. Their job has shifted from building product to babysitting integrations. That cost rarely shows up in any line item, which is exactly why it goes unfixed for years.
The point of moving to durable execution is not to replace your ops people with machines. It is to stop spending your senior engineers' Tuesday mornings replaying Zaps that should have replayed themselves.
Dimension 3: what your next auditor will accept
This is the dimension that catches Dutch SaaS founders off-guard. The first SOC 2 Type II audit, the first ISO 27001 surveillance, or a serious DPA review under the EU NIS2 directive for in-scope SaaS all demand evidence that your operational workflows are:
- authorized: a known principal triggered the action
- logged with who-did-what-when, immutably
- reviewable for the audit window, typically twelve months
- recoverable in a documented way after a failure
Zapier exposes audit logs on the Team plan and above, but the user-identity attribution is weak: an entry that says "Zapier ran this Zap" does not satisfy an auditor who wants to see which human in your org authorised the run. Retool's audit logs are stronger, but neither product gives you the workflow-level event immutability that Temporal's event history provides out of the box.
Score 0 to 3:
- 0: no audit on the horizon for the next twenty-four months
- 1: an audit is expected in twelve to twenty-four months
- 2: an audit is expected in the next twelve months
- 3: you are in an audit window right now, and the auditor has already flagged ops controls
An auditor flagging ops controls mid-window is the most expensive way to discover you needed Temporal. We have seen one client buy six weeks of two-engineer time to retrofit audit trails into a Zapier-shaped pipeline, and the auditor still escalated the finding.
The composite score
Sum the three dimensions out of nine and read the band:
- 0 to 2: stay. Your Retool-plus-Zapier stack is doing its job. Do not touch it for at least six months.
- 3 to 5: instrument. Add a real incident ledger, structured logging on every Zap and Retool query, and scheduled audit-log exports to S3 or equivalent. Do not migrate yet.
- 6 to 7: plan the move. Pick one workflow to rebuild on Temporal as a proof. Migrate the next two over the following quarter.
- 8 to 9: you are already late. The fact that you are scoring this means your team is burning. Start this week.
The trap that catches most teams is the 3-to-5 band. It feels like a "we should migrate" score, but a half-migration to Temporal while your incident ledger is still vibes-based usually produces a more brittle system than the one you had. Instrumentation first. Migration once the data justifies it.
The instrumentation gap most teams skip
If you scored 3 to 5, the answer is not migration yet. The answer is six weeks of instrumentation so you can re-score with real numbers. Three concrete artifacts to ship before any migration work:
- An incident ledger with one row per manual recovery. The SQL above is enough. Backfill the last thirty days from Slack history.
- A structured-log export from Zapier (Team plan) and Retool to whatever your audit-log store is. CloudWatch, BigQuery, a Postgres table; the destination matters less than the contract.
- A single dashboard showing weekly manual-recovery count, time-to-recovery, and which workflow owned the failure. Build it in Retool itself; that is what Retool is for.
Most teams who do this discover, six weeks later, that their score jumped two points. Either the migration becomes obviously right, or it becomes obviously not yet. Either outcome saves you a quarter of guesswork.
A two-week shape for the first migration
The cleanest first migration is the workflow with the highest frequency and the lowest blast radius. Not your billing pipeline. Something like "new signup hits CRM and provisions a free trial." High enough volume to teach the durable-execution muscle, low enough stakes that a Tuesday bug does not refund anyone.
Week one: write the workflow in Temporal, leave Zapier running in parallel. Compare outputs in a side-by-side table for three days. Week two: cut over via a feature flag. Leave Zapier dormant for a week as a fallback. Then archive.
// activities.ts is the boring side: idempotent calls to your APIs.
// workflow.ts is the durable side: Temporal owns retries and state.
import { proxyActivities } from '@temporalio/workflow';
import type * as activities from './activities';
const { createCrmContact, provisionTrial, sendWelcomeMail } =
proxyActivities<typeof activities>({
startToCloseTimeout: '1 minute',
retry: { maximumAttempts: 5, initialInterval: '2s' },
});
export async function onboardSignup(email: string, plan: 'free' | 'pro') {
const contactId = await createCrmContact(email);
const tenantId = await provisionTrial(email, plan);
await sendWelcomeMail(email, tenantId);
return { contactId, tenantId };
}
The lines that matter are startToCloseTimeout and retry. Those few bytes of configuration are what turn an integration that fails silently at 03:00 into one that retries itself five times with exponential backoff and only pages a human when it has really given up.
The most common first-week mistake is to over-engineer the activities. Keep them dumb: one HTTP call, one DB write, one return value each. The durability lives in Temporal, not in your activity code. Resist the urge to bundle two API calls into one activity because they "belong together." They don't, and that bundling is what makes recoveries hurt later.
The line we have walked with three clients
When we built the customer-onboarding workflow for a Dutch SaaS client doing roughly €8M ARR, the thing we ran into was not the Temporal authoring (that took eight days). It was that their Retool audit log did not survive the auditor's sample test, so we moved their ops surface onto a Temporal-backed process automation stack and wrote every state transition into a Postgres event table the auditor could query directly.
The smallest thing you can do today: open your Zapier task history, count the failed tasks in the last thirty days, and write the number on a sticky note. If it is over eight, your ops layer is already more expensive than you think.
Key takeaway
Score incident frequency, on-call headcount, and your next auditor's tolerance out of three each. Six or more out of nine means your Zapier-and-Retool ops layer is done.
FAQ
How long does a Temporal migration typically take?
The first workflow usually takes a senior engineer two weeks if Temporal is new to your team. The second takes three to five days. By the third, you have a pattern your whole team can apply.
Can we self-host Temporal or do we have to use Temporal Cloud?
Both work. Self-hosting on Kubernetes is a strong fit if you already operate a cluster. Temporal Cloud is the right answer if your platform team is small or if event-history retention is part of your compliance story.
Will my ops team still be able to edit workflows?
Not directly. You will need to build a small internal admin UI for the parameters they used to flip in Zapier. That is a few days of work, and it is the moment ops people start trusting the new system.
What if our auditor has never heard of Temporal?
Most have not. What they want is an event log they can sample. Temporal's event history maps cleanly to that, and a one-page document explaining the workflow model is usually enough to settle the conversation.