Process automation
Replacing a 1,800-row dispatch sheet: a freight ops playbook
06:40 on a Tuesday. A dispatcher's workbook is locked by an overnight ghost session, six trucks are minutes out, and the file holding the whole operation will not open.

06:40 on a Tuesday. The morning dispatcher opens DISPATCH_LIVE_v47_FINAL_USE_THIS.xlsx on the shared drive and gets the dialog she has seen a thousand times: file locked by another user. Someone in the warehouse left it open overnight. Six trucks roll in over the next forty minutes. Until that lock clears, none of them have a confirmed bay.
This is the freight forwarder who hired us to kill the sheet. 1,800 active rows, 23 columns, 14 hidden helper columns, eleven dispatchers across two offices, three pivot tables that nobody touched because the last person who tried broke the conditional formatting for a week. The workbook was the system. It was also the bottleneck, the audit log, the customer-promise database, and the single point of failure.
We replaced it with a Next.js board, a Postgres job queue, and a nightly reconciliation script. The migration took eleven weeks from first audit to old-sheet decommission. Here is the playbook, in the order we ran it.
Map the sheet before you touch the schema
The first instinct, when someone hands you a giant Excel, is to read the column headers and start writing a Prisma schema. Don't. The headers lie. The real schema lives in what the dispatchers do, not what IT wrote down in 2017.
We spent six hours on a screen-share with the senior dispatcher. Not asking what the columns meant. Asking her to talk through the workbook the way she actually used it. By hour three we had found:
- A status column with 14 documented values and 31 observed values, several differing only in trailing whitespace.
- One column called
REMARKSthat secretly stored ETA overrides, customer call-back times, and a flag for "do not load before 14:00" written asDNL14. - A hidden helper column that resolved the actual customer name from a VLOOKUP into a separate workbook on a network share.
- Two columns that always agreed in form (
Origin BAY,Origin BAY assigned) but disagreed in 7% of rows, because one was set by the morning dispatcher and the other by the gate guard, and the gate guard was right.
That last bullet is the whole story. The sheet did not have one source of truth; it had two, in conflict, and the business knew which one to trust by social convention. If you migrate the headers, you ship the conflict. If you migrate the behaviour, you can split that into two fields and write the conflict-resolution rule down.
Audit the verbs (what dispatchers do with the sheet) before the nouns (what the columns are named). The verbs are your schema. The nouns are documentation.
Schema from observed states, not documented states
Once we had the verbs, the schema almost wrote itself. Three core tables, two reference tables, and a slim event log. The trick was modelling state as an enum the database refused to budge on.
create type dispatch_status as enum (
'booked',
'awaiting_documents',
'ready_for_bay',
'at_bay',
'loading',
'loaded',
'departed',
'cancelled',
'on_hold'
);
create table shipment (
id uuid primary key default gen_random_uuid(),
ref text not null unique,
customer_id uuid not null references customer(id),
status dispatch_status not null default 'booked',
bay_id uuid references bay(id),
scheduled_at timestamptz not null,
departed_at timestamptz,
notes text,
version int not null default 0,
updated_at timestamptz not null default now(),
updated_by uuid not null references app_user(id)
);
create index shipment_status_scheduled_idx
on shipment (status, scheduled_at)
where status in ('booked','awaiting_documents','ready_for_bay');
Nine states, not fourteen. The other five were sub-states of on_hold with a reason code. The partial index keeps the dispatch-board query under 5ms because the board only ever cares about the three live statuses.
We also imposed a rule the workbook never could: every status transition writes a row to an append-only event table with the old state, the new state, the user, and the timestamp. The first week of that log told us more about how dispatch actually worked than the previous eight years of workbook history.
The board: Next.js with server actions and a tight refresh loop
The UI is plain. A vertical Kanban with one column per live status, a top-bar filter for region, and a row per shipment. No animated drag handles. Dispatchers do not want choreography; they want the row to move when they click "ready" and to stay there if the network blinks.
We built it on the Next.js App Router with server actions for mutations and a four-second polling fetch on the client for fresh state. Yes, polling. Websockets were on the table; eleven dispatchers refreshing every four seconds is 165 requests a minute, well under what a single Postgres pool handles, and polling is debuggable at 03:00 in a way that a stuck socket is not.
// app/dispatch/actions.ts
'use server'
import { z } from 'zod'
import { db } from '@/lib/db'
import { revalidatePath } from 'next/cache'
import { currentUser } from '@/lib/auth'
const Move = z.object({
shipmentId: z.string().uuid(),
toStatus: z.enum(['ready_for_bay', 'at_bay', 'loading', 'loaded']),
expectedVersion: z.number().int(),
})
export async function moveShipment(input: unknown) {
const { shipmentId, toStatus, expectedVersion } = Move.parse(input)
const user = await currentUser()
const result = await db.tx(async (tx) => {
const prev = await tx.one(
'select status from shipment where id = $1 for update',
[shipmentId],
)
const row = await tx.one(
`update shipment
set status = $1,
updated_at = now(),
updated_by = $2,
version = version + 1
where id = $3
and version = $4
returning id, status, version`,
[toStatus, user.id, shipmentId, expectedVersion],
)
await tx.none(
`insert into shipment_event
(shipment_id, from_status, to_status, actor_id)
values ($1, $2, $3, $4)`,
[shipmentId, prev.status, toStatus, user.id],
)
return row
})
revalidatePath('/dispatch')
return result
}
The version column is the optimistic lock. Two dispatchers clicking "ready" on the same shipment within the same second: one wins, the other gets a 409 and a refreshed row. The workbook handled this by silently overwriting whichever save happened last. The new board makes the race visible and forces a human decision, which is what the dispatchers actually wanted once they saw it work.
The queue: Postgres SKIP LOCKED, not Redis, not RabbitMQ
Behind the board sit four background jobs: pull EDI updates from two carriers, push status webhooks to the customer portal, regenerate a PDF manifest when a shipment moves to loaded, and email the customer when it departs. Classic queue work.
We did not add Redis or RabbitMQ. The job queue is a Postgres table with one well-known query pattern: FOR UPDATE SKIP LOCKED.
create table job (
id bigserial primary key,
kind text not null,
payload jsonb not null,
run_after timestamptz not null default now(),
attempts int not null default 0,
locked_at timestamptz,
locked_by text,
finished_at timestamptz,
last_error text
);
create index job_pending_idx
on job (run_after)
where finished_at is null and locked_at is null;
-- A worker pulls one job atomically:
update job
set locked_at = now(),
locked_by = $1,
attempts = attempts + 1
where id = (
select id from job
where finished_at is null
and locked_at is null
and run_after <= now()
order by run_after
for update skip locked
limit 1
)
returning id, kind, payload;
Three Node workers, each polling every 500ms, never collide. SKIP LOCKED is the feature that makes this trivial: the second worker simply skips any row the first one holds a row-level lock on and grabs the next. The Postgres docs cover the semantics tersely; the practical version is "this is how you build a queue without a queue".
The whole queue lives in the same database as the dispatch state, which means a job and its source row can be updated in one transaction. No two-system consistency problem, no Redis going stale, no message-replay logic. When we eventually outgrow it, we will know, because job_pending_idx will start showing up in slow-query logs. Until then, less is more.
Nightly reconciliation: trust, but verify
For the first three weeks after cutover we kept the workbook alive. Dispatchers used the new board; an export script wrote the board state into a parallel sheet at 23:55; a reconciliation job at 00:30 compared the exported sheet to the board's authoritative state and emailed any drift to the ops lead.
This sounds paranoid until you remember the workbook had eight years of folklore baked into it. The reconciliation found three classes of drift in the first week alone:
- A handful of shipments where the legacy carrier-pull script was overwriting
loadedback toat_baybecause of a stale cache. Fixed in a day. - One customer who emailed a CSV every morning to "preload bookings" through a back door nobody had documented. We built an importer.
- Twenty-six shipments per day in a status the board had no concept of: "weighed". The night-shift dispatchers tracked it in a notepad column. We added it as a sub-state and the drift disappeared.
By week three the diff was empty for five nights running. We turned off the export and the sheet became read-only. By week six it was gone.
Do not skip this phase. The first week is when you learn that the spreadsheet had two hidden workflows, three secret integrations, and one customer who has been emailing the warehouse manager directly since 2019. None of that is in the headers. All of it is in the drift report.
Cutover by dual-write, not big-bang
We never asked the dispatchers to switch on a date. For three weeks the board and the sheet ran in parallel. A small bridge script watched the board's event log and applied every transition to the sheet over a service-account login, so the sheet stayed live for anyone who refused to switch.
By day five, eight of the eleven dispatchers were on the board because it was faster. By day twelve, the remaining three switched because they got tired of typing into a sheet that "moved by itself". On day twenty-one we set the sheet read-only and watched. No complaints. We turned the bridge off.
The lesson, which we relearn on every migration of this shape: the way to retire a legacy system is to make the new one boringly better at the daily verb, then let the old one wither. Trying to argue people into the new tool is a losing fight against muscle memory.
What we would do again, and what we would not
Would do again: Postgres for both state and queue, server actions for mutations, an event table from day one, a reconciliation phase that overlaps the old system. Would not: build a custom drag-and-drop until at least three dispatchers ask for it. We built one in week six and removed it in week nine. Nobody noticed.
When we shipped this for the freight forwarder, the thing we ran into late was timezone drift between the carrier EDI feeds (UTC) and the dispatcher's local departed timestamps (Europe/Amsterdam, with daylight saving). We solved it by storing every timestamp as timestamptz, rendering in the user's locale, and writing one carrier adapter per source rather than one shared parser. This is the kind of process automation work we do most often: not a clever model, just a careful schema, a small queue, and a reconciliation job that holds the old world accountable until the new one earns its place.
If you want to start tomorrow, spend six hours watching one person use the sheet. Write down the verbs, not the columns. The enum will fall out of the notes.
Key takeaway
Audit the verbs (what people do with the sheet) before the nouns (what columns are named). The verbs are your schema; the nouns are documentation.
FAQ
Why Postgres for the queue instead of Redis or RabbitMQ?
Because the job and its source row can be updated in one transaction. No two-system consistency to worry about. SKIP LOCKED gives you per-row work distribution across many workers, for free.
How long does a migration like this realistically take?
Eleven weeks for this freight forwarder, from first audit to old-sheet decommission. Most of that was the dual-write reconciliation window. The build itself was about three weeks.
What if the team refuses to switch from Excel?
Don't ask them to switch on a date. Dual-write to the sheet from the new system for two or three weeks. The fast users move first. The rest follow when the sheet starts moving by itself.
Do we need server actions, or will an API route work?
Either is fine. Server actions remove the JSON marshalling boilerplate and tie naturally into form submissions and revalidatePath. An API route is the right call if you also need a non-browser client.
Why polling instead of websockets?
Eleven users polling every four seconds is 165 requests a minute. Trivial for one Postgres pool. Polling is also easier to debug at 03:00 than a stuck socket. Add sockets when load actually forces it.