AI agents
Burr vs Pydantic-AI vs Postgres for an agent orchestrator
A 24-person Utrecht healthtech needs to pick an orchestrator for its prior-authorization agent. We compare Burr, Pydantic-AI graphs, and Postgres on what actually matters.

A 24-person healthtech in Utrecht runs a prior-authorization agent against three Dutch insurers' portals and one hospital's HiX. Last week the agent submitted the same machtigingsaanvraag twice inside an hour. The trace was a 600-line FastAPI handler with a status enum and four except Exception blocks. Nobody could tell which retry produced the duplicate. The CTO gave herself 48 hours to pick a real orchestrator.
She narrowed it to three: Burr, a Pydantic-AI graph, and a hand-rolled Postgres state machine she could finish in a week. We have shipped agents on top of all three for clients in similar shapes, so here is how the comparison actually plays out when the constraints are debuggability, schema ownership, and what a hospital IT-beheerder will sign.
The shortlist, briefly
Burr, from DAGWorks, models your agent as a state machine of actions and transitions, persists each step into a backend you pick (SQLite, Postgres, custom), and ships a local UI that shows you the state at every step of every run.
Pydantic-AI graph (the pydantic-graph package) ships inside the pydantic-ai project. Nodes are Pydantic dataclasses with an async run method that returns the next node. State is a typed Pydantic model. Persistence is opt-in via a BaseStatePersistence protocol you implement.
Hand-rolled Postgres state machine is what most of us write the first time. A jobs table, a status enum, a job_events append-log, a worker loop with SELECT ... FOR UPDATE SKIP LOCKED. No framework, full control, every line yours to debug at 02:00.
Replay-from-step, in practice
"Replay from step" means: the agent failed at step 7 of 12, you fixed the bug, you want to re-run from step 7 with the same state it had then, not from step 1. For a prior-auth workflow that includes a nurse review and a doctor sign-off, replaying from step 1 is not free. It re-sends emails, re-pings the insurer's portal, re-asks the arts to confirm a diagnose she already confirmed yesterday.
Burr wins this outright. Every transition is persisted with the action input and the resulting state, keyed by an app_id and a sequential sequence_id. Resuming a stuck run is one builder call:
from burr.core import ApplicationBuilder
from burr.tracking import LocalTrackingClient
app = (
ApplicationBuilder()
.with_actions(check_dekking, verify_diagnose, attach_onderbouwing, submit)
.with_transitions(...)
.initialize_from(
LocalTrackingClient(project="prior-auth"),
resume_at_next_action=True,
default_state={},
default_entrypoint="check_dekking",
)
.with_identifiers(app_id="aanvr-2026-06-11-7421")
.build()
)
The Burr UI lets a developer click into aanvr-2026-06-11-7421, see the state diff at each step, and replay from any node. For a team that does machtigingen all day, that is the difference between a calm post-mortem and a Friday-evening rebuild.
Pydantic-AI graph supports resume too, but the persistence story is younger. You implement BaseStatePersistence and decide what to store. There is no shipped UI; you build it or stare at JSONB in a psql window. The graph itself is clean, the typing is genuinely good, but the operational layer is yours.
The hand-rolled Postgres version is exactly as good as you make it. The pattern that works: every action is a row in job_events with (job_id, step_id, input_state, output_state, attempt). Resume = pick the last successful step, restore output_state into the worker, run step_id + 1. That holds for a single workflow. For five workflows that drift apart, you are now maintaining your own little Burr.
If "replay from step" is in your runbook, do not write the persistence layer yourself. It is the single hardest thing to get right and the first thing you will regret hand-rolling.
Schema migrations and who carries the pager
Every option has tables. The question is which tables your team owns and which the framework owns.
Burr's persistence backend creates and migrates its own tables. The PostgreSQLPersister writes to a schema you point it at, and its migrations are versioned inside the Burr release. Your Alembic does not know about them and does not need to. The cost: when Burr upgrades its schema, you upgrade Burr on a quiet afternoon, not during an outage.
Pydantic-AI graph is the opposite. Persistence is your problem. You write the table, you write the Alembic revision, you write the JSON serialiser for whichever version of the state model is current. The benefit is that your DBA already understands the table. The cost is that nobody outside your team has tested it.
Hand-rolled is, of course, all yours. The schema is small (two tables, one enum) and easy to migrate. The trap is the second workflow. Prior-auth is one shape, but the same agent will eventually do post-treatment claims, declaration corrections, and patiëntportaal triage. Each one wants slightly different state. By the third workflow you have either built a generic state column (and lost the typing) or three near-duplicate tables (and lost your weekends).
What the hospital IT-beheerder will actually approve
This is the part the framework comparisons never cover, and it is the part that kills the deal. A Dutch hospital's IT department runs NEN 7510 and a procurement process that treats every Python package on the requirements file as a supply-chain liability. Three things they will ask.
Where does the state live?
All three options can put it in a Postgres they already run. Burr and the hand-rolled version do this by default. Pydantic-AI graph does it once you wire the persister. None of them require an external SaaS, and all three can stay inside the hospital VPC. Good news across the board.
How many transitive dependencies?
The hand-rolled version is the smallest: psycopg, your worker, done. Pydantic-AI graph pulls in pydantic-ai, which is a non-trivial but well-maintained tree. Burr brings its own dependencies plus an optional tracking server you will need to either deploy or disable. For a beheerder doing a dependency review before NEN 7510 sign-off, "smaller is faster to approve" is real.
What happens when the agent misbehaves?
The IT-beheerder has read the same stories you have about agents doing destructive things to file systems they were never supposed to touch. They want to know: is there a kill switch, is there a rate limit on tool calls, is every action auditable. Burr gives you a transition graph the beheerder can read in the UI. Pydantic-AI gives you typed nodes. The hand-rolled version gives you whatever you wrote. None of them stop a bad action on their own. All three make it auditable if you set them up that way.
Do not demo the framework's web UI to the IT-beheerder unless it is locked behind the hospital SSO. Burr's tracking UI is a separate FastAPI app and ships with no auth out of the box. Put it behind an auth proxy or do not show it.
The pick
For the Utrecht firm specifically, the answer is Burr, with one caveat. The replay UI alone pays for the upgrade cost. The persisted state at every step solves the duplicate-machtiging incident their CTO is staring at. The Postgres backend keeps the IT-beheerder calm. And the action/transition model maps cleanly to how a Dutch prior-auth flow actually reads in the wet: check dekking, verify diagnose, attach onderbouwing, submit, parse response, escalate-if-AGB-issue.
The caveat: do not deploy the Burr tracking UI to the hospital network unless you own the auth in front of it. Run it on the developer laptop against a read replica, or put it behind an SSO proxy. The tracking data is patient-adjacent.
The Pydantic-AI graph would be our pick if the team had two senior engineers who enjoy writing infrastructure and the workflow was likely to stay singular. The types are excellent and the code reads well. It loses on operability today.
The hand-rolled version would be our pick if the team was three people and the workflow was strictly linear with no human-in-the-loop. That is not this team.
What to do tomorrow
If you are sitting in the same decision, start by writing your workflow on a whiteboard as a state machine. Count the states. Count the transitions. If the result is fewer than eight states and one workflow, hand-rolled is fine. If it is more, or you already know workflow two is coming, install Burr in a branch and port one path. You will know inside a day whether the abstraction fits.
When we built a claims-triage agent for a Dutch zorg-domein client, the thing that bit us was exactly this: we picked the lightest abstraction, shipped, and then spent six weeks rebuilding the persistence layer when a second workflow landed. We ended up porting onto an explicit state machine, and have defaulted to that for every AI agent since.
The smallest thing you can do today: open your current agent code and count the except Exception blocks. If it is more than two, you do not have an orchestrator. You have a retry loop wearing a hat.
Key takeaway
If 'replay from step' is in your runbook, do not write the persistence layer yourself. It is the hardest thing to get right and the first thing you will regret hand-rolling.
FAQ
What is Burr?
A Python library from DAGWorks for building stateful AI agents as state machines, with per-step persistence into a backend you pick and a local debug UI for replaying runs.
Can a Pydantic-AI graph persist state for resume?
Yes, but you implement the backend yourself via the BaseStatePersistence protocol. There is no shipped UI for inspecting or replaying runs, so the operability layer is on you.
When is a hand-rolled Postgres state machine the right call?
For one linear workflow with no human-in-the-loop and a small team, it is fine. For multiple workflows or branching review steps you will re-invent an orchestrator within months.
What does NEN 7510 imply for AI agents in Dutch hospitals?
It is the Dutch information-security standard for healthcare. For agents it implies auditable actions, access controls, dependency review, and a documented data-processing chain inside the hospital VPC.