Strategy
AI coding hype: three things we still write by hand in 2026
We treat the AI coding wave the way we treated 2014 mobile-first: real, uneven, and useless as a slogan. Here are the three places we still hand-roll the code.

It is Tuesday standup. Our backend lead is talking through a payment-reconciliation bug that ate four hours of his Monday. Halfway through the second paragraph someone asks the obvious 2026 question: why didn't an agent just write this? He doesn't look up from his keyboard. He says: "Because I'd still be debugging what the agent wrote."
That answer has become a small rule at ABN. We ship AI agents for a living. Fourteen of them are live in production for clients right now. And there are exactly three places in our own codebase where we still write the code by hand, because an agent would cost more to supervise than to replace.
Before we get to those three, the framing matters.
The 2014 parallel
In 2014 every client meeting opened with the phrase "mobile-first." It was a real shift. Mobile traffic had crossed desktop, ad spend was following, and any agency that didn't rebuild its delivery pipeline for small screens was about to lose the next contract. We did rebuild. So did everyone we respected.
We also watched a generation of agencies sell "mobile-first" as a magic phrase, charge a premium for it, and quietly ship the same desktop sites with a CSS media query bolted on. The wave was real. The hype around the wave was not.
The AI-coding wave in 2026 looks identical to that. The shift is real. The infrastructure work to make coding agents actually useful is the most interesting problem we touch right now, and a Hacker News post on harness engineering with Codex hit the front page this week for a reason. We use these tools every day. They are the difference between a five-day estimate and a one-day estimate on most greenfield work.
But like 2014, the wave is uneven. There are categories where an agent eats the work whole, and categories where the work eats the agent.
What changed and what didn't
Mobile-first was a device shift. The user moved from a 24-inch monitor to a 5-inch screen with one thumb. The engineering problem was layout and bandwidth. Once you had a build pipeline that produced a responsive bundle, the problem was solved for thousands of pages at once.
AI-coding is a probability shift. The model produces code that is right most of the time and wrong some of the time, and the wrong cases are not randomly distributed. They cluster at the exact places where being wrong is most expensive. That clustering is what changes the hand-roll math.
A senior engineer reviewing an agent's pull request has to read every line as if it might be the case where the agent confidently invented a function signature, swapped a <= for a <, or returned a currency amount in cents from a function the caller assumes returns euros. The review time on those PRs is sometimes longer than the time it would have taken to write the code from scratch.
That is the babysit cost. When the babysit cost is higher than the write cost, you hand-roll.
An agent that is mostly correct is wonderful where errors are reversible and rare. It is a liability where the rare error lands on the bill, the database, or the calendar.
Where we still write by hand
Money math and payment idempotency
Anything that moves a euro from one ledger to another is hand-rolled. We do not let an agent generate the function that computes VAT on a partial refund, the retry logic around a Stripe webhook, or the idempotency key derivation for a recurring charge.
Two reasons. First, the failure mode is asymmetric: a refund computed at 21% instead of 9% lands in a Belastingdienst audit, not in a unit test. Second, the correct pattern is documented and small. Stripe's idempotency-key guidance fits on one page, and once you have written the wrapper once you reuse it. There is nothing for an agent to discover that a human reading the docs has not already encoded.
The same logic applies to currency arithmetic. Floats are a known trap. We use integer minor units everywhere, with a single Money type that refuses to compare different currencies. An agent will happily generate code that returns 12.34 when the caller wanted 1234. The review cost of catching that on every PR is higher than the write cost.
Database migrations
We write every production migration by hand. Not the schema diff, which a tool can generate, but the runbook around it: which statements take a table lock, which need CREATE INDEX CONCURRENTLY, which need to be split into a backfill phase and a constraint phase, and what the rollback looks like if the backfill is half done when the deploy is paused.
A migration is a one-way door under load. An agent that ships a migration which acquires an ACCESS EXCLUSIVE lock on a 40-million-row orders table at 11am on a Wednesday has just deleted an hour of revenue. We have read enough of these in client rescues to be permanent skeptics. The PostgreSQL docs on concurrent index creation are clear about the trade-offs, and the cost of getting them wrong is paid in customer trust, not test failures.
The agent can help us draft the runbook and review it. It does not write the final version.
Time-zone and scheduling code
The third one surprises people. Anything that involves time math is hand-rolled, including reminder cadences, cron expressions, DST transitions, and the "send this at 09:00 local for each user" loop that almost every product has.
Time zones are not a programming problem. They are a political problem expressed in code. The IANA tz database is updated multiple times a year because governments change the rules. An agent trained on a snapshot of the world's source code will produce code that looks right and ages poorly. The bugs surface six months later, on a single Sunday in October, in a single country, for users who do not file tickets in your language.
We write this code by hand, in one place, with one library, with a test suite that pins the tz data version. We do not let an agent regenerate it.
What the parallel actually predicts
The mobile-first wave produced two durable outcomes. The studios that took it seriously rebuilt their pipelines and are still in business. The studios that treated it as a tagline are not. The same will be true of this wave. The studios that build real review pipelines around coding agents, with checks and gates and humans on the dangerous paths, will keep shipping. The studios that paste a chat into the editor and ship the result will eat a payment-math bug, a migration outage, or a DST disaster in the next twelve months.
The reverse failure mode is also real. Refusing to use agents at all is the 2014 equivalent of building only for desktop. We do not recommend it to anyone.
The honest position is the boring one. Use agents on the work where being mostly right is acceptable, and hand-roll the rest. When we built the invoice-chase agent for a Rotterdam wholesaler this spring, the chase logic itself was agent-written and reviewed in an afternoon. The ledger reconciliation that decides which invoices are actually overdue was hand-written by a human, because that decision moves money. If you want to see how we draw that line in client work, our notes on AI agents walk through it.
A useful five-minute audit for your own codebase: open the file that handles money, the file that handles migrations, and the file that handles scheduling. Ask whether you would let an agent rewrite each of them tonight, unsupervised, against production. The files where the answer is no are the files you are still going to write by hand in 2027.
Key takeaway
Use coding agents where being mostly right is acceptable, and hand-roll the paths that touch money, migrations, or time. The rare error lands where it hurts most.
FAQ
Is hand-rolling code the same as refusing AI?
No. We use coding agents every day for greenfield features, UI work, and most CRUD. Hand-rolling is reserved for the small set of paths where an error is expensive and irreversible.
Which paths are safe for a coding agent?
Anything where the error is reversible, cheap, and obvious in test: UI components, form validation, internal tools, scripts, fixture generators, and most one-off integrations.
What is the simplest test for whether a path is agent-safe?
Ask whether you would let the agent rewrite it tonight, unsupervised, against production. If the answer is no, it stays hand-rolled until your review pipeline closes that gap.
How does the 2014 mobile-first parallel actually help?
It reminds you the shift is real but the slogan is hollow. Studios that rebuilt their pipelines survived. Studios that bolted a media query on the old site did not. Same shape, different decade.