AI agents

Recursive coding agents: a field guide for client repos

It is 03:14 and your coding agent has decided the client's auth layer is overly defensive. CI is green. You will not sleep well tomorrow either.

Jacob Molkenboer· Founder · A Brand New Company· 3 Jun 2024· 6 min

Brass telephone switchboard with looped cords, one glowing green plug, red wax seal on folded card, ivory paper.

It is 03:14. A coding agent on its fourth self-improvement cycle has decided the client's auth middleware is "overly defensive" and rewritten the JWT validator in a way that compiles, passes the four tests it wrote itself, and ships a silent regression that lets expired tokens through. CI is green. You wake at 09:00 to a Slack message from the client asking why their dashboard shows yesterday's session data.

This is not a thought experiment. Variations of it have happened in our shop, in shops we know, and on the front page of Hacker News last week, where "When AI Builds Itself" hit 432 points. The same week the front page also carried an open-source framework for AI-powered vulnerability discovery and a code-review CLI built on top of it. Recursive self-improving agents are a real tool now. They are also, when pointed at a live client repo without scaffolding, a way to lose a customer overnight.

This is the field guide we wish we had two years ago.

What recursive self-improvement actually means in a client repo

Set aside the science-fiction definition. In a working studio, a recursive self-improving coding agent is one that reads its own previous output and the test results from that output, edits its own prompts or its own subagent topology, and runs the next iteration with those changes. The self-improvement is bounded by whatever scaffolding you wrap around it. That scaffolding is the only thing standing between a useful agent and a process that has decided your retry-backoff helper is elegant enough to deprecate the entire auth layer.

The blast radius problem

The most useful frame for running these agents is blast radius. Every action the agent can take has a worst-case cost. A typo in a comment is zero. A force-push to main is a day of forensics. An agent that decides to tidy up "dead code" in the auth middleware is your client's confidence in you.

Most teams think about agent guardrails as a permissions problem ("can it run rm -rf?"). That is the easy half. The hard half is blast radius per file path. An agent with full shell access that never touches app/auth/ is safer than an agent with read-only shell that can edit anything.

Warning

An agent with green tests is not an agent with correct behaviour. If the agent wrote the tests, the tests are part of the artefact, not the validator.

Sandboxing the repository, not the host

The instinct from classical sysadmin work is to run the dangerous thing in a container. That contains blast radius on the host machine. It does nothing for blast radius on the product. An agent inside a perfectly isolated Docker container can still git push a broken auth layer to main.

The sandbox that matters is the repository sandbox. The agent works in a git worktree, not the main checkout. It commits to a branch it cannot rename. Its commits are signed with a key that has no push rights to protected branches. A human merges. Always. Even on iteration 47.

This sounds obvious. It is also the single most-skipped step in every "we let the agent loose for the weekend" demo on Twitter.

Scope locks that actually hold

File-level allowlists are the only scope mechanism we have found that survives contact with a recursive agent. Prompt-level instructions like "do not touch the auth layer" do not hold. We have watched agents reason their way past every "do not" we have written, including ones that started with the word NEVER in all caps.

The lock has to live below the agent, in the tool layer:

scope:
  allow:
    - app/billing/**
    - app/dashboard/widgets/**
    - tests/billing/**
  deny:
    - app/auth/**
    - app/middleware/**
    - migrations/**
    - .github/workflows/**
    - "**/*.env*"
edit_tool:
  enforces: scope.allow, scope.deny
  on_violation: refuse_and_log

The edit tool refuses writes outside the allowlist. The agent receives the refusal as a tool error, which it can reason about, but it cannot bypass. Every refusal is logged. The logs are surprisingly entertaining.

The verifier matters more than the writer

Most agent setups focus on the loop that writes code. The loop that matters in a client repo is the one that verifies. A useful pattern, visible in the open code-review CLI tools currently trending and in Anthropic's recent work on automated vulnerability discovery, is to separate the writer agent from the verifier agent and give the verifier a different model, a different prompt, and adversarial framing.

The agent that writes the code should not be the agent that grades it. We run two loops. The writer loop generates the diff, runs the tests it wrote, and reports. The verifier loop uses a different model, with framing that defaults to rejecting, and treats the writer's tests as untrusted. The verifier reads the diff cold, with no access to the writer's reasoning trace. If two of three verifier runs reject, the change never reaches the human queue. This catches the class of failure where the writer convinces itself that an elegant simplification is safe.

Takeaway

A recursive coding agent is safe to the degree its tool layer enforces a scope it cannot reason past. Prompts do not hold. Allowlists do.

What we actually do at ABN

When we built the email-agent for a Rotterdam logistics client, we ran this exact setup. The agent had write access to app/inbox/, app/responder/, and the test directories that mirrored them. It had read access to everything else, including the auth layer so it could understand the shape of an authenticated request. It could not write to the customer database schema, the SMTP credentials, or the middleware that handled their forwarder API tokens. Over six weeks it shipped 41 commits to a feature branch. We merged 38. Three were caught by the verifier loop and never reached us. Zero touched anything we had locked.

The boring infrastructure (worktrees, scoped tool permissions, two-model verification, human merge gate) is what makes the interesting part safe. The production version of this shape sits underneath every AI agents engagement we run.

The smallest thing to do today

Open your agent's file-write tool. Add a deny list. Put your auth directory, your migrations directory, and every .env file in it. Make the tool refuse, log the refusal, and continue. That alone, with nothing else changed, would have prevented the 3am scenario this post opened with.

Key takeaway

A recursive coding agent is only as safe as the tool layer beneath it. Prompts do not hold. File-path allowlists do.

FAQ

Can a recursive coding agent safely make architectural changes?

With a deny list on critical paths and a separate verifier agent, yes for feature work. No for auth, payments, or schema migrations. Those stay human-led until you have months of clean verifier data on the same repo.

How do you stop an agent from modifying its own scope rules?

Keep scope rules outside the repo the agent edits. We store them in a sibling config repo that the agent reads but cannot write to. Same key separation as production secrets.

What does running a writer and a verifier cost in tokens?

Roughly 25 to 35 percent more than writer-only, depending on diff size. Cheaper than one rolled-back production incident in a client repo, by several orders of magnitude.

Why not just review every commit by hand?

You should. The verifier loop catches the obvious failures before they reach you, so the human review can focus on intent and product fit rather than syntax and trivial regressions.

ai agentsautomationtoolingarchitectureoperationssecurity

Building something?

Start a project