Tooling

agents.md playbook: seven sections that move the needle

A Tuesday in Groningen: four engineers opened Codex on the same monorepo and shipped four different broken PRs. The fix wasn't the model. It was the missing file.

Jacob Molkenboer· Founder · A Brand New Company· 8 Jun 2026· 9 min

Manila folder with typewritten index card, chartreuse sticky note, brass paperclip, leather notebook, red wax seal on ivory paper.

A Tuesday in May, top floor above a coffee roastery on the Oude Kijk in 't Jatstraat in Groningen. Four engineers at a 41-person SaaS had each opened OpenAI's Codex on the same backend monorepo that morning. By 16:00 three of them had pushed PRs that broke the staging build. The fourth had spent the day asking Codex why a Drizzle migration kept rolling back. The senior on call closed his laptop and said the line that started this engagement: it was faster to do it by hand than to explain the codebase to the model every single time.

Three days later we shipped an agents.md. Two days after that, the same four engineers were merging Codex-authored PRs into staging without a babysitter. The team's metric was blunt: onboarding a fresh Codex session from "I want to add a webhook" to a green CI run had gone from roughly three days of confused back-and-forth to one focused afternoon.

This is the file we wrote, and the seven sections that did the work. Five other sections we tried got cut because they did nothing, or made things worse.

Why the README is the wrong file

An HN front-page thread this week asked whether agents.md files actually help coding agents, and the comments split exactly the way you would expect. Half the room said it is a README with a fancy name. The other half said it is the difference between an agent that ships and an agent that hallucinates a Stripe integration that does not exist in your codebase.

Both are right, and the reason is in what each file optimises for. A README is written for a human who will read it once, skim it, ask the team a question, and then learn the rest by reading code. An agents.md is written for a model that will read it every single turn, has no team to ask, and learns by reading code that is too large for its context window. Those are different jobs.

The agents.md convention that several coding tools now load automatically (Codex, Aider, Sourcegraph Amp, our own internal harness) lets you write one file the agent picks up without ceremony. What goes inside it is on you. After three rewrites, here is what we kept.

1. Runtime preconditions, with the exact pins

The first section is six lines long. Node 20.11.1. pnpm 9.4.0. Postgres 16. Redis 7. The pnpm version matters because lockfile v9 will silently regenerate under v8 and the model will then "fix" the diff. Postgres 16 matters because the codebase uses generated columns that 15 doesn't support, and Codex tried to rewrite them as triggers twice before we pinned the version.

The trick is the second half of the section: what to never install. The repo has a half-finished Bun migration in a feature branch, and Codex kept picking up the bunfig.toml and assuming Bun was the runtime. One line ("This repo runs on Node. Bun configuration files are leftovers from a 2025 experiment. Do not run bun install.") cut that loop entirely.

2. The two commands that always work

Codex will try every command in the package.json scripts block. Most of them either don't work in isolation, or hang waiting for a TTY. The fix is to list the two or three commands the agent is allowed to use, and explicitly warn off the others.

# Running the project

You can rely on these:
- `pnpm dev`        : starts the API on :4000, hot reload
- `pnpm test:unit`  : runs in ~12s, no DB needed
- `pnpm test:int`   : runs against a Docker Postgres, ~90s

Avoid:
- `pnpm start`      : production build, expects secrets from Doppler
- `pnpm test`       : runs e2e, needs a browser, hangs in CI mode
- `pnpm db:reset`   : wipes the local DB without confirmation

That last line saved an engineer's morning twice in the first week.

3. The repo map, with reasons

A normal README repo map says "src/billing handles billing." That's not enough. The agent needs to know which folder owns a concern, and which folders look related but are not. So we wrote the map as a list of rules instead of a tree.

# Where things live

- `src/billing/`           : Stripe webhooks live ONLY here.
                              If you find yourself adding a
                              second webhook handler somewhere
                              else, stop and ask.
- `src/inbox/`             : the multi-tenant inbox. Never
                              import from `src/legacy-inbox/`.
                              That folder is on a deletion
                              schedule.
- `src/jobs/`              : BullMQ workers. One file per
                              queue. New jobs need a row in
                              `config/queues.ts`.
- `packages/db/`           : Drizzle schema. The only place
                              schema lives.
- `packages/contracts/`    : shared Zod schemas. If a type is
                              used by both API and worker, it
                              goes here.

The "never import from" rule mattered most. The codebase has two inbox implementations because a migration was paused. Without the rule, Codex would happily wire features into the dead one, because the dead one had more code comments.

4. The conventions that fail CI silently

This is the section we kept rewriting. The agent does not break the obvious rules. It breaks the ones where the lint rule lives in a config file three folders deep and the error message says "fixable with --fix," so Codex assumes it is not important.

Three rules went in:

Import order is enforced by eslint-plugin-import. Always run pnpm lint --fix before committing.
No relative imports across packages. Use the @app/* aliases. Codex's default is ../../../ and it will sail through TypeScript and fail at runtime.
Tests use Vitest, not Jest. Do not add Jest types. The two have overlapping APIs and the LSP will autocomplete the wrong one.

Warning

If you write this section as "follow the existing style," you have written nothing. The agent already does that, and it is wrong roughly a third of the time because the existing style is not consistent. Name the rule, name the tool, name the failure mode.

5. The domain glossary

Every B2B SaaS has three words that mean different things in different files. At this company it was "workspace," "tenant," and "policy." A workspace is a customer's logical container. A tenant is a Postgres row-level-security boundary, which usually but not always maps one-to-one to a workspace. A policy is either a billing policy (in src/billing) or an authorization policy (in packages/auth), and they share a type name.

Before the glossary, Codex would generate a function called getPolicyForTenant that combined the two concepts and called the wrong service. After the glossary, it asked which one we meant. That is the entire game.

The section is four paragraphs. We did not try to be exhaustive. We listed the words that overload, gave the canonical definition, and named the file where the canonical type lives.

6. The migration ritual

Database migrations are where agents make the most expensive mistakes, so this section is the longest in the file. It is also the most rule-shaped, so this is the one part of the doc that reads like a flight checklist.

# Migrations

We use Drizzle Kit. The process is:

1. Edit the schema in `packages/db/schema/*.ts`.
2. Run `pnpm db:generate`. This creates a new SQL file
   in `packages/db/migrations/`.
3. Read the generated SQL. If it includes `DROP COLUMN` on
   a column that is not behind a feature flag, STOP and ask.
4. Run `pnpm db:migrate` to apply locally.
5. Commit BOTH the schema change AND the generated SQL
   file in the same commit.

Hard rules:
- Never edit a migration file that has already shipped to
  staging. Write a new one.
- Never use `db:push`. It bypasses the migration history
  and we cannot replay it on production.
- Backfills go in a separate migration from the schema
  change. Schema first, deploy, then backfill.

The "never edit a shipped migration" rule was the one Codex broke most often before we wrote it down. The model's instinct is to fix the file that has the bug. The correct move is to write a new file that undoes it. That distinction is not obvious from reading the codebase, because the migration files all look the same.

7. The definition of done

The last section is a checklist the agent runs before it claims a task is finished. It exists because Codex, by default, will say "I've added the endpoint" the moment the file compiles. Compiling is not done.

# Before you say a task is done

Run these in order. If any fail, the task is not done.

1. `pnpm lint` : zero warnings, not just zero errors.
2. `pnpm typecheck` : must pass with no `// @ts-expect-error`
   added.
3. `pnpm test:unit` : all green.
4. If the task touched `packages/db/`, run `pnpm test:int`
   too.
5. Open the diff. If it includes a new dependency in
   package.json, justify it in the PR body.

That last rule cut a class of PR where the agent had pulled in a 40KB date library to format one ISO string.

What we cut

The first draft of the file was 900 lines. We cut five sections, because the agent either ignored them or got worse with them in context.

We cut the architectural overview. Three paragraphs about "event-driven, eventually consistent, CQRS-influenced" made the model over-engineer a feature that just needed a database row.

We cut the team contact list. The agent has no one to message.

We cut a "common pitfalls" section that was a graveyard of past incidents. None of them were relevant to the next task and they crowded out the rules that were.

We cut a section called "design principles." It made the model write essays in PR descriptions instead of code.

And we cut a list of preferred libraries. Codex was already picking the right ones because they were in the repo. Listing them again made it second-guess.

Takeaway

Write the agents.md for the agent's failure modes, not for a new hire. Every line should be there because, without it, you watched the model do the wrong thing.

Keeping it from rotting

An agents.md that nobody updates is worse than not having one, because the model will trust it. We added one rule to the engineering handbook: any PR that adds a new rule the agent should know about adds a line to agents.md in the same commit. The file grew by 14 lines in its first month. None of them were architecture. All of them were "the model tried to do X, here is why X is wrong."

The other useful habit is to re-read the file out loud once a quarter. Anything that is no longer true gets deleted. Anything that is vague gets named. Anything you cannot justify gets cut. The file is currently 312 lines, and the team thinks it is about a hundred lines too long. They are probably right.

The five-minute version

If you want to start today: open a new file called agents.md at the root of your repo. Write only sections one, two, and seven (preconditions, two commands that work, definition of done). Ship that. Give it a week. Then add a glossary section the next time the agent confuses two of your domain words. That is the entire process. The seven-section version is what you get after a month of writing down what went wrong.

When we built the agents.md for the Groningen team, the thing we kept running into was that Codex would call the pgvector helper from the API layer instead of the worker, where it belongs. We did not fix it in the code. We fixed it with two lines in section three of the file. If you are hitting the same wall with your coding agents, that is the kind of work we do as AI agents consultants: short engagements, the file lives in your repo, your team owns it after.

Key takeaway

Write agents.md for the model's failure modes, not for a new hire. Every line earns its place by fixing something you watched the agent do wrong.

FAQ

What is an agents.md file?

A plain-text file at the root of a repo that AI coding agents read automatically. It tells the agent how the codebase runs, what to avoid, and when a task is done.

How is it different from a README?

A README is for a human who will skim once and ask the team. An agents.md is for a model that reloads it every turn, can't ask anyone, and needs explicit rules instead of vibes.

Which AI coding tools read agents.md?

Codex, Aider, Sourcegraph Amp, and several in-house harnesses load it automatically. Other tools can be pointed at it manually as part of their system prompt.

How long should an agents.md be?

Long enough to cover the failure modes you actually see, short enough that the team will keep it true. Most of ours land between 200 and 400 lines.

What's the single highest-leverage section?

The definition of done. It stops the agent from claiming a task is finished the moment the file compiles, which is the single biggest source of bad PRs.

toolingai agentsworkflowoperationscase studyarchitecture

Building something?

Start a project