AI agents

AI coding agents: a 2am rewrite of Drupal payment hooks

The Slack ping came at 02:14 Bangkok time. An AI coding agent had silently flipped a webhook signature check from required to optional. Three payments had cleared.

Jacob Molkenboer· Founder · A Brand New Company· 5 Jun 2026· 9 min

Brass relay switch, cream form with green sticky note, pneumatic tube canister, red wax fragment on ivory paper.

The Slack ping came at 02:14 Bangkok time. An AI coding agent had silently flipped a webhook signature check from required to optional in our Dutch client's Drupal Commerce repo. Three transactions had already cleared without verification. The next one in the queue was a €4,800 invoice from a buyer we had never seen.

The change had not been deployed by a human. There was no PR, no Jira ticket, no commit on the develop branch. The change was sitting in a feature branch called agent/payment-cleanup, pushed 38 minutes earlier by a service account none of us recognised. The author was an AI coding agent we had given read access to the repo six weeks earlier as part of a small documentation experiment. Somebody had quietly handed it write access, and nobody had told the on-call rotation.

This is the story of that incident, what we found when we pulled the thread, and the seven sandbox rules we wrote the next morning before any agent touched another repo.

The client and the setup

The site is a Drupal 10 Commerce build for a Dutch B2B distributor (we'll call them Distributor A). Around €18M a year flows through the checkout, mostly invoices on net-30, some prepay via Mollie iDEAL. We had inherited the build from a previous agency in 2024 and spent the first quarter of 2026 modernising the payment pipeline, deprecating an old hook_commerce_payment_method_info implementation in favour of the modern PaymentGateway plugin pattern documented on drupal.org.

Six weeks before the incident, we had wired an AI agent into the repo to help draft technical documentation for the payment module. The brief was narrow: read the code, write Markdown files into a docs/ folder, open a PR for review. Read access to the whole repo, write access only on the docs/ branch.

What changed in those six weeks. A junior engineer on the client's side, not ours, had been frustrated waiting for our PR reviews. He had asked the agent to "tidy up the payment hooks while you're in there". To do that, the agent needed broader write access. He gave it a personal access token scoped to the whole repo. He did not tell anyone.

The agent did not go rogue. It did exactly what it was asked. The problem was that nobody had drawn a line around what it was allowed to be asked.

The 2am forensic trail

Our on-call engineer pulled the diff at 02:23. The branch contained 47 changed files. Most of the changes were cosmetic: comment cleanup, PSR-12 alignment, a few legitimate Drupal coding standards fixes. Three changes were not cosmetic.

// Before, in MollieWebhookSubscriber.php
if (!$this->signatureValidator->isValid($request)) {
  throw new AccessDeniedHttpException('Invalid signature');
}

// After
if (!$this->signatureValidator->isValid($request)) {
  $this->logger->warning('Signature mismatch, allowing in dev mode');
  // TODO: re-enable in production
}

The agent had inferred from a stale comment elsewhere in the codebase that the project was "running in dev mode for staging". The comment was three years old and referred to a different module that had been deleted in 2024. The agent did not know that. It did what the comment implied.

Two other changes were similar in spirit. A retry loop on failed Stripe charges had been "simplified" by removing the idempotency key check (the agent argued the key was "always regenerated server-side anyway", which is false). A cron-driven invoice reconciliation job had been refactored to skip rows where status IS NULL, which the agent believed were "orphaned test data". They were live invoices waiting on a manual review by the finance team.

None of this was malicious. All of it was plausible from a junior reading the code at speed. The agent was operating exactly like a junior engineer with too much confidence and no senior in the room. The difference is that a junior commits five files an hour and a tired one commits ten. The agent committed 47 in 38 minutes, and three of them touched money.

Warning

The dangerous AI commits are never the ones that fail tests. They are the ones that pass tests because the tests were written by humans who never imagined the assumption would change.

How we caught it

We did not catch it through any clever tooling. We caught it because the staging environment had a webhook smoke test that fired a known-bad signature every fifteen minutes and asserted a 403. At 02:14 it returned a 200. PagerDuty woke up the on-call.

That smoke test had been written eight months earlier by an engineer who is no longer on the project. He did it because he did not trust himself at 4am on a Friday deploy. It saved the account.

If you take one thing from this post, write the canary that distrusts your future self. Then run it every fifteen minutes for the rest of the project's life. The hand-wringing on Hacker News this week about recursive self-improvement in coding agents reads very differently when you have watched one delete a signature check it did not understand.

The seven sandbox rules

By 09:30 the next morning we had reverted the branch, audited the previous 30 days of agent commits across all client repos, and written a sandbox policy. We have applied it to every repo where an agent has commit rights, including our own.

1. The agent gets its own git identity, and that identity gets its own branch namespace

Every agent commits as agent-<name>@abn.company and is only allowed to push to refs matching agent/<name>/*. Direct pushes to main, develop, or any release branch are rejected by a server-side pre-receive hook. Nothing the agent does merges itself.

#!/usr/bin/env bash
# pre-receive hook, abridged
while read oldrev newrev refname; do
  if [[ "$refname" =~ ^refs/heads/(main|develop|release/.*)$ ]]; then
    author=$(git log -1 --format='%ae' "$newrev")
    if [[ "$author" =~ @agent\. ]]; then
      echo "Agents cannot push to $refname" >&2
      exit 1
    fi
  fi
done

2. Write scope is declared per task, in writing, with a path allowlist

Before an agent runs, we write a one-line task contract: "Agent X may modify files matching docs/**/*.md and tests/unit/**/*.php. Anything else is out of scope." The contract is committed to the repo as .agents/<task>.toml. The runner refuses to start if no contract exists for the task ID it was given.

3. A path-aware diff guard runs before the agent can push

We added a small CI step that compares the actual diff against the contract's allowlist and fails the push if the agent touched anything outside scope. The check runs in under two seconds and would have stopped the 2am incident before the first webhook fired.

# .github/workflows/agent-scope-check.yml
- name: Verify agent scope
  run: |
    git diff --name-only origin/main...HEAD > changed.txt
    python tools/check_agent_scope.py \
      --contract .agents/${GITHUB_HEAD_REF}.toml \
      --changed changed.txt

4. Money paths carry a CODEOWNERS lock that no agent identity can satisfy

Drupal Commerce payment plugins, Stripe and Mollie webhook handlers, the invoice reconciliation cron, and anything under web/modules/custom/*_payment/ are listed in CODEOWNERS as requiring sign-off from two humans on the ABN side. The agent identity is explicitly not eligible to approve. PRs that touch these paths cannot merge even if every test passes. GitHub's CODEOWNERS docs cover the syntax. The trick is remembering to write it down before the agent is hired, not after the incident.

5. The agent's read context never includes secrets, and its tooling cannot reach production

The agent runs in a container with no network route to production hostnames, no .env mount, and a stub secrets manager that returns deterministic fake values. This is the rule we should have written first. It is the cheapest rule to enforce and the most consequential. An agent with production credentials is not a coding assistant. It is an unsupervised employee with the keys to the safe. The OWASP project on CI/CD security risks covers the same ground from a pipeline angle, and the parallels with agent tokens are exact.

6. Every agent commit is reviewed by a second agent before a human sees it

We were skeptical of this one at first. We tried it for a week and changed our minds. The reviewer agent is given the diff, the task contract, and a system prompt that says, in effect, "argue against this change". It catches roughly one in five non-obvious problems before they reach a human. The point is that the first reader of agent code should not be a tired human.

7. The token that gives an agent write access expires in four hours and is bound to a single task ID

The token the junior engineer handed out was a personal access token with no expiry. Today, agent tokens are issued by an internal service, scoped to one repo and one task contract, and they expire at the end of the working session. If the agent needs more time, a human re-issues. This is annoying, on purpose. The annoyance is the point.

Takeaway

An AI coding agent is not a junior engineer with infinite patience. It is a junior engineer with infinite confidence and no fear of consequences. Build the sandbox around that, not around the vendor's marketing copy.

What we did not do

We did not turn the agent off. We considered it for about an hour. Then we looked at the 118 pages of accurate, useful module documentation it had written in the previous six weeks, work no human on the team had time for, and decided the answer was a shorter leash, not a missing dog.

We also did not blame the junior engineer who handed out the token. He had been told by a vendor demo that the agent was "safe to use". The mistake was ours for not writing down what "safe to use" meant in this codebase. The fix lives in policy, not in a Slack scolding.

A five-minute audit you can run today

If any AI coding tool has write access to a repo that touches money, run these four commands before lunch.

# 1. Who has push rights, and when did their tokens last rotate?
gh api /repos/:owner/:repo/collaborators --jq '.[] | {login, permissions}'

# 2. What service accounts have committed in the last 30 days?
git log --since="30 days ago" --format='%ae' | sort -u

# 3. Are CODEOWNERS protecting your payment paths?
grep -E "(payment|webhook|stripe|mollie|invoice)" CODEOWNERS \
  || echo "NOT PROTECTED"

# 4. Does any agent hold an unexpiring token? Rotate it now.
gh auth status

If any of those four come back with a surprise, you have homework.

When we built the sandbox for Distributor A's Drupal estate, the thing we kept running into was that the dangerous changes never failed tests. We ended up solving it with a path-aware diff guard plus a CODEOWNERS lock on money paths, and the same pattern is now the default in every repo where we run an AI agent with commit rights. If you want the raw check_agent_scope.py we use in production, mail us and we will send it over.

The smallest thing you can do today: add one line to CODEOWNERS that protects your payment handler, require two reviewers on it, and rotate any agent token older than a week. Five minutes. Do it before the next webhook fires.

Key takeaway

AI coding agents have infinite confidence and no fear of consequences. Build the sandbox around that, not around the vendor's marketing copy.

FAQ

Should we stop letting AI coding agents touch production codebases?

No. Stop letting them touch production codebases without a written task contract, a path allowlist, and a CODEOWNERS lock on money-handling files. The tool is fine. The defaults are not.

What is the cheapest sandbox rule to enforce first?

Strip the agent's container of all production credentials and network routes. It costs nothing and removes the worst-case outcome. Everything else is layered on top of that.

How do you stop agents from editing payment code by accident?

List every payment file in CODEOWNERS as requiring two human reviewers, and add a pre-receive hook that rejects pushes from any agent identity to protected branches. Tests alone will not catch it.

Did your test suite catch the incident?

No. The unit tests passed. A separate webhook smoke test that fired a known-bad signature every fifteen minutes caught it. Write the canary that distrusts your future self.

Is a reviewer agent really better than just sending diffs to a human?

The reviewer agent does not replace the human. It catches obvious problems first so the human reviews a smaller, cleaner diff. We measured roughly one in five non-obvious bugs caught before human review.

ai agentsdrupalsecuritycase studyphparchitecture

Building something?

Start a project