← Blog

AI agents

Anthropic's verified tier: a 14-agent migration playbook

Sixteen working days, fourteen production agents, three of them computer-use. The runbook we used to clear the 8 July deadline without downtime.

Jacob Molkenboer· Founder · A Brand New Company· 22 Jun 2026· 9 min
Brass pneumatic tube canister, wooden switchboard with fourteen jacks, green ribbon, wax-sealed paper, manila tag on ivory.

Last Wednesday we cleared the meeting-room whiteboard and wrote one number at the top: 8 July 2026. Underneath, fourteen rows, one for each agent running in production across our client book. Three of those rows got a red dot: the computer-use scheduler for a Rotterdam logistics firm, the code-execution sandbox we run for an Antwerp fintech, and the file-access librarian sitting on top of a Drupal-to-Strapi migration corpus. The other eleven were inbox-triage and RAG agents. Workhorse stuff, nothing exotic. Sixteen working days to move the three red dots behind Anthropic's verified-organisation tier without taking the eleven boring ones offline in the process.

This is the playbook we used. It is the second time we have run a tier migration like this, and the first one taught us most of what is below. If you have a client book of agents and you have not started, you have a long weekend of paperwork ahead and a short window to get the technical side right.

The split that actually matters

Anthropic's updated usage policy draws the line in a place that looks technical but is really about blast radius. Agents that can move a mouse, run arbitrary code, or read a user's local files now sit behind an identity-verified organisation. Everything else, the inbox responders, the knowledge-base lookups, the single-tool webhook agents, keeps running on a standard API key. The Hacker News thread framed it as a compliance burden. From inside an agent shop, it reads more like Anthropic finally putting the high-capability surface on the same footing as a Stripe Treasury account: you can keep using it, but the org that uses it has to be a real, named legal entity that signed something.

The split matters because it lets you stop pretending eleven inbox agents have the same risk profile as one computer-use scheduler. They never did. Treating them the same is what made past audits painful.

Mapping capability to risk, not labels

The first hour was an inventory. We took every agent in the book and tagged each one by what it could actually do at the API level, not what marketing called it. The eleven low-risk ones were easy: each consumes a fixed scope (a Gmail label, a Pinecone index, an HTTP webhook) and emits text. The three flagged for the verified tier were harder, because two of them grew into their capabilities slowly.

The Rotterdam scheduler started as a Calendly mirror in 2024 and quietly acquired browser-automation when we added container-tracking last autumn. The Antwerp sandbox was always code-execution, but it had picked up a "read the customer's CSV from S3" tool that, on close reading, qualified as file-access. Labels lie. The tool list does not.

jq '.agents[] | {id, tools: [.tools[].name]}' inventory.json \
  | rg -e 'computer_use|code_execution|filesystem|browser' \
  | tee high-capability.txt

That four-line check produced the same three agents the human review had. We kept the script in the runbook anyway. When the next capability boundary moves, we want the audit to take five minutes, not five hours.

The identity verification path

Verifying our Dutch BV took two days end-to-end: the KvK extract, the UBO declaration, and a video call with the per-developer ID check. The Thai branch took longer. Allow a week if your second entity is non-EU, and start it first, not last.

Once verified, the org gets a separate API-key namespace. Old keys keep working on the standard tier. New keys, scoped to verified-only models and capabilities, live in their own ring. Do not try to reuse keys across the boundary. The audit log fields differ, and your SIEM will hate you.

Warning

The verified tier emits new audit fields (actor_identity_id, capability_grant_id) that your existing log shipper almost certainly drops. Update the schema before you cut over, not after. We lost a day to a Datadog parser silently truncating the new fields.

DPA addenda without the eight-way signature loop

Every client we work with signed a Data Processing Agreement against the agent capabilities that existed at signing time. Moving three of those agents into a new sub-processor relationship, Anthropic-as-verified-processor rather than Anthropic-as-standard-processor, is a material change under GDPR Article 28. You owe each affected controller an updated addendum.

Eight clients, sixteen days, roughly the same number of in-house counsels who only review contracts on Tuesdays. We did not have time for eight bespoke negotiations.

The shape that worked was an omnibus capability schedule: one addendum per client that references a versioned schedule (v2026-07) maintained by us, listing for each agent the capability tier, the sub-processor, and the data classes that touch it. Future capability shifts within the same tier bump the schedule version and trigger a notice email, not a re-signature. Capability shifts across tiers still require a fresh signature, which is the right default. That is exactly when the controller should look again.

Two clients pushed back on the schedule pattern and asked for a bespoke addendum each round. For those we ran the long version. The other six signed within nine working days.

The cutover without downtime

The eleven low-risk agents did not need to move. Half the engineering effort was resisting the urge to "tidy up" by migrating them anyway. They stay on standard keys. Their DPAs are unchanged. Leave them alone.

For the three high-capability agents, the cutover ran per-agent, not per-client, on a four-step loop:

  1. Provision a verified-tier key, scoped to one agent's tool list.
  2. Dual-write for 48 hours: both old and new keys handle live traffic. New-key responses are logged and diffed against old-key responses, but only the old responses go back to the client.
  3. Cut the response source to the new key. Keep the old key active and idle for seven days.
  4. Revoke the old key.

The 48-hour shadow window caught one real bug. The Antwerp sandbox's execute_python tool returned a slightly different error envelope on the verified tier (an extra capability_grant_id field that broke a downstream Pydantic validator). We caught it in hour six of the shadow, fixed it in 40 minutes, and the actual cutover was uneventful. Without dual-write, we would have shipped that regression to production.

Cloudflare temporary accounts as the credential side door

The trick we stole this round came from a Cloudflare write-up on temporary accounts for AI agents, published the same week as the Anthropic announcement. Similar instinct, different layer. For the three high-capability agents, the verified-tier API key is now extremely valuable: it carries the org's identity, and if it leaks, rotation involves another KYC pass.

So the verified key never touches the client's network. It lives in a Worker. The Worker mints a per-session disposable credential, scoped to a single agent run, and the client-side process talks only to that. If a session credential leaks, we revoke it in thirty seconds and the verified key is untouched. The pattern adds about twelve milliseconds of latency and removes an entire class of incident from the runbook.

export default {
  async fetch(req: Request, env: Env): Promise<Response> {
    const session = await mintSessionToken({
      agent: req.headers.get("x-agent-id"),
      ttlSeconds: 900,
      scope: ["computer_use:click", "computer_use:read"],
    });
    return fetch("https://api.anthropic.com/v1/messages", {
      method: "POST",
      headers: {
        "x-api-key": env.VERIFIED_KEY,
        "x-session-token": session.token,
        "content-type": "application/json",
      },
      body: req.body,
    });
  },
};

That is the entire bridge. The x-session-token is ours, not Anthropic's. The Worker verifies it on the way in and rejects requests that name an agent the token is not scoped for. Every high-capability call now has two independent failure domains, and the credential we actually care about lives in Cloudflare Secrets, not on a client's bastion host.

What we got wrong the first time

Three things, written down so the next migration is cheaper.

We treated DPA renegotiation as a legal exercise. The clients' security teams cared more about audit-log access than about the addendum wording. The second client we approached asked, before reading the schedule, whether they could pull the new capability-grant log into their own Splunk. We did not have an answer ready. After that we led with the audit posture and the addendum was the afterthought.

We underestimated non-EU KYC. The Thai entity took six working days, including one round of "please re-upload the UBO declaration with a clearer scan." If you have a second jurisdiction in your org, start that verification on day one.

We forgot about staging keys. Two of the eleven low-risk agents had staging environments that still used the old org's keys for convenience. They did not need to move, but our deprovision script tried to revoke them anyway. Tag your keys by environment before you start revoking anything, and write the revoke list by hand the first time you run it.

The smallest version you can run this week

If you have one agent and one client, the playbook collapses to: verify the org, scope a new key, dual-write for 48 hours, send a one-page addendum that references a versioned capability schedule, cut over, keep the old key warm for a week. The shape does not change with scale. Only the spreadsheet does.

When we built the high-capability AI agents for the Antwerp fintech, the thing that almost bit us in this migration was not the verification or the cutover. It was a Pydantic validator three hops downstream that crashed on a new audit field. We caught it because we shadowed real traffic for two days before cutting the response source. Shadow your traffic. Everything else is paperwork.

If you have not started, do this today: jq your agent inventory, grep for the four capability tool names above, and put a red dot next to anything that matches. The rest of the playbook reads off that list.

Key takeaway

Split your agents by what their tool list can actually do, dual-write for 48 hours, and let one versioned capability schedule carry every client DPA.

FAQ

What actually changes on 8 July 2026?

Computer-use, code-execution, and file-access tools move behind an identity-verified Anthropic organisation. Inbox, RAG, and webhook agents stay on standard keys. Mixed orgs need to split key namespaces.

Do we have to renegotiate every client DPA?

Only for clients whose agents use the high-capability tools. Use a versioned capability schedule so future shifts within the same tier do not trigger another signature round.

How long does identity verification take in practice?

Two days for a Dutch BV with KvK and UBO papers ready. Six working days for a Thai branch. Start the slowest jurisdiction first, not last.

Can we keep the same API keys across the tier boundary?

No. The verified tier emits new audit fields and lives in a separate key namespace. Provision fresh keys, dual-write for 48 hours, and keep the old key warm for seven days after cutover.

ai agentsautomationoperationsarchitecturesecuritystrategy

Building something?

Start a project