AI agents
AWS Bedrock to direct Anthropic: a 4-day agent rewire
A 23-person Antwerp dispatch team woke up on Monday with an AWS Bedrock bill that had quadrupled over the weekend. Four days later the agent ran on a direct Anthropic key.

Monday morning, 06:42 Antwerp time. Wim, the head of platform at a 23-person logistics integrator off the Plantin en Moretuslei, opens his AWS cost-explorer email before the espresso lands. The dispatch agent that routed €4.1M of freight orders last month had spent more on inference between Friday night and Sunday lunch than in the entire previous quarter. The bill had quadrupled. Nobody had shipped a feature. Nobody had touched a prompt.
By Thursday afternoon the agent was running on a direct Anthropic key, the Bedrock client was demoted to a warm fallback, and the legal team had signed a fresh data-processing addendum with Anthropic instead of AWS. This is what those four days looked like, and what we would do differently if we were running the same migration today.
The bill that broke the assumption
The dispatch agent was unremarkable. It ingested a load request from a customer portal, called a set of tools that hit the carrier database, the rate sheet, and the customs ledger, then returned a routing decision with a confidence band. About 18,000 calls a day. Most calls were short. The team had been running it on Claude Sonnet via AWS Bedrock since spring 2025, in the Frankfurt region, behind a VPC endpoint with PrivateLink. The setup was boring on purpose, which is the highest praise an operations team gives a piece of infrastructure.
What changed on the weekend was a combination of two unrelated knobs being turned at once. First, AWS quietly rolled out new data-handling terms for newer Anthropic models on Bedrock, including the routing the company had been using. The change had been on a roadmap thread for weeks; the actual switch landed on a Friday. Second, an internal change to how loads were being parsed had pushed average input tokens from roughly 2,800 to roughly 7,400. Neither change by itself would have been a crisis. Together, they were.
The agent had been billing somewhere around €180 a day. By Sunday it was on track for €740 a day, with no ceiling in sight. The team had a hard contractual cap of €25,000 a month with their largest customer for what the contract called “intelligence services.” They were going to blow through it by the 14th.
What the dispatch agent actually does
Before we walk through the migration, the shape of the agent matters, because it dictated which migration was even possible.
The agent is a single Claude-driven loop with eight tools: find_carriers, get_rate_sheet, check_customs, book_slot, quote_customer, log_anomaly, escalate_to_human, and cancel. The system prompt is 1,200 tokens and codifies a few hard rules the dispatchers refuse to bend on, including never quoting a customer before customs is checked, and never booking a slot inside 90 minutes of cutoff without human escalation. Average conversation is 4 to 6 tool calls and then a structured JSON output.
The team had wrapped this in their own SDK so the prompt, tools, and parsing lived in one place. The only thing Bedrock-specific was the transport: the boto3 Bedrock Runtime client, an IAM role, and a PrivateLink endpoint. That isolation is the only reason this migration took four days and not four weeks.
Why Bedrock made sense in 2025, and why it stopped
The original decision was not careless. Running Claude through Bedrock in eu-central-1 meant the inference traffic stayed inside AWS's European backbone. Their existing AWS DPA covered it. Procurement at their biggest customer, a Hamburg-based shipper with a strict subprocessor list, had signed off in one meeting. They got VPC endpoints, IAM-scoped invocations, and CloudWatch logs out of the box. Bedrock pricing was fine for the volume at the time.
What the team underestimated was the rate at which Anthropic would ship new models, and the rate at which AWS would renegotiate the legal envelope around them. By mid-2026, the model the agent depended on for tool-use accuracy was a generation behind what was available directly. The cost-per-token gap between the Bedrock surface and the direct Anthropic API had widened. And the new data-sharing terms meant their procurement story, which had taken nine months to land, no longer matched reality.
If your customer contracts name a specific subprocessor by legal entity, a vendor-side change to data-handling terms can break those contracts before any technical change touches your stack. Track your subprocessor obligations in the same repository as your IaC, not in a Notion page nobody reads.
The migration, in code
The cutover from Bedrock to the direct Anthropic API is not difficult in code. The two SDKs are deliberately close. The work is in the surrounding plumbing.
Their Bedrock call, simplified, looked like this:
import boto3, json
bedrock = boto3.client("bedrock-runtime", region_name="eu-central-1")
def invoke_dispatch(messages, tools, system):
resp = bedrock.invoke_model(
modelId="anthropic.claude-sonnet-4-20250514-v1:0",
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"system": system,
"tools": tools,
"messages": messages,
}),
)
return json.loads(resp["body"].read())
The direct Anthropic equivalent the team wrote on day two:
from anthropic import Anthropic
client = Anthropic() # reads ANTHROPIC_API_KEY
def invoke_dispatch(messages, tools, system):
resp = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
system=system,
tools=tools,
messages=messages,
)
return resp.model_dump()
That is the entire substitution at the call site. The hard parts were elsewhere.
Tool-use shape drift
The response shape is nearly identical. The team's old parser had been written defensively against the Bedrock wrapper envelope, and the unwrapping logic silently dropped a stop_reason on the direct client. The agent started looping on the third tool call of every conversation because the loop terminator was being read from the wrong place. They lost six hours to this, mostly because the symptom was “the agent works on staging” and the staging fixture was a single-turn conversation.
Rate limits, the real one
Direct Anthropic API rate limits are per-organisation tokens-per-minute, not per-region Bedrock quotas. The agent's burst pattern, which is spiky around carrier cutoff times, hit the default tier limit at 09:30 on Tuesday. They had to call Anthropic sales and request a tier increase, which landed inside the day but cost them a morning of manual dispatch. If you are migrating any production workload, ask for the limit bump before you ship the change, not after.
Logging and observability
CloudWatch dropped out. The team had instrumented every Bedrock call with a custom log group and a metric filter that fired on token spend. None of that worked against the direct API. They wrote a 40-line wrapper that emitted the same metrics to CloudWatch via a separate IAM-scoped role, so the rest of their dashboards kept working. That was day three.
GDPR, data residency, and the EU question
The legal piece took longer than the code. Their largest customer's DPA required all subprocessors to be named and located in the EEA. AWS Frankfurt qualified. Sending traffic directly to Anthropic required a fresh DPA, a new subprocessor notification with a 30-day window, and a side letter committing to a specific data-retention posture. Their counsel had the template draft on Tuesday morning and the signed countersignature on Thursday afternoon, which was fast by their standards and only happened because the customer's procurement team had been in the loop from Monday.
The team did one thing that helped enormously, which we now recommend by default: they wrote a one-page “model substrate” memo that named the provider, the region, the retention window, and the redress channel for any data incident. That memo became the artefact procurement teams asked for whenever a new customer onboarded. It costs almost nothing to maintain. It saves entire weeks.
The shim they kept
One decision the team made on day two has paid off more than any other. They did not delete the Bedrock client. They put both transports behind a single interface:
from typing import Protocol
class LLMTransport(Protocol):
def invoke(self, messages, tools, system) -> dict: ...
class AnthropicDirect:
def __init__(self, client): self.client = client
def invoke(self, messages, tools, system):
return self.client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024, system=system,
tools=tools, messages=messages,
).model_dump()
class BedrockFallback:
def __init__(self, client): self.client = client
def invoke(self, messages, tools, system):
# same call as before, kept warm and tested nightly
...
def get_transport() -> LLMTransport:
return AnthropicDirect(...) if FLAGS.use_direct else BedrockFallback(...)
The flag flipped at the load-balancer level, not in code. When Anthropic had a 14-minute incident two weeks later, they bled traffic back to Bedrock without a deploy. The fallback was on an older model and slightly worse at tool selection, but the dispatchers shrugged and the customers never noticed.
Day four, on the whiteboard
By Thursday evening the agent was running on the direct API with the new DPA signed, the Bedrock path on standby, and the bill back below €200 a day. The team kept four notes stuck to the whiteboard, which we have since stolen and used on every migration we have run.
First: the model surface and the legal surface are the same surface. A change in either one ripples through the other, and the operations team usually finds out last.
Second: keep at least one alternate transport warm at all times. Single-vendor inference is a bet you do not need to make, and the cost of keeping a fallback warm is roughly nothing compared to the cost of a four-hour outage during cutoff hours.
Third: log token spend per-tenant from day one. The team had a single global meter, which meant on Monday morning they could not tell which customer's traffic was responsible for the spike. They have since switched to per-tenant attribution, and they caught a second pricing anomaly two months later in 11 minutes instead of 36 hours.
Fourth: write the procurement memo before you need it. The teams that suffer most in a migration like this are the ones whose legal posture lives in three Slack threads and a forwarded PDF.
What you can do this week
If you run a production agent on Bedrock and you have not read your DPA in six months, that is the five-minute audit worth doing today. Open the contract, list the model IDs you actually call, and confirm that the provider chain your customers signed off on still matches what is in production. If it does not, you have a window to fix it before someone with a clipboard finds out.
When we rebuilt the routing layer for a Rotterdam freight client last month, the thing that surprised them was not the inference cost. It was how much faster their procurement team got through customer reviews once the substrate memo existed. If you are weighing a similar migration on your own dispatch or quoting AI agents, the shim-first pattern above is the cheapest insurance you can buy.
Key takeaway
The model surface and the legal surface are the same surface; change one and the other ripples whether you wanted it to or not.
FAQ
Why did the Bedrock bill quadruple overnight?
Two unrelated changes landed in the same weekend: AWS revised data-handling terms for the team's model tier, and an internal parser change pushed average input tokens from 2,800 to 7,400.
Is the direct Anthropic API cheaper than AWS Bedrock?
Per-token list pricing on the direct API is usually lower than Bedrock's equivalent surface, but you lose VPC endpoints, IAM scoping, and CloudWatch metrics, which have their own cost to rebuild.
Do I need a new DPA when moving from Bedrock to Anthropic?
Yes, if your customer contracts name AWS as a subprocessor. You will need a fresh DPA with Anthropic, a subprocessor notice, and usually a side letter on retention.
How hard is the actual code migration?
The SDK call sites are nearly identical. The hard parts are tool-use parser drift, different rate-limit semantics, and rebuilding the observability layer that was free on Bedrock.
Should I delete the Bedrock client after migrating?
No. Keep it behind a transport interface and exercise it nightly. When the primary provider has an incident, you flip a flag at the load balancer instead of shipping a deploy.