Strategy
Edge agent overengineering: the agency CTO field-guide
A guy on HN this week is serving his blog from a phone embedded in a toilet bowl. Your CTO has the thread open in one tab and an AWS invoice in the other.

Some guy on Hacker News this week is serving his blog from a phone that was smashed flat and embedded in a toilet bowl. The thread sits at #22 on the front page. Your CTO has it open in one tab and an AWS bill for €4,200 open in another. He is about to walk into your office with the question we get every time we audit an overengineered edge-agent stack: "if a toilet phone can be a web server, why did we just pay a year of Vercel Pro, two regions of Upstash, and a Cloudflare Workers Paid plan for fourteen agents that each serve fewer than two requests a second?"
This is a fair question. It is also a question that, if you answer it badly, ends in a six-week rewrite before the next sprint instead of a forty-minute Caddyfile edit tonight.
What follows is the field-guide we use internally when we audit a sub-€14M agency's agent stack. Fifteen mistakes that show up repeatedly, sorted by escape cost: eight you can undo in one Caddyfile on a single Hetzner CX22, and seven that lock you into a rewrite the moment you wrote your first line of Workers-specific code.
The toilet-phone framing
The reason the smashed-toilet-phone post is useful is not that you should serve klantsites from a toilet. It is that the post resets the ceiling. The phone is doing maybe one request per minute. A €4.59-per-month Hetzner CX22 (2 vCPU, 4GB, 40GB NVMe, Falkenstein or Nuremberg) is roughly four orders of magnitude more capable. Twelve klant-projecten, each handling a few thousand requests a day, fit on that one box with room left over for a Postgres, a pgvector index, and the agent runtimes themselves.
If your stack is more expensive than that and your load is in that range, you are paying for one of two things: actual technical need, or pattern-matched architecture you copied from an HN post about a company with 400× your traffic. Most of the time it is the second. Almost all of the second is reversible.
The split that matters is not is this overengineered — almost everything is — but can I undo this without rewriting the agent code itself.
Eight you can undo with one Caddyfile
These are mistakes at the edge. They affect how traffic reaches your app. The app itself does not know they exist. You can roll them back tonight, on a single CX22, with one configuration file and a systemd unit.
1. Marketing site on Vercel because it is the framework default. Next.js' marketing copy and Vercel's onboarding flow make this feel like one decision. It is two. A static export served by Caddy with file_server and automatic Let's Encrypt costs you nothing and runs from the same box as everything else.
2. Cloudflare in front of admin dashboards that twelve people use. The CDN is doing nothing — there is no edge cache hit on an authenticated /admin route. You are paying for the obscurity of a Cloudflare hostname and getting a debugging tax in exchange. Front the public surface; serve admin straight off the origin with tls internal on a private subdomain.
3. Lambda@Edge or Workers as a TLS termination layer. Caddy terminates TLS, hot-reloads certs, staples OCSP, and renews itself. Automatic HTTPS is a one-line directive. You do not need an edge runtime for this. You need a TLS-aware web server, which Caddy is.
4. Pre-emptive multi-region for a Dutch klantbase. Every klant you have is inside a 40ms RTT from AMS-IX. A second region in Singapore is buying you 0% of your customers and 100% of an extra data plane to think about. Single region, single box, single backup target.
5. Managed cron services for jobs that run hourly. A systemd timer plus a one-shot unit is twenty lines and survives reboots. The cron-as-a-service products exist for fleets, not for the four invoice-chase jobs you actually run.
# /etc/systemd/system/invoice-chaser.service
[Unit]
Description=Chase one batch of overdue invoices
[Service]
Type=oneshot
WorkingDirectory=/opt/agents/invoice-chaser
ExecStart=/usr/bin/node run.js
User=agents
# /etc/systemd/system/invoice-chaser.timer
[Unit]
Description=Hourly invoice chase
[Timer]
OnCalendar=hourly
Persistent=true
Unit=invoice-chaser.service
[Install]
WantedBy=timers.target
Then sudo systemctl daemon-reload && sudo systemctl enable --now invoice-chaser.timer and you are done. journalctl -u invoice-chaser -f tails it.
6. Webhooks wrapped in API Gateway + Lambda. A webhook is an HTTP POST. Caddy can route it to a long-lived Go or Node process listening on a Unix socket. No cold start, no per-invocation cost, no IAM policy to debug at 21:00 on a Friday.
7. Auth0 Free for the internal tool with twelve users. You will hit the MAU cap the week a client signs and now you are migrating auth during a launch. Authelia behind Caddy's forward_auth directive, or basic auth for a dashboard nobody outside the office sees, is the better lower bound.
8. Datadog log ingestion when journalctl is right there. You are paying per GB to ship logs you read once a quarter. journalctl -u agent-foo -f over SSH is free. A Grafana + Loki pair on the same box, behind Caddy basic auth, is also free and gives you a dashboard for the rare moment you actually need one.
Here is the whole edge for a twelve-project agency on one box:
{
email ops@agency.nl
}
agency.nl, www.agency.nl {
root * /var/www/marketing
file_server
}
app.klant-een.nl {
reverse_proxy unix//run/agents/klant-een.sock
}
app.klant-twee.nl {
reverse_proxy unix//run/agents/klant-twee.sock
}
admin.intern.agency.nl {
forward_auth localhost:9091 {
uri /api/verify
copy_headers Remote-User Remote-Groups
}
reverse_proxy localhost:8080
}
That is the whole edge. One stanza per klant-project, one for the marketing site, one for the internal dashboards. Caddy reloads on change, certs renew on their own, and the file lives in a git repo your CTO can scp to a second box the day Falkenstein catches fire.
Every one of these is a Caddyfile diff and a service restart. None of them touch your application code. If your CTO has only done the eight things above, you are roughly two hours from a hosting bill that fits inside one Hetzner invoice instead of three SaaS dashboards.
The line you cross is the first import { KVNamespace } from "@cloudflare/workers-types". Before that line, you have a portable Node service. After it, your storage layer is married to Cloudflare and you are paying for the divorce.
Seven that force a rewrite before the next sprint
These mistakes change what kind of program you wrote. The agent itself depends on them. You can't undo them with a config file; you commit harder to the platform or you rewrite against a portable runtime.
9. Cloudflare KV or Durable Objects as your primary store. The bindings are not Postgres. You wrote against an eventually-consistent key-value model with a 25 MiB per-value cap and a regional consistency story. Moving off requires re-thinking the data model, not swapping a driver. The Workers limits page is required reading before you commit.
10. R2 with Workers runtime assumptions. If your file handling uses the Workers streaming Response APIs and assumes no Node fs, porting to S3 or local disk is not a one-line driver swap. It is a refactor of every code path that touches a file.
11. Vectorize or Workers AI for embeddings. pgvector inside the same Postgres on the same CX22 is a strictly simpler architecture for any index under a few million vectors. If your retrieval layer is built around Vectorize's query API, you rebuild the retrieval layer, not the index.
12. Splitting one agent into eight Workers to fit the CPU limit. The Workers free tier caps at 10ms CPU; the paid tier at 50ms. If your agent took 200ms of work and you sharded it across eight Workers plus a Queue to fit, you wrote a distributed system. The undo is collapsing it back into one process — which means rewriting the IPC, the partial-failure handling, and the observability you bolted on to debug it.
13. Custom queues on Workers Queues with bespoke consumer logic. The semantics — max batch, retry policy, DLQ behaviour — do not map one-to-one onto SQS, RabbitMQ, or a Postgres-backed job table. Migrating means re-deriving your retry and backoff invariants from scratch.
14. Per-request LLM calls with no caching layer. This one is sneaky because it does not look like vendor lock-in; it looks like a cost problem. The fix (prompt caching, embedding caching, intermediate-result caching) is a real architectural layer. If your agents shell out to a model on every turn with no cache, you do not have an overpriced stack, you have an unfinished one. Anthropic's prompt-caching docs are the starting point for the Claude side.
15. Auth glued to Cloudflare Access policies. If your agents check identity by trusting the Cf-Access-Jwt-Assertion header and have no fallback verifier, moving off Cloudflare means standing up an IdP and rewriting the trust path through every service. The fix is to put a thin auth abstraction in front of the header on day one — but if it has been six months, you are looking at a real piece of work.
The five-minute audit you can run tomorrow
Open the repo. Grep for these strings:
git grep -nE '@cloudflare/workers|KVNamespace|DurableObject|R2Bucket|env\.VECTORIZE|Cf-Access-Jwt'If you get zero hits, you are entirely in the Caddyfile-reversible zone. Spin up a CX22, write the thirty-line Caddyfile above, point a staging subdomain at it, and migrate one klant-project this week as the pilot.
If you get hits, count the files. Under five, refactor before the next sprint while it is still small. Over five, accept that you are now a Cloudflare shop and the cheapest move is to commit harder — Workers Paid, Hyperdrive in front of a managed Postgres, and stop trying to also keep Lambda and Vercel in the mix.
The mistake under all fifteen of these is the same mistake. You read a thread about Discord serving fifteen million concurrent users and quietly assumed it applied to you. It does not. The toilet-phone post is not a meme. It is the same point made in the opposite direction: the ceiling is much lower than the architecture-astronaut posts suggest, and the floor is much, much lower than the AWS invoice.
When we last did this audit for an Amersfoort agency running twelve klant-projecten on a serverless mesh, the thing we ran into was the auth layer — six months of Cloudflare Access glue that had to come out before anything else could move. We ended up solving it by routing all auth through a single Authelia container behind Caddy forward_auth, which let us migrate the remaining services one at a time over three weeks. The whole stack now runs on a CX32 with hosting costs an order of magnitude lower than where we started. If you want to read more about how we approach AI agents at this scale, the service page covers the rest.
Tonight's smallest thing: run that git grep above. The hit count tells you which half of this guide you are in.
Key takeaway
Audit your stack with one git grep; if @cloudflare/workers shows up nowhere, you are an evening's Caddyfile edit from a sane bill.
FAQ
Is a single Hetzner CX22 really enough for twelve client projects?
For most Dutch agency workloads — a few thousand requests per day per project, low concurrency, no media transcoding — yes. The bottleneck you hit first is usually Postgres connection count, not CPU or RAM.
When does Cloudflare Workers actually make sense for an agency stack?
When your traffic is global, latency-sensitive at the request edge, and your code fits the runtime model. For a Dutch klantbase served from Falkenstein, almost none of that is true.
What is the fastest way to undo a Vercel deployment?
Static-export your Next.js app, rsync it to /var/www on a CX22, point Caddy at it with file_server, and update DNS. The switch takes under an hour for a marketing site.
How do I know if I have crossed the Workers-rewrite line?
Grep your repo for @cloudflare/workers, KVNamespace, DurableObject, R2Bucket, and Cf-Access-Jwt. If those appear in more than five files, you are in rewrite territory.