AI agents

Python 3.14 GC: how a Cloud Run agent stalled for six hours

Last Thursday a 26-person HR-tech vendor in Breda watched their onboarding agent stall for six hours after a routine Python 3.14 deploy. The cause was the new incremental GC.

Jacob Molkenboer· Founder · A Brand New Company· 5 Aug 2025· 8 min

Brass pneumatic tube capsule, small wooden relay switch, folded paper slip with green wax seal on ivory paper.

At 09:47 on a Thursday in June, an operations lead at a 26-person HR-tech vendor in Breda pinged me on Signal. Their onboarding agent, a Python 3.14 worker on Cloud Run, had stopped finishing jobs about an hour earlier. Twenty-three new hires were stuck on step three of a seven-step flow. Their CS team was already drafting the apology email.

By the time we hung up, the rough shape was clear. By 15:30 we had a fix in production. The cause was Python 3.14's new incremental garbage collector reclaiming state that a long-lived async generator depended on. The fix was two lines.

Here is the walkthrough.

The system that broke

The vendor builds onboarding software for European staffing agencies. Their onboarding agent is a Python worker on Cloud Run that orchestrates the back-and-forth between a new hire, their manager, and the agency's payroll provider. Each job runs for between fifteen and ninety minutes because it waits on real humans to reply to emails and sign documents.

The worker is written as a long-lived async generator that yields between steps. When a webhook comes back (signature received, payroll registered, ID-check passed), we resume the generator from where it last yielded. This keeps the orchestration code linear and readable. The frame holds all the state we care about while we wait. We use this pattern a lot. It is a reasonable default for agents that talk to humans.

The deploy that looked harmless

They had pushed a routine deploy on Tuesday evening. The diff was small. A Pydantic bump, a copy change in the manager-approval email, and a base image rebuild that moved the runtime from Python 3.13 to Python 3.14. Nothing in the diff touched the agent loop. Tests passed. Staging ran a full onboarding cycle end to end. They merged.

For about thirty-six hours, production looked fine. Cloud Run autoscaled the worker between two and four instances. Jobs finished at the usual rate. Then on Thursday morning, around 08:30 Amsterdam time, jobs started silently dropping progress. The worker would pick a job off the queue, run for a few seconds, then either raise an AttributeError on a coroutine or quietly rewind to the beginning of the flow. The hire's inbox stopped getting chased for signatures. No error reached the customer.

The symptom that gave it away

What pointed us at the GC was the timing. The first failures lined up almost exactly with the worker instances crossing ninety minutes of uptime. Cloud Run's min-instance flag was set to two, so two of the four warm workers had been alive long enough to hit a major collection. The other two, which had cycled more recently, were healthy.

We confirmed with a blunt instrument. We set Cloud Run's max-instance-age temporarily to fifteen minutes. The failure rate dropped to zero inside one autoscale cycle. That was the smoking gun. Something about long-lived instances was wrong, and it was wrong in a way that the runtime, not our code, controlled.

Python 3.14's incremental collector

Python 3.14 ships a new incremental garbage collector. Instead of doing a stop-the-world walk of the old generation in one pass, the tracer processes the old generation in small chunks across many collection cycles. The motivation is sane. A monolithic old-generation collection can pause a process for hundreds of milliseconds on a large heap, which is brutal for latency-sensitive services. The increments fix that.

They also change a subtle invariant. With the non-incremental tracer, every reference in the old generation was either fully visited in one pass or not visited at all. With the incremental tracer, the heap is observed in a partially-traced state across many short scans. For most workloads, this is fine. The CPython team has done careful work to keep the tracer correct under mutation.

For a worker whose request lifetime is measured in minutes (because it is waiting on a human), and which holds state through an async generator frame, the partial-trace world is harder to test. Our worker hit a case where module-level helpers referenced from a paused generator's frame were observed as unreachable, reclaimed, and then dereferenced on resume. The result, depending on which object was hit, was either an AttributeError or a silent re-bind to a recycled object with the wrong type. We saw both, on different jobs, on the same worker, within minutes of each other.

This is the kind of bug that does not show up in a fifteen-minute staging run. Staging never lives long enough to hit a major collection.

The two-line fix

The fix we now bake into every long-running agent worker is this:

import gc
gc.freeze()
gc.collect()

Three lines, technically, if you count the import. You drop those at the bottom of your worker's startup module, after every import and every module-level initialization has happened. gc.freeze() moves every currently-tracked object into a permanent generation that the collector will never visit again. The follow-up gc.collect() triggers one clean pass before requests start arriving, so you do not pay for it later.

That is it. The onboarding agent has been running clean for nine days as of this post. Cloud Run max-instance-age is back at twenty-four hours.

Why freezing the heap works

Freezing the heap at the end of startup is an old trick. It first showed up in production for fork-based web servers, where the Python heap right before fork is everyone's working set and you want it to live in shared pages after fork. Instagram documented their version of this on uWSGI to keep their CPython workers from copy-on-write churn.

For an agent worker, the trick adapts cleanly. Everything you load at startup (your prompt templates, your tool schemas, your retry policies, your client objects for the LLM gateway and, in this case, the payroll API) is exactly the kind of long-lived state that should never be visited by an incremental tracer. Once it is frozen, the tracer has a much smaller working set: per-request data. That is the workload the GC was meant to handle.

For a request-spanning generator, the freeze has a second benefit. The generator's frame itself is allocated when the generator is created, so it lives in the young generation and is not frozen. That is fine. The bug we hit was not the frame, it was the module-level helpers the frame referenced. Freezing those moves them out of the tracer's path and the generator's resume path stops finding holes.

Warning

Do not call gc.freeze() if your worker hot-reloads prompt templates or tool registries at runtime. You will pin the old versions in memory forever and quietly leak.

Where to put it

Three rules of thumb after deploying this across our worker fleet.

First, freeze at the bottom of startup, not at the top. You want every import done. You want every dataclass hydrated. You want your tool registry populated. The point of freezing is to draw a line between "this is permanent" and "this is per-request", and that line is at the end of startup, not the beginning.

Second, do not freeze inside a unit-test process. Frozen objects survive across tests in the same interpreter, and pytest will start showing leaks that are not real. Gate the call behind an environment variable, or only call it in your production entrypoint module.

Third, if your worker hot-reloads anything at runtime (prompts, tool definitions, feature flags pulled at boot), freezing is the wrong shape. You will pin an old version forever. For those workers, either disable the GC on the hot path and run gc.collect() between jobs, or pin your runtime back to Python 3.13 until the incremental tracer settles. The CPython team is iterating on this; expect changes in 3.14.1 and 3.14.2.

The broader pattern for agent workers

The reason this stings for agent workers specifically is that we keep state alive across human time. A web request lives for milliseconds. A Celery job lives for seconds. An agent orchestration job lives for minutes or hours because it waits for a human to do something. Anything in your runtime that assumes a request finishes quickly will eventually trip you up. The new incremental GC is one of those things. It will not be the last.

If you run agents on Cloud Run, Cloud Functions, Lambda with provisioned concurrency, or any other serverless platform that keeps a worker warm, audit your runtime for invariants that depend on instance age. We have hit the same pattern with: OpenTelemetry batch span exporters that drop spans after a fixed buffer age, async HTTP clients that close their connection pool after thirty idle minutes, and Cloud SQL pools that quietly recycle behind your back. None of those are bugs in the platform. They are mismatches between "short-lived request" assumptions and "long-lived job" reality.

The cheap structural fix is an instance-age heartbeat. Emit your worker's age every minute. Page when it crosses something you actually tested against. For the Breda vendor, that number is now ninety minutes, because that is the longest staging run they do.

What we changed in our worker template

The onboarding agent that broke is one of fourteen agents we have live in production. After this incident we added three things to our worker template, in order of importance.

One: the gc.freeze(); gc.collect() pair at the bottom of the startup module, gated by an ABN_FREEZE_GC=1 environment variable so test runs still see a normal heap.

Two: an instance-age metric emitted to Cloud Monitoring every sixty seconds. The alert fires at one hundred and twenty minutes during business hours.

Three: a smoke job that runs every five minutes against the warm worker, exercises the full async generator path end to end, and fails the deploy if any job returns the wrong type from a resumed yield. Cheap to write, very loud when it fires.

None of these are clever. They are the kind of thing you put in once and forget. They are also the kind of thing you do not write until production has hurt you.

The closing note

When we built the onboarding agent for that Breda vendor, the thing we had not predicted was that a runtime upgrade with no code changes would silently break a generator-shaped agent four years after we first wrote that pattern. We ended up solving it with two lines and a heartbeat. If you are building AI agents that span human-time work, the worker template is where this kind of resilience lives. It is the cheapest place to put it.

If you only do one thing today: open your agent worker's startup module, add import gc; gc.freeze(); gc.collect() at the bottom, and ship it. Then set an alert on instance age. The rest is decoration.

Key takeaway

Add gc.freeze() at the bottom of your Python 3.14 worker startup. Two lines stop the new incremental GC from reclaiming state your long-running agent depends on.

FAQ

Does this affect Python 3.13 or earlier?

No. The incremental GC pass is new in 3.14. Workers on 3.13 and earlier are not affected by this specific failure mode, though gc.freeze is still a worthwhile pattern for any long-lived worker.

Is this a bug in Cloud Run?

No. Cloud Run is the messenger. The same pattern shows up on any platform that keeps a Python 3.14 worker warm long enough to hit a major garbage-collection pass, including Lambda and self-hosted VMs.

Will CPython fix the underlying issue?

The CPython team is actively tuning the incremental tracer. Expect changes in 3.14.1 and 3.14.2. Until those land, gc.freeze plus an instance-age alert is the safe path.

What about async generators specifically?

Same shape. The bug was not the generator object itself but the module-level state its frame referenced. Freezing that module-level state stops the tracer from visiting it.

ai agentsautomationarchitecturetoolingoperationscase study

Building something?

Start a project