Tooling

Homelab GPU for an 8-dev AI team: 5090 vs 3090 vs Hetzner

It is Sunday in Schiedam. The homelab AI box will not boot Ollama. The SSD wedged. Eight engineers expect it back by Monday. Here is how we pick the box.

Jacob Molkenboer· Founder · A Brand New Company· 7 Dec 2025· 8 min

Brass relay switch on folded ivory linen, beside tarnished tuning fork and cream index card with green ribbon and red wax seal.

It is Sunday evening in Schiedam. Your team's homelab AI box, the one with the RTX 5090 you talked the founder into, will not boot Ollama. The NVMe wedged mid-update, the model cache is half-written, and on Monday morning eight engineers expect Devstral and Qwen3-Coder to load on first try. Your phone is the on-call rotation. Your phone is the only rotation.

This post is the scoring method we use, every time, when a sub-€10M Dutch agency asks us to spec a GPU box for their internal AI dev platform. We have walked into this question often enough that we wrote the math down.

Three boxes that keep showing up in our quotes

Almost every quote we write for an internal AI dev platform comes down to one of these three shapes.

A single workstation with an RTX 5090. Around €4,200 fully built, 32 GB of VRAM, lives under a desk or in a small server cupboard. Quiet enough that nobody complains.
A dual-3090 frankenbuild. Two used RTX 3090s off Marktplaats or Tweakers V&A, a second-hand riser, a 1600W PSU, mining-rig open frame. Around €2,800 built, 48 GB of VRAM, sounds like a wind tunnel.
Hetzner GPU bare metal. A GEX44 with an RTX 4000 SFF Ada and 64 GB RAM is around €184 per month at time of writing. The GEX130 with an L40s sits closer to €700. No build, no cupboard, no fans of yours to listen to.

Each one answers the same question (eight engineers need GPU time for code completion, RAG indexing, voice transcription, and the occasional fine-tune) with a different trade-off shape.

The three axes we score on

For these decisions we score on three axes and weight them by who the client is:

Cost per engineer per month over 24 months.
Idle-power draw at the Schiedam business electricity rate.
The Sunday question: who restores the box when something wedges.

We do not score on peak FLOPs, on token-per-second numbers from a benchmark blog, or on what the GPU costs new at MSRP. None of those survive contact with eight engineers who all want to hit the inference server at 09:14 on a Tuesday.

Per-engineer cost over 24 months

The single-5090 build:

Hardware (5090, CPU, board, 64GB RAM, NVMe, PSU, case):  €4,200
Spare NVMe + small UPS:                                    €420
Two-year amortisation: €4,620 / 24                       = €192.50/mo
Per engineer (eight devs):                               = €24.06/mo

The dual-3090 frankenbuild:

Hardware (two used 3090s, board, CPU, 128GB RAM, PSU):   €2,800
Spare PSU + NVMe:                                          €380
Two-year amortisation: €3,180 / 24                       = €132.50/mo
Per engineer (eight devs):                               = €16.56/mo

The Hetzner GEX44:

GEX44 monthly rental:                                    €184.00
One-time setup, amortised over 24 months:                  €4.10/mo
Per engineer (eight devs):                               = €23.51/mo

The numbers cluster between €16 and €24 per engineer per month. At this team size, cost is not the deciding axis. It rarely is, once you actually pencil it out.

Idle-power draw at the Schiedam business rate

Schiedam business electricity sits around €0.35/kWh in 2026 once you add the network tariff and energy tax. We use that as the working number when we model an agency that does not have its own rooftop solar.

We measure idle, not peak. The peak is honest about your worst hour; the idle tells you what the box costs you while nobody is using it, which is most of its hours.

Single RTX 5090 box, idle around 80W:
  80W * 24h * 365 = 701 kWh/yr * €0.35 = €245/yr

Dual-3090 frankenbuild, idle around 120W:
  120W * 24h * 365 = 1,051 kWh/yr * €0.35 = €368/yr

Hetzner GEX44, idle from your power bill:
  €0/yr (the kWh is in their datacentre)

The dual-3090 looks worst on idle, but it is also the box you can put on a smart plug and switch off after 19:00 and on weekends. If you actually do that, and most agencies do not, the picture inverts. If you do not, it stays as written.

The Sunday question

This is the axis that ends most of the debates we walk into.

When an NVMe wedges on the homelab box at 22:00 on Sunday, someone drives to the office, ssh's in over Tailscale, or wakes up the on-call engineer. Ollama's model cache lives in ~/.ollama/models. When the cache is half-written because an update was interrupted by a power blip, ollama list returns nothing and the API returns 500. The fix is not hard, delete the corrupt manifest and re-pull the model, but it requires hands.

On Hetzner GPU bare metal, you open a ticket and ask for the KVM console. We have measured response time at around 15 minutes overnight. You still do the fix yourself, but you do it from your couch through the Hetzner Robot console, and they swap hardware inside the SLA if the disk is actually dead.

On a homelab box, the on-call person is whoever signed the original PO. In a sub-€10M agency, that is the founder, the operations lead, or one specific senior engineer who has been there long enough. Three months in, that person is tired of Sunday phone calls.

Warning

If you cannot name the human who restores Ollama at 23:00 on a Sunday in June, in writing, before you buy the GPU, you have not picked a homelab. You have picked a future emergency.

When the math says: not yet

If your team is four engineers and your monthly hosted inference spend (Claude API, OpenAI API, OpenRouter) is under €400, the math on every box above says wait. Nine of every ten quotes we have written this year for a sub-€10M agency would have been cheaper with a hosted API for the first six months and then a re-spec at month seven. We say so out loud in the proposal. It is not a clever play; it is the boring play.

The homelab question becomes real when one of three things is true:

Your monthly hosted spend crosses around €600 and the curve is still going up.
Client data cannot leave the EU, and you are running into data-processing-agreement friction with US-hosted models.
You want to run a fine-tune or an embedding pipeline that would be cost-prohibitive against per-token pricing.

If none of those are true today, the right answer is "not yet" and a short note in the roadmap to revisit at the end of the quarter.

The scorecard, in one block

We keep the score in a small text block in the proposal folder, one column per option:

                              5090 box   Dual-3090   Hetzner GEX44
Per-engineer cost / month     €24.06     €16.56      €23.51
Idle electricity / year       €245       €368        €0
On-call restorer              client     client      Hetzner + client
Recovery time (Sun 22:00)     2-4h       2-4h        15-30 min
Usable VRAM                   32 GB      48 GB       20 GB
Comfortable concurrent devs   4-5        6-7         3-4
Hardware swap on failure      client     client      Hetzner SLA
Noise in the office           low        high        none

The interesting rows are the bottom four. The dual-3090 wins on VRAM and concurrency. The 5090 wins on power and noise. The Hetzner box wins on every operational axis and loses on per-dev VRAM, which is the constraint that bites as your team's context windows grow.

How the call goes, by client shape

A six-engineer marketing agency in Utrecht with no in-house ops capacity gets the Hetzner quote. We will not put a homelab in a building where nobody is going to drive in on Sunday. The premium over a frankenbuild is around €7 per engineer per month. That is two cappuccinos for never having to drive in.

An eight-engineer dev shop with a senior infra person who actually wants the homelab gets the dual-3090 quote. We make sure that person signs the on-call rotation in writing, we put a small UPS in front of it, and we configure ollama serve behind a systemd unit with restart-on-failure plus a model cache integrity check that runs on boot.

A founder-led AI-native startup with two or three engineers and a long roadmap gets the 5090 box. The 32 GB of VRAM is enough headroom to host a 32B-parameter model and a smaller embedding model concurrently, the on-call surface is small (the founder is on call anyway), and the founder learns the stack they will scale on. At zero-to-one, the founder is the SRE whether the org chart says so or not.

We stopped recommending the single-engineer-with-a-3090-under-her-desk shape. It always ends with that engineer being the bus factor and the on-call rotation, and she eventually leaves, and the box becomes a paperweight nobody wants to touch.

One agency we worked with two years ago insisted on the under-desk shape against our quote. Their lead engineer left in February. Nobody at the company knew the BIOS password, the LUKS passphrase for the data disk, or the Tailscale ACL for the box. They paid us to drive to their office on a Wednesday and rebuild everything from the still-running Ollama process. It cost more than the difference between a homelab and 24 months of Hetzner.

The Ollama cache wedge, fixed in five minutes

If you read this far and you have a homelab box that is wedged right now, the recovery looks like this:

sudo systemctl stop ollama

# verify the cache is what is broken
ls -lah ~/.ollama/models/manifests/registry.ollama.ai/library/
# if a directory is empty or a model file is partial, you have your culprit

# remove the manifest for the broken model; orphan blobs are GC'd on next start
rm -rf ~/.ollama/models/manifests/registry.ollama.ai/library/<model>

sudo systemctl start ollama
ollama pull qwen3-coder:32b

If ollama pull still hangs after this, check journalctl -u ollama -f for an unexpected EOF from the GGUF reader. That usually means the underlying NVMe has thrown a write error and you want smartctl -a /dev/nvme0 before you trust the box again. The Ollama troubleshooting docs cover the rest of the common failure modes.

Takeaway

Per-engineer cost converges around €20 per month across all three options. What you are actually buying is who answers the phone on Sunday.

What goes in the runbook

For any homelab choice, the runbook is the deliverable that determines whether the box survives its second year: the BIOS password, the LUKS passphrase, the Tailscale auth key, the location of the spare NVMe in the storage cupboard, the Hetzner Robot login if the setup is hybrid, and the contact details of the person who agreed to be on call. The runbook lives in two places. One is the company password manager. The other is a printed envelope in a sealed drawer in the office. The printed envelope has saved us twice.

Where we land most often

The honest answer, for sub-€10M Dutch agencies: Hetzner first, dual-3090 second, single 5090 only when the team is small enough that the founder is fine being SRE. The premium over a homelab is small enough that we have stopped trying to talk clients out of the bare-metal option. We have spent enough Sundays restoring NVMes to know what those Sundays cost.

When we built the internal AI dev platform for a Schiedam marketing agency last year, we ran exactly this scoring on three options. We landed on a Hetzner GEX44 plus a quarterly snapshot to Backblaze B2, and we wrote the Ollama restoration runbook into their Notion. That is the shape of the AI agent work we do for agencies who want their own platform without owning the hardware.

If you are sizing this for your own team this week, the five-minute audit is: write down, by name, the person who would restore Ollama at 23:00 on a Sunday in June. If you cannot, the answer is Hetzner.

Key takeaway

Per-engineer GPU cost converges around €20 per month across all three options. What you are actually buying is who answers the phone on Sunday.

FAQ

Can the same GPU box serve client production traffic?

Not without a second box. Mixing dev and prod on one GPU is how you ship a Friday outage. Keep a dev box and a separate prod inference path, even when both run the same model and the same weights.

Why not just use a hosted API and skip the GPU entirely?

For most agencies, that is the right answer for the first six months. The homelab question only kicks in when monthly hosted spend crosses roughly €600, or when client data has to stay inside the EU.

What about the RTX 4090 or a pair of used A6000s?

The 4090 still works, but at current second-hand prices the 5090 wins on VRAM headroom for a small premium. A6000 pairs are great for VRAM-bound RAG work, but they idle hot and they are loud.

How do we share one GPU across eight engineers without queue waits?

Run an OpenAI-compatible endpoint (Ollama or vLLM) behind a small queue, give each engineer their own API key, and log token usage per key. Concurrent requests batch at the model layer.

toolingai agentsarchitectureoperationsstrategy

Building something?

Start a project