Legacy sites

Replacing a custom PHP CRM: a Next.js strangler playbook

The recruiters had given up on the CRM and were running active placements out of a Google Sheet. Here is how we replaced the underlying PHP stack without a freeze week.

Jacob Molkenboer· Founder · A Brand New Company· 1 Jul 2024· 9 min

Open leather logbook, brass key on cream card, green ribbon between pages, red wax seal on ivory paper.

The CRM at the Tilburg office had been written in 2018 by a freelancer who moved to Berlin in 2020. By the time we walked in, the recruiters were running active placements out of a shared Google Sheet, because the candidate filter in the PHP app had stopped returning results two months earlier and nobody had figured out why. The director of operations had stopped pretending. The CRM was still there. Nobody used it for anything that mattered.

This is the playbook for replacing it.

What we found inside the box

Before we plan a replacement, we plug into the patient. Two weeks of read-only access, a staging copy of the database, and a notepad. No commitments to the client beyond a written assessment at the end.

The stack we walked into:

PHP 7.2 on a single Hetzner VPS. End of life since November 2020.
MySQL 5.7 with about 80,000 candidate records, 230,000 application rows, and 11 million rows of log noise.
A custom MVC framework with route definitions split across three files.
A 4,800 line jQuery file in /public/assets/js/main.js.
File uploads written straight to /var/www/uploads, which also happened to be where the nightly database backup was dumped.
Eleven cron jobs. Six had been throwing errors silently for nine months. Three were quietly load-bearing.

One person remembered the schema: a senior recruiter named Sebas who had paired with the freelancer for two weeks in 2019. He drew us a map on a napkin. The napkin turned out to be more accurate than the database documentation.

Why we did not write it from scratch

The "rewrite in a cave" plan tempts every team on every legacy project. A clean Next.js app, a fresh Postgres schema, half the lines of code, modern conventions. Twelve weeks of design freedom, a weekend cutover, done.

We did not do it. The CRM has 47 unique screens. Recruiters use 12 of them daily, eight weekly, and the other 27 are rare but load-bearing: compliance reports, the year-end placement summary, the export for the accountant. Rebuilding all 47 in a cave means three months of no visible progress, then a weekend cutover, then a panic week, then a quarter of bug triage with a team that has already lost trust in the new system.

So we picked the strangler fig pattern and committed to shipping something usable every Friday.

The strangler shape, in three pieces

Three components were standing by the end of week one:

A Next.js 15 app deployed at app.[client].nl, behind the same OIDC provider as the PHP app. We put Authentik in front of both so a recruiter logs in once and lands wherever their session is going.
A Caddy reverse proxy that routes specific paths to Next.js once that screen ships, and everything else to the old PHP app. Adding a route is a single Caddy reload.
A thin Postgres database alongside MySQL, kept in sync by Debezium streaming MySQL's binlog into a small consumer that writes into Postgres.

The Caddy config is more boring than it sounds:

app.client.nl {
  @next path /candidates* /vacancies* /inbox*
  reverse_proxy @next next:3000

  reverse_proxy php:80
}

That config grew over ten weeks. By week nine it had eight path matchers and the PHP fallback was almost never hit.

Reads then writes, never both at once

The hard part of any strangler is the cutover of writes. Reads are easy. You can read from either side, compare, move on. Writes are where you find out what you missed.

We ran every table through three modes:

Mode A. PHP writes to MySQL. Debezium replicates to Postgres. Next.js reads from Postgres only.
Mode B. PHP and Next.js both write to MySQL. Debezium still replicates. Next.js reads from Postgres, PHP reads from MySQL, the two stay convergent because nobody writes to Postgres directly.
Mode C. Next.js writes to Postgres. A reverse stream pushes Postgres changes back to MySQL until the PHP screens for that table are decommissioned.

The candidates table moved through A and C in nine days. vacancies took three weeks because of a foreign key to a freelance_status table that nobody could explain. It turned out the freelancer had been mid-refactor when he left for Berlin. The refactor was now ours.

For the initial historical sync we used pgloader, which handled the bulk in about forty minutes for 11M rows. We did not migrate the log table. Nobody had queried it since 2021, and dragging it across would have inflated the new schema by 60% for data the recruiters did not know existed.

Warning

Audit your cron jobs before you migrate a single screen. The Friday-evening "expire stale leads" job in the old CRM was silently keeping the Monday pipeline view sane. We discovered this in week six, when the new Next.js pipeline view looked correct for the first time in three years and a recruiter assumed it was broken.

A thin agent layer, not a thick one

The client asked about AI on day one. Every recruitment agency in 2026 has been pitched a candidate-matching engine by someone wearing a half-zip. The temptation is to overbuild.

We put four agents into production, each with under 200 lines of orchestration code. The shape: a Postgres tools table, a Postgres agent_runs table, a Postgres function that materialises the tools an agent can call, and a small Next.js API route that runs the loop.

CV intake. Takes a PDF or DOCX upload, returns structured candidate data (name, email, current role, last three jobs, skills) and writes a row to candidate_drafts. A recruiter still confirms before merge. We do not auto-create candidates.
Match. Takes a vacancy_id, returns ranked candidates with a one-sentence reason each. Reasons are stored so a shortlist can be audited a month later when somebody asks how it got there.
Outreach drafter. Takes a candidate and a vacancy, returns a draft email in the recruiter's voice. Each recruiter has a small fine-tune set built from their own last 600 sent messages, filtered for the ones that got replies.
Inbox triage. Pulls from the shared inbox, classifies replies (interested, pass, out-of-office, question), drops them into the CRM with a tag. No auto-replies. A human still answers.

Total agent infrastructure: about 1,800 lines of TypeScript, one Postgres schema, no Kubernetes, no agent framework. We learned the hard way on previous builds that the heavy abstractions cost more than they save when the loop is this small.

Cutover, week by week

The full timeline, condensed:

Week 1. SSO in front of both apps. Postgres standing. Debezium pulling.
Week 2. First Next.js screen: candidate search. Mode A. Recruiters search in the new UI, edit in the old one.
Week 3. Candidate detail and CV upload. Intake agent live in read-only mode.
Week 4. Vacancy screens. Mode B on the candidates table. Both apps now write.
Week 5. Match agent live. Recruiters shortlist in Next.js, confirm interviews in PHP.
Week 6. Outreach drafter live. Email send still routed through Mailgun via the PHP cron. We just changed what it sent.
Weeks 7 and 8. The 27 rare screens. We rebuilt 19 in Next.js. The other 8 turned out not to need a UI at all and became Markdown reports generated on a Postgres cron.
Week 9. Mode C on candidates. PHP went read-only for that table.
Week 10. Mode C on everything except the accountant export, which we left in PHP until year-end to avoid changing the file format mid-quarter.
Week 12. The old VPS turned off. We kept a tarball.

Ten weeks of overlap, two weeks of cleanup. No freeze week. No Saturday cutover. The recruiters' day-to-day kept moving the entire time.

Three things we would do differently

Audit the cron jobs first. Half of the eleven legacy crons were silently failing and three were silently propping up the pipeline view. We should have caught this in week one, not week six. grep -r "cron" /etc/cron.d /var/spool/cron on the old VPS before you write a single line of Next.js.

Treat the upload directory as untrusted. /var/www/uploads contained CVs from 2018 through 2025. We assumed they were all PDFs. About 4% were not, including a stack of .pages files, two .exe files that were lying about their type, and a folder of someone's holiday photos uploaded by mistake. The intake agent had opinions. The OWASP file upload cheat sheet is the right starting point for the new pipeline.

Bake longer in Mode B. We moved vacancies from Mode B to Mode C after four days. A bug in the Next.js validation layer wrote three vacancies with NULL company_id and PHP swallowed them on read. We caught it in 48 hours but it was avoidable. Two weeks in Mode B is our new minimum for any table that two apps write to.

The diff that mattered

The CRM was not the problem. It was the symptom. The recruiters were burning about four hours a week each on tasks the CRM should have absorbed: re-entering CV data, drafting cold outreach, tagging inbox replies, building candidate longlists from scratch when the search filter broke.

Twenty recruiters. Four hours a week. Fifty working weeks. Four thousand hours a year, returned to the people whose job it is to actually place candidates.

When we built the agent layer for the Tilburg client, the thing we ran into was that recruiters did not want a chatbot. They wanted their CRM to stop fighting them. We solved it by keeping the UI human-driven and putting the AI agents on the boring tasks, not the creative ones. That is the shape of most useful agents we ship now.

One small thing you could do today, if you have a legacy system you keep meaning to replace: open the error log. Find the oldest unique exception still firing today. Ask why nobody has fixed it. The answer is usually the shape of the rewrite you are avoiding.

Key takeaway

The strangler beats the rewrite because you can ship one screen on Friday and ask the recruiters what they think on Monday.

FAQ

How long does a strangler migration take for a CRM of this size?

Ten weeks of parallel operation plus two weeks of cleanup, for 47 screens and around 80,000 candidates. Most of the time goes into Mode B bake-in and the long tail of rare screens, not new code.

Why move from MySQL to Postgres rather than staying on MySQL?

We wanted JSONB for the agent_runs table, logical replication primitives, and row-level security for multi-tenant work later. MySQL would have worked, but we would have rebuilt those pieces ourselves.

Did the AI agents replace any recruiters?

No. The agents took on the work recruiters disliked: parsing CVs, drafting first emails, tagging inbox replies. Confirmation, judgement, and the actual candidate relationships still sit with humans.

What is the riskiest step of a strangler migration?

Moving a table from Mode B (dual write) to Mode C (single write). A validation bug can quietly write bad rows that the legacy app reads as missing data. Bake for at least two weeks before flipping.

legacy sitesmigrationphpmysqlai agentsarchitecture

Building something?

Start a project