← Blog

Migration

Drupal 7 to Strapi 5: shadow-traffic cutover playbook

Eight weeks, two stacks running side by side, 26,400 IIIF links to keep alive, and a KB harvester that cannot miss a single OAI-PMH update. Here is how the cutover went.

Jacob Molkenboer· Founder · A Brand New Company· 2 Feb 2026· 9 min
Open leather ledger, brass key on cream card, linen-tied shipping tags with green ribbon on ivory paper.

It is March 2026, a Tuesday morning in Haarlem. The hoofdredacteur of a 24-person cultureel-erfgoed publisher opens the collectie-portaal on her laptop. It loads in 4.2 seconds, the same speed it loaded in 2014. Drupal 7's community LTS window closed on 5 January. The PHP 7.0 environment her infrastructure vendor agreed to keep alive expires at the end of May. The KB harvester polls her OAI-PMH endpoint every six hours and has not missed an update in nine years. She has 24 staff, 1,180 onderzoekers with active accounts, and a board meeting in two weeks where she must commit to a number.

That number was eight weeks.

The state of the Drupal 7 portal

The portal was built by two contractors in 2012, handed to a third in 2016, and kept alive by one in-house developer since 2019. By 2026 it carried:

  • 18,400 collectie-objecten as Drupal nodes, each with a custom field carrying a IIIF manifest URL pointing at an external Hyrax-based image server.
  • 26,400 outbound iiif_manifest references in total — some objects carried two or three.
  • A raadpleeg_history table outside the Drupal schema, written by a custom module on every node view by an authenticated onderzoeker. 4.1 million rows.
  • An OAI-PMH endpoint at /oai served by the long-unmaintained oai_pmh contrib module, harvested by the KB every six hours.
  • 312 active editorial users, 1,180 onderzoeker accounts, 90 SAML logins via the regional erfgoed-federatie.

PHP 7.0. MariaDB 10.3. A nine-year-old patched copy of CKEditor. One cron job that had not been touched since 2017.

Why Strapi 5 and Astro, not another monolith

Replacing a 14-year-old Drupal portal with another 14-year-old-shaped monolith would have bought the publisher another decade of the same problem. We split the system on the obvious seam: a headless CMS for editorial work, a static front-end for readers, and a thin Node service for the parts that must stay dynamic (search, raadpleeg-history writes, OAI-PMH).

Strapi 5 because the editorial team needed a Dutch-language admin UI, lifecycle hooks, and component-based content modelling. Astro because the public portal is 95% read, the SEO surface is large (every collectie-object is a landing page), and the team's one in-house developer already knew TypeScript. The dynamic seams ran on a Node/Fastify service we called portaal-edge, deployed alongside Strapi.

The eight-week schedule

Week 1, content modelling and export schema lock. Week 2, Drupal → Strapi extraction pipeline, read-only and idempotent. Week 3, Astro shell, IIIF viewer wiring, search. Week 4, raadpleeg-history ingest, SAML, onderzoeker accounts. Week 5, OAI-PMH parity and KB regression. Week 6, shadow traffic at 10%, 25%, 50%. Week 7, shadow traffic at 100% with both stacks live. Week 8, cutover, Drupal frozen, archive snapshot.

The first three weeks are the cheap weeks. Weeks 4 through 7 are where every migration like this gets killed.

26,400 IIIF manifest links

The IIIF manifests live on an external image server the publisher does not control. The links are stable, but the way Drupal stored them was not: some were absolute, some were relative to a base URL set in a variable_get, and a few hundred had double-encoded query strings from a 2018 batch import.

We did the normalisation in the extraction step, not the front-end. One regex and one URL parse per row, with the original Drupal field stored alongside as iiif_manifest_legacy so we could diff later. The normaliser only cared about producing a URL the IIIF Presentation API 3.0 client could resolve without a redirect.

// extractors/iiif.ts
import { URL } from 'node:url'

const BASE = 'https://images.example.nl/iiif/'

export function normaliseManifest(raw: string): string {
  if (!raw) throw new Error('empty manifest')
  const decoded = raw.includes('%25') ? decodeURIComponent(raw) : raw
  const absolute = decoded.startsWith('http')
    ? decoded
    : new URL(decoded, BASE).toString()
  const u = new URL(absolute)
  // strip Drupal's cache-buster query that nobody asked for
  u.searchParams.delete('_dc')
  return u.toString()
}

Every extracted object went through a verifier that issued a HEAD against the manifest URL and stored the response status next to it. We ran the full set on a Saturday night. 26,338 returned 200, 51 returned 404, 11 redirected. We resolved the 404s by hand with the archivist before week 4 started.

Per-onderzoeker raadpleeg-history

The raadpleeg-history was the part of the system no one was willing to lose. Some entries go back to 2014. Onderzoekers cite their own raadpleeg-history in academic articles. Dropping rows would have broken trust in a way no UI improvement could patch.

We moved the table verbatim, schema and all, into a Postgres 16 instance behind portaal-edge. The write path moved from a Drupal hook_node_view into a Fastify route called from the Astro client after first paint.

// portaal-edge/routes/raadpleeg.ts
app.post('/raadpleeg', async (req, reply) => {
  const { object_id } = req.body as { object_id: string }
  const onderzoeker = await requireOnderzoeker(req) // SAML session
  await db.query(
    `insert into raadpleeg_history
       (onderzoeker_id, object_id, geraadpleegd_op, source)
     values ($1, $2, now(), 'astro-v1')`,
    [onderzoeker.id, object_id]
  )
  return reply.code(204).send()
})

The source column is the trick. Every legacy row carries source = 'drupal-v7'. Every new row carries source = 'astro-v1'. During shadow traffic, both systems wrote in parallel, and we reconciled at the end of each day with a single select count(*) ... group by source, date_trunc('day', geraadpleegd_op). Once the counts matched within a tolerance of 0.2% for three consecutive days, we cut the Drupal writer.

OAI-PMH feed to the KB

The KB harvester is the immovable object. It does not care about your migration. It expects verb=ListRecords to return Dublin Core XML, paginated by resumptionToken, with a stable identifier per record that has not changed since 2014.

We rebuilt the endpoint inside portaal-edge against the OAI-PMH 2.0 specification directly, generating from Strapi content rather than from a contrib module. Two things mattered: the identifier scheme had to match (oai:erfgoed.example.nl:object:{drupal_nid} — yes, we preserved the Drupal node IDs as a column in Strapi), and the datestamp had to use the original record's changed timestamp, not the migration timestamp.

We ran the new endpoint alongside the old one for a full week. The KB harvested both. We diffed the result sets after every harvest. On day five they matched exactly.

Warning

If you rewrite OAI-PMH identifiers during a migration, every downstream harvester (the KB, Europeana, regional aggregators) treats every record as new. You will generate millions of false “updates” and your aggregator relationship will get loud quickly. Keep the identifiers, even if they look ugly.

Shadow-traffic mechanics

Shadow traffic in our setup was a Caddy reverse proxy in front of the public portal, configured to mirror a percentage of GET requests to the new Astro stack and discard the response. The Drupal response was still what the user saw.

erfgoed.example.nl {
    reverse_proxy drupal-app:80

    @shadow {
        method GET
        expression {http.request.uri.path}.matches("^/(collectie|object|zoek)")
    }

    handle @shadow {
        reverse_proxy drupal-app:80
        # fire-and-forget mirror to the new stack
        reverse_proxy /__mirror astro-app:3000 {
            lb_policy first
            health_uri /healthz
        }
    }
}

The mirror request carried the original request ID in an X-Shadow-Request header. The Astro stack logged its response code, latency, and rendered byte count. We compared these against the Drupal logs nightly. By the end of week 6, the new stack was within 8% of Drupal's p95 latency on cached pages and 40% faster on cold ones.

Cutover day

Week 8, Tuesday, 06:00 CET. Editorial users were briefed for a 90-minute write freeze. We pointed the DNS A record at the Astro stack, flipped Caddy from “mirror” mode to “primary new, mirror old,” and watched the access logs.

The first thing that broke was a single bookmarklet a curator had been using since 2015 that hit a legacy /node/edit/{id} URL. We added a 301 from /node/edit/* to the Strapi admin equivalent. Twelve minutes.

The KB harvester ran at 12:00 CET as it always does. It harvested the new endpoint, found no diff, and went back to sleep. That was the moment the project was actually finished.

The seams, not the stack

When we ran this cutover, the constraint that kept biting was scope creep dressed up as “while we are already here.” Every verbatim move bought us a day; every improvement cost three. The final scope was almost embarrassingly conservative, and that is why the eight weeks held. If you are staring at a Drupal 7 or PHP 7 estate of your own, start with the seams, not the stack — that is the part we lean into when we take on a legacy migration.

The cheapest audit you can run today: list every external URL in your current CMS — IIIF manifests, image servers, embed sources, OAI identifiers — fire HEAD requests at all of them, and group the response codes. The 404s are your real migration scope. It takes an afternoon and changes every estimate.

Key takeaway

Every verbatim move bought us a day; every improvement cost three. Conservative scope is what kept the eight-week cutover honest.

FAQ

Why not just upgrade Drupal 7 to Drupal 10?

Because the publisher's PHP environment, contrib modules, and custom code were all carrying nine years of debt. A side-by-side rebuild on a clean stack was cheaper than the contrib audit alone, and decoupled editorial from delivery.

How did you keep the OAI-PMH identifiers stable?

We preserved the original Drupal node IDs as a non-nullable column in the Strapi content model and reused them in the identifier scheme. The KB never saw a record disappear or appear under a new identifier.

What happens to the old Drupal site after cutover?

We froze it read-only, archived a full database snapshot and a wget mirror, and kept the container running for 90 days behind basic auth. After that, the snapshot is the record. The container is gone.

Did you lose any raadpleeg-history rows?

Zero. The legacy table was moved verbatim into Postgres with the original schema. New writes used a different source tag so reconciliation between the two systems during shadow traffic was a single grouped count.

migrationdrupalphpmysqllegacy sitesarchitecture

Building something?

Start a project