← Blog

Drupal

Drupal 7 to Astro and Directus: a 4,600-URL migration

The records officer audits the Archiefwet retention metadata every quarter. Drupal 7 was past end of life. We had ten weeks, 4,600 indexed URLs, and zero room to lose a single bewaartermijn field.

Jacob Molkenboer· Founder · A Brand New Company· 12 Jun 2026· 9 min
Leather logbook, brass key on card, iron tags with twine, green ribbon, red wax fragment on ivory paper.

The records officer at our client's Den Haag office runs the same query every three months. She opens the publications portal, picks ten random reports from the last quarter, and checks each one for a complete Archiefwet metadata block. Bewaartermijn. Vernietigingsdatum. Dossier-id. Classificatie. If any field is empty, the audit fails, and she has to escalate. The portal is built on Drupal 7.

Drupal 7 reached end of life on 5 January 2025. No more core security updates. No more commitments from the Drupal Security Team. Our client, a 22-person municipal consultancy advising on housing and social policy, had been running their publications portal on it since 2014. They publish around 80 reports a year. Other municipal sites, journalists, and a handful of academic researchers link to roughly 4,600 of those URLs. None of those links can break.

The brief in one paragraph

Move off Drupal 7. Preserve every public URL. Preserve every retention field on every publication, because the records officer audits them every quarter and her audit feeds the Archiefwet compliance report to the municipality. Give her a flow she recognises. Do it in ten weeks. Do it without asking the editors to learn a new vocabulary.

Why we picked Astro and Directus

The obvious answer was Drupal 10. We ruled it out in the first week. The client's content model is essentially one content type ('publicatie') with a handful of taxonomy terms. They were not using Views, Panels, or any of the things that make Drupal earn its complexity. Editing in D7 had been a source of weekly support tickets for years. A D10 upgrade would have kept that pain and added two months of module-by-module triage.

So we split the system in two. A static frontend on Astro, because the portal updates a few times a week and has no logged-in users on the public side. A headless CMS on Directus, because it runs on Postgres, the admin UI looks and behaves like a spreadsheet, and we could model the Archiefwet retention fields as real, typed columns instead of a Drupal Field Collection.

Inventorying the 4,600 URLs before touching code

Before writing a line of migration code, we built three independent lists of the URLs we needed to preserve. They disagreed with each other, which is exactly why we built three.

  1. A full crawl of the live site with wget --mirror, filtered to HTML responses.
  2. The Drupal sitemap.xml as the XML Sitemap module emitted it.
  3. A dump of the node and url_alias tables straight from MySQL.
drush sqlq "SELECT n.nid, n.title, n.type, n.status, n.created, ua.alias \
FROM node n LEFT JOIN url_alias ua ON ua.source = CONCAT('node/', n.nid) \
WHERE n.type = 'publicatie' ORDER BY n.created DESC" \
  > publications.tsv

The three lists disagreed in revealing ways. The crawl found 4,612 URLs. The sitemap had 4,587. The database had 4,654 published nodes. The gap was the usual suspects: nodes published with no menu entry, aliased URLs pointing at unpublished revisions, and a stretch of 41 reports from 2017 that an old editor had marked 'promoted to front page' but never actually published. We resolved each discrepancy with the editor-in-chief before we started building. That conversation took an afternoon and saved us a week of post-launch firefighting.

Archiefwet retention as a first-class schema

In Drupal 7 the retention metadata lived in a Field Collection attached to the publication node. That meant five extra database tables, a quirky save flow, and no way to enforce that any of the fields were filled in. The records officer's quarterly audit existed precisely because the system could not be trusted to enforce its own rules.

In Directus we modelled the retention block as columns on the publications collection:

ALTER TABLE publications
  ADD COLUMN retention_years   integer       NOT NULL,
  ADD COLUMN destruction_date  date          GENERATED ALWAYS AS
    ((published_at::date) + (retention_years || ' years')::interval) STORED,
  ADD COLUMN dossier_id        varchar(32)   NOT NULL,
  ADD COLUMN classification    varchar(16)   NOT NULL
    CHECK (classification IN ('openbaar','intern','vertrouwelijk')),
  ADD COLUMN legal_basis       text          NOT NULL,
  ADD COLUMN last_audited_at   timestamptz;

Every field NOT NULL. The destruction date generated, not entered, so it cannot drift away from the publication date. The classification constrained to three values, in Dutch, because that is the vocabulary the records officer uses in her audit checklist. The audit timestamp updated by a Directus Flow when she ticks the 'audited' checkbox in the admin.

Takeaway

If the previous system needed a quarterly human audit to enforce its rules, the new system has not really replaced it until those rules live in the schema.

The ETL pipeline from Drupal nodes to Directus items

One Node script. Read from the D7 MySQL database. Write to Directus through the official SDK. The interesting parts were the messy bits: PDF attachments stored under sites/default/files/, taxonomy terms that had been renamed three times, and a free-text 'onderwerp' field that one editor had been treating as a comment box since 2019.

import { createDirectus, rest, createItem, uploadFiles } from '@directus/sdk'
import mysql from 'mysql2/promise'
import { readFile } from 'node:fs/promises'

const directus = createDirectus(process.env.DIRECTUS_URL!).with(rest())
const db = await mysql.createConnection(process.env.D7_DSN!)

const [rows] = await db.execute<any[]>(`
  SELECT n.nid, n.title, n.created, n.status,
         fb.field_bewaartermijn_value  AS retention_years,
         fd.field_dossier_value        AS dossier_id,
         fc.field_classificatie_value  AS classification
  FROM   node n
  LEFT JOIN field_data_field_bewaartermijn   fb ON fb.entity_id = n.nid
  LEFT JOIN field_data_field_dossier         fd ON fd.entity_id = n.nid
  LEFT JOIN field_data_field_classificatie   fc ON fc.entity_id = n.nid
  WHERE  n.type = 'publicatie' AND n.status = 1
`)

for (const r of rows) {
  const pdf  = await readFile(`./d7-files/publicaties/${r.nid}.pdf`)
  const file = await directus.request(uploadFiles(pdfFormData(pdf, r.nid)))
  await directus.request(createItem('publications', {
    legacy_nid:      r.nid,
    title:           r.title,
    published_at:    new Date(r.created * 1000).toISOString(),
    retention_years: Number(r.retention_years),
    dossier_id:      r.dossier_id,
    classification:  mapClassification(r.classification),
    legal_basis:     'Archiefwet 1995, art. 3',
    pdf:             file.id,
  }))
}

The full run on a laptop took just under six hours, mostly because of the PDF upload step. We ran it three times against a staging Directus instance before the final migration weekend. Each run produced a CSV diff of fields that had changed since the previous run, which the editor-in-chief reviewed before the next pass.

URL preservation and the 301 table

Three URL shapes in D7 needed to keep working:

  • /publicaties/woonzorg-onderzoek-2018 (Pathauto alias)
  • /node/1234 (canonical Drupal URL)
  • /taxonomy/term/56 (topic listing pages)

The new portal uses /publicaties/{slug} for reports and /onderwerpen/{slug} for topics. We exported the inventory from step three as a Vercel redirects config, with the legacy nid stored as a Directus column so middleware can fall back to a database lookup for anything the static config misses.

{
  "redirects": [
    { "source": "/publicaties/woonzorg-onderzoek-2018",
      "destination": "/publicaties/woonzorg-onderzoek-2018",
      "permanent": true },
    { "source": "/node/1234",
      "destination": "/publicaties/woonzorg-onderzoek-2018",
      "permanent": true },
    { "source": "/taxonomy/term/56",
      "destination": "/onderwerpen/wonen",
      "permanent": true }
  ]
}

4,612 entries in the file. Vercel handles it without complaint. The catch-all middleware for /node/:nid hits Directus only on a config miss, which in the first month after launch happened nine times, all for nodes that had never been linked from anywhere outside the site itself.

The records officer's quarterly export

This was the unspoken requirement that almost broke the project. The records officer did not want to learn a new tool. She wanted a CSV in her inbox every quarter, in the same shape as the one she had been getting from the old Drupal Views Bulk Operations setup. So we built a Directus Flow that runs on the first Monday of January, April, July, and October. It pulls every publication, filters out the ones already past destruction date, and emails the CSV to her and her manager.

She can still log into Directus to spot-check individual records. In the six months since launch, she has done that twice, both times because a colleague asked her a specific question. The audit itself now happens inside her email client.

Cut-over weekend

Friday 17:00 we froze edits on the D7 site by switching the editor role to read-only. The final ETL run started at 17:15 and finished at 22:40. Friday 20:00 we dropped the DNS TTL to 60 seconds. Saturday 09:00 we flipped the A record to Vercel. Saturday 10:00 we resubmitted the sitemap to Google Search Console and pinged Bing. The old Drupal site stayed online in read-only mode at d7.[client-domain] for two weeks as a fallback. Nobody needed it. The editor-in-chief slept better knowing it existed.

By Monday lunchtime the records officer had received a test export from the staging Flow. She compared it to the last D7 export, found one column header in a different order, asked us to swap it, and signed off.

What we changed after launch

Three things broke in ways we had not predicted. The Algolia search index needed a second pass because we had not stripped Drupal's <p>&nbsp;</p> artefacts from the body text, which made every report look like it started with a blank paragraph in the search results. The quarterly Flow email went into the records officer's 'automated' folder the first time it ran, which she never reads. We added her manager as a CC so the message had a human recipient. The Astro build at 4,600 pages took 92 seconds, which was fine for weekly publishing but painful for editorial preview. We wired Directus webhooks into Vercel's incremental builds so a single publication update rebuilds in about eight seconds.

When we built this migration the part that surprised us was how much of the ten weeks went into one person's habit. Two of those weeks were spent in conversation with the records officer about her audit checklist, which became our test suite. That kind of legacy migration is most of what we do at ABN, and it almost always lives or dies on whether one person inside the client trusts the new system.

If you run a Drupal 7 site today, here is the five-minute thing to do before anything else. Open your analytics, sort the last twelve months of URLs by inbound external traffic, and write down the top fifty. Those are the URLs your migration cannot break. Everything else is negotiable.

Key takeaway

If the old system needed a quarterly human audit to enforce its rules, the new system has not replaced it until those rules live in the schema.

FAQ

Why not upgrade to Drupal 10 instead of migrating off the stack?

The client used essentially one content type with no Views or Panels complexity. A D10 upgrade would have kept the editing pain and added two months of module-by-module triage for no real gain.

How did you preserve search rankings on 4,600 URLs?

We exported every public URL three ways (crawl, sitemap, database), reconciled the discrepancies with the editor, and wrote a Vercel redirects file with 4,612 permanent 301s plus a middleware fallback for catch-all node IDs.

What happens to the Archiefwet metadata when a publication passes its destruction date?

A Directus Flow flags it for review and excludes it from the quarterly CSV export. Actual deletion is still a human decision the records officer makes, because that is what the Archiefwet requires.

How long did the migration take end to end?

Ten weeks from brief to launch. Roughly two of those weeks were spent shadowing the records officer so her audit checklist could become our test suite before any code was written.

drupalmigrationlegacy sitesmysqlarchitecturecase study

Building something?

Start a project