← Blog

Drupal

Drupal 7 to Sanity for a Frisian newsroom: a 4-week cutover

A Leeuwarden newsroom on Drupal 7 and PHP 7.0, twelve years of artikelen, and a four-week window. Here is how the shadow-traffic cutover actually ran.

Jacob Molkenboer· Founder · A Brand New Company· 22 Sept 2025· 9 min
Leather logbook with faded gilt spine, brass key on cream card, chartreuse ribbon bookmark, iron shipping tag on ivory paper.

Tuesday, 06:42. The morning desk in Leeuwarden hits publish on the lead story and the spinner sits there for nine seconds. Drupal 7 is doing what Drupal 7 does after twelve years and 14,200 artikelen: clearing every cache tag it owns, then a few it forgot it had. Nine seconds is a lifetime when the print PDF needs to be at the press by seven.

That was the brief. A Frisian-language daily, nineteen people in the building, custom PHP 7.0 modules from a developer who left in 2019, and a content tree nobody had fully mapped. The board wanted Sanity and Next.js. They wanted it without losing a single Google News slot. They wanted the URL of every artikel from 2014 to keep working. And they wanted it in four weeks, because Drupal 7 community support had ended in January and their hosting bill had quietly tripled while nobody was looking.

We did it on a four-week shadow-traffic cutover. Here is the actual sequence.

Mapping what was actually in Drupal

Before any migration plan, you have to know what is in the box. The node table reported 14,200 published items. The node_revision table reported 89,000. There were six content types, four of which the newsroom said they never used. There were nineteen taxonomy vocabularies, eleven of which had not been touched since 2017.

We ran one script. Not a tool. A 90-line PHP script that talked to the live MySQL replica and produced a CSV:

node_id, type, title, author, status, created_at, last_edited_at, body_length, image_count, taxonomy_terms, has_video_embed

That CSV got opened in LibreOffice. The hoofdredacteur sat with one of our developers for an afternoon and put a colour on every row. Green meant "this stays". Yellow meant "this stays but the structure changes". Red meant "this never gets imported".

Out of 14,200 articles, 13,847 came up green. 281 came up yellow (those were photo galleries with a custom node bundle we knew Sanity would model differently). 72 came up red, which surprised the editors. Those were test pages from 2015 that nobody had ever bothered to unpublish.

We do this on every legacy migration. The hardest part of moving off Drupal is not the technology. It is getting the people who know the content to look at the content.

Modeling artikelen in Sanity

Drupal 7 stored an artikel as: a node row, a node_revision row, twelve field_data_* rows for the body, lede, author, category, kicker, pull-quote, video embed, source attribution, and so on, plus image references in file_managed, plus URL aliases in url_alias.

Sanity wants one document per article. So we wrote the schema first, before any import:

// schemas/artikel.ts
import {defineType, defineField} from 'sanity'

export const artikel = defineType({
  name: 'artikel',
  title: 'Artikel',
  type: 'document',
  fields: [
    defineField({name: 'title', type: 'string', validation: r => r.required()}),
    defineField({name: 'slug', type: 'slug', options: {source: 'title', maxLength: 96}}),
    defineField({name: 'lede', type: 'text', rows: 3}),
    defineField({name: 'body', type: 'array', of: [
      {type: 'block'},
      {type: 'image'},
      {type: 'videoEmbed'},
      {type: 'pullQuote'},
    ]}),
    defineField({name: 'kicker', type: 'string'}),
    defineField({name: 'author', type: 'reference', to: [{type: 'redacteur'}]}),
    defineField({name: 'category', type: 'reference', to: [{type: 'rubriek'}]}),
    defineField({name: 'publishedAt', type: 'datetime', validation: r => r.required()}),
    defineField({name: 'legacyNodeId', type: 'number', hidden: true, readOnly: true}),
    defineField({name: 'legacyUrl', type: 'string', hidden: true, readOnly: true}),
  ],
})

The two important fields are at the bottom. legacyNodeId and legacyUrl get carried on every imported document forever. Two reasons. First, when something breaks (and something always breaks), the editor opens the article in Sanity Studio and the legacy URL is right there next to the title. Second, the redirect layer reads legacyUrl directly to build the redirect map. More on that below.

The import script, in three passes

The import ran in three passes. We did not try to do it in one. One pass means one bug rolls back everything.

Pass one was authors and categories. Drupal's users table mapped to a redacteur document type. Drupal's taxonomy_term_data mapped to rubriek. Roughly 60 authors, 22 categories. Done in 14 seconds against the Sanity mutations API.

Pass two was articles. We pulled from MySQL in batches of 200, transformed body HTML into Portable Text, and pushed mutations.

// scripts/import-articles.mjs
import {createClient} from '@sanity/client'
import {htmlToBlocks} from '@portabletext/block-tools'
import {JSDOM} from 'jsdom'
import mysql from 'mysql2/promise'

const sanity = createClient({
  projectId: process.env.SANITY_PROJECT_ID,
  dataset: 'production',
  apiVersion: '2026-01-01',
  token: process.env.SANITY_WRITE_TOKEN,
  useCdn: false,
})

const db = await mysql.createConnection(process.env.MYSQL_URL)
const [rows] = await db.execute(`
  SELECT n.nid, n.title, n.created, n.status,
         b.body_value, b.body_summary,
         u.name AS author_name,
         a.alias AS url_alias
  FROM node n
  JOIN field_data_body b ON b.entity_id = n.nid
  JOIN users u ON u.uid = n.uid
  LEFT JOIN url_alias a ON a.source = CONCAT('node/', n.nid)
  WHERE n.type = 'artikel' AND n.status = 1
  ORDER BY n.nid ASC
`)

let imported = 0
for (const row of rows) {
  const blocks = htmlToBlocks(row.body_value, defaultSchema, {parseHtml: html => new JSDOM(html).window.document})
  await sanity.createOrReplace({
    _id: `artikel-${row.nid}`,
    _type: 'artikel',
    title: row.title,
    lede: row.body_summary || extractFirstParagraph(row.body_value),
    body: blocks,
    publishedAt: new Date(row.created * 1000).toISOString(),
    legacyNodeId: row.nid,
    legacyUrl: row.url_alias || `node/${row.nid}`,
  })
  imported++
  if (imported % 100 === 0) console.log(`Imported ${imported}/${rows.length}`)
}

Pass three was images. We did this last because file_managed had thirteen years of mixed-case extensions, broken references, and EXIF data nobody wanted in the new CDN. We streamed each image from the legacy server, ran it through sharp() to strip metadata and produce a WebP variant, and uploaded with the Sanity asset API. That step took forty-one hours of wall-clock time over a weekend.

The redirect chain

This is where most newsroom migrations die. You can have the new CMS humming and the editors trained, but if /artikel/2017/lokaal/burgemeester-opent-bibliotheek returns a 404 on cutover day, you lose a year of accumulated SEO authority in a week.

The legacy redirect chain had grown organically since 2014. There were old /content/[nid] URLs from before the URL aliasing module was installed. There were /node/[nid] paths from the pre-Pathauto era. There were /artikel/[jaar]/[rubriek]/[slug] paths from 2016 onward. And there were a few hundred manually edited rows in the Drupal redirect module's table.

We dumped all of them. About 31,000 entries. Then we built a single Next.js middleware:

// middleware.ts
import {NextRequest, NextResponse} from 'next/server'
import {redirectMap} from './lib/redirect-map'

export function middleware(request: NextRequest) {
  const path = request.nextUrl.pathname
  const target = redirectMap.get(path)
  if (target) {
    return NextResponse.redirect(new URL(target, request.url), 308)
  }
  return NextResponse.next()
}

export const config = {
  matcher: ['/((?!_next|api|favicon.ico).*)'],
}

The redirectMap is a Map<string, string> compiled at build time from a JSON file. 31,000 entries. Loads into memory once. Average lookup at the edge: under a millisecond. We benchmarked it.

308 not 301. 308 preserves the request method and is the modern equivalent. We are not aware of a single search engine that handles 308 worse than 301 in 2026.

Warning

A redirect that points to another redirect that points to the final URL is two strikes against you: latency and crawl budget. Collapse every chain to a single hop before you generate the JSON. If /content/1234 used to redirect to /node/1234 which redirected to the final path, the new map sends /content/1234 straight to the final path. One hop. Always one hop.

Google News and the structured data

The previous setup emitted NewsArticle JSON-LD via a custom Drupal module. The module had stopped being maintained in 2019. The JSON-LD it produced still validated, but only by accident. We rebuilt it as a Next.js component:

// components/ArtikelStructuredData.tsx
export function ArtikelStructuredData({artikel}: {artikel: Artikel}) {
  const data = {
    '@context': 'https://schema.org',
    '@type': 'NewsArticle',
    headline: artikel.title,
    description: artikel.lede,
    datePublished: artikel.publishedAt,
    dateModified: artikel.updatedAt ?? artikel.publishedAt,
    author: [{
      '@type': 'Person',
      name: artikel.author.name,
      url: `https://example-newsroom.nl/redacteur/${artikel.author.slug}`,
    }],
    publisher: {
      '@type': 'NewsMediaOrganization',
      name: 'Example Newsroom',
      logo: {
        '@type': 'ImageObject',
        url: 'https://example-newsroom.nl/logo.png',
        width: 600,
        height: 60,
      },
    },
    image: artikel.heroImage?.url,
    inLanguage: 'fy-NL',
    isAccessibleForFree: !artikel.paywalled,
  }
  return <script type="application/ld+json" dangerouslySetInnerHTML={{__html: JSON.stringify(data)}} />
}

The inLanguage tag is the small detail that matters. Frisian is fy-NL, not nl. Google News reads that and slots the content into the Frisian-language carousel rather than the Dutch one. The previous Drupal module had hardcoded nl. Fixing that one line moved Frisian-carousel impressions up meaningfully in the first week.

We then registered the new build in Google News Publisher Center two weeks before cutover, while still serving from Drupal. Publisher Center associates publications by URL pattern and accepts a new pattern in roughly 48 hours.

The four-week shadow-traffic cutover

This was the part that let everyone sleep. We did not flip DNS on a Friday night and pray.

Week one. Sanity went live as the canonical CMS for new artikelen only. Editors wrote in Sanity Studio. The new Next.js site rendered at staging.example-newsroom.nl, password-protected. Drupal still served all public traffic. Articles created in Sanity were syndicated back into Drupal via a webhook and a 40-line PHP receiver, so the live site never noticed.

Week two. The migration of the 13,847 existing artikelen ran into a tagged Sanity dataset. We diffed every imported article against its Drupal source: title match, body word count within 1%, image count exact. 17 articles failed the diff. All 17 turned out to be old gallery posts with broken HTML the editors never knew were broken. We fixed them by hand.

Week three. Traffic mirroring. We put the legacy site behind a Cloudflare Worker that sent 5% of GET requests to the new Next.js site in parallel, discarded the response, and compared status codes. Out of roughly 240,000 mirrored requests that week we found 312 paths that 404'd on the new site. 287 of those were missing redirects we added to the map. The other 25 were genuine deletes the editors confirmed could stay as 410 Gone.

Week four. DNS cutover at 03:00 on a Tuesday. Drupal stayed running, frozen, for another 14 days as a read-only fallback at archive.example-newsroom.nl. After 14 days with zero traffic to the fallback, we took it offline.

Takeaway

The cutover was four weeks because we spent weeks one and two letting the two systems run in parallel. The technical migration of 14,200 artikelen took eleven hours. The trust-building took twenty-six days. Every newsroom migration is a trust problem dressed up as a technical problem.

What we did not migrate

A note on the database. The old MySQL had nineteen modules' worth of tables we did not need. The temptation is to clean up the old database as part of the migration. Do not. There is a recent piece on Postgres deletes making the rounds that argues the only scalable delete is DROP TABLE, and the same logic applies to a tired Drupal MySQL. We did not migrate the cache_* tables, the watchdog table (4.2 million log rows), the queue table, the sessions table, or any of the dblog or search_index tables. We dropped them at the source by ignoring them entirely. The new system does not have a watchdog table because Next.js logs go to a different system. Solved.

Comments were a special case. Drupal had a comment table with about 89,000 entries, mostly from 2014 through 2018 before the newsroom moved community engagement to a moderated Mastodon instance. We exported comments to a static JSON file, render the count and the top three on the new article page, and point a "lees alle reacties" link to the archive site. No comment system in the new stack. The editors were relieved.

After the cutover

Two weeks after DNS flipped, the morning publish was under 800ms from save in Sanity Studio to visible on the front page. The print-to-press deadline moved from 06:55 to 06:30 because nobody was waiting on the spinner anymore. The hosting bill went from €2,400 a month to €310. Google News impressions in the Frisian carousel were up roughly 18% over the same month a year earlier (the inLanguage fix), and search impressions across all URLs were within 4% of pre-cutover, which is the noise floor.

When we built the artikel migration for this Leeuwarden newsroom, the thing that bit us late was a quirk in the old custom PHP module that wrapped pull-quotes in a non-standard <span class="kicker"> instead of <blockquote>. We solved it with one regex pass in the htmlToBlocks pre-processor before sending to Sanity. That kind of pre-processor lives or dies on someone reading the actual HTML before the script runs. If you are sitting on a Drupal 7 site with a content team and a calendar that says "now", our legacy migration practice is built around exactly this staged-cutover pattern.

The smallest thing you could do this afternoon: run that one CSV export against your own node table, open it in a spreadsheet, and put a colour on every row. The migration plan will write itself from there.

Key takeaway

Every newsroom migration is a trust problem dressed up as a technical problem; the four-week calendar reflects how much trust there is to build.

FAQ

Why four weeks and not one weekend?

The technical migration of 14,200 articles took eleven hours. The other twenty-six days were content review, traffic mirroring, and giving editors time to trust the new tool before flipping DNS.

Why Sanity instead of WordPress or another Drupal version?

The newsroom needed structured content (kicker, pull-quote, video embed) modeled as first-class fields, not WYSIWYG soup. Sanity's schema-first approach and Portable Text fit how the editors already thought about an artikel.

How did you handle 14,200 URL aliases without losing SEO?

We exported every alias plus historical redirect into a single JSON file (31,000 entries) and served them as 308 redirects from Next.js middleware. Every chain was collapsed to a single hop before deploy.

What happened to the old comments?

Exported to a static JSON file. Article pages now show the count and top three legacy comments, with a link to the archive. New conversation happens on a moderated Mastodon instance, not in the CMS.

Did Google News impressions hold during the cutover?

Yes. We registered the new build pattern in Google News Publisher Center two weeks before DNS flipped, kept the URL structure stable, and corrected the inLanguage tag from nl to fy-NL. Frisian-carousel impressions actually rose about 18%.

drupalmigrationlegacy sitesphpcase studyarchitecture

Building something?

Start a project