WordPress

WordPress multisite to headless: 9,300 redirects intact

The SEO lead opened the redirect spreadsheet: 9,304 rows, tuned over a decade. We were about to move 47 subsites onto a headless stack without losing one of them.

Jacob Molkenboer· Founder · A Brand New Company· 7 Feb 2025· 11 min

Open leather logbook with green ribbon, brass key, cream index cards on ivory paper beside a forest linen runner.

The SEO lead emailed us the spreadsheet at 23:47 on a Sunday. Tab one: 9,304 redirects, sorted by hit count. Tab two: a decade of notes in the comments column ("Vakblad merger 2019", "PDF archive collapse", "Eduardo's pet rewrite, do not touch"). Tab three: a flat refusal to sign off on the migration until every one of those rules worked on day one of the new stack.

The client publishes 47 trade-association magazines across one WordPress multisite network. Twelve years of subsites, custom post types, ACF schemas, and the SEO lead's redirect graph that quietly carries roughly 18% of their organic traffic. We were rebuilding the whole thing on Next.js and Sanity. The brief was simple. Don't lose the redirects.

Mapping the network before touching code

The first mistake teams make on a multisite migration is treating it like 47 separate sites. It isn't. It's one users table, one options ladder, one redirect graph, and 47 sets of content tables (wp_2_posts, wp_3_posts, on up to wp_48_posts). The multisite schema shares the bits that break you on cutover day, not the per-site content.

We started by inventorying every moving part with wp-cli over SSH:

wp site list --fields=blog_id,url,registered > sites.csv

for blog_id in $(wp site list --field=blog_id); do
  url=$(wp site list --blog_id=$blog_id --field=url)
  wp post list --url=$url --post_type=any --post_status=any \
    --format=json > "exports/site-${blog_id}.json"
done

wp option get permalink_structure --network
wp plugin list --network --status=active > plugins.csv
wp user list --network --format=csv > users.csv

The redirect rules lived in three places nobody had documented together: the Redirection plugin's wp_redirection_items table (8,104 rules), the network .htaccess (1,089 rules left over from the 2019 magazine consolidation), and 111 hard-coded entries in a custom mu-plugin that the previous agency had abandoned in wp-content/mu-plugins/legacy-paths.php. We pulled all three into one CSV and deduplicated by source path. Final count: 9,304 unique rules, 8,927 of them 301s, 377 of them 302s. We kept the status codes verbatim. A 302 that has been a 302 for nine years is doing something on purpose.

URL structure for 47 sites inside one Next.js app

The legacy network used subdomain mode (vakblad-x.publisher.nl, vakblad-y.publisher.nl). The reflex is to mirror that in Next.js with 47 Vercel projects. We rejected it. Forty-seven projects means forty-seven build pipelines, forty-seven preview deploys, and forty-seven KV bindings to keep in sync.

Instead we ran one Next.js app. A small piece of middleware reads the Host header, rewrites the request to /_sites/[siteSlug]/..., and lets the app render the right tenant. The user-visible URL doesn't change. The internal route knows which site it is on.

// middleware.ts (excerpt, runs before the redirect lookup)
import { NextResponse, type NextRequest } from 'next/server'
import { SITE_BY_HOST } from './lib/sites'

export function rewriteForTenant(req: NextRequest) {
  const host = req.headers.get('host') ?? ''
  const site = SITE_BY_HOST[host.replace(/:\d+$/, '')]
  if (!site) return null
  const url = req.nextUrl.clone()
  url.pathname = `/_sites/${site}${url.pathname}`
  return NextResponse.rewrite(url)
}

SITE_BY_HOST is a 47-line constant generated at build time from the Sanity dataset. No runtime lookup, no cold start tax. The whole tenant resolution is a string comparison.

Sanity schema design for the multi-tenant case

The first architecture question on any multi-tenant headless migration is whether each tenant gets its own dataset. Sanity's docs describe datasets as isolated namespaces, which sounds like the right model for 47 trade publications. We considered it and rejected it for the same reason we rejected 47 Next.js projects.

Cross-site editorial moves (one editor pulling a piece between three vakbladen), shared author records, and shared taxonomies for industry categories made isolation expensive. We went with one dataset and a site reference field on every document.

// schemas/article.ts
export default {
  name: 'article',
  type: 'document',
  fields: [
    { name: 'site', type: 'reference', to: [{ type: 'site' }] },
    { name: 'slug', type: 'slug' },
    { name: 'legacyId', type: 'number', readOnly: true },
    { name: 'title', type: 'string' },
    { name: 'body', type: 'portableText' },
  ],
  validation: (Rule) => Rule.custom(async (doc, ctx) => {
    const dup = await ctx.getClient({ apiVersion: '2024-01-01' }).fetch(
      `*[_type=="article" && site._ref==$site && slug.current==$slug && _id!=$id][0]._id`,
      { site: doc.site?._ref, slug: doc.slug?.current, id: doc._id }
    )
    return dup ? 'Slug exists on this site' : true
  }),
}

legacyId is the WordPress post ID. We stored it on every document so a re-import wouldn't create duplicates and so the redirect engine could look up the new URL by old ID when a stale link came in. That single field saved us about four days of cleanup work over the project.

Why the redirect table is not a config file

This is where most teams stall on a project this size. The reflex is to put redirects in next.config.js or vercel.json. That works for ten. It does not work for 9,300.

Vercel's platform limits cap redirects in vercel.json at 1,024 entries. next.config.js redirects compile into the same routing manifest and hit the same ceiling. We tried it anyway, because we always try the lazy thing first. The deploy failed with a clear error and saved us a week of fighting an architecture that wasn't going to scale.

Warning

If your redirect file is past four digits, stop. The right place for the table is a lookup store the edge can read in under 5ms. Anything else either fails to deploy or pushes the lookup into the origin, where it kills your cache hit ratio.

We moved the table to Vercel KV. The shape was deliberately boring: one key per source path, value is the destination plus the status code. Total stored size with 9,304 entries was around 1.4 MB. We seed it from a CSV at deploy time, idempotently.

// scripts/load-redirects.ts
import { kv } from '@vercel/kv'
import { parse } from 'csv-parse/sync'
import fs from 'node:fs'

const rows = parse(fs.readFileSync('redirects.csv'), { columns: true })

const pipeline = kv.pipeline()
for (const r of rows) {
  pipeline.set(`r:${r.source}`, JSON.stringify({ to: r.target, code: Number(r.code) }))
}
await pipeline.exec()
console.log(`loaded ${rows.length} rules`)

The middleware that does the work

The redirect engine is one file. The whole project's institutional memory of a decade lives behind this lookup.

// middleware.ts
import { NextResponse, type NextRequest } from 'next/server'
import { kv } from '@vercel/kv'

export const config = {
  matcher: '/((?!_next/|api/|favicon.ico|robots.txt).*)',
}

export async function middleware(req: NextRequest) {
  const path = decodeURIComponent(req.nextUrl.pathname)
  const hit = await kv.get<{ to: string; code: 301 | 302 }>(`r:${path}`)
  if (!hit) return NextResponse.next()

  const url = hit.to.startsWith('http')
    ? hit.to
    : new URL(hit.to, req.url).toString()

  return NextResponse.redirect(url, hit.code)
}

The decodeURIComponent call is what saved us seventeen broken rules at go-live. WordPress URLs from 2014 routinely contained encoded spaces and Dutch diacritics (vakblad-w%C3%A9rk, %20 in slugs from a 2016 import). The CSV from the Redirection plugin had them URL-encoded. The request paths coming into middleware were already decoded by Next.js. Match never fired. Decoding the path before the lookup fixed it.

Migrating content idempotently

We wrote one import script per content type, not one big script. Each was idempotent on legacyId. You could rerun it during the 4 AM dress rehearsal and not produce duplicates.

// scripts/import-articles.ts
import { createClient } from '@sanity/client'
import wpPosts from '../exports/all-articles.json'
import { portableTextFromHtml } from './lib/html-to-pt'

const sanity = createClient({ projectId, dataset, token, useCdn: false })

for (const post of wpPosts) {
  await sanity.createOrReplace({
    _id: `article-${post.id}`,
    _type: 'article',
    legacyId: post.id,
    site: { _type: 'reference', _ref: `site-${post.blog_id}` },
    slug: { current: post.slug },
    title: post.title.rendered,
    body: portableTextFromHtml(post.content.rendered),
    publishedAt: post.date_gmt,
  })
}

The hairy part was portableTextFromHtml. Twelve years of editorial output meant twelve years of inline styles, deprecated shortcodes, Gutenberg blocks, classic-editor HTML, Visual Composer remnants, and the occasional <font> tag. We ran three passes: shortcode expansion, block normalization, then HTML to Portable Text via the official @sanity/block-tools package. Roughly 4% of articles flagged for manual review. The editorial team cleared them in two afternoons.

The Yoast metadata that nobody wanted to rebuild

The other thing the SEO lead had tuned for a decade was per-post Yoast metadata: focus keywords, canonical overrides, OpenGraph titles that differed from the H1 because the H1 was for readers and the OG title was for the LinkedIn share. Twelve years of that, across roughly 38,000 articles, lived in wp_postmeta rows with keys like _yoast_wpseo_metadesc and _yoast_wpseo_canonical.

We pulled it in one query per site, normalized the keys, and folded the result into a Sanity object on each article:

SELECT post_id,
       MAX(CASE WHEN meta_key='_yoast_wpseo_metadesc'  THEN meta_value END) AS metadesc,
       MAX(CASE WHEN meta_key='_yoast_wpseo_canonical' THEN meta_value END) AS canonical,
       MAX(CASE WHEN meta_key='_yoast_wpseo_title'     THEN meta_value END) AS og_title,
       MAX(CASE WHEN meta_key='_yoast_wpseo_focuskw'   THEN meta_value END) AS focuskw
FROM wp_2_postmeta
WHERE meta_key LIKE '_yoast_wpseo_%'
GROUP BY post_id;

Two surprises came out of that data. About 800 articles had a manual canonical pointing at a competing publication's URL: the SEO lead had been deliberately consolidating ranking signals after a 2021 acquisition, and those pointers were doing real work. We preserved every one of them, verbatim. Second, the OG titles ran systematically eight to twelve characters longer than the H1s. That was the SEO lead testing share copy independently of page copy. The new schema kept the two fields separate so the editors could keep doing it without a code change.

The sitemap was the other moving piece. Yoast generated one per subsite at /sitemap_index.xml with chained sub-sitemaps. Search Console properties at the publisher were configured against those exact URLs, and rewiring them meant a verification dance with 47 separate accounts that nobody had time for. We replicated the URL shape in Next.js with a route handler that streams a sitemap index per tenant, paginated at 50,000 URLs per sub-sitemap to stay under the sitemap protocol ceiling. Search Console never noticed the backend changed.

The dress rehearsal that mattered

We ran the full migration twice on a staging stack before touching DNS. The second time, the SEO lead sat with us and we replayed her 9,304 redirects through a crawler.

# verify.sh
while IFS=, read -r source expected_target expected_code; do
  out=$(curl -s -o /dev/null -w "%{http_code},%{redirect_url}" \
    "https://staging.example.nl${source}")
  actual_code="${out%%,*}"
  actual_target="${out#*,}"
  if [ "$actual_code" != "$expected_code" ] || \
     [ "$actual_target" != "$expected_target" ]; then
    echo "MISS: $source ($expected_code -> $expected_target) \
 got $actual_code -> $actual_target"
  fi
done < redirects.csv

First pass: 9,221 hits, 83 misses. We worked through the misses with the SEO lead at the same desk. About half were genuine bugs in our import (the URL-decoding issue lived there). The other half were redirects that had been broken in production for years and nobody had noticed. She kept eight of those alive deliberately, retired the rest, and signed the spreadsheet.

Takeaway

A redirect graph is institutional knowledge. The decade of decisions a good SEO lead has made about how traffic flows through a publication is not a config file you replace. It is a graph you preserve, and the engineer's job is to host it without losing entries.

Cutover and the first hour

We cut over on a Tuesday at 06:00 Amsterdam time. DNS TTL was dropped to 60 seconds the week before. The lookup table was pre-warmed in KV across three regions. The legacy WordPress stack stayed live on a subdomain for forty-eight hours behind basic auth, in case we needed to compare anything.

Hour one: 47 sites serving, redirect hit rate 0.31 per request (about what the spreadsheet predicted), no 5xx, p95 middleware latency 11ms cold and 4ms warm. Hour six: the SEO lead emailed back the same crawler, this time pointed at production. 9,304 of 9,304 rules firing as expected. She kept the spreadsheet pinned in her dock.

What we'd do differently

One thing. We would have written the verifier first, before any import code. The reason we lost a day to URL-decoding was that we discovered the issue on the second dress rehearsal instead of on the first import. A red/green test harness against the legacy site, written before any migration code runs, would have caught it on day two instead of day twelve.

The other lesson is older than the stack. The kickoff meeting on a project like this always contains the suggestion to "simplify" the redirect graph as part of the move. The SEO lead pushed back hard and she was right. Ten years of her work is doing real revenue work in the background, and most of it is not legible to anyone who didn't watch it accrete. There has been a quiet drumbeat the last few months about which jobs an LLM can do and which it can't. A redirect graph is the unflattering answer. You can generate boilerplate at scale and you still cannot replace the person who knows that a 302 from 2017 keeps a particular reseller's bookmark alive.

When we built the headless stack for this Dutch trade publisher, the thing we kept coming back to was that the migration's job was not to be clever. It was to move ten years of value across a stack boundary without dropping any of it. We do this kind of legacy migration often enough that the playbook above is now the default order of operations.

If you're staring down a project like this, the five-minute audit is this: open your redirect store, count the rows, and if it's past 1,024 stop pretending the platform's built-in redirect config is going to hold. Move the table out, put a middleware lookup in front of it, and write the verifier before you write the importer.

Key takeaway

A ten-year-old redirect graph is institutional knowledge, not a config file. Move it to an edge KV lookup and the migration becomes safe to ship.

FAQ

Can we keep WordPress redirects in next.config.js or vercel.json?

Only if you have fewer than 1,024 of them. Vercel's routing manifest caps at 1,024 redirects across both files. Past that, move the table to an edge KV and look it up in middleware.

Should each subsite get its own Sanity dataset?

Usually no. One dataset with a site reference on every document gives you cross-site editorial moves, shared authors, and one webhook to maintain. Use separate datasets only when isolation is a hard requirement.

How long does a 47-site multisite migration take end-to-end?

Six to ten weeks in our experience, dominated by HTML-to-Portable-Text cleanup and dress rehearsals, not by code. Budget at least two full dress rehearsals before DNS cutover.

What's the rollback plan if the cutover goes sideways?

Keep the legacy WordPress stack live on a subdomain behind basic auth for at least 48 hours. With a 60-second DNS TTL set a week in advance, you can flip back inside two minutes.

How do we handle URL-encoded characters in legacy redirects?

Decode the request path before looking it up. WordPress URLs from the early 2010s commonly contain encoded spaces and diacritics, but Next.js delivers the path already decoded to middleware.

wordpressmigrationlegacy sitesseoarchitecturecase study

Building something?

Start a project