WordPress
WordPress to Astro: nine days lost to Yoast redirects
Nine days into a WordPress 6.4 to Astro and Payload migration, 4,200 long-tail URLs were returning 404. The Yoast redirect manager was lying about its own data.

It was 11:47 on a Tuesday morning in Den Haag. The marketing lead at a 26-person Dutch publisher refreshed Google Search Console for the fifth time that hour. Organic clicks had been sliding since the relaunch on the Saturday night before, ten days earlier. Not gently. By Tuesday morning, 4,200 pages were returning 404, and the head of editorial was on her way down the stairs with a printout in her hand.
They had migrated from WordPress 6.4 to Astro on the front and Payload CMS on the back. The cutover went clean. All 12,600 article URLs resolved on the new domain. The team had been celebrating on Sunday. By Tuesday they were nine days into a redirect debugging spiral, with no clear cause and a CFO asking when SEO would come back.
We were brought in on day nine. What follows is the story of how the bug was found, and why no automated migration script in the world would have caught it.
The migration plan that looked clean
The publisher ran a long-tail content business. Twelve and a half thousand articles, most more than five years old, plenty of them ranking on niche queries that brought in real ad revenue. The new stack was a sensible move. Astro on the edge for speed, Payload as the editorial CMS, with a Postgres back end and Vercel hosting. The old WordPress had become a maintenance burden: a database that had been growing since 2014, a stack of plugins from three different agencies, and a yearly hosting bill that climbed every renewal.
The migration plan included one piece that everyone agreed was the most important deliverable: the 301 redirect map. Twelve thousand six hundred old ?p=-style and /2019/category/slug-style URLs needed to land on the new permalink structure. On top of that, the publisher had spent five years building wildcard rules in the Yoast SEO Premium redirect manager. Patterns like /oude-rubriek/* rewriting to /archief/oude-rubriek/$1. Roughly 240 wildcard rules in total.
The internal dev who ran the migration did the obvious thing. He ran the WP-CLI export, transformed the redirect map into a JSON file, fed it into a Vercel vercel.json redirect block, and shipped it.
The first sign something was off
Saturday and Sunday: organic search traffic looked normal-ish. Search Console takes a few days to catch up. By Monday afternoon, the 404 count had climbed to 1,100. By Tuesday morning, 4,200.
The pattern in the 404 list was the interesting bit. Every URL came from articles published before May 2019. Newer articles redirected fine. Older articles either redirected to a literal $1 string (the unescaped wildcard placeholder) or to nothing at all.
The dev had checked the redirect map. It looked complete: 12,600 entries plus 240 wildcards, all present, all valid. He had spent three days re-running the export, comparing diff hashes, even rebuilding the WordPress staging environment from a fresh backup. The export was deterministic. The output looked sane.
It was not sane.
The smoking gun in the SQL
When we walked in on day nine, the first thing we did was open the raw serialized PHP blob that held the wildcard rules. Not the JSON output the dev had been staring at. The actual option_value from the database. We pulled it with a one-liner.
wp db query \
"SELECT option_value FROM wp_options \
WHERE option_name = 'wpseo-premium-redirects-base'" \
--skip-column-names > redirects-raw.txt
wc -c redirects-raw.txtThe file was 65,535 bytes. Exactly. That number is a flare.
If your dump of a serialized PHP option is 65,535 bytes, it is not your data. It is the column type. MySQL's TEXT caps at 216-1 bytes (65,535). Real WordPress option data lives in LONGTEXT (4 GiB). When the two get mixed up, the smaller one wins and the truncation is silent.
We unserialized the blob in a PHP REPL. unserialize() returned false. The error pointed at a length mismatch around byte 65,400: a serialized string had been declared as s:412:"..." but the string itself was cut off mid-character. The closing brace of the outer array never arrived. PHP saw the corruption, threw up its hands, and returned false.
WordPress's own get_option() returned the same broken blob to anything reading downstream, so the redirect manager UI silently ignored every wildcard rule defined after the cutoff. The first 8,400 plain redirects survived because they sat earlier in the serialized array. The 240 wildcards came at the end of the structure, and the long-tail pre-2019 archive rules came last of all. Those were the 4,200 pages that 404'd.
Why the option API hid it
This is the part that took us six hours to reverse-engineer, because the bug was older than anyone on the current dev team.
In 2019, a previous agency had bolted on a custom plugin that synced the Yoast redirect data into a reporting table called wp_seo_redirects_sync. They built it because the marketing team wanted to query redirect performance against Matomo data, and joining against a serialized blob inside wp_options is hostile. So they made a flat table. They declared the rule_payload column as TEXT.
For three years, the table held everything. The serialized blob was somewhere around 38 kilobytes. Plenty of headroom. Then sometime in 2022, the editorial team imported a large batch of legacy URLs from a sister title, and the blob grew past 64 KB. The next sync truncated. The reporting table held the truncated copy. Nobody noticed, because nobody read the reporting table directly. The redirect manager UI kept reading from wp_options, where the data was intact.
Here is the part that bit them. The internal dev's migration script, the one he wrote in 2026 to move everything to Astro, sourced the redirect map from wp_seo_redirects_sync, not from wp_options. Why? Because the reporting table already had columns normalized. Source, destination, rule type, regex. It was the obvious data source. It was also three years stale and silently broken.
The wp_options blob was fine. The data was always there. The export pipeline was the bug.
The repair, in four moves
We did not have time to redo the entire migration. The CFO wanted SEO clicks back inside the week. We did this.
One. We extracted the canonical redirect map directly from wp_options, bypassing the broken sync table entirely.
wp eval '
$raw = get_option("wpseo-premium-redirects-base");
$rules = is_array($raw) ? $raw : [];
echo json_encode($rules, JSON_PRETTY_PRINT);
' > redirects-canonical.jsonTwo. We validated every rule in the file. Any rule whose source URL did not resolve to a 200 on the live WordPress (still running on a staging domain) was flagged. The flagged set told us what was real and what was a ghost from years of plugin churn.
// validate-redirects.mjs
import fs from "node:fs/promises";
const rules = JSON.parse(await fs.readFile("redirects-canonical.json", "utf8"));
const base = "https://staging.publisher.nl";
const results = [];
for (const r of rules) {
const res = await fetch(base + r.origin, { redirect: "manual" });
results.push({ ...r, livestatus: res.status });
}
await fs.writeFile("redirects-verified.json", JSON.stringify(results, null, 2));
const live = results.filter(r => r.livestatus === 200).length;
console.log(`${results.length} rules; ${live} live`);Three. We pulled twelve months of access logs from the old hosting provider, grouped by URL, and surfaced the top 25,000 paths by request count. Anything in that list that did not appear in redirects-verified.json got a manual mapping. This is the unglamorous step that recovers tail traffic. There is no shortcut.
Four. We split the redirect map between Vercel's edge config (the hot 5,000 by traffic) and a Postgres-backed lookup in Payload (the long tail). Edge config has its own size cap. Pretending it does not is how migrations like this one stall.
What we changed in our migration playbook
Three concrete habits came out of this job. We use them on every CMS migration now.
Check the column type, not the row count
Before exporting any serialized blob from a WordPress database, we run a one-line check on the column definition.
wp db query "SHOW COLUMNS FROM wp_options LIKE 'option_value'"If the answer is not longtext, we stop and dig. We do the same on any custom table that holds plugin state. The MySQL storage requirements table is worth bookmarking. TINYTEXT caps at 255 bytes, TEXT at 64 KiB, MEDIUMTEXT at 16 MiB, LONGTEXT at 4 GiB. Sites that have lived through many maintainers tend to have at least one mistaken type, somewhere.
Validate the serialization, not the byte count
A serialized PHP blob has a fixed grammar. You can parse it strictly. If unserialize() returns false, there is nothing else to discuss. The data is broken. We now run unserialize on every option larger than 8 KiB during the audit phase, before any migration code runs.
Verify the canonical chain end to end, not just the count
The internal dev had counted 12,600 entries in his output file. The count was right. The contents were wrong. We now sample 200 random URLs from twelve months of access logs and cURL them against the new site. If any return anything other than 200 or a single 301, the migration is not done. Counting rows is not testing.
The cost of the stall
The publisher lost nine days of organic traffic on 4,200 ranking pages. Search Console showed impressions on the affected set collapsing within forty-eight hours of the cutover. Most of the traffic came back within four weeks of the fix going live. The longest-tail pages took roughly eight. Some queries never recovered all the way. Google reads a long 404 streak as a signal, and the signal lingers.
When we built the new redirect layer for this publisher, the thing we ran into was that no single export tool could be trusted to round-trip the full plugin state across three years of agency hand-offs. We ended up solving it by pulling the canonical data from two independent sources (the option API and the access logs) and reconciling them before a single redirect rule shipped. That is the kind of work we tend to do across legacy site migrations: not magic, just the careful kind of plumbing nobody likes paying for until they have lost nine days.
If you have a WordPress site you are about to move, run one command before anything else. Open a database client and ask: SHOW COLUMNS FROM wp_options LIKE 'option_value'. If it answers longtext, you are probably fine. If it answers anything else, stop. The next nine days of your project just got cheaper.
Key takeaway
If a serialized WordPress option dump is exactly 65,535 bytes, the bug is the column type, not the data. Real option data lives in LONGTEXT.
FAQ
Why was the Yoast redirect blob exactly 65,535 bytes?
Because somewhere in the export chain it was stored in a MySQL TEXT column, which caps at 2^16-1 bytes. The data was complete in wp_options (LONGTEXT) but truncated in a custom sync table that used TEXT.
How do I know which WordPress columns are LONGTEXT versus TEXT?
Run wp db query "SHOW COLUMNS FROM wp_options LIKE 'option_value'" or check any custom plugin table with SHOW CREATE TABLE. If a column holding serialized data is not LONGTEXT, treat it as suspect.
Can I migrate WordPress redirects to Astro without Yoast Premium?
Yes. The redirect rules live in wp_options as a serialized array. Pull them with wp eval, convert to JSON, and load them into Vercel edge config or your CMS of choice. Yoast is convenient but not required.
What is the safest way to verify a migration before going live?
Sample a few hundred URLs from twelve months of server access logs and curl them against the new site. Anything that returns more than a single 301, or returns 404, is a bug. Counting rows in the export file is not testing.