← Blog

WordPress

WordPress migration war story: the double-escaped OE-array

Day nine of a Remix migration. The dealer portal works, the catalogue loads, but 380 garagebedrijven are seeing the wrong discount tier. The data is technically there.

Jacob Molkenboer· Founder · A Brand New Company· 19 Mar 2026· 9 min
Open leather ledger with green index tab, brass key on manila card, red wax seal on ivory desk in side light.

Day nine on the dealer-portaal migration

The wholesaler sits on an industrial estate east of Arnhem. Twenty-one people, fourteen million euro of auto parts moving through the warehouse every year, a dealer portal that 380 garagebedrijven log into when they need a brake disc by Thursday morning. The portal had been running on WordPress 4.9, WooCommerce 3.5, and a sheaf of custom PHP 7.2 since 2018. We were nine days into porting it onto Remix and Postgres.

By "nine days into" I mean we were stuck. The catalogue rendered. Login worked. SKUs resolved. But the per-merk discount tiers — the kortingsstaffel that every garage relies on to know whether a Bosch alternator costs €212 or €189 — were quietly wrong. Not for everyone. For 380 of the 412 dealer accounts.

The query in the new system was correct. The Postgres rows it returned were correct. The problem was upstream of us, sitting in the old database, in a single column we had been reading as gospel for nine days.

For eight of those nine days we were sure the bug was ours. We re-read the Remix loader twice. We re-ran the discount-tier function against a fixture of twelve known-good dealers — it passed. We dumped the Postgres rows for one of the affected garages and compared them byte-for-byte against a working dealer's rows; they were identical. We even rebuilt the cache layer from scratch on the theory that a stale Redis key was poisoning the lookup. None of it touched the problem, because the problem was three layers upstream of anything we had written.

What was actually in wp_postmeta

Each of the 14,200 onderdelen carried a meta_key called _oe_numbers. Original Equipment numbers — the manufacturer codes a garage searches by when they want the BMW-equivalent of a generic part. The PHP that fed the kortingsstaffel walked that array, matched the brand prefix, then looked up the discount tier for the garage.

Here is what one row looked like, copy-pasted out of phpMyAdmin:

a:3:{i:0;s:11:\"BMW-1234567\";i:1;s:11:\"VAG-8901234\";i:2;s:11:\"MB-A0005678\";}

Look at the backslashes. They are not supposed to be there. A correctly serialized PHP array writes its strings with bare double quotes:

a:3:{i:0;s:11:"BMW-1234567";i:1;s:11:"VAG-8901234";i:2;s:11:"MB-A0005678";}

The blob in production had been double-escaped. Once when an import plugin in 2018 wrote it to the database through a $wpdb->prepare() call that already escapes. Then a second time when someone — possibly a phpMyAdmin export from a long-forgotten backup — round-tripped the column through another escape layer. The result: every " became \", the byte counts in the s:11: prefixes were now lies, and PHP's unserialize() quietly returned false.

Why unserialize silently returned false

PHP's serialization format is byte-counted. s:11:"BMW-1234567" means "string, eleven bytes, then BMW-1234567". If you change the payload to \"BMW-1234567\" the parser now sees thirteen bytes between the quotes, not eleven. The length header lies. unserialize hands you back false and a notice you only see if your error logging is turned up to E_NOTICE.

WooCommerce 3.5 swallowed that false. get_post_meta() on a double-escaped blob returned an empty array, the kortingsstaffel code defaulted to "no OE match, fall back to base price", and 380 garages had been getting the wrong tier for years. Quietly. Nobody had complained, because the prices were close enough to right that the garages assumed the staffel had been updated.

The fallback was a single line in a helper function we eventually grepped out of the theme:

$oe   = get_post_meta($product_id, '_oe_numbers', true) ?: [];
$tier = $this->match_brand_prefix($oe, $dealer) ?? 'base';

?: [] and ?? 'base'. Two null-coalescing safety nets that, together, made an empty OE array indistinguishable from a missing brand match. We had inherited the same pattern in our Remix code and stripped it during the rewrite because we wanted explicit errors on missing OE arrays. The strictness handed us the bug the old code had been hiding.

The 2018 culprit

We pulled the plugin history for the site and found a CSV importer installed on 14 March 2018. It read exports from the wholesaler's old ERP and wrote them into wp_postmeta. It serialized the array in PHP, then passed the serialized string to $wpdb->insert().

That alone is fine. $wpdb escapes correctly. The bug was that the plugin called addslashes() on the already-serialized string before handing it to $wpdb, because the original author had been bitten once by an unescaped insert and had cargo-culted a defensive layer on top. $wpdb then added its own escaping. The double escape was baked into every row the importer wrote between 2018 and 2021, when the wholesaler stopped using the plugin and started uploading parts by hand.

14,200 rows. About 11,800 written by the importer, the rest by hand. The hand-written rows were fine. That is why 32 dealers had the correct kortingsstaffel and 380 did not.

Why wp-cli search-replace did not catch it

wp search-replace understands PHP serialization. It will walk through a serialized blob, find strings inside it, and update the byte counts when it replaces. What it does not do is tell you the blob is already malformed before it touches it. We had run search-replace four times during the migration to swap a staging domain. Each pass succeeded with zero warnings, because the parser inside wp-cli was unable to open the blob in the first place — it bailed out and left the bytes as-is. No error, no row count delta in the summary, nothing.

We also tried the Search Replace DB script the previous agency had left in the repo, and the Better Search Replace plugin from the admin. Same blind spot in both. They parse, they bail, they tell you nothing. The phpMyAdmin SQL preview rendered the bytes as text without parsing them, so the malformed rows looked indistinguishable from clean ones at a glance. Every off-the-shelf tool we reached for sat downstream of a working unserialize(), and unserialize() had been failing in silence since March 2018.

That is how we burned nine days. The data looked fine. The migration log was clean. The catalogue rendered. The first symptom was a phone call on day eight from a garage in Ede asking why his Bosch parts had quietly gone up eleven percent.

His name was Marco. He ran a three-bay garage that shifted maybe forty Bosch alternators a quarter, and he kept his own spreadsheet of most-ordered SKUs and the tier he was supposed to be on. The wholesaler's account manager forwarded the call to us at 16:40 on a Friday. By Monday morning we were pulling the importer source out of cold storage.

The repair, in three passes

Once we knew what we were looking at, the fix was mechanical. We wrote a one-off PHP CLI script that operated directly against a staging copy of the old MySQL database we could throw away. Three passes.

Pass one: detect every row whose meta_value was a double-escaped serialized blob. The signature is a leading a:N:{ followed by s:N:\" instead of s:N:". MySQL's LIKE treats backslash as its own escape character, so the pattern needs four backslashes to match one literal:

SELECT meta_id, post_id, meta_value
FROM wp_postmeta
WHERE meta_key = '_oe_numbers'
  AND meta_value LIKE 'a:%'
  AND meta_value LIKE '%s:%:\\\\"%';

That returned 11,842 rows. Close to our estimate.

Pass two: for each row, strip the escaping and re-serialize. We did not try to clean the blob in place with a regex on the byte counts — too easy to miss an edge case where the OE number itself contained a quote. Instead we used stripslashes() to remove one layer, fed the result back through unserialize(), and re-serialized the resulting array cleanly.

<?php
require __DIR__ . '/wp-load.php';

global $wpdb;

$rows = $wpdb->get_results(
    "SELECT meta_id, meta_value
     FROM {$wpdb->postmeta}
     WHERE meta_key = '_oe_numbers'
       AND meta_value LIKE 'a:%'"
);

$fixed   = 0;
$skipped = [];

foreach ($rows as $row) {
    $stripped = stripslashes($row->meta_value);
    $decoded  = @unserialize($stripped);

    if (!is_array($decoded)) {
        $skipped[] = $row->meta_id;
        continue;
    }

    $clean = serialize(array_values($decoded));

    $wpdb->update(
        $wpdb->postmeta,
        ['meta_value' => $clean],
        ['meta_id'    => $row->meta_id]
    );

    $fixed++;
}

printf("Repaired %d rows. Skipped %d.\n", $fixed, count($skipped));
file_put_contents(__DIR__ . '/skipped.txt', implode("\n", $skipped));

Pass three: hand-review the skipped rows. Thirty-seven rows had been double-escaped twice — a triple-escape — because they had also been touched by a 2019 migration script that the previous agency had run and then forgotten to log. One stripslashes() was not enough. We re-ran the script on those 37 rows with a second pass, and all of them came back clean.

Putting the kortingsstaffel back together

With wp_postmeta sane, we re-ran the export-to-Postgres pipeline. The parts catalogue now had the correct OE arrays. The kortingsstaffel logic in Remix could match a brand prefix to a discount tier without hitting an empty array.

We then did the thing we should have done on day one: a full reconciliation against the wholesaler's ERP. Of the 380 affected garages, 376 ended up with prices that matched the ERP exactly. Four had bespoke discount agreements that had been stored in a Google Sheet by the sales team, never in the portal. We pulled those four into a JSON file and loaded them on day eleven.

On day twelve we shipped the new portal behind a feature flag, ten percent of traffic at first, and ran a parallel pricing job that compared every quote against the ERP every hour. The reconciliation surfaced two more bespoke discount agreements over the next fortnight, both of which the sales team had also kept in private spreadsheets. We added a bespoke_override column to the Postgres schema and wired the sales team's CRM into it, so the next time someone agreed a kortingsstaffel over a coffee, it ended up in one place.

Takeaway

A WordPress migration is never just a migration. It is the first time anyone has read the data strictly in years. Treat the old database as a witness, not a source of truth.

What we now do on day one

Every WordPress migration we take on now begins with a single command before we touch a line of new code. We dump every distinct meta_key that contains serialized data, pick a sample row per key, and attempt to unserialize() it. If a single key consistently returns false, we stop and audit. If we cannot audit in a day, we tell the client the migration is going to take longer and explain why, before they have booked a launch date and told their dealers.

<?php
// audit-meta.php — run with: wp eval-file audit-meta.php
global $wpdb;

$keys = $wpdb->get_col(
    "SELECT DISTINCT meta_key
     FROM {$wpdb->postmeta}
     WHERE meta_value LIKE 'a:%'
        OR meta_value LIKE 'O:%'"
);

$bad = [];

foreach ($keys as $key) {
    $sample = $wpdb->get_var($wpdb->prepare(
        "SELECT meta_value FROM {$wpdb->postmeta}
         WHERE meta_key = %s LIMIT 1",
        $key
    ));

    if (@unserialize($sample) === false && $sample !== 'b:0;') {
        $bad[] = $key;
    }
}

print_r($bad);

The audit goes further than serialized strings. We also check: every meta_key whose values start with O: (serialized objects — much worse, because the class definition may no longer exist on the new stack). Every option in wp_options over 1MB (almost always a runaway autoload value). Every cron job that has not fired in 180 days. Every user with a role that is not in wp_user_roles. None of these have anything to do with the rewrite. All of them will burn a day if you find them on day eight instead of day one.

Twenty lines of PHP. It would have saved us nine days.

When we rebuilt this portal for the Arnhem wholesaler, the thing we ran into was exactly that quiet, structural lie in the old database — and the old PHP had been politely ignoring it for half a decade. We ended up solving it by treating the migration as a forensic exercise first and a build second, the playbook that now anchors every legacy migration we ship.

If you are about to move off WordPress, run that unserialize() sweep this afternoon. You will know within an hour whether your migration has a nine-day hole in it.

Key takeaway

A WordPress migration is the first time anyone reads your old data strictly in years. Audit your serialized meta before you write a line of new code.

FAQ

Why does PHP's unserialize() return false on double-escaped data?

PHP serializes strings with a byte-count prefix like s:11:"BMW-1234567". When backslashes are added the actual byte count grows, but the prefix does not. PHP detects the mismatch and bails with false.

Will wp-cli search-replace catch a double-escaped serialized blob?

No. It tries to parse the blob first. If parsing fails it leaves the row untouched and reports nothing — no warning, no row-count delta. You have to detect the corruption with your own sweep.

How do you find double-escaped meta in a WordPress database?

Select every distinct meta_key whose values start with a: or O:, sample one row per key, and run unserialize() on it. Any sample that returns false on a value that looks serialized is a repair candidate.

Can you fix double-escaped serialized data with a regex?

Risky. If the strings inside contain quotes the byte-count rewrite goes wrong. The safer route is stripslashes() to peel one escape layer, unserialize() to validate, then serialize() again to write a clean blob.

wordpressmigrationlegacy sitesphpmysqlcase study

Building something?

Start a project