WordPress

WooCommerce to Shopify migration: the Dokan serialization trap

A Tilburg textile wholesaler's WordPress-to-Shopify cutover stalled for ten days because WP All Export quietly chopped a Dokan vendor table at the 1,000-row mark.

Jacob Molkenboer· Founder · A Brand New Company· 11 Jan 2026· 11 min

Open leather ledger torn across one page, brass clip, green tab, broken red wax seal on ivory paper surface.

The Slack message came in at 09:14 on a Tuesday in March: "Why are 620 vendors showing €0.00 for February?" The cutover was supposed to have been quiet. The new Shopify Plus storefront had been live for six hours. The first automated maandafrekening run, the monthly commission statement that 1,840 vendors of a Tilburg textile wholesaler watch their inbox for, had fired at 03:00 CET. A third of those vendors had received a PDF that said, in clean Helvetica, that they had earned exactly zero euros.

We were ten days into the migration. We thought we were done.

The site we inherited

The wholesaler — call them the client — had been running on WordPress 5.8 and WooCommerce 6 since 2019. They had grown into a marketplace shape almost by accident: mills in Turkey, India, and Portugal sold through their domain, and Dokan handled the vendor split. Twenty-five people in the Tilburg office processed returns, vendor disputes, and the kind of B2B paperwork that EU textile distribution requires.

The Dokan installation had been customised. The default commission model in Dokan is a flat percentage. The client had instead implemented a volume-tier system: a vendor doing under €5,000 a month paid 18%, €5,000 to €20,000 paid 16%, and so on through six brackets. That tier table lived nowhere in the WooCommerce admin and nowhere in the Dokan settings. It lived in one row per vendor in wp_usermeta, under a meta key the original developer had named dokan_commission_tiers, holding a PHP-serialized array.

This is, to be clear, a perfectly normal thing for a WordPress plugin to do. It is also a thing that bites you the moment you try to leave WordPress.

The export that looked right

We had picked WP All Export for the vendor table because it handles WooCommerce custom fields better than the WP-CLI exporters and because the client's ops lead already knew the UI. We ran a sample of 100 vendor rows into CSV. We diffed the unserialized tiers against the live admin. Clean match. We ran 500. Clean match. We greenlit the full export.

The full export wrote a 47 MB CSV. The row count matched: 1,840. The headers matched. Every cell had data in it. Every cell. That is the part that took us the longest to notice.

Warning

A serialized PHP string truncated mid-array still contains text. A spot-check of a spreadsheet won't catch it. You need to unserialize() every row and assert the return is not false.

The truncation happened at WP All Export's chunk boundary. By default the tool processes records in 1,000-row chunks and writes each chunk to a temporary buffer before flushing to the CSV. On vendors whose serialized tier blobs exceeded the buffer's per-field cap, the trailing bytes were silently dropped. We had vendors with six tier brackets and per-tier currency overrides, which was the long-tail of the table, and that was where the knife came down.

The CSV looked like this for an affected row:

user_id,dokan_commission_tiers
4421,"a:6:{i:0;a:3:{s:6:""volume"";i:0;s:4:""rate"";d:0.18;s:8:""currency"";s:3:""EUR"";}i:1;a:3:{s:6:""volume"";i:5000;s:4:""rate"";d:0.16;s:8:""currency"";s:3:""EUR"";}i:2;a:3:{s:6:""volume"";i:20000;s:4:""rate"";d:0.14;s:8:""currency"";s:3:""EUR"";}i:3;a:3:{s:6:""volume"";i:50000;s:4:""rate"";d:0.12;s:8:""currency"";s:3:""EU

You can see the cut. The closing braces are gone. The final "EUR" got chopped to "EU. The opening byte count at a:6:{ still promises six tier objects, but only the first three are intact. PHP, asked to unserialize that, returns false and emits an E_NOTICE. The PHP manual says so plainly: any character that throws off the byte arithmetic invalidates the whole string.

The path from truncation to €0.00

The Shopify side had its own logic. We had ported the tier reader into a small Cloudflare Worker that the Shopify Plus storefront called once a month, took the vendor's gross sales total, and returned the applicable rate. That Worker read tier data from a Postgres table we had populated from the CSV.

The CSV import into Postgres did not fail. It did not warn. The field was a text column. A truncated PHP-serialized string is still valid text. The Worker, asked for a vendor's rate, tried to unserialize the blob using a small PHP-array-to-JS parser, got back null, fell through to the default branch, and the default branch returned 0.

Six hundred and twenty vendors had their commission rate set to zero. The maandafrekening run, which multiplies gross sales by rate, dutifully produced 620 statements of €0.00.

The first 90 minutes

By 10:45 we had the Shopify Plus storefront in maintenance mode for vendor portal pages only — the consumer side stayed up. By 11:30 we had a working theory. By noon we had run this query against the WordPress staging DB, which we had mercifully kept frozen:

SELECT user_id,
       LENGTH(meta_value) AS bytes,
       SUBSTRING(meta_value, -8) AS tail
FROM   wp_usermeta
WHERE  meta_key = 'dokan_commission_tiers'
ORDER  BY bytes DESC
LIMIT  20;

The longest serialized values in the WordPress DB were 4,210 bytes. The longest values in the CSV were 4,096. The cap was that obvious in retrospect. The CSV bytes were always a power of two when the data was truncated and arbitrary when it wasn't. A buffer somewhere had stopped reading.

The sample that missed the long tail

The CSV samples we had diffed before greenlighting the full export were two batches: 100 rows and 500 rows. Both were drawn from the head of the table by user ID. That table happened to be ordered roughly chronologically, which meant our samples were almost entirely vendors who had joined in 2019 and 2020, before the volume-tier system had spread across the marketplace. Their serialized arrays held two or three brackets each, well under the chunk-buffer cap. The vendors with six brackets and per-tier currency overrides had joined in 2023 and lived past row 1,500.

The lesson is not that 600 sample rows is too few. The lesson is that an export sample has to be drawn against the dimension that breaks the export. For this data the breaking dimension was serialized blob length. A useful sample would have been the twenty longest rows in the source table. ORDER BY LENGTH(meta_value) DESC LIMIT 20 would have given us, in the first batch we ever ran, the only data that actually mattered. Random sampling and head-sampling both missed the long tail because the long tail was, by definition, rare.

We now treat the longest-row sample as a separate, named test in every WordPress migration. It costs nothing to run. It would have saved ten days here.

The fix that should have been the plan

We threw out the CSV. We replaced WP All Export with a direct mysqldump of the relevant rows, piped into a small Node script that did one thing: unserialize each row, JSON-encode the result, write to a new file. Any row that failed to unserialize stopped the script.

// dokan-tiers-export.js — runs once, fails loud
import { createReadStream } from "node:fs";
import readline from "node:readline";
import { unserialize } from "php-serialize";

const rl = readline.createInterface({
  input: createReadStream("dokan_commission_tiers.tsv"),
  crlfDelay: Infinity,
});

let n = 0, failed = 0;
for await (const line of rl) {
  const [userId, blob] = line.split("\t");
  try {
    const tiers = unserialize(blob);
    if (!Array.isArray(tiers)) throw new Error("not an array");
    process.stdout.write(JSON.stringify({ userId, tiers }) + "\n");
  } catch (err) {
    process.stderr.write(`row ${userId}: ${err.message}\n`);
    failed++;
  }
  n++;
}
process.stderr.write(`done · ${n} rows · ${failed} failed\n`);

Failed: zero. The export was complete in 11 seconds. We reloaded the Postgres table from the JSON, re-ran the Worker over the 620 zero statements, generated 620 corrected PDFs, and emailed them with a one-paragraph apology and a link to the recalculated PDF. The total commission owed across those 620 statements was €184,720. The error in either direction after recalculation was €0.

That was day eleven. The migration shipped on day twelve.

The note we sent to 620 vendors

The communications side mattered as much as the fix. The first email had gone out at 03:00 CET; vendors in Istanbul opened it over breakfast. By the time we had a corrected PDF, several had already forwarded the €0.00 statement to bookkeepers, partners, and in two cases their lawyers. We wrote a single-paragraph correction in Dutch and English, attached the recalculated PDF, named the cause without jargon ("a data export error from our migration"), and gave a direct contact for the client's ops lead. We did not send it through a marketing tool. It went out as plain-text email from the ops lead's own address, BCC'd to the 620.

Twenty-three vendors replied to ask for the original statement to be retracted formally from their bookkeeping. Two asked for a written confirmation of the corrected number on company letterhead. Zero asked to leave the marketplace. The phrase the ops lead kept using afterwards was that a clear apology lands better than a clever one. We have stolen that line for every incident comms template we have written since.

What we changed in the migration checklist

The lesson we wrote into our internal playbook the next morning was specific. It read:

When migrating WordPress data that contains serialize() output — any plugin meta, any wp_options row, any custom wp_* table — assert unserialize() !== false on every row before exporting and after importing. A spreadsheet row count is not validation.

We also added three rules to the checklist:

Never trust a tool that does its own chunking unless you have read the chunking code. WP All Export, WP All Import, WP-CLI batch flags, even mysqldump --extended-insert all have edge cases. Read the source or set the chunk size to "no chunking" and accept the slower run.
Dump the raw DB, not a derived export, for any column that holds serialized data. mysqldump --where is faster, more honest, and never silently truncates.
Diff the byte length of every serialized column in the source DB and the target DB. If any value lost bytes, the migration is not finished.

These checks run on every WordPress migration we touch now. They take six minutes on a 50,000-row WordPress DB and they have caught two further plugins doing the same thing since: a Yoast SEO breadcrumb config that exceeded our buffer on one site, and a custom WPML language-routes table on another.

The wider point about serialized PHP

PHP's serialize() is a binary-ish format pretending to be text. The byte counts in the header (s:6:"volume" means "string of length 6, value volume") are not a hint, they are part of the spec. Drop a byte, the whole string is junk. The format predates JSON's dominance and most modern tooling treats it as an opaque blob. If your WordPress site is going to live in another system one day, every serialized meta row is a small debt waiting to be called in.

This is why, when we plan a WooCommerce to Shopify Plus migration now, the first thing we do — before anyone touches a theme file — is grep wp_usermeta, wp_postmeta, and wp_options for serialized columns and list every plugin that owns one. Twenty minutes of grepping saves ten days of fire-fighting.

Takeaway

If a WordPress migration tool's CSV looks fine in a spreadsheet, it has told you nothing. The only honest check on serialized data is to unserialize it and inspect the return.

When we ran the post-mortem with the client's ops lead, the part she kept coming back to was that the bug was invisible. Nothing in the WordPress logs, nothing in the Shopify logs, nothing in the export tool's report. The first signal was a vendor email. Shopify Plus handled the cutover gracefully; the failure was upstream, in a seven-year-old WordPress plugin doing exactly what its author had every right to do.

Before you start any WordPress migration this quarter: open a MySQL shell on the source DB and run SELECT COUNT(*) FROM wp_usermeta WHERE meta_value LIKE 'a:%' OR meta_value LIKE 'O:%';. The number that comes back is the size of the surface you are about to migrate. Plan for it before you book the cutover.

Key takeaway

If a WordPress migration tool's CSV looks fine in a spreadsheet, it has told you nothing. Unserialize every row and assert the return is not false.

FAQ

Why did WP All Export silently truncate the Dokan vendor data?

Its default 1,000-row chunking writes each chunk to a temporary buffer with a per-field byte cap. Serialized values larger than that cap are clipped, and no warning is emitted.

How do I check whether a WordPress export has truncated serialized data?

Run unserialize() on every exported row and compare byte length to the source DB column. If unserialize returns false or the byte counts differ, the row was truncated.

Is it safe to put serialized PHP arrays in wp_usermeta?

It works fine while you stay on WordPress. The pain arrives at migration time, when any downstream tool that does not honour PHP's byte arithmetic can quietly corrupt the value.

What should replace WP All Export for this kind of data?

A direct mysqldump filtered by meta_key, piped into a small script that unserialize-validates every row and stops on the first failure. Slower, but it cannot lie to you.

wordpressmigrationphpmysqle-commercecase study

Building something?

Start a project