← Blog

E-commerce

Shopify migration post-mortem: a UTF-8 BOM and 19 hours

A 31-person Tilburg manufacturer migrated from WooCommerce 8 to Shopify on a Tuesday night. By Wednesday morning, 4,200 EAN codes were corrupt. One byte did it.

Jacob Molkenboer· Founder · A Brand New Company· 9 Jun 2026· 9 min
Small craft-paper parcel with frayed twine beside a tilted brass scale holding a green wax seal on ivory desk.

It was 03:47 on a Wednesday in May when the warehouse supervisor at a Tilburg metalwork manufacturer logged into the new Shopify admin and saw that the first scan of the morning would not resolve. The handheld read EAN 8714123450019 against the product picker. Shopify returned nothing. He scanned a second box, same result. He scanned a third box of a completely different SKU. Same result. By 04:10 he had called the night-shift lead. By 04:30 the lead had called the operations director. By 05:00 we were on a video call.

The migration had been signed off at 22:00 the previous evening. It had been planned for nine weeks. The catalog held 4,247 active SKUs. Of those, 4,203 now had a leading invisible byte in front of their barcode field. The factory could not pick or ship.

This is a post-mortem of how a single supplier CSV with a UTF-8 BOM took down a working e-commerce operation, how the rollback took 19 hours instead of the 90 minutes our runbook claimed, and what we changed afterwards.

The setup

The client is a 31-person metalwork manufacturer near Tilburg that sells through a B2B portal and a B2C storefront. Around 60% of revenue is wholesale orders to industrial buyers across the Benelux. The other 40% is direct sales to hobbyists. Their stack before the migration was WordPress 6.4 with WooCommerce 8.4, a custom plugin that tied PDF stock sheets to product variants, and a 14-year-old supplier feed that arrived every night as a CSV over SFTP.

WooCommerce had become slow on the wholesale catalog. Search took 4 to 9 seconds on a warm cache. The ops team wanted Shopify Plus for the search, the inventory locations, and the staff role model. The CEO wanted to stop paying for managed hosting that needed quarterly PHP babysitting. We agreed the migration would run in May, on a Tuesday night, after the last warehouse shift.

The migration plan we shipped

The plan looked the way migration plans usually look. Export all products and variants from WooCommerce into the Shopify product CSV shape. Run the nightly supplier feed importer against the new Shopify GraphQL Admin API instead of the old WooCommerce REST API. Switch DNS at 22:00. Keep the WooCommerce database in read-only mode for 72 hours in case we needed to roll back. Three days of dry runs on a staging Shopify development store had all passed. Sample orders fulfilled cleanly. Barcodes scanned. The runbook had a rollback step that we estimated at 90 minutes.

The piece we underweighted was the supplier feed. Every supplier of theirs ships a different shape of CSV. Some are tab-delimited. Some quote everything. One supplier ships an XLSX disguised with a .csv extension. We had written importer scripts for all of them in PHP and they had been running against WooCommerce for years without any visible data corruption. We assumed those scripts could be ported to Node and pointed at Shopify with one week of work. They could not.

03:47 on Wednesday

The first scan that failed read like this. The picker scanned a box labelled 8714123450019. The Shopify product picker returned no result. We logged into the Shopify admin and searched for the EAN as text. Still no result. Then we copied the EAN out of the Shopify product detail page into a hex editor. The first three bytes of the barcode field were 0xEF 0xBB 0xBF. After those three bytes came the 13-digit EAN.

Those three bytes are the UTF-8 representation of a byte order mark. The Unicode FAQ is plain about this: the BOM is not recommended at the start of UTF-8 streams because it confuses code that does not know to skip it. Shopify's barcode field is plain text. It does not strip the BOM. Neither does the GS1 EAN-13 check digit logic that our handheld scanners use. So the scanners read 13 digits, the Shopify index held 13 digits plus three invisible bytes, and they did not match.

Warning

UTF-8 does not require a byte order mark. UTF-16 does. If you accept CSVs from suppliers, assume at least one of them will arrive with a BOM, and assume your importer will silently store it inside the first column unless you strip it.

The BOM and the supplier who changed exporters

The next question was how a BOM had made it past nine weeks of testing. We grepped the staging import logs for the bytes. Nothing. We checked the supplier feeds we had pulled during dry runs. Nothing. Then we pulled the most recent feed from the SFTP server, the one that had landed at 21:48 on the migration evening, and there it was. One supplier, a stainless-steel fittings distributor in Eindhoven, had pushed an updated ERP that exported CSVs with a BOM. Their previous exporter had not. Our nine weeks of dry runs had used the previous month's feeds.

This is the part of the story that hurts. We had a clause in the runbook that said "pull the freshest supplier feeds before cutover". We had not done it. The dry-run pipeline pointed at a frozen S3 snapshot from March, and we had never refreshed the snapshot, because refreshing the snapshot meant re-running the importer test matrix, and the importer had been declared done four weeks earlier. The change in the supplier's exporter had been announced in a release note that nobody on our side read.

Why our validators let it through

Our PHP importer for WooCommerce had used fgetcsv with default settings. fgetcsv does not strip a BOM from the first field. Neither does Node's csv-parse in default mode. Both will happily hand you a string that starts with U+FEFF, and your code will store it.

In the WooCommerce import path, we had a regex on the barcode column that stripped non-digit characters before the value hit MySQL. That regex was the only reason WooCommerce had never had this problem in production. When we ported the importer to Node for the Shopify cutover, we kept the schema, kept the API calls, and dropped the regex because "Shopify validates the EAN". Shopify does validate the EAN. It does not strip a U+FEFF. It treats the resulting string as not-a-valid-EAN and stores it anyway, because the barcode field on the variant resource is free text.

The migration ran the new Node importer against the supplier feed at 22:11. By 22:34, every product that this supplier touched (which was 4,203 of the 4,247 SKUs, because this distributor is the source for most of the catalog's metadata) had a corrupted barcode. The remaining 44 SKUs were custom products the client made in-house and had clean barcodes. Those were the only 44 that scanned at 03:47.

The 19-hour rollback that should have been 90 minutes

At 05:00 we made the call to roll back. The runbook said: redirect DNS to the WooCommerce server, flip WooCommerce out of read-only mode, replay the orders that had come in overnight (there were three), done. We had estimated 90 minutes.

What we did not have in the runbook was what to do about the three orders. Shopify had created its own order numbers starting at 1001. WooCommerce was sitting at 38291. The accounting team had already booked one of the three orders in Twinfield against the Shopify number. The supplier portal had been updated to send dropship orders to a new endpoint. Two of those dropship orders had already been picked. The DNS flip was fast. The state reconciliation was not.

We spent six hours building a spreadsheet of every system that had been touched in the seven hours Shopify was live, what state each system was in, and what command would unwind it. We spent another nine hours executing those commands by hand and verifying each one against the source data. We spent four hours doing a clean re-import of the night's supplier feed, this time with the BOM stripped, into WooCommerce, so the catalog would still reflect Wednesday's stock. The last step finished at 23:51 on Wednesday night. The factory shipped one day late. No customer order was lost. No invoice was double-booked. The client had a bad day.

What we changed

We did not write a long post-mortem document. We changed five things and made each one a hard pre-cutover gate for every migration we run after this one.

First, every CSV importer now strips a BOM from the first byte of the first field before any other processing. This is a three-line change. It should have been there from the day we ported the script. In Node:

import { readFile } from 'node:fs/promises'

export async function readCsv(path) {
  let text = await readFile(path, 'utf8')
  if (text.charCodeAt(0) === 0xFEFF) text = text.slice(1)
  return text
}

Second, every importer now validates every field against a strict schema before it touches the target system. For EANs that means: 13 digits, GS1 check digit verified, no leading or trailing whitespace, no non-printable characters. A row that fails validation is logged and skipped, not stored.

Third, the dry-run pipeline now pulls the most recent supplier feed at the start of every dry run. The frozen snapshot is gone. If a supplier has changed their exporter in the last month, we find out before cutover, not during.

Fourth, the cutover runbook now includes a "compare a sample of records between old and new" step that runs after the importer finishes and before DNS flips. We pick 50 random SKUs, fetch the barcode from both systems, and diff at the byte level. If any byte differs on any sample, cutover stops.

Fifth, the rollback plan now includes the reconciliation steps explicitly. Every external system that might receive a write during the cutover window has a named owner, a command to read its state, and a command to unwind it. The plan is rehearsed once before cutover with a stopwatch. If it takes more than 60 minutes in rehearsal, we do not cut over that night.

Takeaway

A migration runbook is not done until the rollback has been rehearsed end to end against every system that might receive a write during the cutover window.

The lever you can pull today

If you run a Shopify or WooCommerce store and you accept supplier feeds, open the most recent CSV from each supplier in a hex viewer right now. On macOS or Linux:

xxd supplier-feed.csv | head -1

If the first three bytes read ef bb bf, your feed has a UTF-8 BOM. Check whether your importer strips it before storing the first field. That five-minute audit will not catch every migration risk you have, but it will catch the one that took our Tilburg client out for a day.

When we rebuilt the migration pipeline for this client, the thing we kept coming back to was the gap between what our importer accepted and what the downstream system actually stored. We now treat every legacy migration as a reconciliation problem first and a data transformation problem second, which is the opposite of the order most runbooks put them in.

Key takeaway

A migration rollback that has not been rehearsed against every downstream system is a plan, not a rollback.

FAQ

What is a UTF-8 BOM and why does it corrupt CSV imports?

A byte order mark is a three-byte sequence (0xEF 0xBB 0xBF) at the start of a UTF-8 file. UTF-8 does not require it, but some Windows and ERP exporters add it. If your importer does not strip it, those bytes end up inside the first field.

How long should a Shopify migration rollback actually take?

Plan for 60 to 120 minutes if you have rehearsed it end to end. Without rehearsal, expect 4 to 20 hours. Most of the time goes to reconciling external systems (accounting, supplier portals, ERPs) that received writes during cutover.

Does Shopify validate EAN barcodes on import?

Shopify checks barcode format on some flows but stores the barcode field on variants as free text. It will not strip a leading BOM or other invisible characters. Validate on your side before the value hits the Admin API.

How do I detect a BOM in a supplier CSV before importing it?

Run xxd on the file and look at the first three bytes. If they read ef bb bf, the file has a UTF-8 BOM. On Linux, file -I also reports the encoding with the BOM flag set on most distributions.

Why did the same importer work on WooCommerce for years without this bug?

The WooCommerce import path had a defensive regex that stripped non-digit characters from the barcode column before insert. The regex hid the bug. When we ported the importer to Node, we dropped the regex on the assumption that Shopify would validate. It did not.

case studymigratione-commercewordpressintegrationslegacy sites

Building something?

Start a project