← Blog

Magento

Magento 1.9 customer export: the serialized array trap

A Hengelo machine-builder's onderdelen-shop migration stalled for eleven days because a paginated Magento 1 EAV exporter silently dropped a serialized PHP array on the second batch.

Jacob Molkenboer· Founder · A Brand New Company· 11 Feb 2026· 10 min
Half-open leather ledger with brass tag, carbon invoice, green sticky tab and wax stub on ivory linen, dark green backdrop.

The first sign was an email from a machine-fitter in Vlaardingen at 09:14 on a Tuesday. He could no longer download the 18mm M-series flange-bracket STEP file he had pulled twice the month before. By 10:00 the support inbox had forty more like it. All from B2B accounts. All with a CAD-license entitlement on their customer profile. All staring at a Medusa storefront we had cut over the previous Friday.

The shop belongs to a 26-person machinebouwer in Hengelo. Their onderdelen-shop has run on Magento 1.9 since 2015, glued to a custom PHP 7.0 entitlement service that gated downloads of SolidWorks Toolbox parts, STEP exports, and a 9 GB DWG library. We were eleven days into a migration onto Medusa + Astro. Eleven days that should have been three.

The stack we walked into

Magento 1 reached end-of-life in June 2020. Adobe stopped shipping security patches. The community kept it limping with OpenMage LTS, but our client's instance was further off the rails than that: their previous integrator had forked Magento 1.9.3.10 in 2018 and stopped pulling upstream. PHP 7.0, three majors behind. MySQL 5.6. A CAD-entitlement module that was 2,000 lines of observer chain bolted onto customer_save_after, writing a serialized PHP array into a custom customer attribute called cad_entitlements with this shape:

a:3:{
  s:8:"toolbox";a:2:{i:0;s:7:"din-933";i:1;s:7:"din-934";}
  s:5:"step";b:1;
  s:7:"expires";s:10:"2027-03-31";
}

4,600 B2B accounts in the database. 1,140 of them held a non-empty cad_entitlements value. The rest were quote-only customers who never touched a download. That ratio matters for what comes next.

The migration we planned

Medusa stores customer metadata as a JSON column on the customer table. Mapping the serialized PHP array to JSON is mechanical: unserialize() on the way out, json_encode() on the way in. We mirrored the entitlement-check logic into a Medusa workflow that ran on order placement and on a separate /api/store/cad/download/[sku] route in Astro.

For the export we did what every Magento integrator does: we reached for n98-magerun. For 4,600 rows you do not want the bare customer:list CSV dump — you want a custom script that hooks into Mage_Customer_Model_Resource_Customer_Collection and uses addAttributeToSelect('*'). We had one in our migration toolkit from a 2023 project. It paginated 1,000 rows per batch. It had worked on every Magento 1 export we had run before.

The dry-run on staging took eight minutes. Row counts matched. We cut DNS over Friday evening, sent the welcome email Saturday morning, and went home.

Tuesday's inbox

The first thing we did was the wrong thing. We assumed the entitlement-check workflow on the Medusa side had a bug. We spent half a day stepping through the Node code, writing fixture tests, deploying a debug logger to print the parsed metadata shape on every download attempt.

The metadata shape was empty. Not malformed. Not corrupted. Empty. For 1,140 accounts. The other 3,460 customers had their entitlement objects intact.

That number — 1,140 — matched the count of non-empty entitlements in the source database too cleanly to be a coincidence. The bug was on the export side, and it had been there since Friday morning.

Reproducing the drop

We rolled the export script in a fresh Docker container against a copy of the production database. We added per-batch logging: rows in, rows out, per-attribute hit count. The pattern was immediate:

batch 1 (offset 0, limit 1000):    cad_entitlements present in 312 rows
batch 2 (offset 1000, limit 1000): cad_entitlements present in 0 rows
batch 3 (offset 2000, limit 1000): cad_entitlements present in 0 rows
batch 4 (offset 3000, limit 1000): cad_entitlements present in 0 rows
batch 5 (offset 4000, limit 600):  cad_entitlements present in 0 rows

312 out of 1,140. The first batch was the only batch where the attribute appeared at all. The remaining 828 entitlements had been silently dropped, and our row-count diff against customer_entity had not caught it because the customer rows existed — only one column on each was empty.

Warning

Row-count diffs do not protect you against attribute-level data loss in an EAV system. Every column needs its own count, before and after.

What was actually happening

Magento's EAV loader for customer collections caches attribute metadata on the collection instance. When you call addAttributeToSelect('*'), the loader walks the customer entity type, resolves every attribute in the relevant attribute set, and builds a join plan. cad_entitlements lived on a non-default attribute set that the previous integrator had created in 2018 specifically to flag B2B accounts.

On the first batch, the paginator instantiated a fresh collection, the join plan included cad_entitlements, and the 312 B2B accounts in offset 0–999 got their data. On the second batch, the paginator reused the collection object across setPageSize() calls. The cached join plan had been pinned to a non-B2B row encountered late in batch 1. The attribute set on that cached row differed from the attribute set on every row in batches two through five. The loader silently dropped the join.

This is not a bug in n98-magerun — n98-magerun's own commands instantiate a fresh collection per invocation and avoid the trap. The trap is in any custom script that holds a Magento 1 EAV collection across pages. The sharp edge is documented obliquely in Adobe's EAV documentation for Magento 2, where the attribute-set behaviour was reworked precisely because the Magento 1 model leaked state between rows.

The fix

We threw out the collection-based exporter and wrote a direct SQL extract:

SELECT
  ce.entity_id,
  ce.email,
  cev.value AS cad_entitlements
FROM customer_entity ce
LEFT JOIN customer_entity_text cev
  ON cev.entity_id = ce.entity_id
  AND cev.attribute_id = (
    SELECT attribute_id FROM eav_attribute
    WHERE attribute_code = 'cad_entitlements'
    AND entity_type_id = (
      SELECT entity_type_id FROM eav_entity_type
      WHERE entity_type_code = 'customer'
    )
  )
ORDER BY ce.entity_id;

The result set was 4,600 rows, 1,140 with a non-null cad_entitlements. Identical to the source-of-truth count we now had on the wall in red marker. We piped it through a Node script that unserialized each row's PHP array, validated it against a Zod schema, and emitted the Medusa-shaped JSON. No pagination. No EAV collection. No attribute-set state to leak.

Re-importing took 22 minutes. The Vlaardingen fitter got his STEP file back the same afternoon. The Hengelo CFO took the news better than we expected.

The audit we should have run on day one

Before any production cutover we now run a per-attribute non-null diff. For Magento 1 customers, that means this query against the source database, and the equivalent JSON-key count against the destination:

SELECT
  ea.attribute_code,
  COUNT(cev.value_id) AS non_null_count
FROM eav_attribute ea
LEFT JOIN customer_entity_text cev
  ON cev.attribute_id = ea.attribute_id
WHERE ea.entity_type_id = (
  SELECT entity_type_id FROM eav_entity_type
  WHERE entity_type_code = 'customer'
)
GROUP BY ea.attribute_code
ORDER BY non_null_count DESC;

Then the same query against customer_entity_varchar, customer_entity_int, customer_entity_datetime, and customer_entity_decimal. The whole audit takes thirty seconds. We had skipped it because the row-count diff felt sufficient. It was not.

A side-note on where CAD parts portals are heading

The quiet irony of spending eleven days fixing CAD-entitlement plumbing in 2026 is that the upstream is shifting under our client's feet. The first wave of AI-assisted CAD tooling points at a future where the artefact a B2B parts portal serves is less likely to be a static STEP file and more likely to be a parametric model the customer regenerates in-browser. Our client's entitlement scheme — toolbox part numbers gated per account — will not survive that shift unchanged. We told them so. They are revisiting in 2027.

What to do today

If you are sitting on a Magento 1 store with custom customer or product attributes, take thirty seconds and run the per-attribute non-null count above. Save the result somewhere your migration team can find it. When the export tooling runs, run the same query against the destination and diff. That is the cheapest insurance you will ever buy on a legacy migration.

When we built the export runner for the Hengelo machinebouwer's onderdelen-shop, the thing we ran into was exactly this attribute-set blind spot in the EAV collection loader. We ended up solving it with a SQL-first extract and a per-attribute diff that now ships with every legacy migration we run.

Key takeaway

Row-count diffs do not detect attribute-level loss in an EAV system. Every column needs its own non-null count, before and after the cutover.

FAQ

Why did n98-magerun export the customer rows but drop the custom attribute?

n98-magerun itself was not the culprit. The drop was in a custom paginated exporter that reused a Magento 1 EAV customer collection across pages. The cached attribute-set join plan leaked state between batches.

Can the same EAV attribute-set bug happen in Magento 2?

Magento 2 reworked the attribute-set behaviour on customer collections, so the specific Magento 1 trap does not reproduce. EAV is still EAV, though, so per-attribute non-null diffs are still the right safety net.

What is the safest way to export Magento 1 customer EAV data?

Bypass the collection loader and read directly from customer_entity plus the customer_entity_varchar/text/int/datetime/decimal tables, joined by attribute_id. Pagination then has no state to leak.

How do you migrate serialized PHP arrays into a JSON column?

Unserialize each row in a short script, validate the shape against a schema such as Zod or JSON Schema, then json_encode and insert. Validation matters more than the encode step because legacy modules write inconsistent shapes.

Should we still be running Magento 1 in 2026?

No. Magento 1 has had no first-party security patches since June 2020. If a full re-platform is out of reach, OpenMage LTS is the least-bad holding pattern while you plan the move.

magentomigrationphpmysqlcase studye-commerce

Building something?

Start a project