Magento
Magento to Vendure: a serialized EAV blob froze our import
A 22-person Nijmegen tweedehands marketplace, 6,800 luxury consignment records, and one serialized blob from 2019 that turned a clean Vendure import into a fortnight of forensic work.

It was a Thursday in March, late afternoon. The third import run had just finished. We refreshed the Vendure admin and saw, under Collections, a tree that read like this:
a → 6 → {s:5 → "brand" → s:6 → "Hermès"
Six thousand eight hundred of those. Each one a luxury consignment record turned into a phantom category. The marketplace's catalogue had been digested into PHP's serialize() format and spat back out as a taxonomy.
This is the story of how that happened, why it took us fourteen working days to unwind, and the cheap audit we should have run before the first import.
The marketplace
The client is a 22-person tweedehands business in Nijmegen, the kind of place that takes in luxury bags and second-hand designer clothing on consignment, photographs and authenticates each piece, then sells it on its own storefront. They had been running Magento 2.3 with PHP 7.4 underneath for years. The inruil-portaal, the buyer-facing trade-in side, was a custom PHP module bolted onto Magento's EAV layer in 2019.
EAV (entity-attribute-value) is Magento's way of letting you add product attributes without changing the schema. You don't add a column to catalog_product_entity; you write a row to catalog_product_entity_varchar with the right attribute_id. It's flexible, it's slow, and it's the reason every Magento veteran has at least one strong opinion about it. Adobe still treats it as the recommended way to extend product data.
In 2019 someone, long since gone from the team, wrote an extension to track authenticity certificates per item: brand, grade, verifier name, signed date, PDF path, hologram flag. Six fields. Instead of writing six EAV attributes, they wrote one, called it authenticity_certificate, and serialised the whole record into a single PHP blob stored as a text attribute. It worked. The storefront unserialised it on render. Nobody touched it for seven years.
Why we were migrating
Magento 2.3 went end-of-life on 8 September 2022. Adobe's own lifecycle policy is unambiguous: no security patches, no backports. The client had limped past EOL — they're not alone; thousands of stores still run unsupported Magento — but the trade-in portal had grown to the point where editorial team members were spending half their days inside a backend they couldn't shape.
Vendure made sense: TypeScript end to end, a GraphQL admin and shop API, a customisable admin UI, and a SvelteKit storefront the in-house designer could actually maintain. The plan was tidy. Export the product catalogue, the customers, the orders, the consignment records. Run them through Vendure's import pipeline. Stand up the SvelteKit storefront against a staging URL. Switch DNS on a Saturday. Three weeks, budgeted.
Day 11: a clean import
The first eleven days went the way the proposal said they would. We wrote a Magento-to-CSV exporter that walked the EAV tables and flattened each product's attributes into columns. We mapped Magento attributes to Vendure custom fields one by one. The consignment portal's bespoke tables — sellers, items, payouts — got their own Vendure plugin with proper entities.
The import ran. Eight thousand four hundred products. Five thousand customers. Forty thousand orders. We pulled a sample of fifty products at random, diffed each against the live Magento store, found no discrepancies, and went home for the weekend pleased with ourselves.
Day 14: the categories that ate Tuesday
Tuesday morning the client's category manager logged in to start curating the new homepage. She messaged us a screenshot. Under Bags she had found a sub-collection called a. Under a there was a sub-collection called 6. Under 6 there was {s.
Vendure's category tree had been infected. Six and a half thousand collections, all named after fragments of PHP serialize() syntax, nested twenty levels deep. The real categories were still there. They were just buried.
This is what the authenticity_certificate attribute looked like in Magento for one Hermès Birkin record:
a:6:{s:5:"brand";s:6:"Hermès";s:5:"grade";s:2:"AA";s:11:"verified_by";s:11:"GoVerify NL";s:9:"verified";s:10:"2019-04-12";s:13:"cert_doc_path";s:32:"/var/certs/2019/HRM-449201.pdf";s:8:"hologram";b:1;}
And this is the line of our exporter's mapping that handed it to Vendure:
// vendure-csv-mapping.ts
export const mapping = {
name: { column: "name" },
sku: { column: "sku" },
price: { column: "price", currency: "EUR" },
collections: { column: "categories", separator: "|", pathSeparator: ">" },
assets: { column: "images", separator: "|" },
};
The CSV importer splits the collections column on pipes and treats each fragment as a category path, with > as the path separator. The serialised blob contained no pipes. It contained colons, semicolons, braces and quoted strings, and the Vendure path-parser was happy to tokenise them. Each colon became a level. Each top-level string a sibling.
We had reviewed the importer mapping. We had reviewed the exporter. We had not noticed that authenticity_certificate was also being written to the product's category-id column by an old data-quality cron job that ran inside Magento every night to "normalise" certificate metadata. Our exporter trusted Magento's category assignment. Magento had been quietly assigning every consignment item to a serialised-blob "category" since 2019. The storefront never read it. The Magento admin filtered it out of the tree view. It was invisible until we asked something else to read it.
If your CSV importer accepts a categories column and your source system has any history of cron jobs that auto-assign categories, dump that column and read every value before the importer touches it. Not a sample. Every value.
The fix
We did three things, in order.
One: parse the blob into typed columns
We wrote a small PHP CLI that read the raw value for every product with a non-empty authenticity_certificate, called unserialize() on it, and emitted six clean columns. PHP's unserialize is well documented and deterministic, which is the only good thing about this whole story.
#!/usr/bin/env php
<?php
declare(strict_types=1);
$pdo = new PDO('mysql:host=localhost;dbname=magento', $u, $p);
$rows = $pdo->query(
"SELECT entity_id, value
FROM catalog_product_entity_text
WHERE attribute_id = 247
AND value IS NOT NULL"
);
$out = fopen('certificates.csv', 'w');
fputcsv($out, ['sku','brand','grade','verified_by','verified_at','cert_path','hologram']);
foreach ($rows as $r) {
$cert = @unserialize($r['value'], ['allowed_classes' => false]);
if (!is_array($cert) || count($cert) !== 6) {
error_log("quarantine: {$r['entity_id']}");
continue;
}
fputcsv($out, [
sku_for($r['entity_id']),
$cert['brand'] ?? '',
$cert['grade'] ?? '',
$cert['verified_by'] ?? '',
$cert['verified'] ?? '',
$cert['cert_doc_path'] ?? '',
!empty($cert['hologram']) ? '1' : '0',
]);
}
We did not trust the data. We validated that every row decoded as an array with the expected six keys, logged 47 partial rows for manual review, and quarantined them. Forty-three turned out to be from a 2020 bulk-edit that had truncated long verifier names; four were genuinely corrupt and the client's authentication team reconstructed them by hand from the paper file.
Two: rebuild the real category map by hand
Separately we exported the actual category tree from catalog_category_product, joined to catalog_category_entity_varchar for the names. Six top-level categories. Twenty-four sub-categories. Nothing exotic. We diffed the tree against the live storefront with the client's category manager on a video call. We mapped each Magento category to a Vendure Collection by name and assigned products through the importer's normal mechanism — with the collections mapping pointing at a column we generated from catalog_category_product, not at the polluted categories attribute.
Three: store the certificate where it belongs
Authenticity certificates are not a taxonomy. They are per-variant metadata. In Vendure that means a custom field group on the ProductVariant entity with six typed fields, plus a small admin UI extension to render them.
// src/vendure-config.ts
customFields: {
ProductVariant: [
{ name: 'certBrand', type: 'string', label: [{ languageCode: LanguageCode.en, value: 'Brand' }] },
{ name: 'certGrade', type: 'string', label: [{ languageCode: LanguageCode.en, value: 'Grade' }] },
{ name: 'certVerifiedBy', type: 'string', label: [{ languageCode: LanguageCode.en, value: 'Verified by' }] },
{ name: 'certVerifiedAt', type: 'datetime', label: [{ languageCode: LanguageCode.en, value: 'Verified at' }] },
{ name: 'certDocPath', type: 'string', label: [{ languageCode: LanguageCode.en, value: 'Certificate PDF' }] },
{ name: 'certHologram', type: 'boolean', label: [{ languageCode: LanguageCode.en, value: 'Hologram present' }] },
],
},
Then we ran the import a fourth time. Eight thousand four hundred products. Six top-level categories. Twenty-four sub-categories. Six thousand eight hundred typed certificate records on the variants that needed them. Zero phantom collections.
What we would do differently
Read the EAV, don't just count it
Our pre-migration audit counted attributes and listed extensions. It did not read attribute values. A two-minute query would have caught it:
SELECT LEFT(value, 6) AS prefix, COUNT(*) AS n
FROM catalog_product_entity_text
WHERE attribute_id = 247
GROUP BY 1
ORDER BY n DESC
LIMIT 20;
a:6:{ would have sat at the top of that result set with 6,800 rows under it and we would have known on day one.
Diff stratified, not random
Our day-11 spot check used a random sample of fifty products. None were consignment items, because consignment items had a clustered SKU prefix and our random sampler, bless it, happened not to draw from that range. Consignment items were 81% of catalogue volume and 12% of SKU-prefix breadth. After day 14 our acceptance check requires sampling stratified by category and by attribute set, not by row.
Don't ask one importer pass to do two jobs
Our exporter should have written the categories column blank, and we should have run category mapping as a separate, smaller, hand-reviewed pass. Doing it in one stage felt elegant. Elegant cost us a fortnight.
The cost
We over-ran the budget by fourteen working days. The client absorbed half; we wrote off the other half. The storefront launched eight weeks later than planned. The trade-in portal, rebuilt as a small Vendure plugin with its own GraphQL surface, was live before the new storefront and ran the entire migration period in parallel without incident.
Nobody got fired. The 2019 developer who wrote the blob extension was not on the team. The serialize() decision they made was reasonable for the constraints they had: ship the certificate feature in a sprint, don't run a schema migration. The cost surfaced seven years later, on someone else's project plan.
When we rebuilt the trade-in portal for this Nijmegen marketplace, the thing we kept running into was implicit category assignments from cron jobs nobody remembered writing. We ended up solving it by running a distinct-prefix query on every text attribute before the importer ever touched the data, the kind of audit we now bake into every legacy migration we take on.
If you have a Magento store you're thinking of moving, run that LEFT(value, 6) query against catalog_product_entity_text this afternoon and read the top twenty rows. The answer is information you wanted six months ago.
Key takeaway
Before any CSV importer touches a legacy Magento catalogue, run a distinct-prefix query on every text attribute. Find the blob before it finds your category tree.
FAQ
What is an EAV blob in Magento?
EAV is Magento's entity-attribute-value pattern for adding product attributes without schema changes. A blob value is any single column that packs structured data — JSON, serialized PHP, XML — into one field.
Why did Vendure's importer read the certificate as a category?
An old Magento cron job had auto-assigned the certificate attribute to the product's category column. Our exporter trusted that mapping, and the CSV importer split the serialized string on its colons and built a nested collection per fragment.
How long should a Magento 2 to Vendure migration take?
For a single-storefront catalogue under 10,000 products with a clean attribute set, three to five weeks is realistic. Add a week for every custom EAV extension and more for any consignment or marketplace logic.
Is Magento 2.3 still safe to run in 2026?
No. Magento 2.3 went end-of-life on 8 September 2022 and stopped receiving security patches. Stores still on 2.3 are exposed to known CVEs that Adobe will not backport.