Magento
Magento EAV migration: eleven days lost to ghost attributes
A Hasselt wholesaler's Shopware 6 cutover stalled eleven days over four EAV customer attributes nobody had documented since the 2017 B2B portal launch.

Day 11 of the cutover window. The migration lead at a 31-person homewares wholesaler outside Hasselt had been awake since 04:30. The Shopware 6 staging shop was ready. The Magento 2.4.7 source dump was eight days old and frozen. The product catalog had moved. Order history was sitting in the new database. Payment flows passed every smoke test. The only thing keeping the new shop from going live was four customer attributes nobody at the company could explain.
They were called bp_tier_v2, vat_rev_charge_flag, kvk_legacy_id and delivery_window_override. None of them appeared in the Magento admin under Stores, Attributes, Customer. None of them were referenced in the active theme. None of them had a single mention in the company Confluence. The developer who built the original B2B portal in 2017 had left in 2019. His successor left in 2022. The current head of e-commerce had inherited the system three months ago and assumed, reasonably, that anything important would be documented somewhere.
It was not. The attributes existed as rows in eav_attribute. Their values lived in customer_entity_varchar, customer_entity_int and customer_entity_text. Eighteen of the wholesaler's top twenty customers had non-null values for at least one of them. And the migration script we had written, the one that had mapped products and orders and addresses perfectly, was silently dropping all four.
What EAV does to a migration
Magento's customer model is built on EAV: Entity, Attribute, Value. Instead of a wide customers table with one column per field, the data is split. customer_entity holds the entity rows. The attributes are defined in eav_attribute. The values live in narrow tables keyed by datatype: customer_entity_varchar, customer_entity_int, customer_entity_decimal, customer_entity_datetime, customer_entity_text.
This is well-documented behaviour. Adobe's own reference describes the model in the EAV and Extension Attributes guide. The advantage of EAV is that adding a new attribute requires no schema change. The disadvantage is that an attribute can exist in the database without ever appearing in the admin UI, without ever being referenced in a di.xml, and without leaving any breadcrumb anywhere a junior developer would look.
That is exactly what had happened here.
The script that lied to us
Our first cut of the customer migration looked roughly like this. It pulled every attribute marked is_user_defined = 1 from eav_attribute, joined to the value tables, and emitted Shopware custom fields.
<?php
$sql = <<<SQL
SELECT
ea.attribute_code,
ea.attribute_id,
ea.backend_type,
ea.is_user_defined,
ea.is_visible
FROM eav_attribute ea
JOIN eav_entity_type et
ON et.entity_type_id = ea.entity_type_id
WHERE et.entity_type_code = 'customer'
AND ea.is_user_defined = 1
SQL;
$attributes = $db->query($sql)->fetchAll(PDO::FETCH_ASSOC);
Looks fine. Reads cleanly. Passed code review. The query returned 23 rows on the production dump. We mapped 19 of them into Shopware custom fields, wrote the value-migration scripts, ran end-to-end on staging, opened 14 sample customer accounts, ticked them off, and went to bed thinking we were close to done.
The four attributes that broke the cutover were not returned by that query. They were marked is_user_defined = 0. That flag, in Magento, means the attribute was registered by a module's InstallData.php or data_patch script, not by an admin clicking 'Add new attribute' in the UI. The 2017 portal had been built as a custom module. Its attributes had been installed via setup script. From the database's perspective they were system attributes, indistinguishable from dob or gender.
Filtering Magento customer attributes by is_user_defined = 1 will miss every attribute installed by a custom module. On any shop older than two years, this is almost always the wrong filter.
How we found them
The breakthrough came from a different angle. One of our engineers ran a query that asked a stupider question: which attribute ids actually have values stored against real customer rows, regardless of how they were registered?
SELECT
ea.attribute_code,
ea.backend_type,
ea.is_user_defined,
COUNT(DISTINCT v.entity_id) AS customers_with_value
FROM eav_attribute ea
JOIN eav_entity_type et
ON et.entity_type_id = ea.entity_type_id
LEFT JOIN customer_entity_varchar v
ON v.attribute_id = ea.attribute_id
WHERE et.entity_type_code = 'customer'
GROUP BY ea.attribute_id
HAVING customers_with_value > 0
ORDER BY customers_with_value DESC;
Run it once per value table (varchar, int, text, decimal, datetime), union the results, and you get a complete picture of every customer attribute that actually carries data, regardless of where it came from. That query returned 27 attribute codes. Twenty-three matched the admin list. The other four were our ghosts.
The numbers told the rest of the story. bp_tier_v2 had values for 412 customers. vat_rev_charge_flag for 89. kvk_legacy_id for 731, more than the active customer base, meaning it had been backfilled for inactive accounts too. delivery_window_override for 14, but those 14 were the wholesaler's largest accounts by revenue. The kind of accounts that notice immediately if their Tuesday delivery slot disappears.
Reverse-engineering meaning
Knowing the attributes existed was the easy half. Working out what each one meant took two days of reading.
kvk_legacy_id was the simplest. KvK is the Dutch chamber of commerce. The field held the customer's KvK number in the format the 2017 portal used, before Magento's stock taxvat field was repurposed for it in 2019. A grep through the deployed module turned up the read site: an old invoice template still pulled kvk_legacy_id when taxvat was empty. Twenty-three active customer accounts still relied on the fallback. We confirmed by re-rendering a sample of historical invoices against the staging database with the field removed; eleven of them came out with a blank KvK line, which the AR team confirmed would have triggered automatic rejection on three of the wholesaler's larger franchise customers.
vat_rev_charge_flag was a boolean. When set, the checkout suppressed Belgian VAT for B2B intra-EU sales under reverse charge. The flag was being read by an observer on sales_quote_address_collect_totals_after. Without it migrated across, every reverse-charge customer would have been charged 21% VAT on their next order, and the AR team would have had a fun morning.
bp_tier_v2 was a tier code: BRONZE, SILVER, GOLD, PLATINUM. It drove price-list selection through a custom catalog rule. The 'v2' suffix was a hint that there had once been a v1, presumably swapped out without removing the old. A git blame on the deprecated module turned up the migration that wrote v2.
delivery_window_override was the hardest. The value was a JSON blob; no obvious reader showed up in any module, theme template, observer chain or queue consumer we could grep. We traced it by working backwards from the data. The value table showed thirty-one writes in the last twelve months, all between 09:00 and 11:00, all from the same admin user. That admin user turned out to be the warehouse manager, who was editing the field through an internal grid screen built as an unofficial admin extension in 2018 and never added to the main menu. The read site was a nightly cron job that built the next-day picking sheets and consulted delivery_window_override before falling back to the customer group default. Two consumers, both invisible to a code grep. Neither would have surfaced from an admin walkthrough.
Three of the fourteen accounts had malformed JSON in there, parsed by json_decode with no error handling. The fourteen accounts had been receiving the right delivery windows for years by sheer luck: json_decode returned null for the broken rows, the cron treated null as 'no override', and the fallback group default happened to match what the warehouse manager intended. We flagged those three for rebuild before the cutover.
Mapping into Shopware 6
Shopware 6 does not use EAV. Custom fields live in custom_fields JSON columns on entities, declared through custom field sets in the admin or via plugin. The data shape is flatter and the read path is faster, but the migration requires you to make a decision the EAV model lets you defer: what is the type of each attribute, and what set does it belong to?
We grouped the four ghosts into a single legacy_b2b_portal custom field set on the customer entity, with the original Magento codes preserved as field technical names. That kept downstream SQL grep-able. Bronze, Silver, Gold and Platinum became a select field. The reverse-charge flag became a boolean. The KvK legacy id became text. The delivery-window JSON we kept as a text field but added a one-shot validator that flagged the three malformed rows for the AR team to fix by hand.
<?php
$customFieldSet = [
'name' => 'legacy_b2b_portal',
'config' => [
'label' => ['en-GB' => 'Legacy B2B portal'],
'translated' => true,
],
'relations' => [
['entityName' => 'customer'],
],
'customFields' => [
[
'name' => 'bp_tier_v2',
'type' => CustomFieldTypes::SELECT,
'config' => [
'label' => ['en-GB' => 'B2B tier'],
'options' => [
['value' => 'BRONZE', 'label' => ['en-GB' => 'Bronze']],
['value' => 'SILVER', 'label' => ['en-GB' => 'Silver']],
['value' => 'GOLD', 'label' => ['en-GB' => 'Gold']],
['value' => 'PLATINUM', 'label' => ['en-GB' => 'Platinum']],
],
],
],
// three remaining fields elided for brevity
],
];
The custom fields were the easy half. The tier-driven catalog rule needed its own rebuild. In Magento the rule had been a custom condition class, registered through a module, reading bp_tier_v2 off the customer in the price collection flow. Shopware's rule builder accepts customer-custom-field conditions out of the box once the field exists, which turned the rebuild into a configuration exercise rather than a code one. One rule per tier, each pointing at the price list that matched.
<?php
$rule = [
'name' => 'Legacy B2B tier - Platinum',
'priority' => 100,
'conditions' => [
[
'type' => 'customerCustomField',
'value' => [
'operator' => '=',
'selectedField' => 'legacy_b2b_portal_bp_tier_v2',
'renderedField' => ['type' => 'select'],
'value' => 'PLATINUM',
],
],
],
];
Total time to rebuild the rule layer: about four hours, almost all of it spent verifying that every active product still resolved to the right price for each tier. The reverse-charge logic was reimplemented as a small Shopware subscriber on CheckoutOrderPlacedEvent that read the boolean custom field and applied the same VAT suppression the old observer had. The delivery-window picking job was rewritten as a console command on a nightly schedule.
Eleven days late, the cutover went out clean. Total scope added by the discovery: about 60 hours across two engineers and the AR team.
The pre-flight check we now run
After this engagement we added one step to every Magento replatform we touch. Before any migration script is written, we run an attribute census against the production read replica. It looks roughly like this.
SELECT
et.entity_type_code,
ea.attribute_code,
ea.frontend_label,
ea.backend_type,
ea.is_user_defined,
ea.is_required
FROM eav_attribute ea
JOIN eav_entity_type et
ON et.entity_type_id = ea.entity_type_id
WHERE et.entity_type_code IN (
'customer',
'customer_address',
'catalog_product',
'catalog_category'
)
ORDER BY et.entity_type_code, ea.is_user_defined DESC, ea.attribute_code;
We print it. We walk it line by line with the client's most senior person who actually uses the shop. Every line gets one of four marks. Keep means the value is read somewhere we can verify, and the new platform needs the equivalent. Drop means the value exists but nothing reads it, and we have a paper trail showing why. Ask means somebody on the client side recognises the name but cannot remember what set it, and we open a ticket to chase down the original developer or the deprecated module. Ghost means nobody recognises it and we need to grep the database for value counts, write timestamps and consuming code before we decide. That single review session, run a week before cutover, would have caught all four attributes in the Hasselt war story. We have run it on six engagements since. It has caught something on five of them, including a Drupal Commerce shop where a field called preferred_courier_v3 turned out to be driving the entire weekend dispatch routing.
On any Magento shop older than two years, assume undocumented EAV attributes exist. Census the database before you trust the admin UI.
The lesson we keep relearning
Old e-commerce systems are not really code, they are sediment. Each layer was deposited by a developer solving a real problem under a real deadline. Most of those developers are gone. The state they left behind sits quietly in EAV tables, in observer chains, in cron jobs nobody disables because nobody knows what disabling them would do. A replatform that treats the source as documented inevitably finds out, eleven days late, that it was not. The honest version of every migration estimate has a discovery line item the client cannot fully scope and the agency cannot fully promise. The Hasselt overrun was our reminder that the line item is not optional.
When we replatformed the Hasselt shop, the thing that almost broke us was assuming the admin UI was a faithful map of the customer model. We solved it by going to the data first and the UI second. Every legacy migration we have run since has started with the same SQL census, and every one has surfaced something the admin would never have shown us.
If you are running a Magento 2 shop that has lived through more than one developer, open a SQL client today, run the attribute census above against your customer and product entity types, and read every row out loud. The undocumented ones are the ones you need to find before your migration window starts, not after.
Key takeaway
On any Magento shop older than two years, census the EAV tables before you trust the admin UI. Ghost attributes are where replatforms go to die.
FAQ
Why did the is_user_defined filter miss the attributes?
Magento marks attributes registered via a module's setup or data patch script as is_user_defined = 0. Only attributes created through the admin UI get the 1 flag. Custom-module attributes look like system attributes to that filter.
How do I find every customer attribute that actually holds data?
Query eav_attribute joined to each customer_entity_* value table by attribute_id, grouped by attribute_id with a HAVING COUNT > 0. Union across varchar, int, text, decimal and datetime tables for full coverage.
Does Shopware 6 use EAV like Magento?
No. Shopware 6 stores extensions in a custom_fields JSON column on each entity, declared via custom field sets. Migration means choosing a concrete type and set per attribute, not deferring the schema decision.
What should I do with legacy attributes that nobody can explain?
Don't drop them blind. Mark them as ghosts, count how many rows hold values, grep the codebase and templates for reads, and walk the list with whoever runs the shop day to day before deciding keep or drop.