Drupal
Drupal 7 to 10 migration: a nine-day war over Paragraphs
Day one of the upgrade went fine. Day two, the migrate dashboard turned red and stayed there. The product specs were intact in Drupal 7 and gone in Drupal 10.

It was a Tuesday at 11:47 in a 24-person industrial-supply outfit on the edge of Hasselt. The migrate dashboard had been green for sixteen hours. Then it wasn't. Eleven thousand four hundred product-spec nodes had imported into Drupal 10 with empty spec tables. The same nodes opened cleanly in the Drupal 7 source. We were nine days from go-live, with three trade-show campaigns already built around the new catalogue and a head of operations who had personally signed the cutover date into a steering-committee deck.
This is what those nine days looked like, what we kept assuming was the problem, and the one thing about the Paragraphs migration path that nobody warns you about until you have already lost a week. The bug is mundane once you know it. The reason it eats the better part of a sprint, every time, is more interesting.
The site we inherited
The client sells industrial fasteners and gaskets through a B2B catalogue. Their Drupal 7 site was built in 2013 by a Brussels agency that had since folded. The catalogue ran on a content type called product_spec, with a Field Collection field for technical attributes: torque rating, material, thread pitch, certifications. Each product had between 4 and 22 collection items. Total in the database: 41,820 field_collection_item rows across 11,400 nodes.
Two details mattered, although we did not know it yet.
First, the original agency had needed to store arbitrary key-value pairs on some collection items (vendor-specific quirks that did not fit any existing field). Rather than add yet more Drupal fields, they had created a single field_extra_attrs of type text_long and dumped PHP serialize() output into it. A row in the database looked like this:
a:3:{s:9:"din_class";s:5:"8.8.0";s:6:"finish";s:13:"zinc-plated";s:8:"rohs_doc";s:42:"sites/default/files/specs/rohs-2018-44.pdf";}
Second, the collection items had been edited heavily over the years. Many had ten or more revisions. The parent node's revision pointer did not always match the latest revision of the collection item. The site rendered fine on Drupal 7 because Field Collection followed the explicit revision_id stored against the node revision row, not the highest one in the revision table.
The first error
We ran the upgrade with the standard core Migrate Drupal UI plus the Paragraphs contrib module, which ships a migration plugin to lift Field Collection content into Paragraph entities. The first run did roughly what we expected: about 240 row-level errors, mostly missing files. The site had three filesystem layouts stacked on top of each other from a Pantheon-to-Acquia move in 2017, and our file_copy plugin was honouring whichever path each row had originally been written with. We rebuilt the source-to-destination map, repointed the orphans, re-ran, and the dashboard went green.
The next morning, the client's content editors started clicking around. Product pages rendered. Spec tables were empty. Not partially. Empty.
The database told a more interesting story:
SELECT COUNT(*) FROM node__field_specs;
-- 41,820
SELECT COUNT(*) FROM paragraph__field_torque
WHERE field_torque_value IS NOT NULL;
-- 6,118
The paragraph rows existed. The data inside them did not. About 35,000 collection items had landed with NULL values across every sub-field, even though the source rows had values in Drupal 7. The catalogue was, in production terms, gone.
Rabbit hole one: the source query
The obvious suspect was the source plugin. Maybe it was reading the wrong revision table. We spent a day instrumenting the d7_field_collection_item source by subclassing it locally, dumping every row it pulled into a JSONL log, and diffing against direct SELECTs over the live D7 database. The source plugin was fine. It pulled correct rows, with correct values, in correct order. 41,820 rows logged, 41,820 matched byte for byte.
We were on day three.
Rabbit hole two: the destination schema
Next theory: the destination paragraph entity was the problem. Maybe a field-type mismatch was silently dropping values, or our paragraph type config was missing a required setting and the migrate destination plugin was eating the cast. We rebuilt the paragraph type from scratch, exported the config, diffed it against a fresh module install, double-checked every field's storage settings, and re-ran the migration into a clean MySQL database. Same result: paragraph entities created, payload empty.
We were on day five and starting to do the thing you do when a migration has stalled, which is draft an email about pushing the go-live date.
What was actually wrong
The bug was in the revision lookup, but not where we had been looking. Here is the chain we eventually traced:
- The Paragraphs migration plugin reads
field_revision_field_specsfrom the D7 source to find which collection items belong to each node revision. - For each collection item, it then looks up the latest revision row in
field_collection_item_revisionto populate the paragraph fields. - The "latest revision" lookup uses
MAX(revision_id). Not therevision_idpinned by the parent node. - On our client's site, most collection items had a "latest" revision created by a never-published draft from a 2018 editorial workflow. Those revisions had been saved with empty values to clear the form, with the intent of re-populating them later. The editors never came back.
So the migration was reading from a real row. The row was empty. The migration faithfully copied empty into paragraph fields. No errors raised, because empty is a valid value.
The serialized PHP made the diagnosis harder, not the bug itself. The field_extra_attrs column was the only place we could see a difference between the abandoned draft and the revision the node was actually pointing to, because text_long values were not NULL in the abandoned draft, they were a different serialized payload. When we finally compared the two revision sets row-by-row, the pattern jumped out within an hour.
The most embarrassing part of the post-mortem: two senior Drupal developers on the team, a combined twenty-three years of D7 in production, and neither of us had seen this exact failure mode. We had each lived through three or four contrib migration paths that misread source data. None of those had read a row that was technically valid but semantically wrong. The fix takes hours. The diagnosis takes days, because nothing in the dashboard or the logs ever announces itself as the bug. Every row reads as success.
Field Collection in Drupal 7 stored its own revision history independent of the parent node. Any migration plugin that fetches "the latest revision" instead of "the revision pinned by the parent" will silently pick up abandoned drafts on long-lived sites. The Paragraphs contrib's default plugin does the former.
The fix
We wrote a custom source plugin that wraps d7_field_collection_item_revision and forces it to honour the revision_id stored against the parent node's published revision. The override looked roughly like this:
namespace Drupal\hasselt_migrate\Plugin\migrate\source;
use Drupal\paragraphs\Plugin\migrate\source\d7\FieldCollectionItemRevision;
class PinnedRevisionItem extends FieldCollectionItemRevision {
public function query() {
$query = parent::query();
$query->innerJoin(
'field_revision_field_specs',
'fr',
'fr.field_specs_revision_id = fci.revision_id
AND fr.revision_id = (
SELECT vid FROM node WHERE nid = fr.entity_id
)'
);
return $query;
}
}
This pulled the collection item revision the node was actually rendering, not whichever revision happened to have the highest ID. We also added a process plugin to unserialize() the field_extra_attrs payload and split it across three new paragraph fields rather than carry a serialized blob into a modern stack. PHP object deserialisation is a known attack surface, and a fresh CMS is the right moment to walk away from it.
The cleanup pass on the half-imported tables was the easy bit. Drupal's migration tooling expects you to roll back rather than DELETE: every destination plugin knows which rows it created and can drop them in the right order without breaking foreign keys. We ran drush migrate:rollback per migration ID, watched the paragraph and revision tables truncate cleanly, and re-ran the full pipeline against the corrected source plugin. Total wall-clock for the corrected re-run was 4 hours 12 minutes on a 16 GB EC2 box with the source MySQL mounted as an RDS read replica. Spec tables came back populated. The empty drafts stayed in D7, where they belonged.
What we would do differently
Three things, in order of how much pain each would have saved us.
Diff revisions before trusting the source
The fastest way to catch this class of bug is a five-minute query before you start migrating. For each entity type with revisions, compare the row count of "latest revision" against "revision pinned by parent." If those numbers differ by more than 1 or 2 percent, you have abandoned drafts in the source data, and any migration plugin that defaults to MAX(revision_id) is going to lie to you. We now run this diff as the first artefact of every D7 audit, before we even look at the module list.
Treat serialized fields as a separate migration step
Any text field whose content starts with a:, O:, or s: is serialized PHP. Pull a sample with a one-line SQL query, decide what schema you actually want, and write a dedicated process plugin to split the payload across real columns or a JSON field. Do not carry the blob across. The blob will outlive every developer who knew what it meant, and the next time someone touches the data they will be reverse-engineering a 2013 cron script.
Run the full migration into staging on day one
Our error existed for sixteen hours in a green dashboard because nothing was looking at the rendered output. A single content editor clicking around a staging copy on day one would have caught the empty spec tables before we got attached to our theory of the bug. Cheap insurance, and the people you most want clicking around (the editors who actually know the catalogue) are usually delighted to be asked. We now ask the client to nominate one editor whose only migration responsibility is twenty minutes a day with coffee, opening URLs from a deterministic sample. They are not testing for performance or design. They are testing for presence. Did the words land?
The smallest thing you can do today
If you are still running a Drupal 7 site with Field Collection fields (the module is in maintenance mode but not dead), open a database client and run:
SELECT COUNT(*) AS abandoned_drafts
FROM field_collection_item fci
WHERE fci.revision_id != (
SELECT MAX(revision_id)
FROM field_collection_item_revision
WHERE item_id = fci.item_id
);
If the number is larger than zero, you have the same shape of problem we did. It will not bite you while the site is on Drupal 7. It will bite you the day you start the upgrade.
When we ran the rest of the upgrade for the Hasselt client, and the two other Belgian sites that came in on referral after this one, we baked the diff-revisions check into our standard legacy migration kickoff and stopped losing days to abandoned drafts. The serialized-PHP unpicker is now the second migration we write on any D7 project, before anyone talks about themes.
Key takeaway
Field Collection migrations to Paragraphs grab the latest revision by default. On long-lived sites, that is often an abandoned draft, not the published one.
FAQ
Why didn't the migration throw an error when the data was missing?
Empty is a valid field value. The Paragraphs migration plugin copied what it found in the source row. No error was raised because nothing was technically wrong: the source revision really did contain empty values.
Should new Drupal sites still use Field Collection?
No. Field Collection is in maintenance mode. Paragraphs is the default choice for grouping fields on Drupal 9 and 10. Introducing Field Collection on a new build only buys you a future migration headache.
Is serialized PHP in a text field always a bug?
Not always, but it is a smell. It hides structure from the database, breaks indexing, and creates a deserialisation surface. On any migration it should be unpacked into real columns or a JSON field.
How long does a Drupal 7 to 10 upgrade usually take?
For a content-heavy site with custom modules, three to eight weeks of focused work is realistic. Hidden surprises like the one in this post are the usual reason projects land at the long end of that range.