Drupal
Drupal to Sanity migration: the Field Collection trap
Day eight of a Drupal 7 to Sanity migration. The GROQ queries returned clean strings where 3,600 RTRS-audited pesticide doses used to live as nested entity references.

It was day eight. The Sanity Studio looked clean, Astro built in four seconds, and the staging URL loaded instantly on a 4G hotspot in the Den Bosch warehouse parking lot. Then the operations manager opened a single cultivar — Cucumis sativus 'Verdon' — and the pesticide application history rendered as a 412-character string. No line breaks. No dates. No dosages. Just a comma-separated wall of legacy entity IDs.
We were rebuilding the teler-portaal of a twenty-three-person tuinbouw-toelevering: a Drupal 7 grower portal that backs their RTRS certification. Three thousand six hundred records of pesticide dosing per cultivar, per week, going back to 2019. The auditor visits in four weeks. The old portal still works. The new one does not.
This is the story of why we spent eleven days fixing what a 2016 Field Collection setup quietly did to the data — and how to spot it on your own Drupal 7 exit before it eats a sprint.
The grower portal we inherited
The client supplies seeds, substrate, and crop-protection products to tomato, cucumber, and pepper growers across Brabant and Limburg. Twenty-three people, mid-eight-figure revenue, one PHP 7.1 grower portal that had not been touched since 2018. Telers log in weekly to record what they sprayed, on which cultivar, in which greenhouse compartment, at what dose. The portal generates the audit log that feeds their RTRS certification and three other quality marks. No log, no certification. No certification, no contracts with the big retailers.
Drupal 7 ended its community support on 5 January 2025. The site had been limping along on an unsupported PHP 7.1 with monthly patches we paid a freelancer to apply by hand. The brief was simple: get off Drupal, keep the portal working, do not lose a single dosing record. We picked Sanity for the structured content store and Astro for the front-end, with a thin Hono API on Cloudflare Workers for the write path. We budgeted three weeks. We were on schedule until day eight.
What the Field Collection module actually stored
The portal was built in 2016 on Drupal 7.43, when Field Collection was the standard way to model repeating compound data. Each cultivar node had a field called field_spuit_registratie ("spray registration"), which was a Field Collection. Each collection item held three sub-fields: an entity reference to a bestrijdingsmiddel (pesticide) node, a decimal dosering_g_per_ha, and a toepassingsdatum (date applied).
That looks innocent until you understand what Field Collection does. Every collection item is its own entity, with its own entity ID, stored in field_collection_item. The parent cultivar node holds an entity reference to the collection item. The collection item holds an entity reference to the pesticide. So a single spray record is two entity references deep — node → collection_item → pesticide — and each level needs its own load.
In 2016 this was fine. Drupal's entity_metadata_wrapper walked it transparently. The original developer wrote a Twig partial that iterated the wrapper and rendered each record. It worked for nine years. It also embedded an assumption: the read side will dereference the entities for me.
If your Drupal 7 site uses Field Collection for anything compliance-related — audit logs, dosing records, signed-off versions — the data is two entity-references deep. Any migration that flattens the structure to JSON before resolving the references will quietly destroy the relationship.
How GROQ flattened the audit trail
Our export script ran on the Drupal side. It loaded each cultivar node, walked the Field Collection, and emitted a JSON array of records. The Sanity import script took that JSON and pushed it through @sanity/client with a schema we had defined as a list of plain objects on the cultivar document:
// schemas/cultivar.ts
defineType({
name: 'cultivar',
type: 'document',
fields: [
defineField({ name: 'name', type: 'string' }),
defineField({
name: 'sprayRegistrations',
type: 'array',
of: [{
type: 'object',
fields: [
{ name: 'pesticide', type: 'string' }, // <-- the bug
{ name: 'doseGramsPerHa', type: 'number' },
{ name: 'appliedAt', type: 'datetime' },
],
}],
}),
],
})
The pesticide came in as a string. Of course it did — the exporter had read the entity reference, called $wrapper->bestrijdingsmiddel->label(), and serialised the human-readable name. Two thousand records used the same five pesticides. The string was lossless on the rendered page. The auditor would never know.
Except the RTRS auditor does not read pages. They request a CSV export of every dosing event keyed by the EU pesticide registration number (the toelatingsnummer). The original Drupal portal generated that CSV by dereferencing the pesticide entity at export time and reading its field_toelatingsnummer. The new portal had no entity to dereference. The registration number never made it into JSON. The GROQ query we wrote to feed the CSV endpoint returned this:
// /api/rtrs-export.groq
*[_type == "cultivar"]{
name,
"sprays": sprayRegistrations[]{
pesticide, // "Previcur Energy" — the name, not the number
doseGramsPerHa,
appliedAt
}
}
Three thousand six hundred records came out without a registration number. Sanity's GROQ engine did exactly what we asked it to do: project the fields that exist. The fields we needed — the ones the auditor would request — were not in the document. They were in the Drupal pesticide node we had not migrated as its own type.
Eleven days of wrong assumptions
The first three days we spent looking for the bug in the GROQ query. There was none. The second three days we spent rewriting the Astro page to pull the pesticide name and reformat it. That was the wrong layer. By day seven we had built a lookup service that hit the old Drupal site for any missing toelatingsnummer, which worked until the freelancer's monthly patch cycle took the Drupal site offline for a maintenance window we had not been warned about.
Day eight is when one of our engineers ran a single block in the Drupal 7 shell and the actual shape of the data became visible:
// drush php-eval
$node = node_load(2841);
$w = entity_metadata_wrapper('node', $node);
foreach ($w->field_spuit_registratie as $fc) {
$item = $fc->value();
$pest = entity_metadata_wrapper('field_collection_item', $item)
->field_bestrijdingsmiddel->value();
echo $pest->nid . ' ' . $pest->field_toelatingsnummer['und'][0]['value'] . "\n";
}
Every spray record pointed at a real pesticide node with a real toelatingsnummer. Our export had read the label and thrown the reference away. The flatten happened in our PHP, not in GROQ. We had been blaming the read layer for an export-layer mistake.
That mattered, because the fix had to happen on the export and the schema, and any version of the bug we had patched on the Astro side was now wasted work. We deleted four days of code.
The fix: denormalize at export, not at read
We restructured the Sanity schema so that each pesticide is its own document, and each spray registration is an object that holds a real Sanity reference to that document. The export script now resolves the Field Collection in two passes: first every pesticide node, then every cultivar with references to the pesticide documents that already exist.
// schemas/pesticide.ts
defineType({
name: 'pesticide',
type: 'document',
fields: [
defineField({ name: 'name', type: 'string' }),
defineField({ name: 'toelatingsnummer', type: 'string' }), // EU reg. number
defineField({ name: 'activeIngredient', type: 'string' }),
],
})
// schemas/cultivar.ts (revised)
defineField({
name: 'sprayRegistrations',
type: 'array',
of: [{
type: 'object',
fields: [
{ name: 'pesticide', type: 'reference', to: [{ type: 'pesticide' }] },
{ name: 'doseGramsPerHa', type: 'number' },
{ name: 'appliedAt', type: 'datetime' },
],
}],
})
The GROQ query now resolves the reference at read time, which is exactly what GROQ is good at:
*[_type == "cultivar"]{
name,
"sprays": sprayRegistrations[]{
"pesticide": pesticide->{ name, toelatingsnummer, activeIngredient },
doseGramsPerHa,
appliedAt
}
}
The export script ran for fourteen minutes. The CSV came out with every toelatingsnummer in place. We diffed it against the last Drupal-generated CSV and found seven historical records where the original portal had stored a free-text pesticide name without an entity reference — a 2017 data-entry quirk we corrected by hand. The auditor signed off two weeks later.
Lessons from a Drupal 7 to Sanity migration
Three things we now do on every Drupal 7 exit.
First, before we write a line of export code, we run drush field-info fields on the live site and grep for field_collection, entityreference, and paragraphs. Any of those means the data is at least two hops deep. That changes the schema design before it changes the migration script.
Second, we treat the Drupal entity graph as the source of truth and reproduce it in the new system before we flatten anything. If a Field Collection contains an entity reference, that reference becomes its own document in the new stack. Flattening is a presentation concern, not a storage concern. GROQ resolves references at read time exactly so you do not have to bake them into the document.
Third, every export run produces a record-count diff against the live Drupal site before anyone looks at the front-end. If 3,600 records went in and 3,600 strings came out, the script is "working." If 3,600 records went in and 3,593 fully-resolved references came out, you find the seven outliers on day one instead of day eleven.
What this work looked like for the client
When we built the new teler-portaal for this Den Bosch supplier, the thing we ran into was a 2016 Field Collection setup hiding a two-hop entity reference behind a perfectly readable Twig template. We ended up rebuilding the export pass to preserve the reference graph, modelling each pesticide as a first-class Sanity document, and only flattening at the GROQ read layer. If you are staring at a Drupal 7 portal and an EOL date, our legacy migration notes from this project apply almost directly.
Open a terminal on your Drupal 7 site and run drush field-info fields | grep -E 'field_collection|entityreference|paragraphs'. Every line you see is a place a migration script can quietly drop a reference. Map those before you map anything else.
Key takeaway
On a Drupal 7 migration, the dangerous data is the entity reference your Twig template was quietly dereferencing. Flatten at read, not at export.
FAQ
Is the Drupal 7 Field Collection module still safe to rely on?
No. Drupal 7 community support ended in January 2025, and Field Collection was already in maintenance mode before that. Treat any site using it as a migration candidate, not a maintenance candidate.
Why model pesticides as Sanity documents instead of inline objects?
Because the same pesticide appears across hundreds of cultivars and needs one authoritative registration number. References give you a single source of truth and let GROQ resolve at read time without duplicating data.
How long should a Drupal 7 to Sanity migration take for a portal this size?
Three weeks is realistic for a 23-person site with a few thousand records, provided you map the entity graph before writing the export script. Discover Field Collection nesting on day two, not day eight.
What does an RTRS-style audit actually check in the data?
Per-application records keyed by the EU pesticide registration number, with cultivar, dose, date, and operator. Pesticide name alone is not enough — the auditor needs the regulated identifier.