Joomla

Joomla 3.10 K2 migration: the J2XML double-encode trap

Twelve days into a Joomla 3.10 to Sanity migration, 2,600 CrossRef backlinks were silently broken. The culprit was a double base64 nobody had looked for.

Jacob Molkenboer· Founder · A Brand New Company· 20 Dec 2025· 11 min

Worn leather logbook open beside a brass key on a cream card, green ribbon in the spine, iron tags on ivory paper.

Marieke leads the redactie of a small academic publisher on the Rapenburg in Leiden. Twenty-three people, four journals, 9,800 peer-reviewed artikelen going back to 1998. Their Joomla 3.10 site, running K2 for every article, had been frozen since the day the 3.10 end-of-life notice stopped feeling like a threat. The plan was a clean six-week migration to Sanity and Next.js. Export the content, rebuild the front-end, ship before the autumn issue.

Day twelve, the migration was stalled. The artikelen were in Sanity. The article pages rendered. The DOIs at the top of every artikel rendered. And every single CrossRef backlink — the citation graph that lets a reader walk from one paper to the next — was broken. 2,600 of them.

This is the story of what was wrong and how we found it.

Inside the JParameter blob

K2 was the most popular content extension for Joomla 1.5 through 3.x. By 2026 it is still maintained, but rarely chosen for new builds. Its data model has one peculiarity that bites every migration project: the params column on jos_k2_items is a free-form bag. K2 itself uses it for SEO fields, image alignment, and a handful of flags. Anything a third-party extension wants to attach to an article ends up there too.

The redactie had bought a third-party K2 extension in 2014 to manage DOI cross-references. It stored each article's outbound citations as a doi_refs field inside the K2 item's params blob. Joomla called this object a JParameter: internally an associative array, sometimes serialised to INI, sometimes to JSON, and from K2 v2.7 onward to a base64-wrapped PHP serialize() string. Which format you got depended on whether the field had been touched in the admin after the 2.7 upgrade.

The JParameter class itself was deprecated in Joomla 2.5 in favour of JRegistry, which standardised on JSON. K2's params blob was never migrated to the new class. The K2 maintainers built their own serialisation layer on top, which is why the same field can hold three different formats inside one database depending on when each row was last saved.

In their database we found all three formats coexisting. About 7,200 items used the modern base64-wrapped serialize. About 2,600 used the legacy INI. A handful used JSON. Nobody on the redactie had ever known any of this. They had just used the CMS.

The plugin behind the DOI fields was called K2 DOI Refs. It was sold on the Joomla extension directory between 2013 and 2016 and went quiet without a final release notes pin. The redactie's copy was v1.6.2, with the v2.7-compatible storage migration applied automatically the first time an admin re-saved an article on the upgraded K2. We found the plugin's source in the site's components folder, untouched since 2015. It was 800 lines of PHP, two of which mattered to us: the lines that wrote and read the citations field.

The J2XML double-encode

The standard tool for exporting Joomla content is J2XML. It produces a portable XML file with all articles, categories, users, and custom params. It is genuinely good software for what it does, and we had used it on a dozen earlier migrations without incident.

J2XML treats the K2 params blob as an opaque payload. If the string looks like text, it writes it as text. If it looks binary or contains characters that would break XML, it base64-encodes the value first.

The heuristic is fast and correct most of the time. J2XML checks whether the string contains characters outside printable ASCII, or whether the length-divisible-by-four test combined with a high ratio of alphanumeric-plus-slash-plus-plus characters suggests an opaque payload. When either signal fires, it base64-encodes. The catch: an already-base64 string passes the second test trivially.

So the 7,200 items whose K2 params were stored as base64-wrapped serialize got base64-encoded a second time on export. The XML on disk now contained base64(base64(serialize(array))) for those rows. The legacy INI rows were exported untouched.

Our Sanity importer, of course, decoded base64 exactly once. The PHP unserialize() call on the result returned false. The importer wrote whatever it had — an empty doi_refs array — and moved on. No error. No log line. The artikelen imported, the DOI headers imported, the CrossRef backlinks did not.

This was the twelve-day stall.

Warning

Any exporter that base64-encodes "binary-looking" payloads will double-encode anything that is already base64. Diff one record by hand from source to export before you trust the batch — heuristics lie quietly.

What twelve days looked like

The twelve days were not idle. Day one through day three the importer logged the artikel ingestion as complete and we moved on to category mapping. Day four was Sanity Studio configuration. Day five and six were front-end render in Next.js. The redactie staged a sample issue and read through the articles on a preview URL. The body copy was fine. The DOIs at the top of each artikel were fine. Nobody clicked a citation.

On day seven the editor-in-chief did click one and got nothing. We logged a bug, blamed the front-end, and spent days eight and nine in the routing layer. On day ten one of us went back into Sanity Studio and noticed that the references array on every artikel was empty. The importer had succeeded silently. From there it was three days of grepping the importer source for the assignment that wrote that field, finding it, and not believing what we saw.

The diff that broke it open

On day twelve we stopped staring at the importer logs and went back to the source. We took one specific article — a 2019 review paper with fourteen outbound citations — and dumped its raw params column directly from MySQL. We located the same article in the J2XML XML output. Then we diffed.

We had spent the previous two days assuming the bug was in our importer. The Sanity client, the schema validation, the type coercion in the adapter. Every reading was negative; every smoke test passed. Going back to the source meant admitting the importer was not the problem, which is psychologically expensive after eight days of tweaking it.

The XML field was about 30% longer than the MySQL field. We base64-decoded the XML once and got something that still looked like base64. Decoded again and got the PHP serialize string we expected. Ran it through unserialize() and got the array, including the fourteen DOIs.

The whole investigation took thirty-five minutes once we stopped trusting the importer and started trusting the bytes on disk. Twelve days of staring at the wrong layer.

The repair

You cannot fix this on the J2XML side without patching J2XML, which we did not want to put in the critical path of a publisher's autumn issue. We did it on the importer side instead, with one detection pass and one repair pass.

Before either pass touched the production database we ran both against a snapshot copy on a local MariaDB. The detection pass is read-only. The repair pass writes to a staging table in a separate schema. The original jos_k2_items table was never modified. If the autumn issue had to roll back to the Joomla site, the source data was untouched.

<?php
// detect-double-encoded-k2-params.php
// Usage: php detect-double-encoded-k2-params.php

$pdo = new PDO('mysql:host=localhost;dbname=joomla', $user, $pass, [
    PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
]);

$rows = $pdo->query(
    "SELECT id, params FROM jos_k2_items WHERE params IS NOT NULL"
);

$double = 0;
$single = 0;
$ini    = 0;

foreach ($rows as $row) {
    $raw  = $row['params'];
    $once = base64_decode($raw, true);

    if ($once === false) {
        $ini++;            // legacy INI rows — leave alone
        continue;
    }

    $twice = base64_decode($once, true);
    if ($twice !== false && @unserialize($twice) !== false) {
        $double++;
        fwrite(STDERR, "double: item {$row['id']}\n");
    } elseif (@unserialize($once) !== false) {
        $single++;
    }
}

printf("double=%d single=%d ini=%d\n", $double, $single, $ini);

The detection pass told us we had 7,184 double-encoded rows, sixteen single-encoded rows where the field had been touched after the export, and 2,600 INI rows. The numbers matched the gap in the backlink graph almost exactly.

The repair pass was the same loop with a different body: decode the right number of times for each row, run it through unserialize, pull out the doi_refs field, and write the cleaned values to a small staging table.

function extract_doi_refs(string $serialized): array
{
    $params = @unserialize($serialized);
    if (!is_array($params) || empty($params['doi_refs'])) {
        return [];
    }

    // doi_refs was stored as a newline-joined string in K2 v2.10.
    return array_values(array_filter(array_map(
        fn ($s) => trim($s),
        explode("\n", (string) $params['doi_refs'])
    )));
}

From the staging table, the Sanity import was a small adaptor that walked each artikel's references and pushed them into a typed array on the document. Forty minutes after we ran the detection pass, the 2,600 CrossRef backlinks were rebuilt. Marieke checked five at random and stopped checking.

The new shape in Sanity

In the new stack the DOI list is a top-level array on each artikel document. Each reference is a small object with the DOI string, an optional pointer to the cited artikel if it lives in the same publisher, and a cached title for citations that point elsewhere. The cache is there so that a network blip during a CrossRef sync cannot blank a citation a reader is about to click.

// schemas/artikel.ts
export default {
  name: 'artikel',
  type: 'document',
  fields: [
    { name: 'title', type: 'string' },
    { name: 'doi',   type: 'string' },
    {
      name: 'references',
      type: 'array',
      of: [{
        type: 'object',
        fields: [
          { name: 'doi',         type: 'string' },
          { name: 'target',      type: 'reference', to: [{ type: 'artikel' }] },
          { name: 'cachedTitle', type: 'string' },
        ],
      }],
    },
  ],
}

The CrossRef sync is now a nightly Next.js route handler that hits the CrossRef REST API and only updates documents whose references resolve to something stale. It runs in about ninety seconds for the full corpus and emits a structured log line for every change. There is no silent fallback. When a DOI fails to resolve, the build flags the artikel in Sanity Studio and the redactie sees it before a reader does.

What we'd carry forward

Three habits came out of this project that we now apply on every Joomla migration. First, never trust an exporter's encoding heuristic. Diff one record by hand from source to export before you trust the batch, even on tools you have used before. Second, sample three records across the publication date range, because legacy CMSs accumulate format generations and the oldest articles often have the cleanest data. Third, run the importer against a known-good record first and a known-broken record second. Only then turn it on the long tail.

This is also the third Joomla 3.x migration in which the failure was in the export tool, not in the source data. The pattern is consistent: a mature exporter handles 95% of cases with a quiet heuristic, the remaining 5% fail in a way that produces no error and survives every smoke test. The defence is not a better tool. It is a small, slow, byte-level audit of a single record before anything is batched.

The other lesson is older and harder to write down. The redactie did not know their citation metadata lived in a third-party plugin's JParameter blob. They had no reason to. The migration plan we wrote on day one did not know either, because we had read the schema and not the data. A schema tells you what is allowed. The data tells you what is true. They are not the same document.

When we rebuilt the artikel pipeline for the Leiden publisher, the thing we ran into was the silent failure of base64 nesting inside a serialised PHP blob. We ended up solving it by adding a detection pass that counts unserialize attempts at each depth before any document is written to Sanity. The same trick has saved us on two Drupal migrations since, and now sits at the top of our runbook for any legacy migration.

If you have an old Joomla or K2 site queued for migration this year, pick one record with rich custom data, dump its raw row from MySQL, and run base64_decode on the relevant blob twice. If the second decode also succeeds and the result deserialises, you have this bug. Plan an extra day before you schedule the cutover.

Key takeaway

On any legacy CMS migration, diff one record by hand from source to export before you trust the batch — encoding heuristics lie quietly.

FAQ

Does every J2XML export hit this double-encode bug?

Only when the source field is already base64-encoded. Most Joomla core fields are plain text. K2 params from v2.7 onward and custom fields wrapped in base64 by other extensions are the usual victims.

Why not patch J2XML and re-run the export?

We had a deadline and a frozen production database. Patching the exporter would have meant a fresh roundtrip plus regression-testing the whole export. Repairing on the importer side was faster and reversible.

How do I tell single from double base64 in a blob?

Decode once. If the result is readable UTF-8 with an INI fragment, JSON, or a PHP serialize string, you had single. If it still looks like base64 — right character set, length divisible by four — decode again and check.

Can a Joomla 3.10 site stay on 3.10 in 2026?

Technically yes, but 3.10 reached end of life in August 2023. No security patches ship for it. If a relevant CVE drops you are on your own. Migrating off is the only safe answer.

joomlamigrationlegacy sitesphpmysqlcase study

Building something?

Start a project