← Blog

Joomla

Joomla 2.5 to Astro + Sanity: the 2,800-URL playbook

A 14-year-old Joomla 2.5 site with K2, 2,800 indexed URLs, and an SEO contractor who refuses to renumber a single one. Here is how we moved the lot onto Astro and Sanity without losing rank.

Jacob Molkenboer· Founder · A Brand New Company· 12 Jun 2026· 9 min
Open leather logbook with brass key on spine, iron tags with twine, green index card, red wax seal on ivory paper.

The studio sits above a bookshop on the Bruul in Mechelen. Nineteen architects, fourteen years of built work, and a project portfolio site running Joomla 2.5 with K2. Last secure update somewhere in late 2014. The owners called us in March because their host had announced PHP 8.2 as the floor by Q3, and the old stack would not survive the bump. The site had 2,800 URLs indexed in Google. An SEO contractor on retainer had been clear on one point: no renumbering, no path changes, no lost rank.

This is the playbook we used to move them onto Astro and Sanity. It worked. The October crawl after cutover showed 99.4% of the old URLs returning a clean 200 on their new home, the rest landing on a 301 to the right canonical. No drop in Search Console impressions through the four weeks we monitored.

The constraint that shapes everything

Before any architecture talk, the SEO constraint dictates the schema. K2 generates URLs like /projects/47-housing-collective-leuven.html where 47 is the row's id in jos_k2_items. The contractor wanted those numeric IDs preserved verbatim. Practical implication: every project in Sanity carries the original K2 id as a read-only field, and the Astro route is /projects/[id]-[alias].astro. The new system mirrors the old URL grammar exactly. The redirect layer handles the cruft.

If you cannot keep IDs, you can still redirect. But every preserved ID is a redirect that does not need to fire, and Google rewards the cleaner path.

Audit before you write a line of code

A migration that begins in the editor is already off the rails. Our first week was crawling, exporting, and reading. Three sources of truth that have to agree before any schema gets drafted: the site's own sitemap, an external full-depth crawl, and Search Console's index coverage report.

# Pull the live sitemap and normalise to a flat URL list
curl -s "https://example.be/index.php?option=com_xmap&view=xml" \
  | xmllint --xpath '//*[local-name()="loc"]/text()' - \
  | tr ' ' '\n' \
  | sort -u > sitemap.txt
wc -l sitemap.txt   # 2,847 (matches the contractor's count, give or take)

We also ran Screaming Frog at full depth against the live site, exported a CSV of every internal link with its anchor and HTTP status, and pulled the last twelve months of Search Console clicks per URL. The crawl uncovered 41 orphan pages still ranking that Joomla had quietly stopped linking to but Google had not forgotten. Those alone justified the audit fee.

Mapping K2 to a Sanity schema

K2 has three things you need to carry across: categories (with parent-child), items (title, alias, introtext, fulltext, image, gallery, extra fields), and tags. The schema mirrors this in Sanity with the minimum the front-end actually needs.

// schemas/project.ts
import {defineType, defineField} from 'sanity'

export default defineType({
  name: 'project',
  type: 'document',
  title: 'Project',
  fields: [
    defineField({name: 'legacyId', type: 'number', title: 'K2 ID', readOnly: true}),
    defineField({name: 'legacyAlias', type: 'string', title: 'K2 alias', readOnly: true}),
    defineField({name: 'title', type: 'string', validation: r => r.required()}),
    defineField({name: 'slug', type: 'slug', options: {source: 'title', maxLength: 80}}),
    defineField({name: 'category', type: 'reference', to: [{type: 'category'}]}),
    defineField({name: 'intro', type: 'array', of: [{type: 'block'}]}),
    defineField({name: 'body', type: 'array', of: [{type: 'block'}, {type: 'image'}]}),
    defineField({name: 'hero', type: 'image', options: {hotspot: true}}),
    defineField({name: 'gallery', type: 'array', of: [{type: 'image'}]}),
    defineField({name: 'completedYear', type: 'number'}),
    defineField({name: 'client', type: 'string'}),
    defineField({name: 'surfaceSqm', type: 'number'}),
    defineField({name: 'publishedAt', type: 'datetime'}),
  ],
})

legacyId is load-bearing. Every K2 item is imported with its original numeric id so the route generator can target it deterministically. legacyAlias keeps the slug Google indexed, separate from the cleaner slug editors can change later without breaking links.

Extracting K2 cleanly

K2 stores its real content in jos_k2_items. Categories sit in jos_k2_categories. Tags join via jos_k2_tags_xref. Run the query against a fresh dump, not the live database.

SELECT
  i.id            AS legacy_id,
  i.alias         AS legacy_alias,
  i.title,
  i.introtext,
  i.fulltext_,
  i.image,
  i.gallery,
  i.extra_fields,
  i.created       AS published_at,
  c.name          AS category_name,
  c.alias         AS category_alias
FROM jos_k2_items i
JOIN jos_k2_categories c ON i.catid = c.id
WHERE i.published = 1
  AND i.trash = 0
ORDER BY i.id;

A small Node script reads the rows, walks /media/k2/items/cache/ to pick the largest non-thumbnail variant per item, and pushes everything into Sanity through the official client. Run it against a copy of the database with the asset directory rsynced to a local volume, not over a live SSH connection. We learned that one the hard way on an earlier migration.

Warning

K2's extra_fields column is a serialised PHP blob, not JSON. For this client it held the per-project metadata (square meters, client, year) the SEO contractor cared about most. We unpacked it with a short PHP CLI script on the source server before the rest of the export ran. Trying to parse it in Node will cost you a day.

Transforming editor HTML

This is where most migrations go sideways. Joomla's TinyMCE has fourteen years of inline styles, deprecated <font> tags, Word-paste residue, and image paths that point at /images/stories/. Sanity's Portable Text is strict. You cannot push raw HTML at it.

We used @portabletext/block-tools with a deterministic HTML preprocessor.

import {htmlToBlocks} from '@portabletext/block-tools'
import {defaultSchema} from './sanity-schema'
import {JSDOM} from 'jsdom'

function cleanLegacyHtml(raw: string): string {
  return raw
    .replace(/style="[^"]*"/g, '')
    .replace(/<font[^>]*>([\s\S]*?)<\/font>/g, '$1')
    .replace(/<o:p>[\s\S]*?<\/o:p>/g, '')           // Word paste residue
    .replace(/\/images\/stories\//g, '/legacy/')
    .replace(/&nbsp;/g, ' ')
}

const projectBody = defaultSchema
  .get('project').fields
  .find(f => f.name === 'body').type

const blocks = htmlToBlocks(
  cleanLegacyHtml(item.fulltext),
  projectBody,
  {parseHtml: html => new JSDOM(html).window.document},
)

This is also the place where the temptation to delegate to an autonomous agent is strongest. Don't. There is a recurring Hacker News pattern this month of agents running amok inside live environments, and the lesson scales down: an agent given write access to 2,800 records of historical content will eagerly "tidy" headings, merge paragraphs, and quietly drop figure captions. We used an LLM to produce a diff report per item (old plain-text against new plain-text, flagged anywhere the character count diverged by more than 5%). The actual writes were deterministic code that a human read.

Astro routes that honour the old paths

The route file is one page, parameterised by the K2 id and alias.

// src/pages/projects/[id]-[alias].astro
---
import {sanityClient} from '../../lib/sanity'
import Layout from '../../layouts/Project.astro'

export async function getStaticPaths() {
  const projects = await sanityClient.fetch(`
    *[_type == "project" && defined(legacyId)]{
      "id": legacyId,
      "alias": legacyAlias,
      title, intro, body, hero, gallery,
      completedYear, client, surfaceSqm
    }
  `)
  return projects.map(p => ({
    params: {id: String(p.id), alias: p.alias},
    props: {project: p},
  }))
}

const {project} = Astro.props
---
<Layout project={project}>
  <h1>{project.title}</h1>
  <slot />
</Layout>

Astro builds 2,800 static pages at the same canonical URLs Google saw in 2018. See the Astro docs on dynamic routes for the parameter contract. The only deliberate change is the trailing .html, which the redirect layer handles.

The 301 map

Even with identical numeric paths, K2 also serves every item at the unrewritten URL index.php?option=com_k2&view=item&id=47:housing-collective-leuven. Those query-string variants are in the index too. So are the K2 category listings, the date-archive views, and the tag pages. Each pattern gets one rule.

# /etc/nginx/sites-available/architecten.conf

# K2 item by query string
location = /index.php {
  if ($arg_option = "com_k2") {
    set $kind $arg_view;
    if ($kind = "item")     { return 301 /projects/$arg_id; }
    if ($kind = "itemlist") { return 301 /projects/category/$arg_layout; }
  }
}

# K2 item with .html suffix
rewrite ^/projects/([0-9]+)-([^/]+)\.html$ /projects/$1-$2 permanent;

# K2 category listing
rewrite ^/projects/category/([0-9]+)-([^/]+)\.html$ /projects/c/$2 permanent;

On Netlify or Vercel the equivalent goes in _redirects or vercel.json. We also generate a CSV of every old/new pair from the K2 export and feed it to a verification script before cutover.

# Verify every redirect in the map
while IFS=, read -r old expected; do
  actual=$(curl -o /dev/null -s -w "%{redirect_url}" -I "$old")
  status=$(curl -o /dev/null -s -w "%{http_code}" -I "$old")
  if [ "$status" != "301" ] || [ "$actual" != "$expected" ]; then
    echo "MISS $status $old -> $actual (expected $expected)"
  fi
done < redirect-map.csv

Any line printed is a miss. On day one of this exercise we had 184 misses. By the time we cut over we had zero.

Cutover and the 72 hours after

The day-of choreography is dull on purpose. Boring deployments don't make Search Console alerts fire.

  1. Deploy Astro to a staging subdomain. Crawl it with Screaming Frog at the same depth as the original audit. Diff URL count, <title> tags, and <meta name="description"> values against the legacy crawl. Any drift is a Sanity field someone forgot to populate.
  2. Drop TTL on the apex A record to 300 seconds the day before. Most DNS providers let you do this without involving the registrar.
  3. Cutover at 02:00 local. Update the apex. Watch the access log for the first 15 minutes for 404 spikes.
  4. Keep the old Joomla VM running for 30 days as a fallback. If something is missing from the export, the answer is still on disk.
  5. Submit the new sitemap in Search Console the same morning, even though Google will find it on its own. Watch the index coverage report daily for two weeks.

Joomla 2.5 itself reached end of life on 31 December 2014. Anything still on that branch in 2026 is running on borrowed time and somebody else's patched PHP. The Mechelen studio's host pulling the rug is not unusual; it is the default eventual outcome.

What we did not expect

Two things bit us that did not appear in any migration guide.

The first was the extra_fields blob, flagged in the warning above. The second was Joomla's com_xmap sitemap component, which had been silently truncating the project list at 1,000 entries for years because of an obscure config flag. The SEO contractor's "2,800 URLs" number came from Google's index, not from the site's own sitemap. We only noticed when the staging crawl matched the contractor's number but disagreed with Joomla's published sitemap by 1,840 URLs. The lesson: never trust the source CMS to know its own surface area. Trust the crawl, the index, and the access log, and prefer the largest of the three when they disagree.

When we built this migration for the studio in Mechelen, the moment that paid for itself ten times over was the two-day audit before any code was written. If you are sitting on a legacy CMS that nobody supports any more and a SERP rank you cannot afford to lose, the same playbook applies. We do these legacy migrations as a fixed-scope engagement, and the audit is always phase one. It is the phase we will not skip.

If you do one thing today: open Search Console, export your top 500 URLs by clicks, and run them through curl -I against a fresh local crawl of your own site. Anywhere the two disagree is the size of the problem you have not yet seen.

Key takeaway

Preserve K2 numeric IDs in your Sanity schema, generate Astro routes from them, and verify every 301 against an exported map before you cut DNS.

FAQ

Why preserve K2 numeric IDs instead of moving to clean slugs?

Google has indexed the numeric form for years. Preserving the IDs means fewer 301 hops, less PageRank decay, and one fewer thing the new schema has to defend at cutover. Clean slugs can come later.

Can the same playbook work for K2 to WordPress or Strapi?

Yes. The audit, SQL extraction, and 301 strategy stay identical. Only the schema definition and the static route layer change. Astro plus Sanity is what we use, but the playbook is portable.

How long did the full migration take end to end?

Eleven weeks of calendar time for 2,800 projects. Two for audit and schema, three for transformation and editing, three for build and redirect verification, three for staging review with the studio and the SEO contractor.

What happens to comments and old user accounts in K2?

We exported both to JSON archives, kept them off the new site by default, and offered the client a read-only public archive at a /legacy/ path if the comments held real value. For this studio they did not.

joomlamigrationlegacy sitesseoarchitecturecase study

Building something?

Start a project