← Blog

SEO

Joomla to Astro migration: keeping 380k organic sessions

A 22-person tour operator in Nijmegen had 380,000 monthly organic sessions sitting on Joomla 3. Here is the playbook we used to move it without bleeding traffic.

Jacob Molkenboer· Founder · A Brand New Company· 22 Jun 2026· 12 min
Open leather ledger with copperplate entries, brass paperclip on green index card, red wax seal, iron luggage tag.

The marketing lead at a 22-person reisorganisatie in Nijmegen showed us a Search Console export on a Tuesday afternoon in March. 380,000 organic sessions a month, give or take, almost all of it landing on destination pages built in Joomla 3.10. Joomla 3 had been end-of-life for nine months. The CMS was held together by a single freelancer who answered email when he felt like it, and the booking flow ran through a PHP 7.4 plugin nobody at the office could read.

The brief was simple and not simple. Move to something the team could actually edit. Do not lose the traffic. The traffic is the business.

What follows is the playbook we ran, with the parts that mattered and the parts where we nearly broke something. The stack we landed on was Astro for the front-end, Sanity for the content, and a thin Node worker for the booking handoff. None of that is the interesting bit. The interesting bit is the URL map, the 410-vs-301 decision tree, and the structured-data parity check we now run on every TourPackage node before it leaves staging.

The 14,200-URL inventory

The first job was knowing what we had. A Screaming Frog crawl with custom extractors gave us 14,243 indexable URLs. We cross-referenced that against twelve months of GSC Performance > Pages data exported through the API, and against the access logs the host could still give us (Joomla's own logging had been off since 2021).

The three lists disagreed. Screaming Frog found pages GSC had never seen. GSC had pages that returned 404 on a fresh crawl. The access logs had a long tail of ?Itemid= variants that the canonical tag had been silently consolidating for years. We merged the three into a single Postgres table and tagged every row with one of six dispositions.

create table url_inventory (
  url           text primary key,
  source        text[] not null,           -- {crawl, gsc, logs}
  last_seen     date,
  clicks_12mo   int default 0,
  impressions_12mo int default 0,
  backlinks     int default 0,
  disposition   text check (disposition in (
    'keep-301', 'merge-301', 'retire-410',
    'noindex-200', 'soft-404-fix', 'review'
  ))
);

The rule that did most of the work: if a URL had zero clicks, zero impressions, and zero referring domains over twelve months, and the source set did not include logs, it got retire-410 as a starting position. A human reviewed every review row, but the long tail of retire-410 went through unsupervised. About 4,100 URLs went straight to 410.

The other rule that mattered: we never auto-retired anything with active backlinks. A single referring domain from a real publisher kept a URL in review for human eyes. The backlink data came from Ahrefs Site Explorer and was imported nightly into the same table. Two pages from 2017 we would have culled turned out to be the canonical references for a Dutch travel blogger with a five-figure email list. Killing them would have cost more goodwill than any link-equity recovery could have repaid.

The 410-vs-301 decision tree

Most migration guides tell you to 301 everything. That is wrong, and it has been wrong for a while. Google's own guidance is that 410 is the honest signal when a page is gone for good, and the crawler will drop it from the index faster. 301-ing a retired page to a vaguely related parent dilutes the parent and trains Google to treat your redirects as soft-404s, which it will eventually start ignoring.

Our rule, in plain language:

  • Page has a clear one-to-one successor in the new IA? 301.
  • Page has a clear many-to-one successor (five hotel pages collapsing into one destination)? 301 to the parent, but only if the parent already covers the topic. If it does not, 301 is a soft-404 in waiting.
  • Page is gone, the topic is gone, nobody links to it, nobody searches for it? 410.
  • Page has backlinks but no equivalent on the new site? 301 to the closest topical parent and accept the dilution — the link equity is worth more than the cleanliness.
  • Page is a thin tag archive or a paginated category page 4 of 12 that never ranked? 410.

We codified that as a function on the inventory table and ran it nightly during the cutover window. The output was a flat redirects.json that the Astro middleware reads at the edge.

// astro middleware, runs on Cloudflare Workers
import redirects from '../data/redirects.json' assert { type: 'json' }

export const onRequest = async ({ request }, next) => {
  const url = new URL(request.url)
  const hit = redirects[url.pathname + url.search] ?? redirects[url.pathname]
  if (!hit) return next()
  if (hit.status === 410) {
    return new Response('Gone', { status: 410 })
  }
  return Response.redirect(new URL(hit.to, url).toString(), 301)
}
Warning

If you 301 a retired page to the homepage, Google will treat it as a soft-404 within a few weeks and the link equity evaporates anyway. You traded a clean signal for a dirty one. 410 is not scary.

Information architecture: the diff, not the redesign

The Joomla site had grown like a forest. Six levels deep in some places, with destination pages nested under continent, country, region, sub-region, and theme. The new IA flattened that to three levels: destination, package, departure. Every node in the old tree had to land somewhere in the new tree, or it had to die.

We built the mapping in a spreadsheet first — 14,243 rows, three columns: old URL, new URL or NULL, disposition. The content lead at the client owned that sheet for six weeks. She knew which Tuscany page was the canonical one and which four were leftover SEO experiments from 2018. No amount of clever scripting replaces that knowledge. We exported the sheet to the inventory table every Friday and diffed it against the previous week.

The slug rule

New URLs are predictable. /bestemming/{country}/{region}/ for destinations, /reis/{slug}/ for packages, /vertrek/{slug}/{yyyy-mm-dd}/ for specific departures. Joomla's old slugs were a museum of past decisions — component/k2/itemlist/category/47-toscane.html and friends. We kept none of them. Predictable beats familiar when the redirect map is doing the lifting anyway.

Structured-data parity, per node

This is the part most migrations skip and most migrations regret. The old site emitted TouristTrip and Offer schema through a Joomla plugin. The rich results were a meaningful share of the SERP real estate — price, duration, rating, departure date all visible before the click.

If the new site emits a subtly different schema, you can lose rich results for weeks while Google re-validates. We built a parity check that runs in CI on every TourPackage that wants to publish. It crawls the staging URL, pulls the JSON-LD, normalises it, and compares it against the production URL's JSON-LD on the same node. Any property present in production and missing in staging fails the check. Extra properties on staging are allowed.

// scripts/schema-parity.ts
import { fetchJsonLd, normalise } from './lib/jsonld'

const REQUIRED = ['name','description','offers','itinerary','image']

export async function parity(slug: string) {
  const [oldLd, newLd] = await Promise.all([
    fetchJsonLd(`https://www.client.nl/reis/${slug}`),
    fetchJsonLd(`https://staging.client.nl/reis/${slug}`),
  ])
  const a = normalise(oldLd, 'TouristTrip')
  const b = normalise(newLd, 'TouristTrip')
  const missing = Object.keys(a).filter(k => !(k in b))
  const requiredMissing = REQUIRED.filter(k => !(k in b))
  if (missing.length || requiredMissing.length) {
    throw new Error(`Parity failed for ${slug}: ${[...missing, ...requiredMissing].join(', ')}`)
  }
}

It catches stupid things. A Sanity field renamed from price to basePrice in a refactor, breaking the Offer.price emit. An image URL going from absolute to relative. A priceCurrency dropped because the editor unchecked it by accident. We caught 47 of these during the migration, and we still run the check on every publish today.

The parity check has since become standard on every migration we run. We have ported it to Drupal-to-Astro, WooCommerce-to-Shopify, and one Magento-to-headless rebuild. The schemas change, the principle does not: if production renders rich results today, staging must render the same fields tomorrow. Browsing the full Schema.org vocabulary is a useful reminder of how many properties a complex node can carry, and how many silent regressions are hiding in a refactor nobody flagged as risky.

The booking handoff

The thin Node worker was the only piece of dynamic code on the new site. The old PHP plugin posted directly to an SOAP endpoint at the GDS provider; we wrapped that in a typed Node service and stuck it behind a single /api/availability route. Same payloads, same upstream, twenty lines of glue. We resisted the urge to refactor the booking flow itself during the migration, because doing two scary things at once is how you generate four scary things. The booking rebuild is its own project, scheduled for Q4.

The cutover window

We cut over on a Sunday evening at 22:00, the lowest traffic point of the week for this business. The DNS swap was the last thing. Before that:

  1. Astro build deployed to its production domain at new.client.nl two weeks earlier, with noindex on every page and basic-auth at the edge.
  2. Google Search Console verified for new.client.nl a week earlier, with no submissions.
  3. The 14,243-row redirects.json tested against a sampled 800-URL set with a script that walked the redirect chain and checked the final status.
  4. A fresh Screaming Frog crawl of staging with the noindex stripped, comparing every canonical, title, meta description, and JSON-LD blob against production.
  5. A rollback plan that was one DNS record away.

At cutover: drop the noindex, drop the basic-auth, swap the apex A and AAAA records, watch tail -f on the worker logs for 30 minutes. Submit the new sitemap to GSC. Wait.

For the first 72 hours we kept the old Joomla install warm on its original IP, with the redirect logic mirrored at the old origin in case we needed to roll DNS back. We never did. We also kept the old sitemap.xml reachable for 14 days so Google could compare and re-crawl at its own pace, which it did: crawl volume on the new domain tripled in the first week and tapered to a steady new baseline by day 21. The monitoring stack was boring on purpose — a single Grafana board with three panels (5xx rate, redirect-hit rate, indexable-page count) and a Slack alert that fired exactly once, on a typo in a robots.txt rule that took eleven minutes to spot and three minutes to fix.

What happened to the traffic

Week one: a 14% dip in organic sessions, which is exactly what every migration playbook tells you to expect and exactly what every founder panics about. Week three: back to baseline. Week six: 4% above baseline, which we are cautiously attributing to the page-speed improvement (LCP dropped from 4.2s to 1.1s on the destination template) rather than to anything we did cleverly.

The rich results came back faster than we feared, around day 11 for most TourPackage nodes. The parity check was the reason. The two nodes that lost rich results both turned out to have been silently broken on the old site for months — Google just had not got around to dropping them yet.

Takeaway

A migration that preserves traffic is mostly an inventory job and a decision-tree job. The new stack is the easy part. The hard part is being honest about which URLs deserve to live.

Things we would do differently

Three things, in retrospect.

One: we should have built the parity check before we wrote a single Sanity schema, not after. We wrote it in week six of an eight-week build, and we found three schema decisions that needed reversing. If parity had been a precondition from day one, those decisions never get made.

Two: we under-budgeted for the content lead's time on the URL mapping spreadsheet. She spent roughly 60 hours on it over six weeks. That was not in the original quote and we ate it, but the lesson is that the human knowledge of which URLs matter is the bottleneck on every migration we have done since. Quote it accordingly.

Three: we picked Cloudflare Workers for the redirect map mostly because we already knew the runtime. In hindsight, a static _redirects file deployed to the same edge would have done the job with fewer moving parts and one less thing to monitor. Workers earn their keep when the rules are dynamic — ours were not, and the JSON file shipped unchanged for the first four months.

When we built the Astro front-end and the Sanity studio for this client, the thing we underestimated was how much of the work was URL accounting versus actual website development. The redirect map was the product. The site was the wrapper around it.

If you are sitting on a Joomla 3 or Drupal 7 site and the EOL clock is ticking, the smallest useful thing you can do today is run a Screaming Frog crawl, export your last twelve months of GSC pages data, and put them in the same spreadsheet. Look at the rows where the two disagree. That is your migration backlog, before you have chosen a single tool.

Key takeaway

A migration that holds its traffic is mostly inventory work and an honest 410-vs-301 decision tree. The new stack is the easy part.

FAQ

Should I 301 every old URL during a CMS migration?

No. 301 the URLs with a clear successor or real backlinks. 410 the URLs that are genuinely gone. Blanket 301s to the homepage become soft-404s and lose the equity anyway.

How long does it take rich results to come back after a migration?

In our experience, around 10-14 days for most nodes if the JSON-LD parity is exact. If properties are missing or renamed, expect weeks of degraded SERP appearance.

Can I do this without a staging environment that mirrors production schema?

Not safely. The parity check between old and new JSON-LD is what catches silent regressions. Without staging on a real subdomain, you are guessing.

Is Astro the right choice for a content-heavy travel site?

For this client, yes. Static output, fast LCP, no plugin ecosystem to babysit. If your editors need WYSIWYG inline editing on the live page, look elsewhere.

How big should the redirect map be before it slows down the edge?

A flat JSON of 14k entries loads in single-digit milliseconds on Cloudflare Workers. We have seen maps of 80k entries with no measurable impact when keyed as an object.

seomigrationjoomlalegacy sitesarchitecturecase study

Building something?

Start a project