Security

Joomla stored XSS: AI vuln-discovery on a legacy intranet

The intranet had been quiet for eight years. Then we pointed an AI-assisted vulnerability scanner at it, and it lit up a comments component nobody had touched since 2011.

Jacob Molkenboer· Founder · A Brand New Company· 5 Jun 2026· 9 min

Leather logbook tied with linen ribbon, brass key on a 2011 index card, cracked green wax seal, rubber stamp, ink pad.

By the time we connected to the client's intranet, the box hadn't seen a Joomla update since the Obama administration's first term. It served around 80 internal users, mostly operations and warehouse staff at a logistics firm in Rotterdam. The login page worked. Stock figures loaded. Nobody had complained.

The board had asked for a security review before migrating to a new system, and the IT lead handed us SSH credentials with a shrug. "Just tell us if it's bleeding."

We pointed Anthropic's open-source framework for AI-powered vulnerability discovery at the codebase. Two hours later, it flagged a stored XSS in a comments component last touched in 2011.

This post is how that happened, what the bug looked like, and what the framework got right and wrong.

The intranet nobody wanted to touch

The system was Joomla 1.7 with a handful of custom components built by a contractor who had long since retired. The database carried fourteen years of warehouse activity, internal announcements, shift schedules, and (it turned out) one comments table that operations staff used as a kind of in-house bulletin board.

Migrating to Joomla 5 was off the table for budget reasons. The plan was to keep the box behind a stricter VPN, freeze it as read-mostly, and replace the bulletin board with a Mattermost channel. But before any of that, the board wanted to know what they had been running.

A normal audit on a codebase this old goes one of two ways. Either a human reads the PHP slowly and gets a partial answer in a week, or a static scanner spits out 600 false positives nobody triages. Neither is great when the goal is "tell us, with confidence, what an attacker on the LAN could actually do."

Aiming the framework

The framework takes a target codebase, a goal description, and a budget. It uses a model to plan attack-surface enumeration, then dispatches sub-agents to read the code, build hypotheses, and try to confirm them. Outputs are ranked by exploitability, not by checklist categories. That ranking is the part that matters in a triage meeting.

We gave it this brief:

Target:   /var/www/intranet (Joomla 1.7.5 + custom components)
Reach:    Authenticated internal user, lowest role
Goal:     Find anything that lets an attacker execute code in
          another user's browser, read another user's data, or
          escalate role to administrator.
Budget:   200 model turns.

The setup was unglamorous. A snapshot of the codebase, a read-only MySQL copy with the production schema (and synthetic data), and a Docker container with PHP 5.6 because the framework wanted to actually run snippets against the target stack. No production traffic, no production data.

What it noticed in the first pass

The framework returned 14 findings ranked by what it called "demonstrated exploitability." Most were tepid. CSRF on a search form. An outdated jQuery. Verbose error pages on /administrator that leaked path strings.

But finding #2 was specific. The artefact read, almost verbatim:

com_warehouseboard: user-submitted comment bodies are stored unescaped and rendered with echo in views/board/tmpl/default.php. Confirmed by inserting payload <svg/onload=alert(1)> via authenticated POST, then loading the board view, which fired the alert in a separate browser session.
Framework run artefact, finding #2

The framework had not only flagged it. It had submitted the payload, fetched the page back as a different session, and confirmed the script executed. That last step is the gap most static analyzers can't cross.

Takeaway

The difference between "this looks vulnerable" and "this is exploitable" is doing the second request as a different user. AI-driven vuln discovery is interesting because it can take that step without a human writing a Burp macro.

The vulnerable code

The component handled comments roughly like this. The contractor had used JRequest::getVar with the JREQUEST_ALLOWRAW flag, which disables the input filter entirely.

// components/com_warehouseboard/controllers/board.php (Joomla 1.7)
function saveComment()
{
    $user = JFactory::getUser();
    if ($user->guest) {
        JError::raiseError(403, JText::_('ALERTNOTAUTH'));
        return;
    }

    $comment = JRequest::getVar(
        'comment',
        '',
        'post',
        'string',
        JREQUEST_ALLOWRAW
    );

    $db = JFactory::getDBO();
    $q  = 'INSERT INTO #__warehouseboard_comments '
        . '(user_id, body, created) VALUES ('
        . (int) $user->id . ', '
        . $db->quote($comment) . ', '
        . $db->quote(date('Y-m-d H:i:s')) . ')';
    $db->setQuery($q);
    $db->query();

    $this->setRedirect('index.php?option=com_warehouseboard&view=board');
}

And the view:

// components/com_warehouseboard/views/board/tmpl/default.php
foreach ($this->comments as $row) :
?>
    <li class="comment">
        <span class="author"><?php echo $row->author_name; ?></span>
        <div class="body"><?php echo $row->body; ?></div>
    </li>
<?php
endforeach;

Two doors left open. Input came in raw because of JREQUEST_ALLOWRAW. Output went out raw because there was no htmlspecialchars and no Joomla JFilterOutput call on the body. Any authenticated user could plant a payload that would fire in every other user's browser the next time they opened the board, including the administrator who reviewed flagged comments.

What stored XSS bought an attacker here

In a public Joomla site, stored XSS is bad. On this intranet, it was worse, because the administrator session also had access to the /administrator backend. From the moment an admin loaded the board, an attacker could:

Read the admin's session cookie, because HttpOnly was not set on this version.
Make backend requests on their behalf to the com_users component and add a new Super User.
Wipe their tracks via the existing comment edit endpoint, which the same controller also failed to filter.

The framework demonstrated steps 1 and 2 against the staging copy. Step 3 we extrapolated from reading the code. From a "lowest-role authenticated user" starting position, full administrative compromise was about six HTTP requests away.

For background on why output encoding matters even when you think the input was clean, the OWASP XSS Prevention Cheat Sheet is still the right canonical reference. The CWE classification, CWE-79, covers exactly this pattern: improper neutralization of input during web page generation.

The fix

We patched the component in place rather than untangle the whole thing. Three changes.

First, stop accepting raw HTML at the door. JREQUEST_ALLOWRAW is almost never what you want for free-text user input on a bulletin board.

$comment = JRequest::getVar('comment', '', 'post', 'string');
$comment = trim($comment);
if ($comment === '' || mb_strlen($comment) > 2000) {
    JError::raiseWarning(400, JText::_('COM_WAREHOUSEBOARD_BAD_INPUT'));
    return;
}

Second, escape on output. The old Joomla helper is fine, but htmlspecialchars with the right flags is just as good and easier to reason about.

foreach ($this->comments as $row) :
    $author = htmlspecialchars(
        $row->author_name, ENT_QUOTES | ENT_SUBSTITUTE, 'UTF-8'
    );
    $body = nl2br(htmlspecialchars(
        $row->body, ENT_QUOTES | ENT_SUBSTITUTE, 'UTF-8'
    ));
?>
    <li class="comment">
        <span class="author"><?php echo $author; ?></span>
        <div class="body"><?php echo $body; ?></div>
    </li>
<?php endforeach; ?>

Third, add a Content Security Policy header that would have neutered the original payload even if we had missed both other fixes. On Joomla 1.7 the cleanest place is the template's index.php, not an .htaccess rule, because the application controls some response headers itself.

JResponse::setHeader(
    'Content-Security-Policy',
    "default-src 'self'; script-src 'self'; object-src 'none'; "
    . "base-uri 'self'; frame-ancestors 'none'",
    true
);

We rolled the patch in a Monday morning maintenance window, then ran the framework again with the same brief. The XSS finding dropped off the list. Two of the other lower-severity findings, both related to header hardening, also disappeared because the CSP covered them.

What the framework got right and wrong

Right: it read enough of the codebase to know that JREQUEST_ALLOWRAW was a constant defined in libraries/joomla/environment/request.php that disabled the input filter. It traced the variable from saveComment into the #__warehouseboard_comments table and then back out into the view. It also noticed that the same controller had an editComment endpoint with the same flaw, which we would have missed on a first read because the file was 900 lines long.

Wrong: it confidently reported a SQL injection in the same component that turned out to be a false positive. The $db->quote call was correctly escaping the value; the framework had assumed quote was a no-op based on a method signature it misread. We only caught it by writing the payload by hand and watching the query log show a properly escaped string.

Warning

Treat AI-flagged findings the same way you would treat a junior pen-tester's report. The good ones come with a working payload and a reproduction step. The shaky ones come with adjectives.

What we kept from the run

Habits, mostly. The framework left us a structured artefact for every finding: target file, vulnerable line range, payload, reproduction transcript. We now keep that artefact next to every security ticket. It cuts the back-and-forth with the developer who has to fix the issue, because the reproduction is no longer a sentence in a Jira comment, it is an executable curl.

We also stopped treating "the codebase is too old to scan" as a real reason to skip scanning. Fifteen-year-old PHP is exactly where these tools shine, because the patterns they were trained to recognise (unescaped echo, raw request vars, missing CSRF tokens, dangerous string concatenation into queries) are everywhere in code that predates modern frameworks. A senior PHP reviewer would find the same bugs eventually. The framework found them in an afternoon and brought receipts.

One footnote on cost. The two-hour scan burned through roughly its full 200-turn budget. Compared to a week of human pre-audit, it was a rounding error. Compared to having the contractor's bulletin board exfiltrate an admin session over a long weekend, it was free.

What you can do in the next hour

If you run any internal app older than five years, grep the codebase for ALLOWRAW, for echo $_POST, for innerHTML =, and for eval. Read what those calls do. If you find one that handles user-submitted content, you almost certainly have the same shape of bug we found. Patch the output first with htmlspecialchars on every echo, then the input, then add a CSP header so the next mistake fails closed.

When we built the secure-migration plan for the Rotterdam client, the painful part wasn't the patch. It was reconstructing what fourteen years of staff had typed into a component nobody had ever audited. That kind of slow excavation is where the legacy migration work we do at ABN tends to start, and where AI vuln-discovery tooling has earned a permanent slot in our security checklist.

Key takeaway

AI vuln-discovery shines on old PHP because the dangerous patterns are everywhere and the tooling can now confirm exploitability with a second request as a different user.

FAQ

Does the framework work on modern PHP code, not just legacy Joomla?

Yes. We ran it against PHP 5.6 for this case study, but the patterns it detects are language-agnostic. We have used it against PHP 8.2 and Node services with comparable results.

Can you point it at a live production system?

Don't. It will try payloads against real endpoints. Always work from a snapshot of the code and a read-only copy of the database in an isolated container. Treat it like any dynamic scanner.

How long did the scan and triage take end to end?

Two hours of wall-clock for the codebase scan, about an hour for the team to verify findings, and a Monday morning window to deploy the patch. Less than a working day in total.

Does this replace a human pen-test?

No. It's a strong front-loaded pass that surfaces obvious and reproducible bugs. Human review still matters for business-logic flaws and chained exploits the framework doesn't model.

securityjoomlaphplegacy sitescase study

Building something?

Start a project