← Blog

RAG

RAG on Drupal 9: a 45-minute retrofit without editor pain

Forty-five minutes from composer require to a working /api/rag/search endpoint, with the Drupal editors noticing nothing. Here is the exact playbook we use.

Jacob Molkenboer· Founder · A Brand New Company· 20 Mar 2024· 9 min
Open wooden index-card drawer on ivory paper, one card with green tab, brass divider, cream ledger stack, side light.

It's Tuesday. Your support lead asks, again, why the help-centre search still serves the 2019 return-policy PDF when there's a clean rewrite published in May. The Drupal site has roughly 1,400 knowledge-base nodes. The editors will not move to Notion. The CFO will not sign off on a re-platform. You have a paid plan with a model vendor and an afternoon free.

This is the playbook we use when a client wants retrieval-augmented generation grafted onto an existing Drupal 9 install without touching the editorial flow. Forty-five minutes from composer require to a working /api/rag/search endpoint. The editors notice nothing on the first save. They notice a small status line on the second.

The one constraint that matters

RAG retrofits fail when they fight the CMS instead of riding it. The editors already have a place where they save content: the node form. Whatever you build has to treat that save as the only source of truth. No second CMS. No "please copy your article into the embeddings tool". No nightly export jobs that drift out of sync with the published state, then get blamed when the chatbot quotes a draft.

So the architecture is fixed before you start. Drupal stays the system of record. A sidecar handles embeddings and vector search. The link between them is a single entity hook with a queue behind it. Everything else is a detail you can swap out without breaking the editor's day.

Warning

Drupal 9 reached end-of-life on 1 November 2023. If you're retrofitting RAG onto D9, you should also be planning the D10 jump. The hook signatures here carry over cleanly, but the security clock has been ticking for over two years.

Minute 0 to 10: inventory before you write a line

Open the database. Count what you're actually indexing.

SELECT type, COUNT(*)
FROM node_field_data
WHERE status = 1
GROUP BY type
ORDER BY 2 DESC;

You're looking for two things. First, the content types that hold real answers (often kb_article, faq, product, policy). Second, the ones that emphatically do not (landing pages, redirects, taxonomy stubs, image-only galleries). The whole RAG quality story starts here. If you embed the navigation nodes, your retriever will return "Contact us" for every other query and the support team will lose faith in the bot by Thursday.

Write the allowlist down. Three or four content types is normal. If the site is multilingual, check whether body fields are translated per node or whether you have one node per language. The retriever has to know which language slot to compare against, so the answer to that question feeds straight into the chunk schema you're about to create.

Minute 10 to 15: pick the vector store

Three reasonable answers in 2026. pgvector if you already have Postgres in the stack (most ops teams do). Qdrant or Weaviate if you want a separate service with a clean HTTP API and you're comfortable running it. A managed vendor if you don't want to run anything and your finance team is okay with per-query pricing.

For a 1,400-node knowledge base, pgvector on the same Postgres instance the Drupal site already uses is hard to beat. One backup, one set of credentials, one network boundary, and your ops team doesn't have to learn a new service. Create the table now.

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE rag_chunks (
  id BIGSERIAL PRIMARY KEY,
  node_id INT NOT NULL,
  node_type TEXT NOT NULL,
  revision_id INT NOT NULL,
  langcode TEXT NOT NULL DEFAULT 'en',
  chunk_index INT NOT NULL,
  body TEXT NOT NULL,
  embedding VECTOR(1536) NOT NULL,
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX rag_chunks_node_idx ON rag_chunks (node_id, langcode);
CREATE INDEX rag_chunks_embedding_idx
  ON rag_chunks USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

Minute 15 to 30: the sync hook

This is the only piece of Drupal code that has to live inside the site. Make it a small custom module, not a contrib patch.

# modules/custom/abn_rag/abn_rag.info.yml
name: 'ABN RAG sync'
type: module
description: 'Push knowledge-base nodes into the vector store on save.'
core_version_requirement: ^9.5 || ^10
package: 'ABN'
dependencies:
  - drupal:node
// modules/custom/abn_rag/abn_rag.module
<?php

use Drupal\Core\Entity\EntityInterface;

const ABN_RAG_TYPES = ['kb_article', 'faq', 'policy'];

function abn_rag_node_insert(EntityInterface $node) {
  abn_rag_queue($node, 'upsert');
}

function abn_rag_node_update(EntityInterface $node) {
  abn_rag_queue($node, 'upsert');
}

function abn_rag_node_delete(EntityInterface $node) {
  abn_rag_queue($node, 'delete');
}

function abn_rag_queue(EntityInterface $node, string $op) {
  if (!in_array($node->bundle(), ABN_RAG_TYPES, true)) {
    return;
  }
  if ($op !== 'delete' && !$node->isPublished()) {
    return;
  }
  \Drupal::queue('abn_rag_sync')->createItem([
    'op'  => $op,
    'nid' => (int) $node->id(),
    'vid' => (int) $node->getRevisionId(),
    'lang' => $node->language()->getId(),
  ]);
}

Three things to notice. The hook never embeds inline; it only enqueues, so a slow embeddings API call cannot block the editor's save. It keys on the revision ID, which means a revert pulls a fresh upsert through the same code path as any other update. And it short-circuits on unpublished nodes, which keeps drafts out of the index until they actually ship. That last guard is the one piece you should not omit, especially if the site uses Drupal's content moderation workflow.

The worker is plain Drupal queue API. The cron runner picks it up every minute on a normal site; on a busier site, run drush queue:run abn_rag_sync from a supervisor process.

// src/Plugin/QueueWorker/RagSyncWorker.php
namespace Drupal\abn_rag\Plugin\QueueWorker;

use Drupal\Core\Queue\QueueWorkerBase;
use Drupal\node\Entity\Node;

/**
 * @QueueWorker(
 *   id = "abn_rag_sync",
 *   title = @Translation("RAG sync"),
 *   cron = {"time" = 30}
 * )
 */
class RagSyncWorker extends QueueWorkerBase {
  public function processItem($data) {
    $indexer = \Drupal::service('abn_rag.indexer');
    if ($data['op'] === 'delete') {
      $indexer->delete($data['nid']);
      return;
    }
    $node = Node::load($data['nid']);
    if ($node && $node->isPublished()) {
      $indexer->upsert($node, $data['lang']);
    }
  }
}

The RagIndexer service does the real work: strip the CKEditor HTML to plain text, chunk on paragraph boundaries with a 200-token overlap, hit the embeddings endpoint, write to rag_chunks. Roughly sixty lines of PHP if you keep it boring. Keep the per-chunk token count conservative; editors paste long tables and you do not want a single chunk to blow the context window of whatever model the downstream support bot uses.

Minute 30 to 40: the retrieval endpoint

Expose one read-only route. No write surface, no auth bypass, no "oh we'll add a search endpoint to the same controller later" temptation.

# abn_rag.routing.yml
abn_rag.search:
  path: '/api/rag/search'
  defaults:
    _controller: '\Drupal\abn_rag\Controller\RagSearchController::search'
  methods: [POST]
  requirements:
    _permission: 'access content'
public function search(Request $request) {
  $payload = json_decode($request->getContent(), true);
  $query = trim($payload['q'] ?? '');
  $k     = min((int) ($payload['k'] ?? 5), 20);
  $lang  = $payload['lang'] ?? 'en';

  if ($query === '') {
    return new JsonResponse(['hits' => []]);
  }

  $vec = $this->embedder->embed($query);
  $rows = $this->connection->query(
    'SELECT node_id, body, 1 - (embedding <=> :v) AS score
     FROM rag_chunks
     WHERE langcode = :lang
     ORDER BY embedding <=> :v
     LIMIT :k',
    [':v' => $vec, ':lang' => $lang, ':k' => $k]
  )->fetchAll();

  return new JsonResponse(['hits' => $rows]);
}

The <=> operator is pgvector's cosine distance. Lower is closer; we flip it to a similarity score so the chatbot prompt can use a threshold like score > 0.78 to decide whether to answer at all. Refusing to answer when retrieval is weak is the single biggest quality lever you have. Most "hallucinating chatbot" complaints we see in client audits trace back to a retrieval layer that never said "I don't know" and a generation prompt that was happy to invent something for the top-1 chunk no matter how bad it was.

Minute 40 to 45: give the editors a sanity light

This is the bit that earns you trust with the people who maintain the content. Add a small block to the node edit form that shows the chunk count and last-indexed time for the current node. Two lines of hook_form_node_form_alter, one query against rag_chunks, and a render-array snippet.

When an editor saves a policy update and a moment later sees "indexed 4 chunks, 2 seconds ago" appear under the title, they stop wondering whether the bot got the memo. That tiny visible feedback loop is worth more than any internal dashboard. It also gives QA a place to verify before they ship, which means fewer Slack pings to your inbox.

What we deliberately did not build

No re-ranker. No hybrid keyword + vector fusion. No semantic cache. No per-tenant access control. No image embeddings. All of those are real, and several are useful, and all of them are post-45-minute concerns. The goal of the retrofit is to get a defensible baseline into production today so you can measure where it actually breaks before you spend a week solving a problem the support tickets do not have.

Multimodal embeddings, hybrid sparse-plus-dense retrieval, and graph-augmented variants are all real techniques worth reaching for when the data demands them. If your knowledge base is 1,400 text nodes, you do not have those problems yet. Solve the boring text retrieval first. Add the fancier pieces only when the ticket data tells you exactly which one to add.

The five-minute test worth writing first

Before you call it done, write one PHPUnit kernel test that creates a node, runs the queue, queries the endpoint, and asserts that the node comes back top-1. It will fail the day a Drupal point release changes a hook signature or an embeddings provider quietly rotates its model. That is exactly the day you want to know.

public function testIndexedNodeIsRetrievable() {
  $node = $this->createNode([
    'type'  => 'kb_article',
    'title' => 'Return window',
    'body'  => 'Customers may return items within 30 days of delivery.',
  ]);
  $this->runQueue('abn_rag_sync');
  $hits = $this->ragSearch('how long do I have to return something');
  $this->assertSame((int) $node->id(), (int) $hits[0]['node_id']);
}

What we ran into on a real one

When we built the support-bot retrofit for a Dutch insurance broker's Drupal 9 portal last quarter, the thing we ran into was not the embeddings or the vector store. It was the editorial workflow: their compliance team revises policy nodes through Drupal's content moderation states, and the first version of our sync happily pushed in-review drafts to the index because hook_entity_update fires on every transition. We solved it with the isPublished() guard above and a separate listener for the published moderation event, so editors could keep using the workflow they trained on. That kind of footnote is what separates retrofitting AI agents onto a legacy CMS from greenfield work.

If you want to try this today: run the SQL block above against a staging copy of your Postgres, scaffold the module skeleton, and time yourself. Forty-five minutes is achievable. The bit that takes the rest of the week is deciding which of your content types actually deserves to be in the index, and that conversation is best held with the editors over coffee, not over a Jira ticket.

Key takeaway

Treat Drupal as the source of truth and bolt RAG on as a queue-driven sidecar. Editor flow stays untouched, and vector freshness comes for free with every save.

FAQ

Why pgvector instead of a dedicated vector database?

Because the Drupal site already has Postgres. One backup, one credential, one network boundary. Switch to Qdrant or a managed vendor only when query volume or vector count makes pgvector slow, not before.

Does this work on Drupal 10?

Yes, with no code changes. The info.yml already declares compatibility with 9.5 and 10. The hook signatures, queue API, and routing format are identical across both releases.

Will indexing slow down node saves for editors?

No. The hook only enqueues. The embedding call and the vector write happen in the queue worker, which runs out of band. The editor's save returns as fast as it did before the module was installed.

How do you keep unpublished drafts out of the chatbot?

The hook short-circuits on isPublished() before enqueueing, and the worker re-checks it before upserting. Drafts and moderation states never reach the vector store, even if an editor saves repeatedly.

ragdrupalknowledge baseai agentsphparchitecture

Building something?

Start a project