AI agents

Model provenance audits: when 'proprietary' is just a LoRA

A junior engineer pushed unmerged base weights to a public Hugging Face repo on Tuesday morning, and a 29-person Apeldoorn vendor's 'proprietary LLM' was outed as a LoRA on Llama 3.1 by lunchtime.

Jacob Molkenboer· Founder · A Brand New Company· 15 Jun 2026· 9 min

Brass relay switch and cream index card with sliced wax seal, green ribbon trailing across bone-paper desk.

The CTO called at 09:14 on a Tuesday. By 09:30 the vendor's 'proprietary Dutch legal LLM' had been outed on a subreddit for AI tinkerers, the diff had 180 upvotes, and someone in Munich had already published a how-to-reproduce. The vendor is a 29-person company in Apeldoorn. Four of the Netherlands' twenty largest law firms hold paid licences.

At 02:11 that morning, a junior engineer pushed a feature branch to what he assumed was the vendor's private Hugging Face organisation. It was public. The push included the unmerged base weights checkpoint, not just the fine-tuned adapter that the public docs implied existed in isolation. Inside ninety minutes, a developer with safetensors installed had compared SHA-256 hashes against the publicly distributed Meta release. They matched. The 'in-house, ground-up' model the vendor had been pitching was, in technical reality, a LoRA adapter of about 180 million parameters sitting on top of an unmodified Llama 3.1 70B base.

We have seen the same shape twice in the last six months at other vendors a client asked us to evaluate: a stated 'in-house' model that on inspection was an open base with a thin adapter on top. The pattern is becoming common enough that we wrote down the model provenance audit we now run on every model a client says they trained, and on every model a client is being asked to buy.

What the vendor was selling

The pitch deck described a 'purpose-built Dutch legal language model, trained on 40 years of curated Hoge Raad rulings and a private corpus of partner-firm contracts.' The architecture diagram showed a single grey rectangle labelled 'Vendor LLM.' There was no mention of a base model, no acknowledgement of upstream provenance, no 'Built with Llama' notice anywhere in the docs.

The technical reality, once you ran the audit: an unmodified Meta Llama 3.1 70B base, plus a LoRA adapter trained on roughly 8,400 case summaries and a Dutch contract corpus. The total trainable parameter count was around 0.26% of the model. The compute spend was perhaps €4,200 of H100 time. Not nothing, but not 'we built a foundation model.'

That distinction matters, and not only as a marketing question. Meta's Llama 3.1 Community License requires that any product or service using the model display 'Built with Llama' prominently. It also forbids using Llama outputs to train other LLMs that compete with Llama, and adds a 700M-MAU clause. If you ship a wrapped Llama as a proprietary product without naming it, you are out of compliance with the licence you accepted when you pulled the weights.

Warning

Pulling a Llama checkpoint and shipping it as your own product without the required attribution is not a grey area. It is a licence breach. Your enterprise customers' legal teams know this and will eventually ask.

The first hour

We came in as their fractional ML lead. The phone call was somewhere between a confession and a request for cover. Three things had to happen in parallel before noon.

One, take the public repo private and revoke the Hugging Face token the junior had used. Two, write a customer-facing note: short, factual, no spin. Three, decide on the architectural truth the company would defend in writing from now on. Number three was the hard one.

The CTO's first instinct was to argue the LoRA was 'still proprietary because the adapter weights are ours.' That argument is technically accurate and commercially worthless. A customer paying €78,000 a year for a 'Dutch legal LLM' does not care whether the 180M-parameter adapter is original. They care whether they got what was promised. They didn't.

The four-step provenance audit

We wrote this up the following week as a checklist for our own due-diligence process. We now run it on every model a client says they trained, and on every model a client is being asked to buy. It takes one engineer about three hours.

1. Hash the base weights

If a vendor will give you the model files, the cheapest check is SHA-256 hashing the safetensors shards and comparing them against the public Meta, Mistral, Qwen, and DeepSeek releases. If any shard matches, you are looking at an unmodified base. This is exactly how the Apeldoorn vendor was outed: not by clever interpretation, by file hashing.

#!/usr/bin/env bash
# hash-shards.sh - dump SHA-256 for every safetensors shard
set -euo pipefail

MODEL_DIR="${1:?usage: hash-shards.sh /path/to/model}"

cd "$MODEL_DIR"
for f in *.safetensors; do
  sha256sum "$f"
done | sort

Run that against the vendor's release, then run it against the public base model, and compare. If they share even one shard, the 'from scratch' claim is false. The technique works because most teams that fine-tune via LoRA do not modify the base shards at all. The adapter is a separate file, by design.

2. Check the tokenizer

Tokenizers are the cheapest fingerprint. A team that trained a model from scratch chose a tokenizer. They will be able to explain that choice in technical detail and show you the training-corpus statistics behind the vocabulary. A team that fine-tuned an open base will ship the upstream tokenizer untouched. The tokenizer.json file hash will match the upstream release exactly.

from transformers import AutoTokenizer
import hashlib, json

def fingerprint(path: str) -> str:
    vocab = AutoTokenizer.from_pretrained(path).get_vocab()
    sorted_pairs = sorted(vocab.items(), key=lambda kv: kv[1])
    blob = json.dumps(sorted_pairs, ensure_ascii=False).encode()
    return hashlib.sha256(blob).hexdigest()

print("vendor:", fingerprint("./vendor-model"))
print("llama :", fingerprint("meta-llama/Llama-3.1-70B"))

If those hashes match, the vendor did not train a tokenizer. That alone does not prove the model is a derivative (some teams reuse tokenizers intentionally), but combined with shard hashes it is decisive.

3. Compare on a held-out evaluation

Pick five prompts that are obscure enough that an unmodified base would answer them in a recognisably 'Llama' or 'Mistral' way: long, hedged, with the same characteristic refusal phrasing. Then run the vendor's model on the same prompts at temperature 0. If the answers carry the upstream model's stylistic fingerprints (the same opening clauses, the same safety disclaimers, the same hallucination patterns on Dutch case names), you have your second piece of evidence. A LoRA adapter trained on 8,000 documents does not erase the base's voice.

4. Read the licence tree

Every transitively included weight in a vendor model carries a licence. Llama 3.1 has the Meta Community License, with attribution, acceptable-use rules, and the MAU clause. Mistral has Apache 2.0 on the older releases and the Mistral Research License on others. Qwen has its own terms. DeepSeek's licensing changed twice in 2025. If the vendor cannot tell you which licences apply to which parts of their stack, in writing, they have not done the work.

For our enterprise clients we now require a signed Model Bill of Materials with every procurement: base model and version, fine-tuning method, dataset provenance, and the exact licence clauses the vendor relies on. It is a one-page document. The fact that asking for it shrinks the candidate-vendor list by half is informative.

What the vendor did next

They wrote the note. It went out the same afternoon. The version we shipped after two redrafts said: 'The base model in our product is Meta Llama 3.1 70B, used under the Llama 3.1 Community License. Our fine-tuning adapter and Dutch legal training corpus are proprietary to us. We have updated our product page and documentation to reflect this.'

Two of the four law firms accepted that note as sufficient. One asked for a discount and got 30%. One terminated the contract within ninety days. Total commercial damage was somewhere around €310,000 in lost ARR plus legal fees, against a product that was technically working. The damage was almost entirely caused by the gap between what was promised and what shipped, not by the LoRA-on-Llama architecture itself. A vendor that says day one 'we fine-tune Llama 3.1 for Dutch legal work and here is the corpus we use' has a defensible business. A vendor that hides it has a press cycle.

What we changed in our own process

We do model work for clients in two modes: we build agents on top of someone else's foundation model, and we evaluate vendors that the client is considering buying from. After this incident we made three changes.

First, the four-step audit above is now part of every vendor evaluation we run. It is also part of our own internal pre-flight before we attribute any model claim in customer docs. If we wrote 'trained on' we have to be able to defend it under hash inspection.

Second, we require a written Model Bill of Materials from any sub-vendor before integrating their model into a client product. The template is short. If a vendor cannot produce one, that is its own answer.

Third, on Hugging Face we treat any push to a public organisation as a deploy: protected branches, required reviewers, no service tokens with write access on a developer laptop. The Apeldoorn incident was, at root, a deploy-to-prod accident with model weights playing the role of credentials. The fix is boring: branch protection, scoped tokens, a CI check that refuses to push anything over 1 GB without an explicit flag.

The plain version

If a vendor claims they trained a model, the cheapest way to verify is to hash the weights. The second cheapest is to hash the tokenizer. The third cheapest is to read the licence file. The fourth is to ask them, in writing, which base model and which version they fine-tuned from. The four together take an afternoon. They will catch most of what is currently being mis-sold as proprietary in the European AI vendor market.

When we did the unwinding work for the Apeldoorn vendor we rewrote their customer-facing model documentation and rebuilt the procurement story they could honestly defend in front of a law firm's general counsel. That rewrite is the kind of work we do as part of our AI agents and model work: not selling models, but making sure the models a business depends on are described accurately enough to survive an auditor with a hash command.

The smallest thing you could do today: run sha256sum on the largest safetensors shard your in-house or vendor-supplied model ships, and compare it against the upstream open releases on Hugging Face. If it matches, you have a conversation to have before someone else has it for you.

Key takeaway

If a vendor says they trained an LLM, hash the safetensors shards before you sign the contract. It catches most of what is sold as proprietary today.

FAQ

What is a LoRA adapter?

A small set of trainable weights, often 0.1 to 1% of the base model, that adjusts behaviour without modifying the base. Cheap to train, cheap to ship, easy to mis-sell as proprietary if you hide the base.

Is fine-tuning Llama 3.1 and selling the result allowed?

Yes under the Meta Community License, but you must display 'Built with Llama' attribution and follow the acceptable-use policy. Hiding the base model breaches the licence you accepted.

Can hashing really prove a model is derivative?

It can prove the base weights are unchanged copies of a public release. Combined with a tokenizer hash match it is decisive evidence that no training-from-scratch happened.

How long does the provenance audit take?

For one engineer with shard access, about three hours. Without shard access (API-only vendor), it relies on tokenizer fingerprints and behavioural tests and takes closer to a day.

ai agentsarchitecturestrategyoperationscase studybusiness

Building something?

Start a project