AI agents

LLM provenance audits: 14 tells of a merged checkpoint

It is June 2026. You have eight weeks before EU AI Act Article 50 applies, and the artifact your vendor delivered as a proprietary fine-tune looks suspicious.

Jacob Molkenboer· Founder · A Brand New Company· 16 Jun 2026· 9 min

Manila dossier ajar with carbon copies, brass wax seal, green wax drop, red ribbon tag on ivory desk.

It is a Tuesday morning in June. You are the CTO at a Dutch fintech, and Article 50 of the EU AI Act starts applying in eight weeks. Your vendor has just delivered the artifact you paid sixty thousand euros for: the "proprietary fine-tune of our in-house 12B foundation model" that the sales deck promised. You unzip it. You open config.json. Five seconds later you have the answer, and it is not the one you wanted.

This is a field guide to the fourteen ways a self-hosted LLM provenance audit unravels when the "proprietary" model is actually a quiet merge of two checkpoints somebody pulled off Hugging Face. We rank them by how much effort they cost you. The first seven you catch with a text editor. The next three need a diff against an upstream repository. The last four need a GPU and an afternoon.

Why this audit is now your problem

Until recently, "where did the weights come from" was a curiosity question. Article 50 of Regulation 2024/1689 changes that. As a deployer of a general-purpose AI system you owe specific transparency to the people interacting with it, and you cannot make those disclosures honestly if you do not know what is inside the box. The provider obligations under Articles 53 and 55 are stricter still, and the GPAI Code of Practice has pushed the chain of custody all the way down to the training-data summary.

The risk is not theoretical. Procurement teams that used to take "we trained it ourselves" on faith now have tooling that contradicts the vendor in minutes. The same open ecosystem that made model-merging trivial also made detection cheap: anyone with the artifact, a text editor, and a weekend can verify whether the disclosure matches the weights. The forensics methods are public. When the artifact disagrees with the marketing, the Article 50 obligations fall on you, not on the integrator who delivered it.

The good news is that fakes leave fingerprints. Here are fourteen of them.

The config.json layer: seven tells, five minutes each

Open the artifact. Every Hugging Face-shaped model has a config.json at the root, and merged-checkpoint forgeries leak through it like a wet roof.

The _name_or_path leftover. The single most common slip. When you save a model after from_pretrained(), the transformers library writes the original repo ID into this field unless somebody scrubs it. We have audited four vendor models in the last quarter where this field still read meta-llama/Llama-3.1-8B-Instruct or mistralai/Mistral-7B-v0.3. End of audit.

The architecture signature. architectures is an array with one entry, and that entry is the class name of a published family: LlamaForCausalLM, Qwen2ForCausalLM, MistralForCausalLM. A truly proprietary architecture would either subclass with a new name or carry an auto_map pointing to vendor code in the repo. If you see neither, the "in-house architecture" claim is a lie.

The hyperparameter shape. hidden_size, intermediate_size, num_attention_heads, num_key_value_heads, and num_hidden_layers form a fingerprint. Compare them against the top ten open releases at that parameter count. If all five match a public model byte for byte, the vendor did not pretrain. They downloaded.

diff <(jq -S '.architectures, .hidden_size, .intermediate_size, .num_attention_heads, .num_key_value_heads, .num_hidden_layers' vendor/config.json) \
     <(jq -S '.architectures, .hidden_size, .intermediate_size, .num_attention_heads, .num_key_value_heads, .num_hidden_layers' upstream/config.json)

The RoPE telltale. rope_theta and rope_scaling get tuned only when somebody has actually extended context. If the vendor claims an extended context window but rope_theta is still 500000.0 and rope_scaling is null, the context extension exists in the marketing PDF only.

The vocab_size that did not grow. Real fine-tunes add tokens. Medical models add Latin stems, legal models add citation tokens, and Dutch fine-tunes add diacritics and compound morphemes that base BPE shreds. A vocab_size identical to the base model, plus a tokenizer with the same special-token IDs, is a strong signal that nothing happened above the embedding layer.

The transformers_version giveaway. When a vendor claims a custom training stack, the saved transformers_version field should be inconsistent or absent. If it reads 4.44.2, exactly matching the upstream release date, the model was probably loaded, lightly poked, and re-saved through vanilla transformers.

The quantization story. The README claims AWQ for inference. quantization_config is missing. torch_dtype is bfloat16. The story does not survive the file.

None of these tells is conclusive on its own. A tired engineer at a real lab might leave _name_or_path in a config after a legitimate continued pretraining. What matters is the cluster. Three or more tells from this list, and you are almost certainly looking at a thin wrapper.

The tokenizer layer: three tells that need a diff

The tokenizer is the part vendors most often forget to disguise, because they assume nobody looks. Download the upstream tokenizer, then compare.

The SHA-256 of tokenizer.json. A real fine-tune that adds tokens or modifies the chat template changes this hash. If the vendor's tokenizer.json hashes to exactly the upstream value, the tokenizer was untouched. The combination "untouched tokenizer plus fine-tuning claim" is plausible, but it must be in the disclosure.

sha256sum vendor/tokenizer.json upstream/tokenizer.json

The BPE merge order. Open both tokenizer.json files, scroll to model.merges, and compare the first fifty entries. The order is deterministic with respect to the corpus used to train the BPE. Identical merge order across the first hundreds of entries means identical training data, which means the vendor used the upstream BPE and called it theirs.

The special-token ID block. tokenizer_config.json carries bos_token, eos_token, pad_token, and any chat-template tokens with their IDs. If the IDs match upstream verbatim and the chat template is the upstream Jinja string with one comma changed, the vendor "instruction-tuned" by editing a string.

The weight layer: four tells that need a GPU and a Sunday

If the config and tokenizer layers still leave doubt, you go to the weights themselves. This is where merge forensics live.

Embedding cosine similarity. Load the vendor model and a candidate base. For every token in the vocabulary, compute the cosine similarity between the two embedding vectors. A genuine fine-tune drags the distribution down to a long tail around 0.85 to 0.95. A merged-checkpoint forgery shows a tight spike above 0.999, because no gradients ever touched the embedding table.

import torch
from transformers import AutoModelForCausalLM

vendor = AutoModelForCausalLM.from_pretrained("./vendor", torch_dtype=torch.float32)
upstream = AutoModelForCausalLM.from_pretrained("upstream/model-id", torch_dtype=torch.float32)

v = vendor.get_input_embeddings().weight
u = upstream.get_input_embeddings().weight
cos = torch.nn.functional.cosine_similarity(v, u, dim=-1)
print(cos.mean().item(), cos.median().item(), (cos > 0.999).float().mean().item())

The bimodal layer delta. Compute the elementwise difference between vendor weights and two suspected upstream sources. Plot the per-layer norm. A clean fine-tune produces a smooth gradient. A SLERP or TIES merge from mergekit produces a characteristic two-cluster pattern, with layers near the embedding and head leaning to one parent and middle layers leaning to the other. Once you have seen the shape, you recognise it instantly.

Layer-norm distribution. The gamma and beta vectors in RMSNorm and LayerNorm are tiny, statistically rich, and almost never modified by light fine-tuning. Run a Kolmogorov-Smirnov test between vendor and candidate base. If p > 0.5 across every layer, the norms were copied. Real fine-tuning shifts them, even slightly.

Generation reproducibility. Fix a seed. Greedy-decode fifty domain prompts on the vendor model and on the candidate base. If output token IDs match for more than thirty of the fifty, the vendor model is functionally the base. This is the test that holds up in front of a regulator, because it is reproducible from the artifact alone.

What a real merge looks like, in practice

Merging is not the problem. It is legitimate research, and excellent open models like the SLERP-merged Marcoro14 are built that way. The problem is merging without disclosure, especially when sold as proprietary fine-tuning. Public recipes leave residual signatures: mergekit publishes the YAML that produces these artifacts, and the open-source community publishes the techniques to spot them. Both sides of the audit are public, which means any motivated CTO can run both before the AI Act paperwork is due.

Takeaway

Provenance is a chain of files, not a vendor's word. If config.json, tokenizer.json, and the embedding matrix all say "Llama-3.1-8B with the serial numbers filed off", the disclosure you owe under Article 50 starts there.

The audit you can run before lunch

Open the artifact. Run jq on config.json and check for the seven tells above. Run sha256sum on the tokenizer files against the three most likely public bases. If anything is suspicious, schedule the GPU job for the weekend. You will know more about the model in an hour than the vendor told you in three months.

When we built the document-classifier agent for a Rotterdam logistics group last winter, the "fine-tuned" model their previous integrator delivered turned out to be a vanilla Qwen2 with a system prompt and a renamed folder. We rebuilt it as honest in-house AI agents with a documented training trail, because the Article 50 paperwork has to match the artifact, and the artifact has to match the truth.

Key takeaway

If config.json, tokenizer.json, and the embeddings all match a public base, your Article 50 disclosure starts there, not with the vendor's marketing PDF.

FAQ

What does Article 50 of the EU AI Act actually require?

Article 50 sets transparency obligations for providers and deployers of certain AI systems, including disclosure when a user is interacting with AI and when content is AI-generated. Key provisions apply from August 2026.

Is merging two open checkpoints illegal?

No. Model merging is a legitimate research technique. The problem is selling a merged model as proprietary fine-tuning without disclosure, which breaks transparency obligations under Article 50 and most procurement contracts.

Can I run this audit without a GPU?

The first ten tells need only jq, sha256sum, and a text editor. The last four weight-level tells need a GPU and a few hours, but those are the conclusive ones if config and tokenizer evidence is ambiguous.

What if the vendor refuses to share config.json?

That is the audit result. A vendor that will not show you the artifact's metadata cannot meet Article 50 transparency requirements either, and the procurement decision follows from that fact alone.

ai agentsarchitecturesecuritystrategytooling

Building something?

Start a project