AI agents
AI-native playbook: sixteen ways Dutch SMEs misfire on agents
Your competitor read the same Hacker News post you did. Six agents later, you're paying GPT-4 to decide whether to send an email. Here's the field-guide back.

It is a Tuesday in Utrecht. Your technical co-founder is on his fourth coffee. The screen shows a LangGraph diagram with six agent nodes, a supervisor, three tool nodes, and an arrow that loops back on itself like a question mark. The actual job, pulling new orders from the Magento webhook, checking stock against the WMS, and pinging the supplier when something dips below threshold, used to be a forty-line Python script on cron. He read the founder's playbook on Hacker News last month, the one telling you to build AI-native, agent-first, model-as-the-core-loop. You both got excited. Now Vercel bills €1,420 a month and the supplier still gets the email an hour late.
This is the field-guide back.
The playbook is right, for a different company
The post that started it, The founder's playbook: Building an AI-native startup, is not wrong. It is written for a company whose product only exists if a model holds it together. A research assistant. A coding agent. A workflow whose value lives in the model's judgment, not in the deterministic plumbing around it.
Your company has a Magento store, a Mollie integration, twelve employees in Eindhoven, €4M of revenue, and a backlog of operations work whose biggest pain is that nobody chases the invoices on time. Different physics. The playbook treats the model as the wedge. For you, the model is the seasoning.
Below: sixteen specific ways the AI-native playbook misfires in a Dutch SME context, ranked by what it costs to undo. The first eight your technical co-founder can fix this afternoon. The last eight force a rewrite, and the further down the list, the more likely an investor's diligence call will find them before you close the seed round.
Eight mistakes you can undo this afternoon
1. Wrapping a deterministic flow in an LLM router
A Magento webhook fires. The payload is typed. The destination is known. You do not need an agent to decide "is this an order?" because the URL already did. Replace the router node with an if statement and a switch on event_type. The token spend on that one node alone is usually 10 to 15% of the bill.
2. LangGraph for a three-step linear pipeline
State machines earn their weight when state actually branches. If your LangGraph has no cycles and no conditional edges that the database couldn't answer, it is a function with extra cost and extra observability surface. Refactor to three plain function calls and delete the supervisor.
3. Conversation memory as persistence
If anything would be lost when the process restarts, it belongs in Postgres, not in a BaseMemory instance. Add a table. Add a write. The agent reads from the table on the next turn. Memory is a cache, not a database.
4. Calling GPT-4 to parse JSON that arrived as JSON
The Magento API hands you typed JSON. Asking a model to "extract the order ID" from it is paying four cents for what data["order_id"] does for free. Models are for ambiguous inputs; APIs are for the rest.
5. A decision agent for an if/else
"Should we email the customer or the supplier?" is a column in your orders table, or a five-line rule. The model has no information your code doesn't. Read the column.
6. Vector-DBing forty PDFs
Forty PDFs fit in a prompt. RAG starts mattering at thousands of documents with semantic overlap. Until then, ship the file paths and let the model read the one it needs, or pre-summarise once into a static file. Pinecone is not a personality trait.
7. Re-embedding every customer record nightly
Customer records change rarely. Embed on update, not on a schedule. A Postgres trigger fires when the row changes, the trigger queues a job, the worker re-embeds. NOTIFY/LISTEN is older than your company and still the cheapest way to wire this up.
8. Hand-rolled retry loops
Do not write while True: try: ... except: sleep(2) inside an agent node. The model will retry, the agent will retry, the queue will retry, and the customer will get four invoices. Put the work on a queue with idempotency keys and let one layer own retries.
If the input shape is known and the output shape is known, a function does the work. Agents earn their cost when the model adds judgment a human would otherwise have to.
Eight mistakes that force a rewrite
9. Agents speaking natural language to each other
Six agents passing English strings between themselves is a six-stage telephone game with a stochastic translator at every hop. Replace inter-agent messages with typed schemas, Pydantic, Zod, whatever your stack uses. Anthropic spelled this out plainly in Building effective agents: most workflows do not need agents at all, and the ones that do should communicate in structured data, not prose.
10. No idempotency on side effects
Your invoice-chaser sent the same customer three reminders because it retried twice and each retry hit SendGrid independently. Idempotency keys are not optional once an agent touches money, email, or the calendar. This is a schema change, not a prompt change, and the fix is in the database before it's in the code.
11. The LLM as source of truth
"Ask the agent what the customer ordered" is a sentence that means you have lost the plot. The order lives in the database. The agent reads from the database. The database is right; the agent might be. If the agent's memory is the canonical record of anything that matters, you cannot ship to a finance team and you cannot pass an audit.
12. The always-on supervisor loop
A LangGraph supervisor polling every five seconds for work that arrives twice a day is roughly €600 a month in tokens, paid to do nothing. Trigger from the event, webhook, queue, database notification, not from a clock. This is the cron-vs-agent question with a token meter attached.
13. Shared mutable state in-process
When agent B starts mutating the working memory of agent A, you have a distributed system pretending to be a monolith, with no transactions and no audit log. Pick one. Either the agents are separate processes that talk over a queue, or they are functions in one process with one shared transaction. The hybrid is what creates the bugs you cannot reproduce.
14. No eval harness
"It worked when I tried it" is not a test. Without a fixture of fifty real inputs and expected outputs, every prompt change is a coin flip and every model upgrade is a regression risk. The harness is the first thing to build, not the last; without it you cannot ship the third agent without silently breaking the first two.
15. Built directly on the legacy site
The agent reads from a 2014 WordPress install with a custom MyISAM table and a plugin nobody has updated since 2019. The next core upgrade breaks the integration; the next CVE-disclosed plugin pulls the site offline. The migration you have been postponing is now blocking the AI roadmap, and "we did the AI work first" reads badly in diligence.
16. Skipping the boring database design
You bolted six agents onto a schema with no events table, no audit log, no idempotency keys, no transaction boundary around the side effects. Every problem above lives here. Investors will find it in fifteen minutes. This is the rewrite, and it is unavoidable once any three of items 9 to 15 are present at once.
The cheapest fix is the one you do tonight
When we built the inbox-triage agent for a Rotterdam logistics client, the first attempt was a four-node LangGraph supervisor with two retrievers and a tool router. The second attempt was three Postgres triggers and one agent that only ran when the triggers fired. The token bill dropped by a factor of nine and latency went from twelve seconds to under two, because the lever in AI agents for SMEs is mostly admitting which parts do not need a model.
Tonight: open the file that holds your agent graph, count the nodes, and against each one write the one-sentence answer to "what would break if this were a Postgres trigger and a function?" If you cannot answer in a sentence, you have found the first node to delete.
Key takeaway
If the input shape is known and the output shape is known, a function does the work. Agents earn their cost when the model adds judgment a human would otherwise.
FAQ
When does an AI agent actually beat a cron job for an SME?
When the input is unstructured (free-text email, voice, a photo) or the output requires judgment a human would otherwise provide. For typed inputs with typed outputs, a cron and a function are faster, cheaper, and easier to debug.
Is LangGraph the wrong tool?
LangGraph is fine when you genuinely have a graph: cycles, branches, agent handoffs that code cannot decide. For a three-step linear pipeline, three function calls are simpler, cheaper, and easier to test.
How do I know if my agent setup needs a rewrite before fundraising?
Check for idempotency on side effects, a real eval harness with fifty fixtures, and typed messages between agents. If any of those three are missing, expect a rewrite to surface during technical diligence.
Can we keep one agent and replace the others with code?
Usually yes. Most six-agent designs collapse to one agent that handles the ambiguous step plus a handful of triggers and functions for the deterministic ones. Start by listing which steps need judgment.