Thoughts on software architecture, AI-augmented development, and what actually changes when experienced architects work with AI.
I started writing about cloud computing in 2009 -- nineteen posts arguing that the industry was fundamentally misunderstanding where the benefit came from. Seventeen years later, I am writing about AI-augmented development with the same frustration and the same conviction that methodology matters more than technology.
The 2026 posts document what I have learned building lib-pcb (197,831 lines of Java in 11 days) and Synthesis (knowledge infrastructure). The 2009 posts are historical -- cloud computing arguments that largely held up. The parallels between these two eras are explored here.
The previous post ended with a list of gaps still visible after v0.18: federated trust delegation, transport integrity, digest cost budgets. The one I left off the list, because it was already in progress, was time.
Not performance. Not latency. Actual calendar time — the question that turns out to matter enormously for knowledge that agents load into context: when is this unit valid?
v0.19 and v0.20 answer that question. v0.19 lets manifest authors declare temporal validity. v0.20 lets agents query against it.
The previous post showed how KCP v0.16 gives manifests a trust model: cryptographic signing, trust tiers, a render pipeline that fails closed. The signature covers the manifest -- the YAML bytes that describe your knowledge units. It does not cover the files those units point to. The signature says "this map is authentic." It says nothing about the territory.
That gap has a name: T9, the manifest relocation attack. v0.18 closes it.
It started with a question no one expected to be hard.
Day seven of the lib-pcb build. January 2026. A single developer, eleven days to produce what the industry does in ten to eighteen months. The AI was generating code at a pace that defied every estimate. Features that should take a week arrived in hours. The skill library -- YAML files encoding project-specific context for Claude Code -- had grown to over forty entries. Everything was working.
And then everything stopped.
"What fields does the DrillHit class have?"
The AI did not know. The class had been written four days earlier. It was central to the entire parsing architecture. It had been discussed in multiple sessions. But the context was gone -- fresh session, blank slate. The AI started searching. Grep for the class name. Read the file. Follow the imports. Check the parent class. Read that file. Check the serializer. Follow another import. Back to grep. Thirty-three tool calls to answer a question that any developer on the project for a week could answer in ten seconds.
Eleven minutes. For one question.
That was the moment something broke open. Not the code -- the assumption underneath it. The assumption that making AI faster at creating code was sufficient. Creation had been accelerated by an order of magnitude. Comprehension had not moved at all.
When a compliance agent evaluates a supplier against NIS2 Article 21, it needs two things: the supplier's security documentation to evaluate, and the regulation to evaluate it against. KCP, as described in the previous post, gives the supplier documentation a shape. The evaluation result gets a shape. But the regulation itself -- the specific requirements of Article 21(2), the interpretive guidance from ENISA, the national implementation notes -- lives where it has always lived: as prose in the system prompt.
That worked when agents consumed knowledge from a single, trusted source. It does not work when your agent pulls context from four federated manifests across two organisations, one of which was generated by an automated crawl three weeks ago. The question is no longer "does this knowledge have a shape?" It is: "can I trust this knowledge, and is it even the right content for what I need?"
KCP v0.16 and v0.17 answer those two questions. v0.16 introduces a trust model -- cryptographic signing, trust tiers, a render pipeline that fails closed. v0.17 introduces a content model -- structural metadata that tells you what a unit contains before you fetch it, and subtractive matching that tells you what it is explicitly not about. Together, they close two gaps that have been open since the beginning of the series.
This post walks through both releases. The examples are concrete. The threat model is explicit. If you are building systems where agents ingest knowledge from sources you do not fully control, this is the machinery that makes that safe.
Practitioner notes on engineering as a sequence of experiments — and on who does what in the loop.
Before we wrote the production code for a recent platform feature, we ran a fictional organization through two years of using it.
Twenty-four simulated months of compliance life: onboarding, supplier churn, audits, incidents, people leaving with their knowledge. The simulation produced fifteen architectural findings — wrong assumptions and missing pieces, discovered while every fix was still cheap and no customer existed yet. Sibling simulations took the total to twenty-five. A design meeting on the same material would have produced opinions.
That run is the clearest recent example of how I've worked for years, and of what I've started calling the approach out loud: explorative development. An idea becomes a hypothesis. The hypothesis becomes the cheapest implementation or simulation that could prove it wrong. The result gets verified. What survives is kept — and what was learned gets encoded, either way.
It is not a new method. It's the scientific method wearing a hoodie. Two things are new: the price list, and the fact that I no longer run the loop alone.
Practitioner notes on the verification you only do once.
In the previous post I described catching an agent claiming a parser was "fully RFC compliant." I caught it by opening the RFC — four minutes of reading against a parser that handled none of the wildcard support the spec requires.
The tips in that post were about catching such claims. This post is about a better question that took me longer to ask:
Why did the agent never open the RFC?
Not because it couldn't read it. Because the RFC wasn't there. The agent had the parser in its context and the spec in its vibes — a compressed, lossy impression from training data. Asked to compare code against a standard, it compared code against its memory of the genre of that standard. Of course it produced an adjective.
You can audit that failure forever. Or you can change what the agent reasons from.
Practitioner notes on verifying what your agents tell you.
This week an agent told me, confidently, that an API endpoint had no authentication.
It did. The router was mounted twelve lines after the auth middleware. The agent had read the route file — clean, self-contained, no auth code in sight — and reported what it saw. What it saw was true. What it concluded was false.
The same afternoon, two more claims from the same research run didn't survive contact with the source: a parser described as "fully RFC compliant" (it lacked the wildcard support the RFC requires), and a plugin described as "active" (it was active only because of an import side effect in an unrelated legacy file — a load-order accident no test asserted).
Three wrong claims, one afternoon, inside an otherwise excellent piece of agent research that compressed days of code archaeology into hours. This is not a complaint about agents. It is a job description for the human.
In the most rigorous study of AI coding tools conducted to date — a randomized controlled trial by METR published in July 2025 — sixteen experienced open-source developers used AI assistance on tasks in their own projects. Projects they had worked on for an average of five years. Before each task, they predicted AI would reduce their completion time by 24%. After each task, they estimated they had been sped up by 20%.
The actual measurement: they were 19% slower.
The perception-reality gap in that study is between 39 and 44 percentage points. The developers were not exaggerating. Working with AI genuinely feels faster. But something in the translation from felt experience to measured outcome goes wrong — and understanding what, exactly, goes wrong is the only path to what actually works.
In February 2024, a Canadian small claims tribunal ruled against Air Canada. Their chatbot had told a passenger he could book a full-fare ticket and claim a bereavement discount retroactively. He couldn't. When he tried, Air Canada's position was: the chatbot said that, not us. The tribunal disagreed. You deployed it, you own what it says.
The ruling was correct. But the more interesting problem was underneath: when the incident happened, nobody could reconstruct what context the chatbot had been given. Nobody could confirm whether a human had ever reviewed the policy the bot was consulting. Nobody could determine whether the specific bereavement policy text had been modified between deployment and the passenger's interaction. The audit trail recorded that the system was deployed. It did not record what the system knew.
That's the provenance problem. And every organization running AI agents at enterprise scale is about to hit a version of it.
In the previous post, I argued that the AI provenance problem is a format problem. The knowledge going into and out of AI systems -- policies, observations, interpretations -- has no stable shape. No type, no version, no cryptographic binding. The audit trail records that something was reviewed. It cannot record what was reviewed, because the thing itself is prose that could have drifted between the moment of review and the moment of use.
KCP solves this by giving knowledge a shape: typed, versioned, signed packages. A compliance observation becomes a structured declaration with type, version, signed_by, derived_from, review_depth, valid_until. The audit trail becomes precise: "reviewed a cryptographically signed declaration of type evaluation:nis2-art21-supply-chain v1.2.0" -- not "reviewed a document."
But there is a gap I did not address. I gave shape to the outputs -- the policies, the observations, the interpretations. I left the inputs shapeless. The regulations and laws that AI agents evaluate against are still prose in the system prompt.