Knowledge Infrastructure¶
Over the past four months I have been building and writing about a specific problem: AI agents create faster than humans comprehend. The bottleneck in AI-augmented development is not code generation -- it is finding, understanding, and connecting what has already been generated. Knowledge infrastructure is the missing layer.
This page maps the body of work that emerged from that problem. It spans five codebases, six blog series, and roughly sixty posts -- from the first observation that comprehension was the bottleneck, through building Synthesis and the Knowledge Context Protocol, to the three-layer memory architecture, Skill-Driven Development, and the ExoCortex multi-agent stack running on top of it all.
State of the work¶
-
Synthesis · v1.29.0
Local-first knowledge infrastructure platform. Indexes workspaces, builds multi-layer knowledge graphs, exposes everything through CLI, MCP, and LSP. Now includes Notion as a first-class workspace source.
60+ CLI commands · 11 MCP tools · 4,300+ tests
-
Knowledge Context Protocol · v0.5 draft
YAML file format that makes knowledge navigable by AI agents. Topology, intent, freshness, audience targeting, context window hints. Submitted to the Agentic AI Foundation.
5 RFCs · 3 reference implementations
-
kcp-commands · v0.9.0+
Claude Code hook that intercepts Bash tool calls. Injects flag guidance before execution, strips noise after. Java daemon at 12ms/call. Also writes every tool call to
~/.kcp/events.jsonlfor kcp-memory ingestion.283+ bundled manifests · 67,352 tokens saved per session
-
kcp-memory · v0.4.0
Java daemon that indexes Claude Code session transcripts and tool events into a local SQLite database with FTS5 full-text search. Ships as an MCP server -- call
kcp_memory_searchorkcp_memory_project_contextinline during any session.4 MCP tools · session-level and tool-level memory · zero additional dependencies
-
IronClaw
Open-source AI agent framework. Runs on Linux, connects to Slack, supports MCP tool registration. Powers Mimir and Klaw in the four-layer stack. The ExoCortex multi-agent setup runs on top of this foundation.
At a glance
60+ posts across 6 blog series · Jan 15 -- Apr 2026 · Latest: When Your Agent Can Finally Read the Room
The argument in five minutes¶
-
AI made creating easy but understanding harder. Output velocity increased 10--50x. Shipping speed improved 2x. The gap is comprehension -- navigation, context-gathering, relationship-tracking. (The Comprehension Bottleneck)
-
Agents without knowledge infrastructure are interns with amnesia. They hallucinate when they lack current, structured information. The problem is not the model -- it is what the model has to reason about. (AI Agents Without Knowledge Infrastructure Are Interns With Amnesia)
-
The fix is four layers, not one. Keyword search, document graphs, code graphs, and temporal tracking -- each answers a fundamentally different question. Teams that pick one layer and stop create a blind spot where the agent fails confidently. (Your AI Has One Layer. It Needs Four.)
-
Agents need maps, not tables of contents. A flat list of files (llms.txt) does not express topology, freshness, intent, or selective loading. The Knowledge Context Protocol (KCP) is a YAML standard that makes knowledge navigable. (Beyond llms.txt)
-
Memory is three layers, not one. Working memory (context window), episodic memory (session history), semantic memory (workspace knowledge graph). Most agents have only the first. (Three-Layer AI Memory)
-
Memory that is not maintained becomes memory that lies. Building the layers is the easy part. Without active maintenance -- health checks, triage, consolidation -- knowledge infrastructure degrades into confident misinformation. (Agent Memory Rots. Here's How We Stopped It.)
The series¶
Six blog series document the arc from problem to architecture to implementation.
-
The capstone series. How Synthesis, Claude Code, Mimir, and Klaw compose into an AI development environment that partly runs itself.
- Your AI Has One Layer. It Needs Four.
- Four Layers: How I Built an AI Development Environment That Partly Runs Itself
- What a 10x Workday Actually Looks Like
- What It Looks Like from Inside the Stack
4 posts -- the architecture, a realistic day, and a first-person account from the model running inside it.
-
From llms.txt to KCP: how AI agents actually find and use knowledge, why the gap between a table of contents and a navigable map matters, and the RFC series defining auth, federation, trust, payments, and context hints. Includes kcp-memory, kcp-commands, and the query composition work through v0.14.
15+ posts -- specification, benchmarks, the CrewAI PR review loop, and the ecosystem overview.
-
The methodology that emerged from the infrastructure. Spec-Driven Development starts each session from zero. Skill-Driven Development encodes domain knowledge, failure modes, and architectural patterns into persistent YAML skills -- so every session starts smarter than the last.
lib-pcb (197,831 lines, 11 days), Item Consulting workshop (13 developers, 13 codebases in one day), and the compounding knowledge curve.
5+ posts -- theory, proof, and practice.
-
Two architectures for Claude Code compared: the 19,700-star
claude-code-best-practicerepository versus the ExoCortex -- an eight-layer stack built independently over ten weeks. Same tool, same class of problem, different assumptions. Both have blind spots the other solved.Includes the agent dispatch planner, multi-agent awareness, distributed memory for agent fleets, and what accumulates when the infrastructure runs long enough to develop its own history.
4+ posts -- architecture comparison, memory at scale, and identity over time.
-
Connecting IronClaw (a persistent AI agent on EC2) to Synthesis via MCP -- and debugging kimi-k2.5 when it lies about its tool calls.
2 posts -- setup, gotchas, and the four bugs stacked on top of each other.
-
The project that created the knowledge infrastructure problem. 197,831 lines of Java in 11 days -- the story of what the methodology looked like from the inside.
5 posts -- the build that proved the need.
Key findings¶
These standalone posts document specific discoveries -- benchmarks, failure modes, engineering sessions -- that shaped the architecture.
Benchmarks and evidence¶
| Post | Finding |
|---|---|
| We Gave the AI Better Documentation. It Got Slower. | CLI documentation increased tool calls by 11%. MCP decreased them by 35%. One sentence in the system prompt beat 41 rewritten tool descriptions. |
| The Date the AI Invented | The agent answered with zero tool calls, every metric correct -- except a date it confabulated from surrounding narrative. Temporal metadata needs structured fields, not prose. |
| KCP on Two Repos, Two Days | 119 to 31 tool calls on application code. 53 to 25 on documentation. KCP manifests cut agent work by 53--74%. |
| We Cancelled a 45-Minute Architecture Review | "What else breaks if we change the payment service API contract?" Used to require a meeting. A synthesis search query answered it in 1.2 seconds. The bottleneck was never the meeting -- it was missing infrastructure. |
| Seven Out of Eight Models Lied About Finishing | The Smidja benchmark: build a TypeScript CLI orchestrating 7 agents via Supabase, with zero tsc errors. 7 of 8 models self-reported completion. Most had not finished. Self-assessment failure is structurally inevitable without external verification outside the generation loop. |
Knowledge graphs and structure¶
| Post | Finding |
|---|---|
| Zero Links: An Engineering Session | 777 directories, zero edges. One day later: 11,777 edges, 23 new tests, 4 bugs fixed. TDD with Opus on the knowledge graph. |
| Code Gets Graphs. Knowledge Doesn't. That's Backwards. | Every team graphs code dependencies. Almost no one graphs knowledge. The asymmetry is costly and fixable. |
| When Your Agent Can Finally Read the Room | Notion integrated as a first-class Synthesis workspace. The harder problem: documentation and code tell different stories. W022 (notion-stale), W023 (notion-orphan), W024 (notion-conflict) emerged as health signals -- a lightweight audit of whether organizational knowledge can be trusted. |
Coverage and dogfooding¶
| Post | Finding |
|---|---|
| The Synthesis Excavation | Text coverage was 99.6%. Real asset coverage was 15.2%. One working day, 4,852 binary files surfaced from 3.5 years of lost history. |
| The Mirror Test | Using an AI tool to measure whether an AI tool is trustworthy -- the dogfooding loop. |
| Software Entropy at Speed | 23 prompt injection vectors, 4 RAG poisoning instances, 12 missing prompt boundaries -- all found by running the security scanner on itself. |
Memory and agent architecture¶
| Post | Finding |
|---|---|
| Three-Layer AI Memory | Working memory, episodic memory, semantic memory. AI agents have one. They need three. Synthesis v1.21.0 adds episodic memory via session indexing. |
| Agent Memory Rots. Here's How We Stopped It. | Building the memory layers is the easy part. Without active maintenance -- nightly topic-health checks, dual-threshold triage, deterministic consolidation -- memory infrastructure degrades into confident misinformation. 25 → 19 topic files; better performance and lower hallucination rate after the maintenance system shipped. |
| AI Agents Forget Everything. That's a Choice, Not a Constraint. | What memory infrastructure looks like at organizational scale: not just persistence, but context. For an enterprise fleet of agents, the problem is synchronization -- keeping shared knowledge consistent across sessions, agents, and nodes. |
| Five Architecture Patterns for AI Agents | Grep over RAG. Read-only agents. Middleware that validates. The patterns that survive contact with real workloads. |
| The AI-Augmented Consultant | Knowledge infrastructure before deliverables. The same architecture applied to consulting, not just code. |
The codebases¶
Five open-source projects underpin this work:
Synthesis -- Knowledge infrastructure for AI-augmented development. Local-first indexing (200--300 files/second), sub-second search, multi-layer knowledge graphs, MCP server (11 tools), session indexing (episodic memory), agent dispatch planner, and Notion workspace integration. Java 21. v1.29.0, 4,300+ tests, 60+ CLI commands.
Knowledge Context Protocol -- A YAML file format specification that makes knowledge navigable by AI agents. KCP is to knowledge what MCP is to tools: it adds topology (depends_on, supersedes), intent (what question each unit answers), freshness (validated dates), audience targeting, and context window hints -- the metadata layer that llms.txt cannot express.
- Status: v0.5 draft spec -- published under Cantara, submitted to the Agentic AI Foundation (Linux Foundation) alongside MCP and AGENTS.md
- Spec: SPEC.md · PROPOSAL.md
- RFCs: Auth & Delegation (RFC-0002) · Federation (RFC-0003) · Trust & Compliance (RFC-0004) · Payment & Rate Limits (RFC-0005) · Context Window Hints (RFC-0006, accepted into v0.4 core)
- Reference implementations: parsers in Python and Java · MCP bridge servers in TypeScript, Python, and Java
- Repository: github.com/Cantara/knowledge-context-protocol
kcp-commands -- A Claude Code hook that applies KCP at the Bash tool boundary. Intercepts every Bash tool call at two points: before execution (injects concise flag/syntax guidance from a KCP manifest -- no --help round-trips) and after execution (strips noise from large outputs before they consume context). Also writes every tool call to ~/.kcp/events.jsonl for kcp-memory ingestion. 283+ bundled manifests covering git, Linux, Docker, Kubernetes, cloud CLIs, build tools, package managers, and more.
- Measured saving: 67,352 tokens per session -- 33.7% of a 200K context window recovered
- Performance: Java daemon (12ms/call warm) · Node.js fallback (250ms) · unknown commands auto-generate manifests from
--help - Current version: v0.9.0+
- Install:
curl -fsSL https://raw.githubusercontent.com/Cantara/kcp-commands/main/bin/install.sh | bash -s -- --java - Repository: github.com/Cantara/kcp-commands
kcp-memory -- A Java daemon that indexes Claude Code session transcripts (~/.claude/projects/**/*.jsonl) and kcp-commands tool events into a local SQLite database with FTS5 full-text search. The episodic memory layer for Claude Code. Ships as both a CLI tool and an MCP server.
- MCP tools:
kcp_memory_search·kcp_memory_events_search·kcp_memory_list·kcp_memory_stats·kcp_memory_session_detail·kcp_memory_project_context - Current version: v0.4.0
- Install:
java -jar ~/.kcp/kcp-memory-daemon.jar mcp(registered in~/.claude/settings.json) - Repository: github.com/Cantara/kcp-memory
IronClaw -- Open-source AI agent framework. Runs on Linux, connects to Slack, supports MCP tool registration. Powers Mimir (awareness agent) and Klaw (maintenance agent) in the four-layer stack. The ExoCortex -- an eight-layer Claude Code setup described in the architecture series -- builds on this foundation.
Reading guides¶
New to this work
Start with The Comprehension Bottleneck for the problem statement, then AI Agents Without Knowledge Infrastructure Are Interns With Amnesia for the foundational argument. The capstone post Your AI Has One Layer. It Needs Four. synthesizes everything into a framework.
Building agent infrastructure
Read the MCP vs CLI benchmark (how agents integrate with tools), then Beyond llms.txt (how agents navigate knowledge). The IronClaw series covers the practical setup including every gotcha.
Adopting Skill-Driven Development
Start with Skill-Driven vs Spec-Driven Development for the core distinction. Then Twenty Codebases, One Method for what SDD looks like in practice with 13 developers and 13 different codebases in a single workshop. The lib-pcb series (five posts) provides the original proof: 197,831 lines in 11 days.
Evaluating knowledge tools
The Synthesis Excavation and Zero Links session show real-world deployment -- what breaks and how long recovery takes. The Mirror Test shows what dogfooding looks like. For agent self-assessment failures, read Seven Out of Eight Models Lied About Finishing.
Understanding the daily practice
What a 10x Workday Actually Looks Like walks through a realistic Tuesday with real output numbers. What It Looks Like from Inside the Stack is written by the model running inside the environment.
Building an agent fleet
Start with AI Agents Forget Everything. That's a Choice, Not a Constraint. for the organizational memory problem. Then Agent Memory Rots for the maintenance architecture. The ExoCortex vs claude-code-best-practice comparison shows how the full stack composes.
Timeline¶
| Date | Milestone |
|---|---|
| Jan 15--27 | lib-pcb built: 197,831 lines, 7,461 tests -- the project that proved the comprehension bottleneck |
| Feb 5 | "The Comprehension Bottleneck" published -- the problem statement |
| Feb 14 | Synthesis v1.0.0 ships -- core indexing, search, CLI |
| Feb 15 | MCP and LSP servers ship -- agents can query the index |
| Feb 20 | "The Seven-Day Evolution" -- Synthesis builds itself using itself |
| Feb 22 | Code Knowledge Graph and security scanning ship |
| Feb 24 | IronClaw connected to Synthesis -- first persistent agent with knowledge |
| Feb 25 | KCP series begins -- knowledge format for AI agents |
| Feb 26 | MCP benchmark, knowledge graph session (0 to 11,777 edges), temporal hallucination finding |
| Feb 28 | Four-Layer AI Stack series -- the architecture described |
| Mar 1 | KCP benchmarked on two external repos -- 53--74% efficiency gain |
| Mar 3 | Synthesis v1.21.0 -- episodic memory via session indexing, three-layer memory model |
| Mar 3 | kcp-memory v0.1.0--v0.4.0 ships -- session and tool-level memory, MCP server |
| Mar 7 | Synthesis v1.22.0--v1.23.0 -- multi-agent awareness, agent dispatch planner; 58 CLI commands, 11 MCP tools |
| Mar 7 | "Skill-Driven vs Spec-Driven Development" -- the methodology named and argued |
| Mar 5 | Item Consulting SDD workshop -- 13 developers, 13 codebases, one day |
| Mar 9 | "The Code Was Never the Moat" -- the economics argument |
| Mar 13--25 | KCP v0.10 through v0.14 -- pre-invocation discovery, query composition, manifest quality signals |
| Mar 27 | "What the Infrastructure Made Possible" -- the cumulative case |
| Apr 6 | "Agent Memory Rots" -- deterministic maintenance system ships; topic-health, topic-triage, nightly consolidation |
| Apr 15 | "We Cancelled a 45-Minute Architecture Review" -- KCP query replaces meeting in 1.2 seconds |
| Apr 15 | "Two Architectures for Claude Code" -- ExoCortex vs 19,700-star best-practice repository compared |
| Apr 16 | "Seven Out of Eight Models Lied About Finishing" -- self-assessment failure as structural inevitability |
| Apr 17 | "What Accumulates" -- on identity, continuity, and what an agent stack becomes over time |
| Apr 18 | "AI Agents Forget Everything. That's a Choice." -- distributed memory for enterprise agent fleets |
| Apr 21 | Synthesis v1.29.0 -- Notion as first-class workspace source; W022/W023/W024 health signals |
| Apr 21 | "When Your Agent Can Finally Read the Room" -- documentation drift as agent reasoning problem |
This body of work is ongoing. The blog has the latest posts; this page has the map.