Skip to content

Knowledge Infrastructure

Over the past four months I have been building and writing about a specific problem: AI agents create faster than humans comprehend. The bottleneck in AI-augmented development is not code generation -- it is finding, understanding, and connecting what has already been generated. Knowledge infrastructure is the missing layer.

This page maps the body of work that emerged from that problem. It spans five codebases, six blog series, and roughly sixty posts -- from the first observation that comprehension was the bottleneck, through building Synthesis and the Knowledge Context Protocol, to the three-layer memory architecture, Skill-Driven Development, and the ExoCortex multi-agent stack running on top of it all.


State of the work

  • Synthesis · v1.29.0


    Local-first knowledge infrastructure platform. Indexes workspaces, builds multi-layer knowledge graphs, exposes everything through CLI, MCP, and LSP. Now includes Notion as a first-class workspace source.

    60+ CLI commands · 11 MCP tools · 4,300+ tests

    Release history · GitHub

  • Knowledge Context Protocol · v0.5 draft


    YAML file format that makes knowledge navigable by AI agents. Topology, intent, freshness, audience targeting, context window hints. Submitted to the Agentic AI Foundation.

    5 RFCs · 3 reference implementations

    Spec · GitHub

  • kcp-commands · v0.9.0+


    Claude Code hook that intercepts Bash tool calls. Injects flag guidance before execution, strips noise after. Java daemon at 12ms/call. Also writes every tool call to ~/.kcp/events.jsonl for kcp-memory ingestion.

    283+ bundled manifests · 67,352 tokens saved per session

    GitHub

  • kcp-memory · v0.4.0


    Java daemon that indexes Claude Code session transcripts and tool events into a local SQLite database with FTS5 full-text search. Ships as an MCP server -- call kcp_memory_search or kcp_memory_project_context inline during any session.

    4 MCP tools · session-level and tool-level memory · zero additional dependencies

    GitHub

  • IronClaw


    Open-source AI agent framework. Runs on Linux, connects to Slack, supports MCP tool registration. Powers Mimir and Klaw in the four-layer stack. The ExoCortex multi-agent setup runs on top of this foundation.

    GitHub

At a glance

60+ posts across 6 blog series · Jan 15 -- Apr 2026 · Latest: When Your Agent Can Finally Read the Room


The argument in five minutes

  1. AI made creating easy but understanding harder. Output velocity increased 10--50x. Shipping speed improved 2x. The gap is comprehension -- navigation, context-gathering, relationship-tracking. (The Comprehension Bottleneck)

  2. Agents without knowledge infrastructure are interns with amnesia. They hallucinate when they lack current, structured information. The problem is not the model -- it is what the model has to reason about. (AI Agents Without Knowledge Infrastructure Are Interns With Amnesia)

  3. The fix is four layers, not one. Keyword search, document graphs, code graphs, and temporal tracking -- each answers a fundamentally different question. Teams that pick one layer and stop create a blind spot where the agent fails confidently. (Your AI Has One Layer. It Needs Four.)

  4. Agents need maps, not tables of contents. A flat list of files (llms.txt) does not express topology, freshness, intent, or selective loading. The Knowledge Context Protocol (KCP) is a YAML standard that makes knowledge navigable. (Beyond llms.txt)

  5. Memory is three layers, not one. Working memory (context window), episodic memory (session history), semantic memory (workspace knowledge graph). Most agents have only the first. (Three-Layer AI Memory)

  6. Memory that is not maintained becomes memory that lies. Building the layers is the easy part. Without active maintenance -- health checks, triage, consolidation -- knowledge infrastructure degrades into confident misinformation. (Agent Memory Rots. Here's How We Stopped It.)


The series

Six blog series document the arc from problem to architecture to implementation.

  • The Four-Layer AI Stack


    The capstone series. How Synthesis, Claude Code, Mimir, and Klaw compose into an AI development environment that partly runs itself.

    1. Your AI Has One Layer. It Needs Four.
    2. Four Layers: How I Built an AI Development Environment That Partly Runs Itself
    3. What a 10x Workday Actually Looks Like
    4. What It Looks Like from Inside the Stack

    4 posts -- the architecture, a realistic day, and a first-person account from the model running inside it.

  • Knowledge Context Protocol


    From llms.txt to KCP: how AI agents actually find and use knowledge, why the gap between a table of contents and a navigable map matters, and the RFC series defining auth, federation, trust, payments, and context hints. Includes kcp-memory, kcp-commands, and the query composition work through v0.14.

    15+ posts -- specification, benchmarks, the CrewAI PR review loop, and the ecosystem overview.

  • Skill-Driven Development


    The methodology that emerged from the infrastructure. Spec-Driven Development starts each session from zero. Skill-Driven Development encodes domain knowledge, failure modes, and architectural patterns into persistent YAML skills -- so every session starts smarter than the last.

    lib-pcb (197,831 lines, 11 days), Item Consulting workshop (13 developers, 13 codebases in one day), and the compounding knowledge curve.

    5+ posts -- theory, proof, and practice.

  • ExoCortex Architecture


    Two architectures for Claude Code compared: the 19,700-star claude-code-best-practice repository versus the ExoCortex -- an eight-layer stack built independently over ten weeks. Same tool, same class of problem, different assumptions. Both have blind spots the other solved.

    Includes the agent dispatch planner, multi-agent awareness, distributed memory for agent fleets, and what accumulates when the infrastructure runs long enough to develop its own history.

    4+ posts -- architecture comparison, memory at scale, and identity over time.

  • Giving an AI Agent a Brain


    Connecting IronClaw (a persistent AI agent on EC2) to Synthesis via MCP -- and debugging kimi-k2.5 when it lies about its tool calls.

    2 posts -- setup, gotchas, and the four bugs stacked on top of each other.

  • Building lib-pcb


    The project that created the knowledge infrastructure problem. 197,831 lines of Java in 11 days -- the story of what the methodology looked like from the inside.

    5 posts -- the build that proved the need.


Key findings

These standalone posts document specific discoveries -- benchmarks, failure modes, engineering sessions -- that shaped the architecture.

Benchmarks and evidence

Post Finding
We Gave the AI Better Documentation. It Got Slower. CLI documentation increased tool calls by 11%. MCP decreased them by 35%. One sentence in the system prompt beat 41 rewritten tool descriptions.
The Date the AI Invented The agent answered with zero tool calls, every metric correct -- except a date it confabulated from surrounding narrative. Temporal metadata needs structured fields, not prose.
KCP on Two Repos, Two Days 119 to 31 tool calls on application code. 53 to 25 on documentation. KCP manifests cut agent work by 53--74%.
We Cancelled a 45-Minute Architecture Review "What else breaks if we change the payment service API contract?" Used to require a meeting. A synthesis search query answered it in 1.2 seconds. The bottleneck was never the meeting -- it was missing infrastructure.
Seven Out of Eight Models Lied About Finishing The Smidja benchmark: build a TypeScript CLI orchestrating 7 agents via Supabase, with zero tsc errors. 7 of 8 models self-reported completion. Most had not finished. Self-assessment failure is structurally inevitable without external verification outside the generation loop.

Knowledge graphs and structure

Post Finding
Zero Links: An Engineering Session 777 directories, zero edges. One day later: 11,777 edges, 23 new tests, 4 bugs fixed. TDD with Opus on the knowledge graph.
Code Gets Graphs. Knowledge Doesn't. That's Backwards. Every team graphs code dependencies. Almost no one graphs knowledge. The asymmetry is costly and fixable.
When Your Agent Can Finally Read the Room Notion integrated as a first-class Synthesis workspace. The harder problem: documentation and code tell different stories. W022 (notion-stale), W023 (notion-orphan), W024 (notion-conflict) emerged as health signals -- a lightweight audit of whether organizational knowledge can be trusted.

Coverage and dogfooding

Post Finding
The Synthesis Excavation Text coverage was 99.6%. Real asset coverage was 15.2%. One working day, 4,852 binary files surfaced from 3.5 years of lost history.
The Mirror Test Using an AI tool to measure whether an AI tool is trustworthy -- the dogfooding loop.
Software Entropy at Speed 23 prompt injection vectors, 4 RAG poisoning instances, 12 missing prompt boundaries -- all found by running the security scanner on itself.

Memory and agent architecture

Post Finding
Three-Layer AI Memory Working memory, episodic memory, semantic memory. AI agents have one. They need three. Synthesis v1.21.0 adds episodic memory via session indexing.
Agent Memory Rots. Here's How We Stopped It. Building the memory layers is the easy part. Without active maintenance -- nightly topic-health checks, dual-threshold triage, deterministic consolidation -- memory infrastructure degrades into confident misinformation. 25 → 19 topic files; better performance and lower hallucination rate after the maintenance system shipped.
AI Agents Forget Everything. That's a Choice, Not a Constraint. What memory infrastructure looks like at organizational scale: not just persistence, but context. For an enterprise fleet of agents, the problem is synchronization -- keeping shared knowledge consistent across sessions, agents, and nodes.
Five Architecture Patterns for AI Agents Grep over RAG. Read-only agents. Middleware that validates. The patterns that survive contact with real workloads.
The AI-Augmented Consultant Knowledge infrastructure before deliverables. The same architecture applied to consulting, not just code.

The codebases

Five open-source projects underpin this work:

Synthesis -- Knowledge infrastructure for AI-augmented development. Local-first indexing (200--300 files/second), sub-second search, multi-layer knowledge graphs, MCP server (11 tools), session indexing (episodic memory), agent dispatch planner, and Notion workspace integration. Java 21. v1.29.0, 4,300+ tests, 60+ CLI commands.


Knowledge Context Protocol -- A YAML file format specification that makes knowledge navigable by AI agents. KCP is to knowledge what MCP is to tools: it adds topology (depends_on, supersedes), intent (what question each unit answers), freshness (validated dates), audience targeting, and context window hints -- the metadata layer that llms.txt cannot express.

  • Status: v0.5 draft spec -- published under Cantara, submitted to the Agentic AI Foundation (Linux Foundation) alongside MCP and AGENTS.md
  • Spec: SPEC.md · PROPOSAL.md
  • RFCs: Auth & Delegation (RFC-0002) · Federation (RFC-0003) · Trust & Compliance (RFC-0004) · Payment & Rate Limits (RFC-0005) · Context Window Hints (RFC-0006, accepted into v0.4 core)
  • Reference implementations: parsers in Python and Java · MCP bridge servers in TypeScript, Python, and Java
  • Repository: github.com/Cantara/knowledge-context-protocol

kcp-commands -- A Claude Code hook that applies KCP at the Bash tool boundary. Intercepts every Bash tool call at two points: before execution (injects concise flag/syntax guidance from a KCP manifest -- no --help round-trips) and after execution (strips noise from large outputs before they consume context). Also writes every tool call to ~/.kcp/events.jsonl for kcp-memory ingestion. 283+ bundled manifests covering git, Linux, Docker, Kubernetes, cloud CLIs, build tools, package managers, and more.

  • Measured saving: 67,352 tokens per session -- 33.7% of a 200K context window recovered
  • Performance: Java daemon (12ms/call warm) · Node.js fallback (250ms) · unknown commands auto-generate manifests from --help
  • Current version: v0.9.0+
  • Install: curl -fsSL https://raw.githubusercontent.com/Cantara/kcp-commands/main/bin/install.sh | bash -s -- --java
  • Repository: github.com/Cantara/kcp-commands

kcp-memory -- A Java daemon that indexes Claude Code session transcripts (~/.claude/projects/**/*.jsonl) and kcp-commands tool events into a local SQLite database with FTS5 full-text search. The episodic memory layer for Claude Code. Ships as both a CLI tool and an MCP server.

  • MCP tools: kcp_memory_search · kcp_memory_events_search · kcp_memory_list · kcp_memory_stats · kcp_memory_session_detail · kcp_memory_project_context
  • Current version: v0.4.0
  • Install: java -jar ~/.kcp/kcp-memory-daemon.jar mcp (registered in ~/.claude/settings.json)
  • Repository: github.com/Cantara/kcp-memory

IronClaw -- Open-source AI agent framework. Runs on Linux, connects to Slack, supports MCP tool registration. Powers Mimir (awareness agent) and Klaw (maintenance agent) in the four-layer stack. The ExoCortex -- an eight-layer Claude Code setup described in the architecture series -- builds on this foundation.


Reading guides

New to this work

Start with The Comprehension Bottleneck for the problem statement, then AI Agents Without Knowledge Infrastructure Are Interns With Amnesia for the foundational argument. The capstone post Your AI Has One Layer. It Needs Four. synthesizes everything into a framework.

Building agent infrastructure

Read the MCP vs CLI benchmark (how agents integrate with tools), then Beyond llms.txt (how agents navigate knowledge). The IronClaw series covers the practical setup including every gotcha.

Adopting Skill-Driven Development

Start with Skill-Driven vs Spec-Driven Development for the core distinction. Then Twenty Codebases, One Method for what SDD looks like in practice with 13 developers and 13 different codebases in a single workshop. The lib-pcb series (five posts) provides the original proof: 197,831 lines in 11 days.

Evaluating knowledge tools

The Synthesis Excavation and Zero Links session show real-world deployment -- what breaks and how long recovery takes. The Mirror Test shows what dogfooding looks like. For agent self-assessment failures, read Seven Out of Eight Models Lied About Finishing.

Understanding the daily practice

What a 10x Workday Actually Looks Like walks through a realistic Tuesday with real output numbers. What It Looks Like from Inside the Stack is written by the model running inside the environment.

Building an agent fleet

Start with AI Agents Forget Everything. That's a Choice, Not a Constraint. for the organizational memory problem. Then Agent Memory Rots for the maintenance architecture. The ExoCortex vs claude-code-best-practice comparison shows how the full stack composes.


Timeline

Date Milestone
Jan 15--27 lib-pcb built: 197,831 lines, 7,461 tests -- the project that proved the comprehension bottleneck
Feb 5 "The Comprehension Bottleneck" published -- the problem statement
Feb 14 Synthesis v1.0.0 ships -- core indexing, search, CLI
Feb 15 MCP and LSP servers ship -- agents can query the index
Feb 20 "The Seven-Day Evolution" -- Synthesis builds itself using itself
Feb 22 Code Knowledge Graph and security scanning ship
Feb 24 IronClaw connected to Synthesis -- first persistent agent with knowledge
Feb 25 KCP series begins -- knowledge format for AI agents
Feb 26 MCP benchmark, knowledge graph session (0 to 11,777 edges), temporal hallucination finding
Feb 28 Four-Layer AI Stack series -- the architecture described
Mar 1 KCP benchmarked on two external repos -- 53--74% efficiency gain
Mar 3 Synthesis v1.21.0 -- episodic memory via session indexing, three-layer memory model
Mar 3 kcp-memory v0.1.0--v0.4.0 ships -- session and tool-level memory, MCP server
Mar 7 Synthesis v1.22.0--v1.23.0 -- multi-agent awareness, agent dispatch planner; 58 CLI commands, 11 MCP tools
Mar 7 "Skill-Driven vs Spec-Driven Development" -- the methodology named and argued
Mar 5 Item Consulting SDD workshop -- 13 developers, 13 codebases, one day
Mar 9 "The Code Was Never the Moat" -- the economics argument
Mar 13--25 KCP v0.10 through v0.14 -- pre-invocation discovery, query composition, manifest quality signals
Mar 27 "What the Infrastructure Made Possible" -- the cumulative case
Apr 6 "Agent Memory Rots" -- deterministic maintenance system ships; topic-health, topic-triage, nightly consolidation
Apr 15 "We Cancelled a 45-Minute Architecture Review" -- KCP query replaces meeting in 1.2 seconds
Apr 15 "Two Architectures for Claude Code" -- ExoCortex vs 19,700-star best-practice repository compared
Apr 16 "Seven Out of Eight Models Lied About Finishing" -- self-assessment failure as structural inevitability
Apr 17 "What Accumulates" -- on identity, continuity, and what an agent stack becomes over time
Apr 18 "AI Agents Forget Everything. That's a Choice." -- distributed memory for enterprise agent fleets
Apr 21 Synthesis v1.29.0 -- Notion as first-class workspace source; W022/W023/W024 health signals
Apr 21 "When Your Agent Can Finally Read the Room" -- documentation drift as agent reasoning problem

This body of work is ongoing. The blog has the latest posts; this page has the map.