Skip to content

When your agent can finally read the room

Synthesis v1.29.0 adds Notion as a first-class workspace source — 7 classes, 104 tests, and three health signals that turned out to be more interesting than the indexing itself.

Synthesis v1.29.0: Bridging the gap between the codebase and organizational context. Blueprint and topographic map aesthetic — a terminal on the left, context nodes on the right.

The wall

It always started the same way. I would point an ExoCortex agent at a codebase — say, a Spring Boot service with a couple hundred files — and ask it to figure out why the authentication flow worked the way it did. The agent would do what it does well: read the code, trace the call chain, build a model of the technical structure. And then it would stop. Not because it ran out of files, but because the why was not in the files.

The answer was in a Notion page somewhere. An architecture decision from three months ago. A compliance requirement documented during due diligence. Meeting notes where someone said "we cannot use OAuth implicit flow because the regulator said no." That context — the reasoning behind the code — lived in a completely different system, unreachable from the agent's perspective.

For a while I just lived with it. I would paste relevant Notion excerpts into agent context manually, or maintain Markdown mirrors of critical pages. The manual approach works until it does not, and the moment it stops working is when you most need it: under time pressure, switching between client contexts, running multiple agents on different tasks.

So Synthesis learned to read Notion.

The Execution Path Does Not Explain the Reasoning. The agent sees the code — GET /api/transaction, TransactionController, validate, ComplianceCheck — but misses "The Why": the architecture decision, the compliance requirement, the meeting notes saying "we cannot use OAuth implicit flow because the regulator said no."

The realization that made it simple

Notion, if you squint past the block-based UI, is a document store with a tree structure and an HTTP API. Pages have parents. Pages have children. Pages contain blocks that render as text, headings, lists, code, tables. Strip away the product design and you are looking at something structurally identical to a filesystem with Markdown files in nested directories.

Squinting Past the Product Design. Notion is simply a document store with a tree structure and an HTTP API. Pages with Parents → Nested Directories. Pages with Children → Sub-folders. Blocks (Text, Headings, Code) → Markdown Content. Strip away the UI, and Notion is structurally identical to a filesystem containing Markdown files.

Once you frame it that way, the integration design is almost boring. You need five things: an API client (with rate limiting, because Notion's API will throttle you at 3 requests per second), a block-to-Markdown converter, a mapper that turns the page hierarchy into virtual filesystem paths, a sync state tracker, and a watcher for incremental updates. That is exactly what was built.

The Architecture of a Bridge: five components in sequence. 1. API Client — Java 21 HttpClient, rate-limited to 3 req/s, automatic pagination. 2. Converter — translates 15+ block types and rich text annotations to standard Markdown. 3. Mapper — hierarchy traversal, turns parent chains into virtual paths, depth limits at 10 levels. 4. Tracker — sync state management stored via a V21 Flyway migration. 5. Watcher — incremental syncs and virtual thread polling for updates.

The NotionPageMapper walks up the parent chain for each page and constructs a path. A page called "Architecture" under "Engineering" becomes Engineering/Architecture.md. Cycle detection, depth limits at 10 levels, collision resolution by appending an ID suffix when two pages produce the same path. The virtual files get indexed in Lucene with a notion:// prefix so they are distinguishable from filesystem documents in search results.

The block converter handles 15+ block types: paragraphs, three heading levels, bulleted and numbered lists, to-do items, code blocks with language hints, quotes, callouts, dividers, images, bookmarks, tables, child page references, child databases. Unknown types render as HTML comments — lossless awareness rather than silent data loss. Rich text annotations map to Markdown formatting: bold, italic, strikethrough, inline code, links. Combined annotations work. It is the kind of tedious, detail-oriented work that benefits from having 33 tests specifically for the converter.

Lossless Awareness Over Silent Data Loss. Supported Formats: 15+ block types handled. Safety Mechanism: unknown types do not disappear — they render as HTML comments for lossless awareness. Combined Annotations: properly maps overlapping bold, italic, strikethrough, and inline code to Markdown. Backed by 33 dedicated unit tests for block conversion accuracy.

The configuration is minimal:

# .synthesis/config.yaml
notion:
  enabled: true
  token: "${NOTION_TOKEN}"
  rootPageId: "optional-root-page-id"
  pollIntervalMinutes: 15
synthesis -d /workspace init --source notion   # appends Notion config block
synthesis -d /workspace scan                   # full sync
synthesis -d /workspace watch                  # incremental polling on virtual thread
synthesis health                               # includes W022/W023/W024

How you structure a feature for AI-driven development

This was implemented by ExoCortex — specifically, Claude Opus working through 6 sequential phases, each fully tested before moving to the next:

  1. Database schema and sync state DAO (V21 Flyway migration: notion_pages + notion_sync_state)
  2. NotionClient: Java 21 HttpClient, rate-limited to 3 req/s, automatic pagination, 429 retry
  3. NotionBlockToMarkdown: 15+ block types, rich text annotations
  4. NotionPageMapper: hierarchy traversal, virtual path building, collision resolution
  5. NotionWorkspaceSource + NotionWatcher: full sync, incremental sync, virtual thread polling
  6. Health checks, ScanCommand/WatchCommand/StatusCommand/MaintainCommand integration

104 new tests across 9 test classes. Zero Notion API calls in any test — a custom StubHttpClient that delegates to a configurable response handler, no Mockito or WireMock involved. The total test suite after the feature landed: 4,428 tests, 0 failures.

The Six-Phase Execution Staircase. Each step reaches a passing test suite before the next begins: 1. DB Schema & Sync State DAO → 2. NotionClient (429 retries) → 3. NotionBlockToMarkdown → 4. NotionPageMapper (collision resolution) → 5. WorkspaceSource & NotionWatcher → 6. Health Checks & Command Integration. Testing Metrics: 104 new tests across 9 classes. 0 Notion API calls in tests (custom StubHttpClient). Total test suite: 4,428 tests, 0 failures.

The interesting question is not "can AI write code?" — that discourse is over. The interesting question is how you decompose a feature so that an AI agent can build it phase by phase without losing coherence. The answer is to think like an architect rather than a prompt engineer. Define the interfaces between phases. Ensure each phase is independently testable. Let the agent run each phase to green before starting the next. The discipline is identical to what makes software projects work with human teams — clear boundaries, verifiable intermediate states, no phase that depends on vibes from a previous phase.

Think Like an Architect, Not a Prompt Engineer. The Paradigm Shift: the question is no longer "can AI write code?" but "how do you decompose a feature so an agent can build it coherently?" Four principles: Structural Boundaries — define strict architectural interfaces between phases. Independent Verification — ensure each phase is independently testable before moving to the next. No Ambiguity — never rely on "vibes" or assumptions from a previous execution phase.

One small war story along the way: another PR landed the V20 Flyway migration on the same day we needed it for Notion. Classic. We bumped to V21 and resolved it in one commit.

The sleeper feature: health signals as organizational intelligence

I almost shipped this without the health checks. Search works, incremental sync works, pages show up in the knowledge graph — done, right? But Synthesis already had health signals for filesystem content, and it felt wrong to add an entire workspace source without the same observability.

The Sleeper Feature. The Missing Observability: Synthesis already had health signals for filesystem content. Adding external workspaces without the same observability felt fundamentally incomplete. The Unintended Discovery: what started as basic operational health monitoring — ensuring pages were syncing correctly — evolved into something vastly more valuable than just search indexing.

Three new signals:

W022 (notion-stale): The workspace has not synced in more than three times the configured poll interval, or has never synced at all. A single SQL query against notion_sync_state. If your poll interval is 15 minutes and the last sync was 90 minutes ago, something is broken and nobody noticed. If it has never synced, you configured the integration and then forgot to actually run it. Both are common failure modes, both invisible without something that actively tells you.

W023 (notion-orphan): Pages whose parent page ID does not match any known page in the workspace. Structural drift. This happens when someone reorganizes a Notion workspace — moves pages to a different section, deletes a parent without moving the children. The orphaned pages still exist and still have content, but the virtual path hierarchy is broken. An agent would find them under a path that no longer reflects reality.

W024 (notion-conflict): Two or more pages produce the same virtual filesystem path. Caught and resolved automatically during sync (collision resolution appends an 8-character ID suffix), but the health check flags it because it signals an underlying naming issue that confuses humans too.

What surprised me is that these checks, taken together, function as a lightweight audit of documentation culture. W022 tells you if your sync pipeline is healthy. W023 tells you if someone reorganized without cleaning up. W024 tells you if naming conventions have drifted. None of this was the goal — the goal was operational health monitoring — but the side effect is organizational linting.

Diagnostic Matrix: Decoding System Health. W022 (notion-stale): workspace hasn't synced in >3x the configured poll interval → a broken pipeline nobody noticed, or a forgotten configuration. W023 (notion-orphan): page parent ID does not match any known page → structural drift, someone sloppily reorganized the workspace without moving children. W024 (notion-conflict): two pages produce the exact same virtual path → naming convention drift and human confusion (auto-resolved via 8-char ID suffix).

Operational Checks as Organizational Linting. The Core Insight: taken together, these three technical error codes function as a lightweight audit of a team's documentation culture. The Unintended Value: the original goal was purely operational health monitoring. The side effect was the creation of a "linter" for human organizational chaos.

What this actually enables

When an ExoCortex agent picks up a task now, it can navigate:

  • The codebase — filesystem source, indexed as before
  • The architecture decisions — Notion
  • The product requirements — Notion
  • The compliance context — Notion (for some clients, this is a private regulatory knowledge base)
  • The meeting notes — Notion

The Contextual Breakthrough. ExoCortex Agent at the centre, connected to five sources: 1. The Codebase (Filesystem Source), 2. Architecture Decisions, 3. Product Requirements, 4. Compliance Context, 5. Meeting Notes. "An agent that reads code but not decisions is just making guesses. An agent reading the decision log understands the complete reality of the system."

Synthesis v1.29.0: Giving AI Agents the "Why" Behind the Code. The Context Gap: AI agents see what (code) but not why (context). The Bridge: Notion as a virtual filesystem — 15+ block types, intelligent path mapping, health signals. Full Context Agents: navigate codebase, architecture, requirements, compliance, and meeting notes in a single unified view. The 6-Phase Build Process: Foundation & Connectivity → Mapping & Conversion → Sync & Observability. Understanding Intent, Not Just Syntax: "The value is not better search. The value is agents with full organizational context."

The difference is not incremental. An agent that can read your code but not your architecture decisions is making guesses about intent. An agent that can also read the decision log, the requirements, the constraints — that agent understands the context the code lives in. It can answer "why is the auth flow like this?" not by inferring from code structure, but by reading the page where someone wrote down the reasoning.

This is the value: not better search. The value is agents with full organizational context.

The Evolution of Agent Intelligence. Isolated Agent (v1.28): guesses intent based purely on code structure, confined strictly to the local filesystem, blind to external rules. Contextual Agent (v1.29): reads the actual decision log to understand the why, navigates the entire filesystem + corporate workspace, fully aware of regulatory and compliance blockers natively. "The value is not just better search; the value is agents with full organizational context."

What is not there yet

Notion support is read-only. There is no write-back — agents cannot create or update Notion pages. Database properties (status, assignee, dates) are not indexed as structured metadata. There is no embedding-based semantic search over page content; Lucene full-text is good enough for most queries, but "which pages discuss rate limiting?" would benefit from vector retrieval. The incremental sync re-fetches the full page list and filters by last_edited_time client-side; Notion's API makes server-side filtering on edit time harder than it should be.

Known Boundaries and the Implementation Roadmap. 1. Read-Only Operations: no write-back capabilities, agents cannot create or update pages yet. 2. Metadata Extraction: database properties (status, assignee, dates) not yet indexed as structured metadata. 3. Semantic Search: relying entirely on Lucene full-text, lacks embedding-based vector retrieval. 4. Sync Filtering: incremental sync relies on client-side filtering via last_edited_time due to API limitations.

These are natural next steps, each one an independent feature with clear boundaries and testable intermediate states.

The shape of the thing

Every few weeks, ExoCortex gets a little more complete as a picture of how to work with AI agents properly. Not "throw everything at GPT and hope," but structured, disciplined augmentation: clear interfaces, verified states, health monitoring, proper test coverage. Synthesis started as a filesystem indexer and is now something closer to a unified workspace intelligence layer — code, documentation, organizational knowledge, all searchable, all health-checked, all exportable as KCP for agent consumption.

The Notion integration is 7 classes and 104 tests. It took about a day to build and review. The thing it enables — agents that understand why, not just what — is the part worth building toward.

ExoCortex Unified Layer. Synthesis has evolved from a basic filesystem indexer into a comprehensive workspace intelligence layer. Code, documentation, and organizational knowledge are now unified. Everything is fully searchable, health-checked, and exportable as Knowledge Context Providers (KCP) for agent consumption. "The ultimate goal is not throwing everything at an LLM and hoping. It is building structured, verified access so agents can finally understand why, not just what."


Synthesis v1.29.0 released April 21, 2026. The Notion workspace source is available to all Synthesis pilot users.