This page explains Cairn from the repository's point of view: what gets committed, what gets generated, how repo-wide search works, and how MCP clients drill from a global hit into exact document sections.
Cairn treats each source document as a separate structured index, then adds a repo-scoped layer for discovery, global ranking, and client routing. The loop repeats whenever docs change.
The repository keeps its documentation policy close to the code. The expensive derived artifacts stay local and can be regenerated.
.cairn/config.toml defines include/exclude globs, primary doc, MarkItDown policy, locale preference, and repo search diversity..cairn/manifest.json records freshness and document states..cairn/documents/<doc_id>/ stores one normal Cairn index per source document.Cairn does not copy BookRAG, RAPTOR, or A-RAG as papers. It turns their useful retrieval patterns into a repository workflow with a stable CLI, generated indexes, and MCP tools.
During sync, every source document becomes its own
structure-aware index: section tree, summaries, entities, xrefs, and
vectors. The document stays the primary retrieval unit instead of
disappearing into anonymous chunks.
Cairn keeps the multi-level summary idea, but anchors it to the author's heading tree. Agents can start with gists and synopses, then expand only the sections that justify full text.
The agentic part lives in the tool surface: repo discovery first, then document drilldown. Cairn exposes typed retrieval tools and lets the MCP client plan, inspect, and cite.
Repo config, manifest freshness, stale detection, per-document failure isolation, global hybrid ranking, and hit explanations are the engineering layer that makes those ideas usable in real repos.
A maintainer should be able to explain the whole system in six commands. Agents see a stable tool surface; humans get local diagnostics before putting the index behind MCP.
docsgraph init -y creates a conservative repo docs policy.
docsgraph sync --fake indexes every discovered document with deterministic local providers.
docsgraph status and docsgraph doctor show freshness, routing, and provider health.
docsgraph query repo "..." mirrors the hybrid ranker; MCP clients can call repo_context for a full context pack.
docsgraph serve --fake starts a repo-scoped MCP server with structured envelopes.
Edit docs, watch status become stale, sync again, and keep the committed policy stable.
Cairn's repo mode is intentionally two-stage. First, search across the
repository to pick candidate sections. Then, use normal document tools
with doc to inspect exact structure and text.
search_documents blends dense vectors, lexical field support, BM25-style sparse evidence, doc/path identity, and graph-neighborhood support.
repo_context composes ranked hits, compact section content, hit explanations, local relationships, and a relationship map in one MCP call.
repo_graph returns the docs relationship map. repo_impact reports derived artifacts and docs surfaces affected by document or section changes.
sections_per_doc defaults to the repo config so agents can discover the right document before going deep.
get_section(doc=..., id=...), outline(doc=...), expand(doc=...), and read_range(doc=...) return exact slices with stable anchors.