Metadata-Version: 2.4
Name: wikimoth
Version: 0.2.1
Summary: Deterministic, token-minimal, reproducible memory for Claude and agents: wikilink-graph retrieval, then compaction, then a Claude reader.
Author: Julian Geymonat
License: Apache-2.0
Project-URL: Homepage, https://github.com/juliangeymonat-jpg/wikimoth
Keywords: rag,memory,agents,claude,wikilinks,multi-hop,deterministic
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: hybrid
Requires-Dist: rank_bm25; extra == "hybrid"
Provides-Extra: dense
Requires-Dist: transformers; extra == "dense"
Requires-Dist: torch; extra == "dense"
Requires-Dist: numpy; extra == "dense"
Provides-Extra: claude
Requires-Dist: anthropic; extra == "claude"
Provides-Extra: tokens
Requires-Dist: tiktoken; extra == "tokens"
Provides-Extra: headroom
Requires-Dist: headroom; extra == "headroom"
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Requires-Dist: rank_bm25; extra == "dev"
Dynamic: license-file

# WikiMoth

<p align="center">
  <img src="https://raw.githubusercontent.com/juliangeymonat-jpg/wikimoth/main/docs/img/wikimoth-logo.png" width="88" alt="WikiMoth">
</p>

### Connects the dots. The same way, every time.

**[wikimoth.com](https://wikimoth.com)** · `pip install wikimoth`

**Deterministic, token-minimal, auditable memory for Claude and agents.** Point WikiMoth at a
folder of `[[wikilink]]` notes (an Obsidian vault, or Claude's own memory folder) and it
follows the authored links to the answer flat search can't reach, shows you the exact note-chain
behind it, and feeds the reader ~99% fewer tokens than pasting the whole vault. Pure markdown,
no GPU, no vector DB, no LLM in the retrieval loop.

<img src="https://raw.githubusercontent.com/juliangeymonat-jpg/wikimoth/main/docs/img/connect-the-dots.png" alt="One question, three hops: WikiMoth follows your authored links to the note that holds the answer. Flat search stops at the first keyword match.">

```bash
pip install wikimoth
wikimoth install      # capture: turn your Claude Code sessions into a [[wikilink]] vault
wikimoth serve        # browse the vault + see "what memory fed this answer"
```

---

## Why not just let Claude manage its own context?

We benchmarked exactly that. An agent that browses the notes folder and prunes its own
context reaches the **same answers**, multi-hop included (12/12 in our run). It just pays for it:
**4 to 6 model round-trips and roughly 10x the billed tokens per question**, because it re-sends a
growing transcript every step. WikiMoth retrieves the same note-chain in **one deterministic pass,
no model in the loop**, and shows you the exact notes behind the answer.

<img src="https://raw.githubusercontent.com/juliangeymonat-jpg/wikimoth/main/docs/img/agent-vs-wikimoth.png" alt="Same answer, far less work: letting Claude prune its own context takes 4 to 6 model round-trips, about 10x the billed tokens, and roughly 9 seconds per answer; WikiMoth does it in one deterministic retrieval pass with zero model calls in the loop, in milliseconds, with an auditable note-chain. Both reach the right answer 12 out of 12.">

<sub>Real run, Claude Sonnet 4.6, 12 multi-hop questions on a reproducible vault. The ~10x counts a
reader on both sides; it is corpus-specific, not a universal law. Reproduce it with
`python scripts/run_agentic_benchmark.py`. Full breakdown in [Honest limits](#honest-limits).</sub>

---

## Why WikiMoth

Most agent memory is either *paste the whole notes folder into context* (expensive, and the model
gets lost in the middle) or *LLM-summarised similarity search* (lossy, and non-deterministic:
the same question can return different memory next week). WikiMoth takes a different bet: your notes
**are** the store (plain markdown), the graph is **authored** (your `[[wikilinks]]`, no embeddings
to train or drift), and retrieval is **code, not a model**, so it's reproducible and you can read
exactly why each note was chosen.

|  | **WikiMoth** | BM25 | Vector RAG | claude-mem | LLM Wiki (Karpathy) |
|---|:--:|:--:|:--:|:--:|:--:|
| Connects the dots (multi-hop over authored `[[links]]`) | ✅ | ❌ | ❌ | ❌ | ✅ *(agentic)* |
| Deterministic retrieval (same query → same result) | ✅ | ✅ | ✅ | ❌ | ❌ |
| No LLM call to retrieve | ✅ | ✅ | ✅ | ~ | ❌ |
| Auditable note-chain (which notes produced the answer) | ✅ | ~ | ❌ | ❌ | ~ |
| Direct-lookup recall@8 (real vault) | 1.00 | 1.00 | 1.00 | ~ | ~ |
| No GPU / no vector DB / no index build | ✅ | ✅ | ❌ | ~ | ✅ |
| Plain-markdown store (open in any editor) | ✅ | ~ | ❌ | ❌ | ✅ |
| Token-minimal vs dumping the vault | ✅ −99% | ✅ −99% | ✅ −99% | ✅ | ~ |
| Deterministic, API-free auto-capture | ✅ | ❌ | ❌ | ❌ | ❌ |
| Hygiene **without an LLM** (conflicts · dupes · stale · supersede) | ✅ | ❌ | ❌ | ~ | ❌ |

<sub>LLM Wiki *follows* links and skips the vector DB like WikiMoth, but an **LLM writes and reads** the wiki, so retrieval is agentic (an LLM call per recall, not reproducible), while its curated pages are richer. `~` = partial / not independently benchmarked.</sub>

The edge is the **combination**, not higher recall: WikiMoth *matches* flat search on the basics and
adds connect-the-dots + determinism + an audit trail + a plain-markdown store. See
[Honest limits](#honest-limits) for exactly where it ties and where it wins.

### Compared to Karpathy's LLM Wiki

WikiMoth shares the substrate Andrej Karpathy's *LLM Wiki* pattern popularised: plain-markdown
`[[wikilink]]` notes, no vector DB, but flips the **engine**. In the LLM-Wiki pattern an **LLM
writes *and* reads** the wiki: rich, source-cited pages, but recall is *agentic* (it costs an LLM
call and the path isn't reproducible). WikiMoth computes the edges in **code** and retrieves with a
**fixed algorithm, no LLM in the loop** → the same note-chain every time, reproducible and
auditable. They're complementary, not competing: point WikiMoth at a Karpathy-style wiki and you get
deterministic multi-hop retrieval over it. (We don't claim to be "better" than the LLM Wiki: it
curates richer pages; we retrieve deterministically.)

## Quickstart (read)

```python
from wikimoth import MemoryRAG, EchoReader

rag = MemoryRAG(reader=EchoReader())          # API-free default reader
rag.index("path/to/your/wikilink/vault")      # notes → ~400-token chunks, graph built

chunks, tokens = rag.retrieve("a connect-the-dots question?", top_k=8)
print(f"{len(chunks)} chunks, {tokens} tokens to feed the reader")

print(rag.answer("a connect-the-dots question?"))   # retrieve → compact → read
```

Swap in a real Claude answer (only touches the API when constructed):

```python
from wikimoth import MemoryRAG, ClaudeReader
rag = MemoryRAG(reader=ClaudeReader(model="claude-sonnet-4-6"))   # needs ANTHROPIC_API_KEY
```

## See what memory fed an answer: `wikimoth serve`

```bash
wikimoth serve                 # serves http://127.0.0.1:8765 (local-only)
wikimoth serve --vault PATH --port 8080
```

A zero-dependency local web viewer (pure stdlib, no Flask, no JS framework, no network):

- **browse + search** your notes,
- the authored **`[[wikilink]]` graph** (the same edges the retriever walks),
- and the one that matters, **"what memory fed this answer"**: type a question and see the exact
  note-chain WikiMoth would feed a reader, with per-chunk hop distance, token counts, and the `−N%`
  vs dumping the whole vault. Retrieval only: no LLM call, no API key, deterministic.

Because the store is plain markdown, you can equally open the same vault in Obsidian or VS Code;
the viewer is a convenience, not a lock-in.

## In the agent loop: `wikimoth mcp`

`wikimoth serve` is for *you*. The MCP server is for the *model*: it exposes the same deterministic
retrieval over the Model Context Protocol, so Claude calls it itself instead of you fetching context
by hand.

```bash
# 1. install into the Python that runs your Claude Code
python -m pip install wikimoth

# 2. verify the command resolves (prints status, then exits)
python -m wikimoth status

# 3. register the MCP server with Claude Code
claude mcp add wikimoth -- python -m wikimoth mcp
```

Step 2 is the check that matters: if `python -m wikimoth status` prints a status line, then
`python -m wikimoth mcp` will run for Claude too. Use the **same `python`** in all three steps (it is
`python3` on some systems); that is the one thing that has to match.

Prefer the Node world, or no Python set up? One line, no toolchain matching:

```bash
claude mcp add wikimoth -- npx -y wikimoth-mcp
```

The [`wikimoth-mcp`](https://www.npmjs.com/package/wikimoth-mcp) launcher finds a Python that has
WikiMoth (or `uvx`-installs one on the fly), injects the vault path so the server never reads an
empty folder from the client's working directory, and passes the MCP channel through untouched. The
same `npx -y wikimoth-mcp` works as the server command in any `mcpServers` config (Claude Desktop,
Cursor, Windsurf); set `WIKIMOTH_VAULT` to your vault.

Now Claude has a `recall(query)` tool. Ask it something that lives in your notes and it calls
`recall`; WikiMoth walks the `[[links]]` and hands back the exact note-chain (no LLM call to
retrieve, token-minimal, the same result every time), and Claude answers from it. A `status` tool
reports the connected vault. For any other MCP client, use `python -m wikimoth mcp` as the server
command (stdio transport); point it at a specific vault with `--vault PATH`.

`python -m wikimoth mcp` is the portable form (it runs wherever the package is installed). The bare
`wikimoth mcp` works too when the console script is on your PATH. It is pure stdlib: a hand-rolled
JSON-RPC 2.0 stdio server, no MCP SDK dependency.

mcp-name: io.github.juliangeymonat-jpg/wikimoth

## Capture: sessions → notes (the write half)

Retrieval needs a `[[wikilink]]` vault; hand-authoring one is the friction. `wikimoth.capture` builds
it automatically by installing Claude Code lifecycle hooks that turn each session into **one**
deterministic markdown note.

**The invariant that matters:** a note's `[[wikilinks]]` (the graph edges) are computed by code
(string/path matching), **never by a model**. An LLM may *optionally* draft the summary prose
(`WIKIMOTH_LLM_PROSE=1`), but any `[[...]]` it emits is stripped, never parsed as an edge. So the
graph is reproducible (same session + vault → same edges) and auditable. Default capture is fully
deterministic and makes **zero API calls**.

```bash
wikimoth install                 # writes 5 hooks into ./.claude/settings.json (absolute interpreter path)
wikimoth install --user          # ~/.claude/settings.json instead
wikimoth install --vault PATH    # choose where notes go (sets WIKIMOTH_VAULT)
wikimoth status                  # vault, note/session/buffer counts, hook state
wikimoth uninstall               # remove the hooks again
```

Lifecycle: **SessionStart** recalls recent sessions into context · **UserPromptSubmit / PostToolUse**
buffer the session · **Stop / SessionEnd** write one note. The captured notes are exactly what the
read pipeline indexes; capture and retrieval close the loop.

## Keep memory honest: the hygiene suite

A memory that only grows rots. Notes go stale, two notes start disagreeing, the same fact gets saved
twice, an old fact is replaced but never retired. Other agent-memory tools resolve this *with an LLM*
that silently overwrites the old state. WikiMoth ships six commands that surface it **deterministically**,
and **never delete anything**: git is your audit trail.

| command | finds | writes? |
|---|---|:--:|
| `wikimoth conflicts` | two notes asserting a different value for the same fact (type-aware, valid-time precision) | no |
| `wikimoth lint` | broken links, orphans, stubs, stale notes, supersession cycles | no |
| `wikimoth dedup` | exact + near-duplicate notes (MinHash + LSH, confirmed by exact Jaccard) | no |
| `wikimoth decay` | notes going cold: old, rarely linked, rarely recalled (a review queue, never auto-delete) | no |
| `wikimoth recall --as-of <date>` | what your memory asserted on a past date (bitemporal time-travel, no DB) | no |
| `wikimoth supersede OLD NEW` | retire a fact: invalidate, don't delete | **yes** |

Three principles hold across all of them:

- **Invalidate, don't delete.** `supersede` marks a note superseded and links it to its replacement
  in frontmatter. The old body drops out of retrieval, but its `[[link]]` to the current note stays
  live, so a query that lands on the stale note **hops free to the new one**. Nothing is ever `rm`'d;
  the history lives in git.
- **The tool never guesses.** Every command emits *candidates*. The one judgement that isn't
  mechanical, deciding whether two notes truly contradict, is left to the calling model, never baked
  into the tool. So the output is reproducible and you can read exactly why each candidate was flagged.
- **Bitemporal, no database.** `recall --as-of 2026-01-01` replays what the vault asserted on that
  date from frontmatter validity windows alone. No event log, no vector store, no migration.

Same invariants as the rest of WikiMoth: pure stdlib, deterministic (byte-identical output),
read-only except the single `supersede` writer, plain markdown you can diff. The seven commands are
also exposed over MCP (`list_conflicts`, `list_lint`, `list_duplicates`, `list_fading`, `supersede`,
plus `recall` gaining `as_of` / `show_superseded`), so the agent can keep its own memory clean.

## Install

<img src="https://raw.githubusercontent.com/juliangeymonat-jpg/wikimoth/main/docs/img/zero-infra.png" alt="In the retrieval loop: 0 GPUs, 0 vector DBs, 0 LLM calls.">

WikiMoth's core is **pure stdlib** (`dependencies = []`): the retrieval engine, chunker, wikilink
graph, pipeline and capture are all vendored under `wikimoth/`: nothing extra to install, no GPU, no
vector DB.

```bash
pip install wikimoth
# optional extras:
pip install "wikimoth[hybrid]"          # optional BM25-seeded retriever variant
pip install "wikimoth[claude,tokens]"   # real Claude reader + exact tiktoken counts
```

Extras: `hybrid` = BM25-seeded retriever (`rank_bm25`) · `claude` = the `anthropic` reader ·
`tokens` = exact token counts (`tiktoken`) · `dense` = the dense benchmark baseline · `headroom` =
reversible CCR compaction.

## How it works

<img src="https://raw.githubusercontent.com/juliangeymonat-jpg/wikimoth/main/docs/img/pipeline.png" alt="Pipeline: retrieve by walking the links, compact to the note-chain (about 5k tokens), read plain markdown with an audit trail, then capture new facts back.">

**retrieve → compact → read.** `index()` splits each note into ~400-token chunks (~50 overlap),
keeping per-chunk note identity so the `[[wikilink]]` graph still connects across chunks (multi-hop
at chunk granularity). `GraphRetriever(source="wikilinks")` seeds lexically, then walks the authored
links, so a passage *not* lexically similar to the question but *reachable by a link* still gets
pulled. An optional compaction stage (reversible CCR via
[`chopratejas/headroom`](https://github.com/chopratejas/headroom)) shrinks passages further before
the (paid) reader; it degrades to a no-op if headroom isn't installed.

A pure-navigation hub (a table-of-contents like `MEMORY.md`) can be indexed as **graph edges only**
(`exclude_content`, default `("MEMORY.md",)`): its `[[links]]` build edges and it stays a BFS
waypoint, but its own chunks never reach the reader.

Every stage is constructor-injectable via `MemoryRAG(retriever=…, compactor=…, reader=…)`, so you can
swap the retriever (e.g. the BM25-seeded `HybridRetriever`), the compactor, or the reader.

## Benchmark: tokens fed to the reader

<img src="https://raw.githubusercontent.com/juliangeymonat-jpg/wikimoth/main/docs/img/token-cost.png" alt="Tokens to answer one question: about 482,000 to paste the whole vault vs about 5,000 for the WikiMoth note-chain, a 99% cut versus dumping the vault.">

`wikimoth.benchmark.harness` measures *tokens fed to the reader* (what you actually pay for) across
arms over the **same** vault and questions:

| arm | feeds the reader | status |
|---|---|---|
| `dump` | the whole vault | baseline |
| `deterministic` | wikilink-graph retrieval | implemented |
| `deterministic_compacted` | retrieval + Headroom | implemented |
| `agentic` | an LLM browses and prunes its own context | implemented (Claude tool-use) |

No paid API calls run by default; every arm's reader defaults to the API-free `EchoReader`.

## Honest limits

<img src="https://raw.githubusercontent.com/juliangeymonat-jpg/wikimoth/main/docs/img/multi-hop.png" alt="Link-only answers reached: BM25 0%, vector/dense 0%, WikiMoth 100%, on link-only corpora. On direct lookups all three tie at recall@8 = 1.00.">

<img src="https://raw.githubusercontent.com/juliangeymonat-jpg/wikimoth/main/docs/img/determinism.png" alt="Same query run five times: WikiMoth returns one distinct result, LLM-based memory varies run to run.">

WikiMoth's value is **deterministic, auditable, token-minimal, plain-markdown** memory with a real
multi-hop capability, not "better retrieval than BM25". Specifically:

- **The −99% is vs *dumping the vault*** (≈5k vs ~482k tokens on a real 356-note vault), not vs
  BM25: a tuned BM25-RAG also feeds ~5k. The win is against the realistic status quo (paste
  everything / naive whole-note RAG), and it's deterministic.
- **On a typical real vault, retrieval ≈ BM25.** Direct-lookup recall@8 ties at 1.00. The
  multi-hop / connect-the-dots win (0% → up to 100% where flat search scores zero) shows up on
  **curated, link-heavy** corpora; on an average vault, hybrid is *never worse than BM25*, not
  strictly better on recall.
- **Determinism** is inherent to any static retriever (BM25/dense too); WikiMoth's determinism win is
  specifically **vs LLM-summarised memory** (which varies run to run).
- **vs letting the model prune its own context** (the `agentic` arm, real run against Claude
  Sonnet 4.6, 12 multi-hop questions): the agent reaches the same answers, multi-hop included
  (12/12). The difference is cost. It takes 4 to 6 paid round-trips and about 10x the billed tokens
  per question, because it re-sends a growing transcript each step, where WikiMoth answers from one
  deterministic pass with no model call in the retrieval loop and an auditable note-chain. The
  multiple is corpus-specific, not a law. Reproduce it: `python scripts/run_agentic_benchmark.py`.

## Pluggable + License

`MemoryRAG(retriever=…, compactor=…, reader=…)`; defaults `GraphRetriever(source="wikilinks")` /
`NoOpCompactor` / `EchoReader`. Anything satisfying the small Protocols drops in.

Apache-2.0; see [LICENSE](LICENSE). © 2026 Julian Geymonat.
