Metadata-Version: 2.4
Name: thought-mcp
Version: 0.4.0
Summary: THOUGHT — Temporal Hierarchical Object Union & Graph Hybrid Toolkit. A local MCP memory server with bi-temporal graph + vector + temporal layers, a query router, and a consolidation engine.
Project-URL: Homepage, https://github.com/RNBBarrett/thought-mcp
Project-URL: Documentation, https://github.com/RNBBarrett/thought-mcp#readme
Project-URL: Issues, https://github.com/RNBBarrett/thought-mcp/issues
Project-URL: Source, https://github.com/RNBBarrett/thought-mcp
Project-URL: Changelog, https://github.com/RNBBarrett/thought-mcp/blob/main/CHANGELOG.md
Author: Richard Barrett
License-Expression: MIT
License-File: LICENSE
Keywords: ai,graphrag,knowledge-graph,llm,mcp,memory,temporal,vector-search
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Requires-Dist: mcp>=1.9
Requires-Dist: numpy>=1.26
Requires-Dist: psutil>=5.9
Requires-Dist: pydantic-settings>=2.2
Requires-Dist: pydantic>=2.6
Requires-Dist: python-ulid>=2.7
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0
Requires-Dist: scipy>=1.13
Requires-Dist: sqlite-vec>=0.1.6
Requires-Dist: structlog>=24.1
Requires-Dist: typer>=0.12
Provides-Extra: all
Requires-Dist: mcp>=1.9; extra == 'all'
Requires-Dist: pygit2>=1.15; extra == 'all'
Requires-Dist: sentence-transformers>=2.7; extra == 'all'
Requires-Dist: spacy>=3.7; extra == 'all'
Requires-Dist: sqlite-vec>=0.1.6; extra == 'all'
Requires-Dist: tree-sitter-python>=0.23; extra == 'all'
Requires-Dist: tree-sitter-typescript>=0.23; extra == 'all'
Requires-Dist: tree-sitter>=0.23; extra == 'all'
Provides-Extra: code
Requires-Dist: tree-sitter-python>=0.23; extra == 'code'
Requires-Dist: tree-sitter-typescript>=0.23; extra == 'code'
Requires-Dist: tree-sitter>=0.23; extra == 'code'
Provides-Extra: dev
Requires-Dist: freezegun>=1.5; extra == 'dev'
Requires-Dist: hypothesis>=6.100; extra == 'dev'
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-benchmark>=4.0; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.5; extra == 'dev'
Provides-Extra: embeddings-bge
Requires-Dist: flagembedding>=1.2; extra == 'embeddings-bge'
Provides-Extra: embeddings-local
Requires-Dist: sentence-transformers>=2.7; extra == 'embeddings-local'
Provides-Extra: embeddings-openai
Requires-Dist: openai>=1.30; extra == 'embeddings-openai'
Provides-Extra: git
Requires-Dist: pygit2>=1.15; extra == 'git'
Provides-Extra: llm-anthropic
Requires-Dist: anthropic>=0.30; extra == 'llm-anthropic'
Provides-Extra: llm-ollama
Requires-Dist: httpx>=0.27; extra == 'llm-ollama'
Provides-Extra: llm-openai
Requires-Dist: openai>=1.30; extra == 'llm-openai'
Provides-Extra: mcp
Requires-Dist: mcp>=1.9; extra == 'mcp'
Provides-Extra: ner
Requires-Dist: spacy>=3.7; extra == 'ner'
Provides-Extra: postgres
Requires-Dist: pgvector>=0.3; extra == 'postgres'
Requires-Dist: psycopg[binary]>=3.1; extra == 'postgres'
Provides-Extra: sqlite-vec
Requires-Dist: sqlite-vec>=0.1.6; extra == 'sqlite-vec'
Description-Content-Type: text/markdown

# THOUGHT

[![PyPI](https://img.shields.io/pypi/v/thought-mcp.svg)](https://pypi.org/project/thought-mcp/)
[![Python](https://img.shields.io/pypi/pyversions/thought-mcp.svg)](https://pypi.org/project/thought-mcp/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![CI](https://github.com/RNBBarrett/thought-mcp/actions/workflows/ci.yml/badge.svg)](https://github.com/RNBBarrett/thought-mcp/actions/workflows/ci.yml)
[![Docker](https://github.com/RNBBarrett/thought-mcp/actions/workflows/docker.yml/badge.svg)](https://github.com/RNBBarrett/thought-mcp/actions/workflows/docker.yml)
[![GHCR](https://img.shields.io/badge/ghcr.io-thought--mcp-blue?logo=docker)](https://github.com/RNBBarrett/thought-mcp/pkgs/container/thought-mcp)

**T**emporal **H**ierarchical **O**bject **U**nion & **G**raph **H**ybrid **T**oolkit — a local MCP memory server that gives any LLM a persistent, auditable memory fabric on your own machine.

> OB1 stores your thoughts. Karpathy's wiki compiles your knowledge. **THOUGHT** remembers with provenance, understands relationships, detects contradictions, never forgets what used to be true — and routes every query to the right mathematical structure before touching a single byte of data.

---

## ✨ New in v0.4 — Lifecycle commands + Local LLMs + Cypher + Ask

Four headline capabilities:

1. **`thought db`** — `size` / `flush` / `backup <file>` / `load <file>` / `inspect <file>`. With `--before` / `--since` / `--time-axis` to slice the KB by date. Always auto-backs-up before destructive operations.
2. **Local-LLM integration** — `embedder.choice = "ollama"` / `"lmstudio"` / `"openai-compat"`. `thought ollama-setup` and `thought lmstudio-setup` discover models and write your `thought.toml`. Same provider switch covers auto-write's `--mode extract` so durable-fact extraction runs locally too.
3. **Cypher query layer** — `thought query "MATCH (p:PERSON)-[:WORKS_AT]->(o) RETURN p.name, o.name"`. Saved views (`thought view save adidas_seattle "..."`) turn queries into named, re-evaluating memory constructs. `thought schema` shows what's queryable. `AS_OF` for time-travel.
4. **`thought ask` in English** — natural-language → Cypher via your configured LLM (Anthropic / Ollama / LM Studio / OpenAI-compat). Validates against the parser; bad translations degrade to plain `recall` so you always get something. `--save-as <name>` harvests good translations into saved views.

```bash
# Backup before doing something risky:
thought db backup ./snap.db && thought db flush --since 2026-01-01 --time-axis valid --yes

# Run with Ollama (no API keys, fully local):
ollama serve && ollama pull nomic-embed-text
thought ollama-setup --write

# Ask in English (uses whichever LLM is configured):
thought ask "who at Acme also prefers Adidas?" --explain --save-as adidas_acme

# Or write Cypher directly:
thought query "MATCH (p:PERSON)-[:PREFERS]->(:CONCEPT {name:'Adidas'}) RETURN p.name"
```

See [CHANGELOG.md](CHANGELOG.md#040--2026-05-15--db-lifecycle--local-llms--cypher--ask) for the full v0.4 list. v0.3's auto-memory hooks (below) and v0.2's code-vertical surface are unchanged — v0.4 is purely additive.

---

## ✨ Previously in v0.3 — Auto-memory + topic browsing

**Memory that captures + retrieves itself, plus a way to see what's in there.** Wire two Claude Code hooks into your project, restart, and from that point on every assistant turn is auto-ingested and every user prompt is pre-recalled against the KB — no explicit `remember` / `recall` calls needed. Discoverability: a new `thought topics` command lists what the KB *contains* (people, places, functions, concepts…) and `thought browse <topic>` drills into a specific area.

```bash
# One command wires both hooks into Claude Code (project-scoped settings.json).
thought hook install --both

# What's in my memory right now?
thought topics
#  type           count  examples
#  CONCEPT        89     Acme, Adidas, dessert
#  function       425    personalized_pagerank, recall, remember
#  PERSON         47     Alice, Bob, Dana
#  ORGANIZATION   12     Acme Corp, Beta, OpenAI

# Drill into a topic (or an entity name)
thought browse dessert --depth 2
#  #   type     entity   score
#  1   CONCEPT  donut    0.42
#  2   CONCEPT  cake     0.31
#  3   CONCEPT  pastry   0.18
```

The auto-recall hook uses the `additionalContext` field on Claude Code's `UserPromptSubmit` event (10k-char cap, low-confidence gated so it doesn't pollute context). The auto-write hook runs on `Stop`, picks the last user + last assistant turns out of the session transcript, and pipes them into the existing ingest pipeline — content-sha256 idempotency + Jaccard dedup keep replays from doubling the KB. Optional `--mode extract` routes each turn through Haiku first for higher-signal facts (~$0.001/turn).

Two new MCP tools are also exposed for agents that want the discoverability primitives directly: **`mcp__thought__list_topics`** and **`mcp__thought__browse_topic`**.

What's in the bi-temporal model already does the heavy lifting on contradictions — if you say "I prefer Adidas" on Monday and "I prefer Nike" on Friday, the second auto-write supersedes the first via a `CONTRADICTS` + `SUPERSEDES` edge pair, exactly like Zep's temporal validity windows from the recent literature.

```bash
# Per-hook install if you'd rather pick them à la carte:
thought hook install --recall          # auto-recall only
thought hook install --write           # auto-write only
thought hook install --write --scope user  # global to your home dir
```

See [CHANGELOG.md](CHANGELOG.md) for the full v0.3 list. v0.2's code-vertical surface (below) and v0.1's horizontal-memory surface are unchanged — v0.3 is purely additive.

---

## ✨ Previously in v0.2 — Memory for AI coding agents

v0.2 specialises the same architecture for the workflow with the strongest natural fit: **AI-assisted coding**. THOUGHT now parses your source with tree-sitter, builds a real function-call graph as typed edges, and stamps every fact with its git commit. The bi-temporal `as_of` queries you already had now answer *"what did the codebase look like at commit X?"* for free.

```bash
thought ingest-code src/                      # tree-sitter ingest, multi-language
thought ingest-git . --mode full              # stamp every commit
thought callers GraphLayer.personalized_pagerank
#  #  score    type    entity                              file
#  1  0.0132   method  Dispatcher._dispatch_code           Dispatcher
#  2  0.0130   method  Dispatcher._dispatch_fact           Dispatcher
#  3  0.0122   method  CodeLayer.impact_set                CodeLayer
#  4  0.0110   method  CodeLayer.callers_of                CodeLayer
thought impact authenticate_user              # what's affected if I change this?
thought diff --from v1.0 --to HEAD            # set diff between two commits
```

**Real measurement on this codebase**: 38 files → 425 entities → 575 CALLS edges in **<250 ms**. The killer-demo query *"who calls GraphLayer.personalized_pagerank"* returns the four real callers ranked by Personalized PageRank in **60 ms** on a 1086-edge graph.

What's new:

- **AST-aware ingest** via tree-sitter — Python + TypeScript / JavaScript out of the box, multi-language plugin shape for the rest.
- **Function-call-graph edges** — `CALLS`, `IMPORTS`, `INHERITS_FROM`, `OVERRIDES`, `DEFINES` as typed edges. The Graph Layer's HippoRAG-style PageRank then powers ranked callers / impact-set queries.
- **Git-history stamping** — every entity carries `code_commit_sha`. `thought diff --from <sha1> --to <sha2>` returns the set difference of functions between two commits.
- **New Router CODE class** — natural-language queries like *"who calls authenticate_user"* route through the call-graph machinery automatically.
- **5 new CLI commands**: `ingest-code`, `ingest-git`, `callers`, `impact`, `diff`.

See [CHANGELOG.md](CHANGELOG.md) for the full v0.2 list. The v0.1 horizontal-memory surface below is unchanged — v0.2 is purely additive.

---

## What is THOUGHT?

THOUGHT is a **memory server for LLMs**. You install it on your machine, wire it into your AI coding assistant (Claude Code, Cursor, Cline, Continue, Windsurf), and now your assistant has a brain that persists across conversations and across projects.

Everything runs **locally** — your memory is a single SQLite file on your laptop. No cloud, no account, no sync service, no API key.

### The problem it solves

Out of the box, AI coding assistants have goldfish memory. Every new conversation starts blank. If you told it last week to *"always use Postgres for v2 features,"* you'll be telling it again today. If you decided in March that *"the auth module is being rewritten,"* by April that context is gone.

Existing fixes don't really solve this — they trade one problem for another:

| Common workaround | What goes wrong |
|---|---|
| **Stuff context into your system prompt** | You hit token limits fast, and the model can't tell what's current vs. obsolete. |
| **Cloud memory** (ChatGPT, Claude Projects) | Locked to one vendor, no audit log, can't query *"as of last week,"* no contradiction handling. |
| **RAG over your notes** (mem0, Letta, …) | Stores facts as flat vectors. No relationships between facts, no time tracking, no provenance, no notion of "this used to be true." |
| **An LLM-maintained Markdown wiki** (Karpathy's gist) | Lossy by design (the LLM summarises everything), grows linearly, no semantic search, no temporal queries. |

THOUGHT fixes the structural issues, not just the symptoms.

### What you get

Once installed, your AI assistant gains two new tools it can call automatically when the conversation implies it:

- **`remember(content)`** — *"note that we decided X."* THOUGHT extracts the entities and relationships, embeds them for similarity search, and links everything to its source so you can audit later.
- **`recall(query)`** — *"what did we decide about X?"* THOUGHT figures out what kind of question you asked, routes it to the right retrieval strategy, and returns at most 10 hits — each tagged with how trustworthy it is.

You can also drive it from your terminal (CLI) or use the Python API directly.

### Why it's better than existing solutions

The TL;DR, in plain English:

- **It knows when facts changed.** Every fact carries two timestamps: when it was true in the world, and when the system learned it. *"What did we say about pricing on Jan 15?"* actually works — even if pricing changed on Feb 3.

- **It tracks how facts relate.** Functions, classes, people, projects, decisions — they're all entities in a typed graph (CALLS, OWNS, INHERITS_FROM, CONTRADICTS, …). Asking *"who calls `authenticate_user`?"* is a real graph query, not a fuzzy text match.

- **It refuses to hallucinate relationships.** Every edge has a mandatory pointer back to the source document that produced it. If a fact has no source, it doesn't exist. No more *"the model invented a connection that was never in the data."*

- **It surfaces contradictions instead of silently overwriting.** When you say *"auth is now using sessions"* after previously saying *"auth is JWT,"* both facts stay. A `CONTRADICTS` edge is created. `recall` can then answer *"what facts about auth are currently disputed?"*

- **It picks the right retrieval method per question.** Fuzzy associative queries hit vector similarity. Relationship queries hit graph traversal. Time-travel queries hit the temporal layer. The wrong question never hits the wrong index.

- **It bounds output.** No matter how big the knowledge base gets, `recall` returns at most 10 hits. Your context window doesn't get blown up by a runaway retrieval.

- **It's append-only.** Nothing is ever deleted. When facts go stale, they're retired (their validity window closes), not erased. Full forensic audit of every change.

- **It's natively multi-user.** `scope='shared'` for project-wide facts, `scope='private'` with `owner_id` for personal notes. Five devs on one repo each get their own private memory plus a shared common pool.

Plus **eleven cutting-edge retrieval techniques** from 2024–2026 literature (Anthropic Contextual Retrieval, HippoRAG-style PageRank, bi-temporal Graphiti, CRAG, MetaRAG confidence, …) stacked on top — see the [Frontier techniques](#frontier-techniques-incorporated-with-credits) section below for the full list with citations.

The technical capability matrix vs. the closest comparable systems:

| | OB1 (pgvector) | Karpathy LLM-Wiki | **THOUGHT** |
|---|---|---|---|
| Relationship logic | flat rows | flat markdown | **typed graph edges** |
| Temporal awareness | none | none | **bi-temporal (world-time + learned-time)** |
| Provenance | informal tag | informal citation | **mandatory `source_ref` on every edge** |
| Multi-user | RLS bolted on | single-user | **native two-zone graph** |
| Query routing | always vector | always inject | **VIBE / FACT / CHANGE / CODE / HYBRID router** |
| Contradiction model | absent | LLM lint only | **`CONTRADICTS` typed edge, write-time** |
| Bounded result size | unbounded | unbounded | **≤10 enforced** |

### What THOUGHT is **not**

- **Not a cloud service.** Everything runs locally. No data leaves your machine.
- **Not a vector DB replacement.** It uses one (sqlite-vec by default, pgvector optional), but adds the graph + temporal layers on top.
- **Not a fine-tuner.** It doesn't change your model. It changes what your model can *remember*.
- **Not retrieval-quality magic.** No single 10× win exists in 2024–2026 LLM-retrieval literature; THOUGHT compounds several 1.5-3× gains across orthogonal dimensions. Expect 2-3× better recall on questions that actually need the typed graph or temporal layer; expect roughly parity on pure-vibe semantic queries.

---

## How to use THOUGHT

This section walks through everything from install to advanced workflows, with explanations of *why* each step exists. If you just want the 30-second version, skip to **[Quickstart](#quickstart)**.

### Install

Three ways. Pick one:

```bash
# Option 1 (recommended) — full bundle, everything you'll use
pip install 'thought-mcp[all]'

# Option 2 — minimal: CLI + MCP server only (no production embeddings)
pip install thought-mcp

# Option 3 — zero install: uvx fetches it on demand
uvx thought-mcp install --client cursor
```

`uvx` is what the MCP client configs use internally, so option 3 is fine if you don't want a global install. After install, verify with:

```bash
thought doctor
```

You should see all green. Any red items will print the exact command to fix them.

### Quickstart

The one-line happy path for connecting THOUGHT to your AI client:

```bash
thought start --client cursor   # or claude-code, cline, continue, windsurf
```

Then **restart your AI client** (close every window, reopen). Done. The next conversation will have the `remember` and `recall` tools available.

If you're not sure which client to pick, run `thought install --detect` first — it shows every supported client's config path and whether it's installed on your machine.

### What `thought start` actually does

Knowing what changed makes troubleshooting easier later:

1. **Creates the SQLite database** at `.thought/thought.db` in your current directory. This is your memory. Back it up like any database.
2. **Writes `CLAUDE.md`** in your current directory. This tells your AI assistant how the memory tools work and when to use them. You can edit it to add project-specific conventions like *"always tag finance decisions with scope=private."*
3. **Writes `thought.toml`** with sensible defaults. Most people never need to touch it.
4. **Updates your AI client's MCP config** to register `thought` as a server. The previous config is backed up to `<config>.thought.bak`.
5. **Starts the MCP server** in the background, listening on `127.0.0.1:8765`.

After your AI client restarts, it discovers `thought` and gains the two new tools.

### Day-to-day usage — letting your AI use the memory

Once wired up, your AI assistant calls `remember` and `recall` *automatically* when the conversation implies it. You usually don't need to invoke them explicitly. Here's what that looks like:

**Telling it to remember something:**

```
You:  We've decided to use Postgres for the v2 release.
AI:   Got it. [calls remember internally]
      Stored. We can recall this with "database plan" or similar later.
```

The AI parses out the facts (v2 → Postgres), links them to the conversation as the source, and writes them to memory. You don't see the tool call.

**Asking about something it remembered:**

```
You:  What's our database plan?
AI:   [calls recall internally]
      Based on what's in memory: for v2 we're using Postgres
      (source_grounded, recorded 3 days ago).
```

The AI also sees how trustworthy each hit is via its `confidence_class` (`source_grounded`, `inferred`, or `hallucination_risk`). If a hit comes back as hallucination_risk, a well-prompted AI will flag the uncertainty rather than presenting it as fact.

**Time travel — what was true at some past date:**

```
You:  What was our pricing on January 15?
AI:   [calls recall with as_of="2026-01-15"]
      On Jan 15: Free / Pro $19 / Team $99.
      Pricing changed on Feb 3 to Free / Pro $25 / Team $129, so the
      current prices are different.
```

This is THOUGHT's most distinctive capability. Existing memory tools structurally can't answer this — they overwrite old facts. THOUGHT keeps both, marks the older one retired, and `as_of` queries see the world as it was.

**Catching contradictions automatically:**

```
You:  Remember that auth is now using sessions.
AI:   Noted. [calls remember]
      Heads up: a previously-stored fact says "auth is JWT."
      Both are now in memory, linked by a CONTRADICTS edge.
      Want me to retire the JWT fact?
```

The AI sees the conflict at write time and prompts you. The JWT fact isn't deleted — it's marked as superseded but still queryable for audit.

**Private vs. shared scope (multi-user / multi-project):**

```
You:  Remember as a private note: I prefer 4-space indentation.
AI:   Stored in your private scope. Won't surface in shared recalls.
```

Use `scope='private'` for personal preferences. Use `scope='shared'` for project decisions everyone on the team should see. A shared recall returns public facts plus the requester's own private facts; never another user's.

### How to nudge the AI when it doesn't reach for memory

If your AI is being lazy and skipping `recall`, try phrases like:

- *"According to memory..."*
- *"What do we have on..."*
- *"As of last week, ..."*
- *"Check memory for..."*
- *"@thought what's our..."* (in clients that support tool-prefix syntax)

To insist on storing something:

- *"Note this down: ..."*
- *"Remember that..."*
- *"Store this for later: ..."*
- *"Add to memory: ..."*

The single highest-leverage thing is the **`CLAUDE.md`** file that `thought init` drops in your project. Edit it to add project-specific conventions. The AI reads it on every session start, so rules like *"always remember architectural decisions, never remember code snippets"* are honored consistently.

### Auto-memory: capture + recall without thinking about it

By default, THOUGHT only writes when the agent calls `remember` and only reads when it calls `recall`. That works, but it puts the burden on the agent to *remember to remember*. In v0.3 you can wire two Claude Code hooks that do it for you:

```bash
# In your project root:
thought hook install --both           # installs into ./.claude/settings.json
# (or --recall / --write to pick one)

# Restart Claude Code.
```

#### Project scope vs. user scope — which one do you want?

`thought hook install` has two scopes, and the right pick depends on whether you want a per-project memory or one memory shared across everything:

```bash
thought hook install --both                   # default: --scope project
thought hook install --both --scope user      # global: every Claude Code session
```

| Scope | Settings file | What it covers | When to pick it |
|---|---|---|---|
| `project` (default) | `./.claude/settings.json` | Only Claude Code sessions launched in *this* directory | You want a project-local KB; you have a `thought.toml` here and you don't want this project's memory mixing with others. |
| `user` | `~/.claude/settings.json` | *Every* Claude Code session this user runs, anywhere | You want a single lifelong KB that follows you across projects, and you're OK with all projects writing into the same memory. |

Honest caveat for `--scope user`: the hooks resolve `thought.toml` (and therefore the SQLite db path) relative to whatever cwd Claude Code is launched in. If you start a session in a directory without a `thought.toml`, the hook still runs — but the ingest pipeline will silently fail to land anything because there's no configured db. The hook returns exit-code 0 so it doesn't break your turn (just emits a one-line warning to Claude Code's MCP log).

The clean "one shared KB across every project" recipe:

```powershell
# 1. Pick an absolute path for your lifelong memory.
$env:THOUGHT_DB_PATH = "$env:USERPROFILE\.thought\global.db"
# (Put that line in your $PROFILE on Windows or ~/.zshrc on macOS.)

# 2. Initialize once at that path.
thought init --db-path "$env:THOUGHT_DB_PATH" --no-claude-md

# 3. Install hooks globally.
thought hook install --both --scope user
```

Now every Claude Code session — in any directory — auto-reads + auto-writes the same global KB. Pair it with `thought stats` from any terminal to see what's accumulated.

To remove hooks later: delete the relevant block from `.claude/settings.json` (project or user), or rerun `thought hook install` with different flags after first removing the existing entries (the installer is additive — it doesn't strip stale entries).

From that point on:

- **Every user turn** → the auto-recall hook runs `thought hook recall` against the prompt, pulls the top-5 hits, and injects them into the next turn's context via Claude Code's `additionalContext` field (10k-char cap). The hook gates on the existing `low_confidence` flag from CRAG, so when there's nothing relevant it stays silent rather than pushing a "no hits" banner into context.
- **Every assistant turn end** → the auto-write hook runs `thought hook write` against the session transcript, picks the last user + last assistant turn, and pipes them into the existing ingest pipeline. Content-sha256 idempotency + Jaccard dedup absorb replays so the KB doesn't bloat.

If you want higher-signal writes at the cost of a small per-turn LLM call, switch to extract mode:

```bash
# Set ANTHROPIC_API_KEY, then edit .claude/settings.json:
#   "command": "thought hook write --mode extract"
# Each turn is now routed through Haiku to distill durable facts before ingest.
```

Auto-write writes default to `scope=private` so multi-user deployments don't accidentally cross-pollinate user-specific facts. The bi-temporal model already handles contradictions: if the user says "I prefer Adidas" one day and "I prefer Nike" the next, the second auto-write supersedes the first via `CONTRADICTS` + `SUPERSEDES` edges — exactly the temporal-validity-window pattern from Zep / Graphiti's 2025 work.

To opt out for a turn, just don't trigger the hooks (e.g. with a `--no-hooks` flag in Claude Code, or by uninstalling the hooks from `.claude/settings.json`).

### Browsing what's in your memory

Two new commands answer the "what does the agent actually know?" question without writing a query:

```bash
# What kinds of facts are stored?
thought topics
#  type           count  examples
#  CONCEPT        89     Acme, Adidas, dessert
#  function       425    personalized_pagerank, recall, remember
#  PERSON         47     Alice, Bob, Dana
#  ORGANIZATION   12     Acme Corp, Beta, OpenAI

# Drill into a type (everything of that kind)
thought browse CONCEPT --limit 20

# Drill into a specific anchor (PPR-ranked neighbourhood)
thought browse Acme --depth 2

# JSON output for scripting
thought topics --json
thought browse Alice --json
```

The agent itself can use the same surface via `mcp__thought__list_topics` and `mcp__thought__browse_topic` — useful in prompts like *"first survey what we already know about authentication, then suggest changes"*.

### Managing your KB (v0.4)

THOUGHT now has first-class lifecycle handles for the KB. Three walkthroughs:

**"I just want to start over."**
```bash
thought db backup ./before.db        # snapshot before
thought db flush --yes               # wipe
# Test some new workflow...
thought db load ./before.db --yes    # roll back if it went poorly
```

**"Trim old facts I don't need anymore."**
```bash
# See what'd be affected first (peek at a date-bounded snapshot):
thought db backup ./old.db --before 2026-01-01 --time-axis valid
thought db inspect ./old.db --schema
# Confirmed it's what we want; actually flush:
thought db flush --before 2026-01-01 --time-axis valid --yes
```

The three time axes:
- `--time-axis created` (default): when the row was *inserted* (transaction time)
- `--time-axis valid` *(usually what you want)*: when the fact became true (world time)
- `--time-axis learned` : when the system *learned* it (alias of created in most setups)

**"Peek inside a backup before loading it."**
```bash
thought db inspect ./snap.db --schema
thought db query ./snap.db "MATCH (p:PERSON) RETURN count(p)"    # coming via the Cypher layer
thought db load ./snap.db --merge --since 2026-01-01 --time-axis valid
```

`db load --merge` is non-destructive — INSERT-OR-IGNORE based on the existing entity identity, so re-running is a no-op. Default `db load` (no `--merge`) replaces the active DB, auto-backing up the current one to `<db>.bak.<timestamp>` first.

### Running with local models — Ollama / LM Studio / any OpenAI-compatible server (v0.4)

Zero API cost, fully offline. Same provider switch covers embeddings *and* the `--mode extract` path for auto-write, so your durable-fact extraction also runs locally.

**Ollama quickstart:**
```bash
ollama serve                                # in another terminal
ollama pull nomic-embed-text                # 768d local embedding
thought ollama-setup --write                # auto-writes thought.toml
thought ingest "Alice owns Acme Corp."
thought recall "alice"
```

**LM Studio quickstart:**
```bash
# In LM Studio's UI: load nomic-embed-text-v1.5 (or any embedding model), start the local server.
thought lmstudio-setup --write
thought ingest "Alice owns Acme Corp."
```

**Any OpenAI-compatible server** (vLLM, llama.cpp `--api`, text-generation-webui, OpenAI proper):
```toml
[embedding]
choice = "openai-compat"
dim = 1024
openai_compat_url = "http://localhost:8000/v1"
openai_compat_model = "your-model-id"
openai_compat_api_key = ""   # blank for local servers; set for OpenAI cloud
```

**Migrate an existing KB to a new embedder:**
```bash
thought reembed --to ollama --dim 768       # re-embeds every entity; entities + edges untouched
```

**Performance honesty.** Local embeddings are 5-20 ms per call vs ~0.2 ms for in-process sentence-transformers. For most users that's fine; the `auto` choice still picks sentence-transformers when installed. Pick local LLMs for privacy / no-API-cost, not for raw speed.

**LLM-extract via a local model.** Once `[llm] provider = "ollama"` (or `lmstudio` / `openai-compat`) is set, the auto-write hook's `--mode extract` path uses it automatically — no Anthropic key required.

### Querying your memory with Cypher (v0.4)

A documented Cypher subset that compiles to parameterised SQL against our typed-edge graph. Read-only in v0.4 — writes still go through `remember`.

**Discover what's queryable first:**
```bash
thought schema
#  Entity types        Relation types
#  PERSON          47  WORKS_AT      32
#  ORGANIZATION    12  PREFERS       18
#  CONCEPT         89  LIVES_IN      24
#  function       425  CALLS        575
```

**Worked examples:**
```bash
# Pattern match
thought query "MATCH (p:PERSON)-[:WORKS_AT]->(o:ORGANIZATION) RETURN p.name, o.name"

# Property + WHERE
thought query "MATCH (p:PERSON {name:'Alice'}) WHERE p.tier = 'hot' RETURN p"

# Time-travel: what did the KB believe yesterday?
thought query --as-of 2026-05-14 "MATCH (a:PERSON)-[:PREFERS]->(c) RETURN a.name, c.name"

# See the SQL we emit:
thought query --explain "MATCH (p:PERSON) RETURN p.name LIMIT 5"
```

**Save a query as a memory construct:**
```bash
thought view save adidas_seattle '
  MATCH (p:PERSON)-[:PREFERS]->(:CONCEPT {name:"Adidas"})
  MATCH (p)-[:LIVES_IN]->(:CITY {name:"Seattle"})
  RETURN p.name'
thought view run adidas_seattle    # re-evaluates against the live KB every time
thought view list
```

This is the *"join disparate facts into a new construct"* primitive — the saved view is pull-evaluated, so any future fact that satisfies it surfaces automatically. Views survive `db flush` (they describe queries, not data).

**Supported v0.4 subset:**

| Cypher feature | Supported? | Notes |
|---|---|---|
| `MATCH (n:Type {prop:value})` | ✅ | Entity match by type + property |
| `(a)-[:REL]->(b)` typed edge | ✅ | Forward + reverse `<-[:REL]-` |
| `WHERE expr AND expr` | ✅ | `=`, `<>`, `<`, `>`, `<=`, `>=`, `CONTAINS`, `STARTS WITH`, `IN` |
| `RETURN id, id.prop, id AS alias` | ✅ | Property projection + JSON objects |
| `LIMIT N`, `SKIP N` | ✅ | |
| `AS_OF "iso-date"` | ✅ | Bi-temporal time-travel |
| `OPTIONAL MATCH` | ❌ | Use multiple MATCH clauses |
| `MERGE` / `CREATE` / `DELETE` / `SET` | ❌ | Writes go through `remember` |
| Variable-length paths `-[:R*1..N]->` | ❌ | Use explicit multi-step patterns |
| `WITH` chaining | ❌ | Save intermediate as a view |
| Aggregations (`count`, `collect`) | ❌ | Coming in v0.5 |

Anything outside the subset raises a clear `UnsupportedCypher` error pointing at this table — no half-working execution.

### Asking in English — `thought ask` (v0.4)

Same query layer, but you type the question and your configured LLM translates it. With Ollama / LM Studio configured, this is zero-API-cost.

```bash
# Setup: just configure [llm] in thought.toml — any provider works.
# (Or use thought ollama-setup --write / lmstudio-setup --write.)

thought ask "who at Acme also prefers Adidas?"
#  fell back to recall: no LLM provider configured
#  (or: actual Cypher translation if [llm] is set)

thought ask "what shoes do people in Seattle prefer?" --explain
#  Cypher: MATCH (p:PERSON)-[:LIVES_IN]->(:CITY {name:"Seattle"})
#          MATCH (p)-[:PREFERS]->(c:CONCEPT) RETURN p.name, c.name
#  ... result table ...

# Harvest a good translation into a durable saved view:
thought ask "people who work at Acme" --save-as acme_people --explain
```

**Honest tradeoffs:**
- Text-to-Cypher SOTA accuracy is ~72%; local models lag by ~10-20 points. v0.4 *validates* the LLM's output against the parser before executing — bad translations fall back to plain `recall(question)` so you always get something.
- `--explain` makes the translation auditable. `--save-as` lets you curate the good ones.
- `--no-fallback` makes scripted workflows fail loudly instead of falling back silently.

### Working with code — the v0.2 capabilities

If you're using THOUGHT for AI-assisted coding (the v0.2 specialisation), there's a separate ingest path that parses your source files via tree-sitter and builds a real function-call graph:

```bash
# Ingest a codebase — entities are functions / classes / methods / modules
thought ingest-code src/

# Ingest with full git history so as_of queries work for code
thought ingest-git . --mode full

# Ask who calls a function (ranked by importance)
thought callers authenticate_user

# Ask what's affected if you change a function
thought impact authenticate_user

# Show the set difference of entities between two commits
thought diff --from v1.0 --to HEAD
```

After ingestion, the AI's regular `recall` tool also gains code awareness. Natural-language questions like *"who calls authenticate_user?"* route through the call-graph machinery automatically.

### Using the CLI directly (no AI involved)

THOUGHT works fine from a terminal without an AI assistant. The CLI is most useful for bulk operations and inspection:

```bash
# Add a single fact
thought ingest "Alice owns Acme Corp. Acme is part of HoldCo."

# Bulk-ingest a directory of Markdown notes
thought ingest --glob 'docs/**/*.md'

# Pipe in from any tool that emits one fact per line
git log --since='1 week ago' --format='%s' | thought ingest --stdin

# Query directly
thought recall "who owns Acme"

# Open an interactive REPL — type queries, type +text to add facts
thought repl

# See what's currently in memory
thought stats

# Soft-delete entities matching a SQL LIKE pattern (audit-logged, not destroyed)
thought forget "kendra%"
```

### Upgrading

When a new version of THOUGHT ships:

```bash
pip install --upgrade thought-mcp     # pull the new package
thought upgrade --all                 # re-pin every MCP client config to the new version
# Restart your AI client to pick up the new server.
```

`thought upgrade --all` solves the *"uvx is still using its cached old version"* problem by re-pinning your MCP client configs to the exact version you just installed (with the required extras included).

### MCP client config paths (manual install if `--detect` can't find your client)

If `thought install --detect` doesn't find a client you have installed, the JSON block to add manually is:

```json
{
  "mcpServers": {
    "thought": {
      "command": "uvx",
      "args": ["--from", "thought-mcp[mcp,sqlite-vec]", "thought", "serve"]
    }
  }
}
```

Per-client locations:

- **Claude Code** — `~/.claude.json` (top-level `mcpServers` block)
- **Cursor** — `~/.cursor/mcp.json`
- **Cline** — VS Code `globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json` (or `~/.cline/cline_mcp_settings.json`)
- **Continue** — `~/.continue/config.json`
- **Windsurf** — `~/.codeium/windsurf/mcp_config.json`

---

## Standing on the shoulders of

THOUGHT exists because of:

- Scott Nichols [**@srnichols**](https://github.com/srnichols) — [OpenBrain](https://github.com/srnichols/OpenBrain) showed that pgvector + MCP is a powerful pattern.
- [**@benclawbot**](https://github.com/benclawbot) — [open-brain](https://github.com/benclawbot/open-brain) provided a clean reference implementation.
- Andrej Karpathy [**@karpathy**](https://github.com/karpathy) — the [LLM-Wiki gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) crystallized why context engineering is the next discipline.

## Frontier techniques incorporated (with credits)

| # | Technique | Source |
|---|---|---|
| 1 | **Contextual Retrieval** — LLM-generated chunk context prepended before embedding | [Anthropic, Sept 2024](https://www.anthropic.com/news/contextual-retrieval) |
| 2 | **HippoRAG 2 — Personalized PageRank memory** | [Gutiérrez et al., NeurIPS 2024](https://arxiv.org/abs/2405.14831) ([repo](https://github.com/OSU-NLP-Group/HippoRAG)) |
| 3 | **Bi-temporal Graphiti** — separate valid-time and transaction-time | [Zep, arXiv 2501.13956](https://arxiv.org/abs/2501.13956) ([repo](https://github.com/getzep/graphiti)) |
| 4 | **Atomic fact decomposition + Jaccard dedup** | [Wanner et al., 2024](https://arxiv.org/abs/2410.16708v1) |
| 5 | **BGE-M3 hybrid embeddings (sparse + dense + ColBERT)** | [BAAI](https://huggingface.co/BAAI/bge-m3) |
| 6 | **Matryoshka two-pass retrieval** | Kusupati et al.; OpenAI text-embedding-3 |
| 7 | **CRAG (Corrective RAG)** — retrieval evaluator + fallback | [Yan et al., 2024](https://arxiv.org/abs/2401.15884) |
| 8 | **MetaRAG epistemic uncertainty** — `confidence_class` per hit | [arXiv 2504.14045](https://arxiv.org/abs/2504.14045) |
| 9 | **Ebbinghaus decay scoring** — strength × `e^(-λ·days)` × recall-boost | [@sachitrafa/YourMemory](https://github.com/sachitrafa/YourMemory) |
| 10 | **Context-engineering budget per query class** | [Karpathy & community, 2025](https://github.com/davidkimai/Context-Engineering) |
| 11 | **Append-only writes (Mem0 2026)** — never UPDATE/DELETE | [Mem0 State of Memory 2026](https://mem0.ai/blog/state-of-ai-agent-memory-2026) |

Built on: [MCP Python SDK](https://github.com/modelcontextprotocol/python-sdk) ([@modelcontextprotocol](https://github.com/modelcontextprotocol)), [sqlite-vec](https://github.com/asg017/sqlite-vec) (Alex Garcia), [pgvector](https://github.com/pgvector/pgvector) (Andrew Kane), [Pydantic](https://github.com/pydantic/pydantic), [Typer](https://github.com/fastapi/typer), [structlog](https://github.com/hynek/structlog). spaCy ([Explosion AI](https://github.com/explosion/spaCy)) is an optional extra.

---

## Architecture

```
   Claude Code · Cursor · Cline · Continue · Windsurf
   ┬───────────────────────────────────────────────────
   │                  (auto-wired by `thought install`)
   ▼
┌──────────────────────────────────────────────────────────────────┐
│         MCP server  (Streamable HTTP · async handlers)           │
│            remember(content, ...)    recall(query, ...)          │
└──────────────────────────┬───────────────────────────────────────┘
                           │
                           ▼
              ┌───────────────────────────┐    LRU recall cache
              │          Router           │    (write-version keyed)
              │  VIBE  FACT  CHANGE  HYBRID│  ↳ rules.yaml (user-editable)
              │  + CRAG confidence eval   │
              └───────────┬───────────────┘
              ┌───────────┼───────────────┐
              ▼           ▼               ▼
      ┌─────────────┐ ┌──────────┐ ┌────────────┐
      │  Vector L.  │ │ Graph L. │ │ Temporal L.│
      │ Matryoshka  │ │ HippoRAG │ │ bi-temporal│
      │  + GraphRAG │ │ PPR (+   │ │  as_of     │
      │  + sqlite-  │ │ scipy.   │ │ (valid +   │
      │  vec MATCH  │ │ sparse + │ │  learned)  │
      │             │ │ local    │ │            │
      │             │ │ push)    │ │            │
      └──────┬──────┘ └────┬─────┘ └─────┬──────┘
             │             │              │
             ▼             ▼              ▼
        ┌───────────────────────────────────────┐
        │      StorageBackend (ABC)             │
        │  SQLite + sqlite-vec  |  pgvector     │
        │  sources · entities · edges · triples │
        │  embeddings · strength_cache · log    │
        │  + bulk source-provenance JOIN        │
        │  + touch-access flush queue           │
        └──────────────┬────────────────────────┘
                       │
                       ▼
         ┌─────────────────────────┐
         │  Consolidation Engine   │  background thread
         │  Ebbinghaus · cold/warm │  + `thought consolidate` CLI
         │  · dedup · audit log    │
         └─────────────────────────┘
```

**Bi-temporal axis:** every entity and edge tracks `(valid_from, valid_until)` (world-time) **and** `(learned_at, unlearned_at)` (transaction-time). "What did we know about X on date Y" and "what was true about X on date Y" are different queries; THOUGHT answers both via `recall(..., as_of=Y, as_of_kind='valid' | 'learned')`.

---

## What makes THOUGHT qualitatively different

These are capabilities **neither OB1 nor the Karpathy wiki structurally supports** — adding them would require rewriting their data layer:

- `recall(query, as_of=<past>)` returns the world as it was, not as it is.
- Every hit carries `confidence_class ∈ {source_grounded, inferred, hallucination_risk}` so the LLM knows what to trust.
- Contradictions are **first-class data** — `CONTRADICTS` typed edge with `detected_at` and `confidence_score`, queryable, not LLM lint notes.
- Multi-user scope is **structural** — `(scope, owner_id)` filter at the storage layer, inherited by every retrieval path.
- All writes are **append-only**. Supersession is a new edge plus a `valid_until` close, never an UPDATE/DELETE — full forensic audit is guaranteed.
- The query router classifies before searching — wrong question never hits the wrong index.

---

## Measured results

These numbers come from `tests/comparison/run.py` — same workload, same deterministic embedder, three architectures. Reproducible: `python -m tests.comparison.run`.

### Recall@10 by query class

| System | VIBE | FACT | CHANGE | HYBRID | overall |
|---|---|---|---|---|---|
| **THOUGHT** | **100%** | **100%** | **68%** | **66%** | **83.5%** |
| OB1 | 100% | 100% | 32% | 100% | 83.0% |
| Karpathy wiki | 100% | 30% | 0% | 100% | 57.5% |

THOUGHT and OB1 tie on overall recall@10, but the **CHANGE column (68% vs 32%) is the headline number** — THOUGHT is 2.1× more accurate on the queries where temporal correctness matters. Karpathy wiki is 0% on temporal: it has no notion of time.

### Temporal correctness on CHANGE queries (strict — penalizes returning contemporary answer for historical query)

| System | rate |
|---|---|
| **THOUGHT** | **68%** |
| OB1 | 32% |
| Karpathy wiki | 0% |

### Contradictions detected at write-time

| System | count |
|---|---|
| **THOUGHT** | 2 |
| OB1 | 0 |
| Karpathy wiki | 0 |

### Ablation — marginal contribution of each frontier technique

(From `python -m tests.comparison.ablation` → [docs/ablation.md](docs/ablation.md))

| Variant | Overall | FACT | CHANGE | HYBRID |
|---|---|---|---|---|
| **Full v0.1 (all Tier A)** | **83.5%** | **100%** | **68%** | **66%** |
| − HippoRAG bidirectional PPR | 66.0% | 30% | 68% | 66% |
| − Bi-temporal edge retirement | 75.0% | 100% | 34% | 66% |
| − Query router (force VIBE) | 65.5% | 30% | 32% | 100% |

Each disabled technique costs THOUGHT real measurable accuracy on the dimension it was added to improve. HippoRAG is worth +70pp on FACT queries; bi-temporal supersession is worth +34pp on CHANGE; the router is worth +35pp overall.

### Performance

THOUGHT went through three performance passes. Each one targeted the bottleneck the previous one exposed.

**v0.2 pass — architectural** (sqlite-vec + scipy.sparse + local push PPR):
1. **sqlite-vec C/SIMD MATCH** for vector ANN (was Python brute-force over the embeddings table).
2. **Binary sign-quantized index mirror** ([Charikar 2002 LSH](https://www.cs.princeton.edu/courses/archive/spring04/cos598B/bib/CharikarEstim.pdf)) for dense embeddings — opt-in via `use_binary_quantization=True`; another ~8-16× over the float path on production models.
3. **`scipy.sparse` vectorised Personalized PageRank** — one CSR matvec per iteration in place of the dict-of-lists power loop.
4. **Andersen-Chung-Lang local push PPR** ([2006](https://www.math.ucsd.edu/~fan/wp/localpartition.pdf)) — ε-approximate PPR touching only `O(1/(ε·(1−α)))` nodes, automatically used when the in-scope KB exceeds 5k entities.

**v0.3 pass — system + UX**:
5. **Batched ingest** — all writes from one `remember()` in one transaction; `remember_many()` batches across N items in one transaction with one `embed_many` call → **2-4× ingest throughput**.
6. **LRU recall cache** keyed by `(write_version, query, ...)` — repeat queries become **µs-scale** (~130,000× over cold-recall p50).
7. **Touch-access batched flush queue** — eliminates the per-hit UPDATE on the recall hot path, batches into one `executemany` periodically.
8. **PPR transition-matrix cache** with `write_version` invalidation — repeat FACT recalls skip the COO→CSR matrix rebuild entirely.
9. **One-query bulk source-provenance fetch** — replaced N+M roundtrips (`edges_to` per hit + `SELECT` per source) with a single JOIN.
10. **WAL tuning** — 64 MiB page cache, 256 MiB mmap, `synchronous=NORMAL`, `busy_timeout=5s`.
11. **Async MCP tool handlers** — `asyncio.to_thread` lets the Streamable HTTP transport service concurrent recalls.

#### Measured progression

Same workload (`Entity{i} owns Company{i%50} Corp.`), same Windows laptop, deterministic embedder, **30 unique queries** (no cache hits) for cold recall measurement:

| KB size | v0.1 recall p50 | v0.2 recall p50 | **v0.3 recall p50** | v0.3 ingest (bulk) | v0.3 cache-hit p50 |
|--------:|----------------:|----------------:|--------------------:|-------------------:|-------------------:|
| 1,000   | 50.3 ms         | 12.3 ms         | **8.5 ms**          | 0.67 s             | **0.7 µs**         |
| 5,000   | 261.6 ms        | 42.5 ms         | **37.8 ms**         | 3.73 s             | 0.7 µs             |
| 10,000  | 521.4 ms        | 61.6 ms         | **93.6 ms**¹        | 7.47 s             | 0.7 µs             |
| 25,000  | ~1,300 ms²      | 171.8 ms        | **186.0 ms**        | 17.18 s            | 0.7 µs             |

¹ v0.3 honest-cold-cache numbers are slightly higher than v0.2's warm-cache numbers at the same KB size — v0.2 measured 20 repeats of the *same* query without a cache, which our profiler flattered. With the v0.3 LRU cache, repeated queries become essentially free (0.7 µs), so the real-world latency curve is the cold-cache row for first-time queries and the cache-hit column for everything else.

² Original v0.1 took >10s per recall at 25k entities; numbers extrapolated from the linear growth pattern.

**Overall vs v0.1**: 5-7× faster cold recalls, ~10,000-130,000× faster cache hits, 2-4× faster ingest (bulk).

**Growth pattern**: 25× more data → ~22× more latency in v0.3 — closer to linear at the high end because the deterministic embedder is itself O(N) on the brute-force fallback; with `sentence-transformers/all-MiniLM-L6-v2` (production embedder, dense vectors), sqlite-vec's index becomes sub-linear and you get the full architectural win.

Also unchanged:
- **Result bound** — `len(hits) ≤ 10` always, verified at every KB size.
- Comparison-harness latency dropped from 7.78 ms → 2.75 ms with full accuracy preserved (FACT 100%, CHANGE 68%).

### Structural capability matrix (none of these are accuracy claims — they're either present or absent)

| Capability | THOUGHT | OB1 | Karpathy wiki |
|---|---|---|---|
| bi-temporal as_of | ✅ | ✗ | ✗ |
| source-grounded confidence class | ✅ | ✗ | ✗ |
| contradiction as typed edge | ✅ | ✗ | ✗ |
| multi-user scope isolation | ✅ | partial (RLS) | ✗ |
| append-only audit log | ✅ | ✗ | ✗ |
| Personalized PageRank retrieval | ✅ | ✗ | ✗ |
| Ebbinghaus decay scoring | ✅ | ✗ | ✗ |
| CRAG-style low-confidence flag | ✅ | ✗ | ✗ |
| Matryoshka 2-pass ANN | ✅ | ✗ | ✗ |
| Anthropic Contextual Retrieval | ✅ | ✗ | ✗ |
| query router (VIBE/FACT/CHANGE) | ✅ | ✗ | ✗ |
| forecasting (TLogic, v0.2) | planned | ✗ | ✗ |

---

## Design rationale

Full architectural discussion in [plan.md](plan.md). Short version of the philosophy:

> A memory system should **know what kind of question is being asked before it searches anything, store facts with their origin and validity, and never lose history in the act of updating.**

The three-layer split (Vector / Graph / Temporal) plus the Router is the architectural answer: each query class is dispatched to the mathematical structure that fits it. The eleven frontier techniques stack 1.5-3× gains on orthogonal axes; together they take the system from "pgvector wrapper" to "memory fabric."

Honest framing: no single 2024-2026 technique gives a 10× recall jump. The "1000× more useful" claim isn't about recall@10; it's about capabilities competitors structurally cannot have (the matrix above) compounded with stacked accuracy gains (the ablation table).

---

## Configuration

Default config (`thought.toml`, written by `thought init`):

```toml
db_path = ".thought/thought.db"

[embedding]
choice = "auto"           # "auto" picks sentence-transformers if installed,
                          # else deterministic (zero-dep test embedder).
                          # Override: "minilm" | "bge-m3" | "openai" | "deterministic"
dim = 384

[server]
host = "127.0.0.1"
port = 8765

[consolidation]
enabled = true
cycle_seconds = 60.0
cold_demotion_days = 30
staleness_days = 30
batch_size = 100

[llm]                     # optional — enables Contextual Retrieval enrichment
enabled = false
provider = "none"         # "anthropic" | "openai" | "ollama"
```

`thought` walks the directory tree (git-style) looking for a `thought.toml`, so you don't need a `--config` flag when running from a subfolder of your project.

Environment overrides: `THOUGHT_DB_PATH`, `THOUGHT_EMBEDDER`.

---

## CLI reference

### Setup / lifecycle

```bash
thought init [--quick] [--embedder auto|minilm|deterministic]
                                  # write config + db + CLAUDE.md
thought install --detect          # show every detected MCP client config path
thought install --client cursor   # wire one client (with backup, idempotent)
thought install --all             # wire every detected client
thought start [--client cursor]   # init-if-needed + install + serve in one command
thought serve [--transport stdio|streamable-http] [--host ... --port ...]
                                  # start MCP server (stdio by default; HTTP optional)
thought doctor                    # deep environment health check
thought --version
```

### Ingest

```bash
thought ingest "Alice owns Acme Corp."
thought ingest --file notes.md
thought ingest --glob 'docs/**/*.md'
cat changelog.txt | thought ingest --stdin

# Per-item scope
thought ingest --file private-notes.md --scope private --owner-id alice
```

### Recall

```bash
thought recall "who owns Acme"
thought recall "what did we say about pricing" --as-of 2026-01-01
thought recall "auth changes" --as-of 2026-01-01 --as-of-kind learned
thought recall "alice" --json     # raw JSON for piping into other tools
```

### Inspect + maintenance

```bash
thought stats                     # entities / edges / sources / contradictions / top accessed
thought repl                      # interactive shell — type queries, +text to remember
thought forget 'kendra%'          # soft-delete by SQL LIKE pattern (audit-logged)
thought consolidate               # run one consolidation cycle
```

### DB lifecycle + local LLMs + query + ask (v0.4)

```bash
# Manage the DB:
thought db size [--json]                           # disk + entity/edge counts
thought db flush [--yes] [--before X] [--since X] [--time-axis valid|learned|created]
thought db backup <file> [--force] [--before X] [--since X] [--time-axis ...]
thought db load <file> [--yes] [--merge] [--before X] [--since X] [--time-axis ...]
thought db inspect <file> [--schema] [--json]      # peek before loading

# Local LLMs:
thought ollama-setup [--host URL] [--model M] [--write]
thought lmstudio-setup [--base-url URL] [--model M] [--write]
thought reembed --to <ollama|lmstudio|minilm|...>  # migrate embedder w/o re-ingesting

# Query:
thought schema [--json]                            # entity + relation types
thought query "<cypher>" [--as-of DATE] [--explain] [--json]
thought view save <name> "<cypher>" [--replace]
thought view list [--json]
thought view show <name>
thought view run <name> [--json]
thought view delete <name>

# Ask in English:
thought ask "<question>" [--explain] [--no-fallback] [--save-as <name>]
```

### Auto-memory + topic browsing (v0.3)

```bash
thought hook install --recall            # auto-recall: UserPromptSubmit → recall + inject
thought hook install --write             # auto-write: Stop → ingest the last turn
thought hook install --both              # both, idempotent
thought hook install --both --scope user # globally for all projects, not just this one

# The hook subcommands themselves (called by Claude Code, not by you):
thought hook recall                      # stdin: hook payload → stdout: additionalContext
thought hook write [--mode raw|extract]  # stdin: hook payload → ingests transcript

# Browsing
thought topics [--scope all|shared|private] [--min-count N] [--json]
                                         # entity-type aggregations with examples
thought browse <name> [--depth 1] [--limit 20] [--json]
                                         # drill into a type or an entity name
```

### Code-vertical commands (v0.2)

```bash
thought ingest-code <path> [--glob '**/*.py'] [--lang python|typescript|auto]
                                  # tree-sitter ingest — functions / classes / methods as entities
thought ingest-git <repo> [--mode snapshot|full] [--paths '*.py,*.ts']
                                  # commit-stamped ingest; --mode full walks every commit
thought callers <name> [--file path] [--limit 10]
                                  # direct callers ranked by HippoRAG PageRank
thought impact  <name> [--file path] [--limit 20]
                                  # transitive impact set: what's affected if you change <name>
thought diff   --from <sha1> --to <sha2> [--file path]
                                  # set diff of entities between two ingested commits
```

### Docker

```bash
docker build -t thought-mcp .
docker run --rm -p 8765:8765 -v thought-data:/data thought-mcp
```

The image runs as a non-root user, exposes `:8765`, persists state at `/data`, and runs `thought serve` as the default command. Once tagged releases are pushed, an upstream image is published at `ghcr.io/<owner>/thought-mcp:<version>` and `:latest`.

---

## Troubleshooting

### `thought install --detect` says my client path doesn't exist

Most clients only create their config file after first launch. Open the client once, then re-run `thought install --client <name>`. The installer will create the file if its parent directory exists.

### `sqlite enable_load_extension` reports `NO` in `thought doctor`

You're on a Python build without loadable-extension support — most commonly Anaconda's bundled Python. Two fixes:

```bash
# Option A — install python.org Python and use that interpreter
# Option B — use pysqlite3-binary
pip install pysqlite3-binary
```

THOUGHT falls back to a pure-Python ANN path automatically, so this is a performance issue, not a correctness one.

### Recall returns `low_confidence: true` with no results

The CRAG evaluator flags this when the top hit's score is below threshold. Common causes:

- Knowledge base is empty or lacks anything relevant. Try `thought stats` to confirm.
- You're using the deterministic embedder (the test default). Set `embedder = "auto"` in `thought.toml` and reinstall sentence-transformers: `pip install 'thought-mcp[embeddings-local]'`.
- Query phrasing doesn't match indexed entity names. Use the `repl` to iterate.

### MCP client can't find the server

```bash
thought doctor                              # confirm MCP SDK + vec extension load
thought serve --skip-precheck               # try without the precheck
# Then inspect the client's MCP logs — most surface "failed to start" with a path
```

If `uvx thought-mcp serve` is in your `mcpServers` config and `uvx` isn't on PATH for the GUI client, switch the `command` to an absolute path to the `thought` entrypoint (`which thought` / `where thought`).

### First `recall` after startup is slow

The first call lazy-loads the embedder (downloads `all-MiniLM-L6-v2`, ~80 MB, on first run). After that it's warm. Use `thought init` (without `--quick`) to pre-download.

### Windows console garbles output

The CLI reconfigures stdout/stderr to UTF-8 at startup. If you're piping through a tool that still uses cp1252, set `PYTHONIOENCODING=utf-8` in your shell.

---

## Testing & development

```bash
pytest tests/unit -q                 # 56 unit tests
pytest tests/perf -m perf            # 4 performance benchmarks
python -m tests.comparison.run       # rebuilds docs/comparison.md
python -m tests.comparison.ablation  # rebuilds docs/ablation.md
```

Coverage target: 85% on `src/thought`. CI matrix runs Python 3.11/3.12/3.13 × Ubuntu/macOS/Windows on every push (see `.github/workflows/ci.yml`). Tagging `v*` triggers `release.yml` (PyPI trusted publishing) and `docker.yml` (multi-arch GHCR image).

---

## Roadmap

**Current (shipped)** — 11 Tier A frontier techniques (Contextual Retrieval, HippoRAG PageRank, bi-temporal Graphiti, atomic-fact triples + Jaccard dedup, BGE-M3 hybrid embeddings, Matryoshka 2-pass retrieval, CRAG evaluator, MetaRAG confidence class, Ebbinghaus decay, context-engineering budget per query class, append-only writes); comparison + ablation harnesses; two MCP tools; multi-platform CLI with auto-install for five MCP clients; LRU recall cache + PPR matrix cache + sqlite-vec + scipy.sparse PageRank + local push PPR + batched ingest (the three perf passes described above); Docker + PyPI release workflows.

**v0.2 fast-follow** — RAPTOR hierarchical summary trees at WARM→COLD demotion ([Sarthi et al., ICLR 2024](https://arxiv.org/abs/2401.18059)); sleep-time compute pre-computation ([Letta + UCB, April 2025](https://arxiv.org/abs/2504.13171)); TLogic temporal-rule forecasting ([arXiv 2112.08025](https://arxiv.org/abs/2112.08025)); Reflexion-style self-edit ([Shinn et al., NeurIPS 2023](https://arxiv.org/abs/2303.11366)); multi-hop deep recall (IRCoT/PRISM); introspective `thought audit` ([transformer-circuits, 2025](https://transformer-circuits.pub/2025/introspection/index.html)).

**v0.3+** — RankZephyr local reranker, PIKE-RAG domain rationale extraction, DSPy-learned retrieval policies, real Postgres backend, REST API alongside MCP, encryption-at-rest (SQLCipher / pgcrypto), tenant isolation, OpenTelemetry traces/metrics.

---

## License

MIT — see [LICENSE](LICENSE).

---

## References

- OpenBrain — https://github.com/srnichols/OpenBrain · https://github.com/benclawbot/open-brain
- Karpathy LLM-Wiki gist — https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
- MCP Specification — https://modelcontextprotocol.io/specification/2025-11-25
- HippoRAG — https://arxiv.org/abs/2405.14831 (NeurIPS 2024)
- Zep / Graphiti — https://arxiv.org/abs/2501.13956
- Anthropic Contextual Retrieval — https://www.anthropic.com/news/contextual-retrieval
- CRAG — https://arxiv.org/abs/2401.15884
- LightRAG — https://arxiv.org/abs/2410.05779
- BGE-M3 — https://huggingface.co/BAAI/bge-m3
- RAPTOR — https://arxiv.org/abs/2401.18059
- Matryoshka Representation Learning — https://huggingface.co/blog/matryoshka
- TLogic — https://arxiv.org/abs/2112.08025
- Mem0 State of Memory 2026 — https://mem0.ai/blog/state-of-ai-agent-memory-2026
- sqlite-vec — https://github.com/asg017/sqlite-vec
- pgvector — https://github.com/pgvector/pgvector
