# supamem

> Qdrant-backed dual-memory tooling for AI coding agents (Claude Code, Cursor, OpenCode).
> Provides a CLI to bootstrap, index, run an MCP server, install per-client hooks, and run
> retrieval evals — all backed by a locked tuned-hybrid (BM25 + MiniLM) Qdrant pipeline.
> Extracted from the SoftChat (https://app.softchat.ru) production memory stack so any
> team can run on the same battle-tested foundation.

`supamem` packages a hybrid sparse+dense semantic memory layer (Qdrant), a Model Context
Protocol server, and per-client session/edit hooks as a single Python distribution. Once
installed, AI coding assistants gain persistent semantic memory across projects.

## Core docs

- [README](README.md): Hero, quickstart, prerequisites, install matrix, CLI reference, client wiring
- [MIGRATION](MIGRATION.md): Migrating from an in-tree `dev_memory` setup to supamem
- [LICENSE](LICENSE): MIT

## Distribution

- [PyPI](https://pypi.org/project/supamem/): Released via Trusted Publisher OIDC; `pip install supamem` or `uv tool install supamem`. Current version: v0.3.0a7 (alpha pre-release; v0.2.0 / v0.2.1 already shipped). Optional extras: `pip install supamem[eval]` (RAGAS triad + pytrec_eval), `pip install supamem[peers-mem0]` (mem0 peer adapter), `pip install supamem[ast-chunker]` (v0.3.0a7+; tree-sitter AST chunker for Python — opt-in, defaults unchanged).
- [CHANGELOG](CHANGELOG.md): Per-version release notes (v0.1.0 initial, v0.1.1 update-check + AGENTS.md, v0.1.2 project-tunable regress baselines, v0.1.3 dual_memory_write + qdrant aliases, v0.1.4 SessionStart banner + supamem live dashboard, v0.1.5 SessionStart wired by installer, v0.2.0 mcp.caps + multi-project install + agent-discipline hooks, v0.2.1 user-visible banner + drift signal, v0.2.2a1 transcript chunker plugin, v0.2.3a1 coding-path classifier + where filter, v0.2.4a1 code-aware reranker + eager ML fetch + doctor-driven repair self-heal, v0.2.5a1 subagent reachability auto-patcher + unpatch-agents + doctor reachability panel, v0.3.0a1 per-source temporal validity + transcript-only recency decay + temporal doctor panel, v0.3.0a2 bench harness (LongMemEval_S + RAGAS triad) + [eval] extra + two-tier judge + doctor Eval-bench panel, v0.3.0a3 filtered_dense retrieval backend + path_prefix where magic key + valid_to: "now" no-op alias + [retrieval.filtered_dense] preview_chars config + anti-identity-tier lock + Filtered-dense doctor panel + mcp_server query Field min_length=1, v0.3.0a4 bench harness where-filter pass — scoped + unscoped passes at runner.py:428, scoped-only gate methodology, sibling-key result envelope, bench-only payload.session_id, ADR-0001, v0.3.0a5 coderag eval suite — new `supamem.eval` plugin entry-point group, two-repo deterministic haystack pinned to commit-SHAs, three-column metric reporting, mem0 peer adapter (separate Qdrant collection), bench-only payload.repo + payload.axis, LongMemEval demoted to on-demand-only, ADR-0002, REQUIREMENTS.md PUB-05/EVAL-05 edits, v0.3.0a6 CodeRAG live numbers + mem0 head-to-head — auto-queries-from-manifest wiring in `--full`, `corpus.ensure_populated_manifest` lazy build-on-call, ADR-0002 §7 rewritten with live three-run variance-gated floors (offline `< 0.005 ms` and `1.000` cells removed; latency p95 ceiling moved 500 ms → 5000 ms one-shot per D-LAT-01), ADR-0002 §8 NEW "Mem0 peer comparison" with paired-bootstrap delta + 95% CI per axis × column × metric, `metrics.paired_bootstrap_delta` pure-stdlib helper, schema-compat `peers: {}` + `comparisons: {}` always-present empty dicts on non-`--peer` envelopes, Phase 14+15 byte-identical regression locks preserved unchanged, v0.3.0a7 AST chunker + HyDE retrieval (opt-in plugins) + ADR-0002 §9 — new `tree_sitter_code` chunker plugin (opt-in via `pip install supamem[ast-chunker]`; Python only; tree-sitter parser; falls back to `markdown_header` on parse error per D-AST-03), new `tuned_hybrid_hyde` retrieval plugin (opt-in; localhost Ollama-backed query rewriter with locked HyDE prompt per D-HYDE-01, `keep_alive=-1` warm-pool retention, 600 ms timeout + 1 retry, falls back to original query on failure per D-HYDE-03, D-07 localhost guard inherited from `eval.judge`), chunk-level recall metric (`recall_at_*_chunk` siblings) + bench-only `payload.chunk_id`, `--reingest-coderag` flag on `supamem eval --suite coderag` (default OFF; rebuilds the bench collection via `supamem.chunker` entry-point keyed on `cfg.chunker`), Ollama warm-pool doctor panel (read-only; fires only when retrieval=tuned_hybrid_hyde), ADR-0002 §9 "Phase 17 uplift comparison" with three sibling sub-tables (default vs ast_on / hyde_on / ast_plus_hyde), HyDE violates D-LAT-01 hard ceiling on 4/5 cells (max p95 6069 ms) → opt-in-only verdict (defaults unchanged in 0.3.x; default-flip gated on v0.4), Phase 14+15+16 byte-identical regression locks preserved unchanged)

## Translations

- [README (English, canonical)](README.md): The one PyPI renders
- [README zh-CN](README.zh-CN.md): Simplified Chinese
- [README es](README.es.md): Spanish
- [README ja](README.ja.md): Japanese
- [README ru](README.ru.md): Russian (project author native)

## CLI commands

- [supamem init](README.md#cli-surface): Greenfield bootstrap — probes Qdrant, creates collection, writes `.supamem/config.toml`
- [supamem install](README.md#wiring-into-your-client): Patch a client config (claude-code, cursor, opencode) — atomic with backup. `--scope project` (default, per-workspace `.mcp.json` / `.cursor/mcp.json`) or `--scope user` (legacy global). `--enforce-search` (claude-code only) registers the opt-in PreToolUse edit-gate. v0.2.4a1+ proactively downloads all ML prerequisites (MiniLM ~90 MB, BM25 ~10 MB, mxbai-rerank-base-v2 ~1 GB) with `rich.progress`. Pass `--skip-models` / `--no-skip-models` to skip eager ML model download (air-gapped first-run; backfill via `supamem repair`). v0.2.5a1+ also auto-patches `~/.claude/agents/` and `<project>/.claude/agents/` to append `mcp__supamem__*` to restrictive `tools:` whitelists; opt out with `--skip-patch-agents`.
- [supamem repair](README.md#cli-surface): doctor-driven self-heal — re-fetches missing/partial reranker model, re-syncs `share/`, repairs managed CLAUDE.md/AGENTS.md blocks, restores client config, re-applies subagent reachability patches (idempotent). Composes with `supamem doctor` (diagnose) as the canonical UX entry points. v0.2.4a1+ extends the v0.2.0 migrate-from-global-install path; original behavior preserved. Supports `--skip-patch-agents` (v0.2.5a1+).
- [supamem index](README.md#cli-surface): Embed dev memories into Qdrant using the locked tuned-hybrid pipeline (D-25). v0.2.2a1+ adds `--transcripts` (bare → uses `[supamem.transcript] default_root`, defaults to `~/.claude/projects/`) and `--transcripts <path>` (explicit) to ingest Claude Code session JSONL as Q+A drawer chunks; `--transcripts-only` skips the default project corpus; `--since Nd|Nh|0` filters transcripts by mtime (default 180d, `0` disables)
- [supamem mcp-server](README.md#cli-surface): Run the MCP server over stdio (default) or HTTP
- [supamem hook](README.md#cli-surface): Per-client session/edit hooks called by the client itself
- [supamem doctor](README.md#cli-surface): Probe Qdrant, resolve config chain, report version drift, surface update-check status
- [supamem stats](README.md#cli-surface): Welford schema-v2 usage counters
- [supamem live](README.md#-see-it-work--supamem-live): Real-time terminal dashboard tailing the audit JSONL — visibility into PreToolUse hook injections (v0.1.4+)
- [supamem migrate](README.md#cli-surface): Brownfield migration from an existing `dev_memory` collection
- [supamem eval](README.md#cli-surface): Run the bench harness — `--suite goldens` (bundled regression baseline, v0.1.2+) or `--suite longmemeval_s` (lazy-fetched LongMemEval_S, v0.3.0a2+, scoped + unscoped passes since v0.3.0a4 — gate decision is scoped-only, see [docs/adr/0001-scoped-only-bench-gate.md](docs/adr/0001-scoped-only-bench-gate.md); DEMOTED to on-demand-only in v0.3.0a5 per ADR-0002) or `--suite longmemeval_scoped_smoke` (bundled fixture, v0.3.0a4+, ≤5 questions, no lazy-fetch) or **`--suite coderag [--full] [--out PATH] [--peer mem0] [--reingest-coderag]`** (v0.3.0a5+, code-shaped retrieval suite — new Phase 13 ship gate; two-repo deterministic haystack pinned to commit-SHAs, two axes `code_fact` + `decision_rationale`, three-column reporting, optional mem0 peer row; v0.3.0a7+ `--reingest-coderag` rebuilds the bench collection via the `supamem.chunker` entry-point keyed on `cfg.chunker` — required to exercise `tree_sitter_code`; see [docs/adr/0002-coderag-eval-philosophy.md](docs/adr/0002-coderag-eval-philosophy.md) and the [Phase 17 uplift §9](docs/adr/0002-coderag-eval-philosophy.md#9-phase-17-uplift-comparison)); MTEB-style JSON envelope to `~/.supamem/eval/<utc-iso>.json`; project-tunable goldens baselines via `[supamem.eval]` config; legacy `--regress` mode preserved
- [supamem uninstall](README.md#cli-surface): Reverse `supamem install` cleanly
- [supamem unpatch-agents](README.md#cli-surface): Reverse subagent reachability patches recorded in `agent_patches.json`. Skips files the user has edited since (frontmatter SHA-256 match) with a per-file warning. Run BEFORE `pip uninstall supamem` — there is no portable pip/uv/pipx uninstall hook (v0.2.5a1+).

## MCP tools (v0.1.3)

- `dual_memory_search(query, top_k=5, where=None)`: Hybrid (BM25+dense, RRF) retrieval over the project's Qdrant collection. Top-k, latency, summary. Response shape (v0.2.0+): each `Chunk` carries `text` (full intact payload) and `preview` (display-only excerpt capped at `mcp.caps.max_preview_chars`); top-level `SearchResult.clamped_to` is set when the server clamped requested `top_k`. v0.2.3a1+ accepts `where: dict[str, str | list[str]] | None` — optional Qdrant payload filter; AND across keys, OR within list values (`MatchAny`); single string → exact `MatchValue`. v1 documents `room` as the only key (one of: `backend`, `frontend`, `tests`, `docs`, `scripts`, `config`, `migrations`, `types`, or `null`). Examples: `where={"room": "backend"}` or `where={"room": ["backend", "tests"]}`. Unknown keys are passed through to Qdrant (forward-compat for Phase 9/11)
- `dual_memory_write`: Persist agent-authored memory — writes Markdown to `<project>/.claude/insights/_agent/<slug>.md` with YAML frontmatter, immediately upserts into Qdrant (wait=True), idempotent on topic via UUIDv5
- `qdrant_find(query, top_k=5, where=None)` (alias of dual_memory_search): Backward-compat for users coming from upstream `mcp-server-qdrant`. Inherits the same caps, response shape, AND the v0.2.3a1+ `where` payload filter byte-identically (D-17 alias parity)
- `qdrant_store` (alias of dual_memory_write): Same compat shim. Disable both aliases with `SUPAMEM_QDRANT_ALIASES=0`

## MCP response caps (v0.2.0+)

Server-side hard caps on every retrieval response. Configured under the `[supamem.mcp.caps]` TOML table; surfaced in `supamem doctor` with config-source provenance.

- `mcp.caps.max_top_k` (default: 25) — server silently clamps requested `top_k` to this value; `SearchResult.clamped_to` is populated when clamping fires so callers can detect it
- `mcp.caps.max_query_chars` (default: 250) — Pydantic `Field(max_length=...)` baked into the tool schema at registration time; queries longer than the cap are rejected with a structured MCP validation error (no silent truncation, no stdout pollution)
- `mcp.caps.max_preview_chars` (default: 200) — display preview cap on each `Chunk.preview`; the full canonical payload in `Chunk.text` is never truncated

## Visibility surfaces (v0.1.4+)

- `supamem live` CLI: Rich-Live terminal dashboard, real-time tail of the audit JSONL with rotation/resize/Ctrl-C handling and pipe-safe plain-JSONL fallback when stdout isn't a TTY
- `supamem hook session-start`: cross-client SessionStart banner injected via `additionalContext` (Claude Code) + `additional_context` (Cursor/OpenCode forks). Auto-detects calling client from `CLAUDECODE`/`OPENCODE`/`CURSOR_AGENT` env vars. Format: `🧠 supamem v<x.y.z> · <collection> · <N> chunks · audit <path>`. Fail-soft per hook discipline — never blocks session start

## MCP project-root resolution (v0.2.0+)

stdio MCP servers are often launched by hosts (Cursor, IDE wrappers) from a cwd that is NOT the workspace, which silently drops supamem to the default collection (`dev_memory_tuned_hybrid`) and produces Qdrant 404s when callers query the project's actual collection.

- `SUPAMEM_PROJECT_ROOT` (env var) — preferred, explicit. Auto-injected by `supamem install --scope project` into `<repo>/.mcp.json` (Claude Code) and `<repo>/.cursor/mcp.json` (Cursor) so the subprocess locates `.supamem/config.toml` regardless of cwd
- Parent-walk fallback — when the env var is unset, supamem walks parents from `Path.cwd()` looking for `.supamem/config.toml` or `pyproject.toml [tool.supamem]`. Stops at filesystem root or `$HOME` to avoid scanning above the user's home
- Stderr fallthrough warning — when neither the env var nor the parent-walk locate a project marker AND the resolved collection is still the shipped default, `supamem mcp-server --transport stdio` emits a one-line stderr warning (cwd inspected, env var presence — never values, fix command). Stdout stays JSON-RPC clean
- Verify with `supamem doctor` from the repo root: the resolved collection must match what the MCP client returns from `dual_memory_search`

## Multi-project install + agent-discipline hooks (v0.2.0+)

- **Per-workspace install is the default** as of v0.2.0. `supamem install --client claude-code` writes to `<repo>/.mcp.json` (Anthropic project-scope MCP file, takes precedence over user-scope per docs). `supamem install --client cursor` writes to `<repo>/.cursor/mcp.json` (Cursor per-workspace, project-level wins on conflict). Use `--scope user` to keep legacy global writes (last install wins on multi-project machines).
- **`supamem repair`** is the migration verb for users on legacy global installs. Strips supamem from BOTH project AND user scopes (defensive uninstall) then re-installs at project scope from current cwd. Idempotent. Auto-detects clients when `--client` omitted. Forwards `--enforce-search`.
- **Claude Code edit-gate hook** (`supamem hook claude-code-gate`, opt-in via `supamem install --enforce-search`). Registers a PreToolUse `Edit|Write|MultiEdit` matcher that DENIES the tool call when no `mcp__supamem__dual_memory_search` (or `qdrant_find` alias) is logged in the session transcript since the last user turn (strategy A — strict per-turn). Reverse-scans the transcript JSONL with a 256 KB byte cap; emits Anthropic's `permissionDecision: deny` JSON contract on stdout. Override per-session with `SUPAMEM_GATE_DISABLE=1`.
- **Cursor `beforeSubmitPrompt` advisory hook** (`supamem hook cursor-advisory`). Cursor 1.7's hooks API has no fail-closed pre-edit event, so this is advisory-only: when the user's prompt looks edit-bound (regex over `fix|refactor|rename|implement|add|...`), emit `{"continue": true, "permission": "allow", "agentMessage": "..."}` reminding the agent to call `dual_memory_search` first. Override with `SUPAMEM_ADVISORY_DISABLE=1`. Auto-installed by Cursor installer alongside the existing sessionStart snapshot.

## SessionStart banner (v0.2.0 enriched)

Format: `🧠 supamem ✓ v0.2.0 · <collection> · <N> chunks · audit <path>` (additional `· update v0.X.Y available` segment when `update_check` cache reports a newer release).

- Health flag — single character right after `supamem`: `✓` healthy / `⚠` qdrant unreachable OR resolved collection is still the shipped default (legacy global-install / wrong-cwd failure mode)
- Update hint — cache-only read of `update_check.json`; never blocks session-open on network. Healing is NEVER automatic — the banner only signals; run `supamem repair` to act
- Suppress entirely with `SUPAMEM_BANNER_DISABLE=1`
- Suppress ONLY the user-visible terminal line (keep injecting context for the model) with `SUPAMEM_BANNER_QUIET=1`. v0.2.1+ emits `systemMessage` (Claude Code) and `user_message` (Cursor forward-compat) alongside `additionalContext` — Claude Code renders `systemMessage` as the `SessionStart:startup says: <line>` row in the terminal. Health flag `⚠` now also fires on per-client install drift detected by `supamem doctor` (managed-block version != running CLI version).

## Update-check (v0.1.1+)

- Daemon-thread GitHub Releases probe; 24h TTL cache at `platformdirs.user_cache_dir("supamem")/update_check.json`
- Stderr footer on next invocation when newer release available; never blocks
- Suppress with `SUPAMEM_NO_UPDATE_CHECK=1`, `NO_UPDATE_NOTIFIER=1`, or `CI=1`
- Visible in `supamem doctor` (current vs cached-latest, last-check timestamp, suppression env)

## Transcript ingestion (v0.2.2a1+)

- `supamem index --transcripts` (bare flag) ingests Claude Code session JSONL from `~/.claude/projects/` (or `[supamem.transcript] default_root`) as Q+A drawer chunks via the new `transcript` chunker entry-point. Pass an explicit path with `supamem index --transcripts /path/to/sessions/`. Mixed corpora dispatch per-suffix: `*.md` → `markdown_header`, `*.jsonl` → `transcript`.
- `--transcripts-only` skips the default project corpus and indexes only transcripts in the same run.
- `--since 30d` / `--since 12h` / `--since 0` — mtime filter on transcript JSONL; `0` disables. Defaults to `[supamem.transcript] since_days = 180`.
- New `[supamem.transcript]` config table (six keys; surfaced by `supamem doctor` with `[source: ...]` provenance):
  - `default_root` (str, default `~/.claude/projects/`)
  - `since_days` (int, default `180`)
  - `tool_payload_max_chars` (int, default `2000`) — tool-use payloads above this size are elided to a synthesis stub
  - `chunk_soft_max_tokens` (int, default `600`)
  - `include_paths_glob` (list[str], default `[]`)
  - `exclude_paths_glob` (list[str], default `[]`) — hand-exclude sensitive sessions before indexing
- New plugin entry-point: `transcript = "supamem.indexer.transcript.chunker:chunk_transcript"` under the `supamem.chunker` group. Signature: `chunk_transcript(text: str, *, source_path: Path, **kwargs) -> list[ChunkRecord]`.
- Per-message-uuid dedupe in the manifest (`__transcripts__` key) — re-running on an unchanged corpus reports `0 new, 0 changed`. Editing one message purges-then-reinserts only that chunk.
- ⚠ Transcripts may contain secrets — review your `~/.cache/supamem` Qdrant collection before sharing (no v1 redaction; see CHANGELOG security note for v0.2.2a1).

## Coding-path classifier (v0.2.3a1+)

Every indexed chunk gains a `payload.room` facet (string or `null`) via **exact path-component equality** — `set(Path(file_path).parts) ∩ set(keywords)` — never substring matching. A file at `data/chest_xray/img.png` is NEVER classified as `tests`. The `where` retrieval filter on `dual_memory_search` and `qdrant_find` (see MCP tools section) consumes this facet.

- `payload.room` — single string or JSON `null`. ALWAYS present on every point (uniform schema, D-06). v1 values: `backend`, `frontend`, `tests`, `docs`, `scripts`, `config`, `migrations`, `types`, or `null`.
- Priority is encoded by config order (first-match-wins, D-01a). Putting `tests` before `backend` makes `tests/backend/api_test.py` classify as `tests`.
- Hash-drift sweep — `manifest.classifier_hash = sha256(json.dumps(rooms, sort_keys=False))` captures both content AND priority order. On every `supamem index` run, if the stored hash differs from the current config hash, supamem scrolls the collection in batches and `client.set_payload({"room": new_room}, points=[ids], wait=True)` per-room — pure metadata update, **zero re-embedding cost** (D-08, D-09).
- Pre-v0.2.3 collections auto-migrate on first post-upgrade index invocation (missing `__classifier_hash__` → drift from null → one-time sweep).
- `supamem doctor` surfaces the active rooms map with `[source: ...]` provenance, the stored `classifier_hash`, and a per-room histogram (including a `null` bucket for unmatched chunks) — D-07, D-16.

`[supamem.classifier.rooms]` config table (default keyword map, priority order locked):

```toml
[supamem.classifier.rooms]
tests      = ["tests", "test", "__tests__", "spec", "specs"]
types      = ["types", "@types", "typings"]
migrations = ["migrations", "alembic", "schema"]
config     = ["config", "configs", ".github", "ci"]
scripts    = ["scripts", "bin", "tools"]
docs       = ["docs", "documentation"]
frontend   = ["frontend", "web", "client", "ui", "components", "pages"]
backend    = ["src", "backend", "api", "server", "lib"]
```

User TOML at `[supamem.classifier.rooms]` REPLACES the defaults dict (leaf-replace, not merge — matches the `transcript_*` precedent). Reordering rooms in the user config trips the sweep gate because `sort_keys=False` is intentional. Transcript chunks (chunker == "transcript") classify to `room = null` by construction — non-coding paths; filter them via the existing `payload.chunker` key instead.

## Code-aware reranker (v0.2.4a1+)

Cross-encoder `mixedbread-ai/mxbai-rerank-base-v2` (Apache-2.0, ~1 GB) plugged into `tuned_hybrid` retrieval as the **new default** (`retrieval.reranker = "mxbai_v2"`). Setting `retrieval.reranker = "off"` restores pre-Phase-8 byte-identical behavior. When reranker is on: PREFETCH_LIMIT widens to 50 per arm; T-4 recency multiplier is skipped; T-5 dedup + T-8 token budget run AFTER rerank.

- New `supamem.reranker` plugin entry-point group — third parties register custom rerankers without forking. Registered: `mxbai_v2 = supamem.rerankers.mxbai_v2:MxbaiV2Reranker`. Plugin signature: `rerank(query: str, candidates: list[RetrievedChunk]) -> list[RetrievedChunk]`. Lazy model load on first `rerank()` call mirrors `embedders/_ensure()`.
- `RetrievedChunk` gains optional `rerank_score: float | None` field for telemetry; primary `score` carries the rerank score when reranker is on.
- `supamem doctor` `Reranker` panel: name, model_id, cache path (`platformdirs.user_cache_dir("supamem")/models/<model_id>/`), on-disk size + partial-download detection, last-load latency, last-100-query rerank p50/p95, detected device (cuda/mps/cpu).
- `[retrieval.reranker]` config table:
  - `name = "mxbai_v2" | "off"` (default `"mxbai_v2"`)
  - `top_n = 50` — rerank pool size; clamps to fused-candidate count
  - `prefetch_per_arm = 50` — widened from default 20 when reranker on
  - `batch_size = 16`
  - `model_id = "mixedbread-ai/mxbai-rerank-base-v2"`
- New env vars: `SUPAMEM_CACHE_DIR` (override the platformdirs cache root for tests/CI); `HF_HUB_OFFLINE=1` + `TRANSFORMERS_OFFLINE=1` (respected by `prepare()` — refuse network probes); `SUPAMEM_INTEGRATION_RERANKER=1` (opt-in integration test gate).
- New deps: `mxbai-rerank>=0.1.6,<0.2`, `huggingface_hub>=0.24`, `filelock>=3.13` (pulls `transformers>=4.49`, `torch>=2.0`, `accelerate>=1.5` transitively).

## Subagent reachability (v0.2.5a1+)

Subagents (Claude Code agent definitions under `~/.claude/agents/*.md` and `<project>/.claude/agents/*.md`) inherit ONLY the tools listed in their frontmatter `tools:` whitelist. Plugins like GSD, superpowers, and hookify ship agents with restrictive whitelists (e.g. `Read, Edit, mcp__context7__*`) that exclude `mcp__supamem__*`, so the parent session having the supamem MCP server connected is irrelevant — code-touching subagents silently fail dual-memory lookups.

- **Auto-patcher** runs on `supamem install` and `supamem repair`. Idempotently appends `mcp__supamem__*` to any restrictive `tools:` whitelist that doesn't already cover supamem (broad `mcp__*`, `mcp__supamem__*`, or any specific `mcp__supamem__<tool>` literal counts as covered). Files with a missing or empty `tools:` line have full-inheritance per Claude Code semantics and are left untouched. Symlinked files are skipped with a warning to avoid polluting upstream repos. Style preserved (CSV vs YAML list) via `ruamel.yaml` round-trip.
- **`--skip-patch-agents`** opt-out flag on `install` / `init` / `repair`.
- **`supamem unpatch-agents`** reverses every recorded patch. Skips files the user has edited since the patch (frontmatter SHA-256 newline-normalized match) and emits a per-file warning naming them. Exit 0 even when nothing to restore.
- **Backup manifest** at `platformdirs.user_cache_dir("supamem") / agent_patches.json` (override the cache root with `SUPAMEM_CACHE_DIR`). Single rolling JSON, schema_version=1, FileLock-protected concurrent writes, atomic temp-and-rename. Per-entry: relative path, original frontmatter SHA, patched frontmatter SHA, original `tools:` value verbatim, timestamp, supamem version.
- **`supamem doctor` Subagent reachability panel** (between Reranker and Installed clients) lists per-agent status grouped by `[global]` and `[project]` scope: `patched (added mcp__supamem__*)`, `OK (already covered)`, `OK (full inheritance)`, `skipped: <reason>`, or `needs patching (run supamem repair)`. Renders the manifest path + an `unpatch-agents` reminder when a manifest exists. Read-only by construction; never flips the doctor exit code.
- **Uninstall contract** is a documented two-step (no portable pip/uv/pipx uninstall hook in 2026):

  ```bash
  supamem unpatch-agents      # restore agent whitelists first
  pip uninstall supamem       # then remove the package
  ```

  `supamem doctor` displays the manifest path and reminder so this flow is discoverable without docs.

## Per-source temporal validity (v0.3.0a1+)

Every indexed chunk carries a binary `valid_to` field: `null` ⇒ live; `≤ now()` ⇒ superseded (filtered out of every retrieval). Re-indexing a CHANGED file atomically scrolls existing chunks for that path, `set_payload(valid_to=now())`s them, then upserts new content-hash-keyed chunks with `valid_to=null`. Old and new chunks coexist in Qdrant; auto-GC at end of `supamem index` deletes superseded chunks past `retention_days`. The retrieval-time filter is constructed at a single site (`retrieval/filters.py:build_qdrant_filter`) and inherited by every backend — `tuned_hybrid` (both Prefetch arms), `dense`, `bm25`, `qdrant_find`, `dual_memory_search`. Filter uses `IsEmptyCondition` on `valid_to` (NOT `IsNullCondition` — Qdrant#5342: `IsNull` does not match missing fields).

### Config keys

`[retrieval.temporal]` — universal binary `valid_to` invalidation:

- `retention_days` (int, default `90`) — auto-GC sweeps superseded chunks older than this at the end of `supamem index`. Set to `0` to disable auto-GC entirely (kept-forever escape hatch for compliance / audit collections).

`[retrieval.recency.per_source.transcript]` — transcript-only opt-in decay (does NOT affect code / ADR / doc / null-room rankings):

- `enabled` (bool, default `false`) — gates the post-rerank decay multiplier.
- `half_life_days` (float, default `14.0`) — half-life of the multiplier; e.g. age=14d ⇒ multiplier=0.85 with α=0.7.
- `alpha` (float, default `0.7`) — floor of the multiplier; range `[0.0, 1.0]`. Validators reject out-of-range at boot.

Worked example with locked defaults: age 0d → 1.000; 7d → 0.924; 14d → 0.850; 28d → 0.775; ∞ → 0.700 (floor).

### Doctor panel

`supamem doctor` adds a "Temporal validity" panel between Reranker and Subagent reachability:

- live / superseded / awaiting_gc / future_dated counts
- per-source breakdown (markdown_header / transcript / null)
- oldest + newest `valid_from` across the collection
- `retention_days` provenance line
- `validity_migration` provenance line (when manifest gate has tripped)

Read-only by construction; never flips the doctor exit code.

### Migration

First post-upgrade `supamem index` back-fills `valid_to=null` on legacy points (gated by manifest `__validity_migration__` reserved key, idempotent on subsequent runs). Defense-in-depth alongside the `IsEmpty` runtime filter.

## Bench harness (v0.3.0a2+)

Bundled offline benchmark harness gating Phase 13's public claims. Two suites, one MTEB-style JSON envelope, two-tier judge, milestone-acceptance verdict against the v0.1.5 baseline. **No measured numbers ship in v0.3.0a2** — this release ships the harness only; Phase 13 owns the public claim.

- `supamem eval --suite goldens|longmemeval_s [--full] [--judge heuristic|ollama:<model>] [--report json] [--out PATH] [--baseline vX.Y.Z] [--dataset-path PATH] [--verbose]` — run the harness; emits an MTEB-style JSON envelope to `~/.supamem/eval/<utc-iso>.json`. Default judge is heuristic (offline, fastembed-backed, no network). `EVAL_JUDGE_MODEL` env var overrides; SaaS prefixes (`openai|anthropic|cohere|mistral`) refused per the D-07 invariant (`assert_no_saas_llm_env()`).
- `supamem eval --suite goldens` — extends the v0.1.x bundled regression baseline (`src/supamem/eval/goldens/`) to the new envelope shape. `main_score = recall_at_5`.
- `supamem eval --suite longmemeval_s` — lazy-fetches LongMemEval_S (`xiaowu0162/longmemeval-cleaned`, ~3 GB) from a pinned HF revision SHA into `platformdirs.user_cache_dir("supamem")/datasets/longmemeval/<sha>/`. CI fast-path runs an axis-stratified seeded 10-question subset (`tests/eval/smoke_ids.json`); full ~500 QA run gated behind `--full`. `main_score = tokens_per_correct_answer`.
- `supamem eval --list-suites` — list registered suites + default judge tier.
- `supamem eval --regress` — legacy v0.1.x regression mode (preserved byte-identical for backward compatibility).
- `--dataset-path PATH` — fully air-gapped CI mirrors override; skips the HF fetch entirely.
- Optional extra: `pip install supamem[eval]` brings in `ragas==0.4.*`, `datasets`, and `huggingface_hub>=0.24` for the RAGAS triad (`context_precision`, `context_recall`, `answer_relevance`). Without it, RAGAS metrics report as `null` with a one-line `err_console` install hint — the harness never crashes on missing extra.
- New env vars: `EVAL_JUDGE_MODEL=ollama:<model>` (Tier 2 judge override; SaaS refused), `EVAL_DATASET_CACHE` (override the dataset cache root for CI / air-gapped mirrors).
- `supamem doctor` "Eval bench" panel surfaces dataset SHA drift vs the pinned revision, cache size, last-run timestamp, RAGAS extra availability, and the active baseline file. Read-only; never flips the doctor exit code.
- Baseline storage: `src/supamem/eval/baselines/v0.1.5.json` (checked-in JSON; ships with `_baseline_pending: true` until Phase 13 measures). Selected via `--baseline vX.Y.Z` (defaults to previous milestone tag).
- Report shape (D-REPORT-01): MTEB-style envelope keyed by `supamem_version`, `config_sha`, `collection`, `suite`, `dataset.{name,revision,n,subset_ids}`, `judge.{kind,model}`, `main_score`, `scores.{recall_at_5,context_precision,context_recall,answer_relevance,tokens_per_correct_answer,context_compression_ratio,input_tokens_p50,input_tokens_p95,write_cost}`, `by_axis`, `baseline.{version,delta}`, and `per_question` (verbose only).

## Filtered retrieval backend (v0.3.0a3+)

Scoped+capped retrieval backend that wraps `tuned_hybrid` with a `where` filter and a per-hit preview cap. Selected by `[supamem.retrieval] backend = "filtered_dense"` (config-only switch, mirrors how `tuned_hybrid` / `dense` / `bm25` are selected). Registered via the existing `supamem.retrieval` plugin entry-point group.

### Retrieval backends (registered)

- `tuned_hybrid` — locked BM25 + MiniLM fusion (default; D-25)
- `dense` — MiniLM-only
- `bm25` — sparse-only
- `filtered_dense` (v0.3.0a3+) — hybrid wrapper around `tuned_hybrid` with backend-level `where` enforcement and per-hit preview cap; populates `RetrievedChunk.preview` from `[retrieval.filtered_dense] preview_chars`. Other backends leave `preview` unset (None) — MCP server populates it as today for those paths

### Config

- `[retrieval.filtered_dense] preview_chars` (int, default `240`) — backend-level per-hit preview cap. `0` disables truncation entirely (preview becomes the full document text). Independent of the MCP transport cap `mcp.caps.max_preview_chars` (Phase 5), which continues to apply on top — the MCP server takes `min(transport_cap, len(chunk.preview))`. Both default-on; both individually disable-able by setting to `0`.

### MCP `where` parameter — magic keys (v0.3.0a3+)

`dual_memory_search` and `qdrant_find` accept a `where: dict[str, str | list[str]] | None` filter (Phase 7 D-04 contract — type unchanged). Beyond the Phase 7 `room` key, two magic keys are recognized in v0.3.0a3+:

- `path_prefix` (string or list of strings) — left-anchored exact path-segment match against the new `payload.path_prefixes: list[str]` payload field. Indexer builds the prefix list per chunk at index time (`Path(file_path).parts` accumulated as `["src", "src/supamem", "src/supamem/retrieval", "src/supamem/retrieval/filters.py"]`). Translates to `FieldCondition(key="path_prefixes", match=MatchValue(...))` for a string, `MatchAny(...)` for a list. `KeywordIndex` (`on_disk=True`) created at collection init, mirroring Phase 7 `room`. `path_prefix="src/supa"` does NOT match `src/supamem/...` because `"src/supa"` is not a stored prefix segment — only complete `/`-segment boundaries match (rejected: `MatchText` + Qdrant `prefix` tokenizer, which is token-anywhere, not left-anchored).
- `valid_to: "now"` — accepted as a no-op alias for the always-on temporal clause from Phase 9 (`build_qdrant_filter` already enforces it). Any other value raises `ValueError` with the message `"supamem rejects valid_to=<value>: temporal validity is always-on (Phase 9 D-FILTER-01); time-travel queries are out of scope. See CHANGELOG v0.2.x for context."` — surfaces via `err_console`.

Multiple `where` keys are AND'd; list values within a key are OR'd (`MatchAny`). All other `where` keys continue to use the existing key-name = payload-key-name pass-through path.

### Migration (D-PFX-06)

Legacy chunks (indexed before v0.3.0a3) lack `payload.path_prefixes`. First post-upgrade `supamem index` runs a one-shot eager scroll-and-`set_payload` sweep that back-fills `path_prefixes` per chunk — pure metadata update, **zero re-embedding cost**, idempotent on subsequent runs. No `--force` reindex required. Mirrors the Phase 7 D-08 classifier-hash sweep precedent.

### Doctor surface

`supamem doctor` adds a "Filtered-dense backend" panel surfacing the resolved `preview_chars` value with `[source: ...]` provenance. Read-only by construction; never flips the doctor exit code.

## Bench harness where-filter pass (v0.3.0a4+)

Phase 14 fixes the LongMemEval_S bench methodology so the indexer-side `where`-filter levers shipped in Phases 7 (`room`), 9 (`valid_to`, always-on), 11 (`path_prefix`), and 14 (`session_id`) actually exercise on `tokens_per_correct_answer`. Single call site at `runner.py:428` now emits BOTH passes per question; smoke vs full continues to be gated by the existing `smoke_ids` filter inside the same loop.

- **Scoped pass.** Per question, the runner builds `where = {"session_id": question.haystack_sessions}` and queries the dedicated bench collection (`supamem_eval_longmemeval_s`). Result lands at `result.scores.scoped` and `result.by_axis.<axis>.scoped`.
- **Unscoped pass.** Same iteration, no `where` filter. Result lands at `result.scores.unscoped` and `result.by_axis.<axis>.unscoped`. Reported for transparency only — never gates.
- **Gate decision (scoped-only).** `_compute_main_score` for `longmemeval_s` reads `scores.scoped.tokens_per_correct_answer` against the v0.1.5 baseline. Phase 13's −30% gate is decided strictly against this number. See [docs/adr/0001-scoped-only-bench-gate.md](docs/adr/0001-scoped-only-bench-gate.md).
- **v0.1.5 baseline re-captured.** `eval/baselines/v0.1.5.json` carries both `unscoped` and `scoped` sibling keys. The original devdocs-collection number (`1374.59`) is preserved as `legacy_devdocs_unscoped_tpca` for historical reference but does NOT gate. Pre-Phase-14 absolute numbers are not directly comparable to post-Phase-14 numbers — the corpus changed.
- **Reproducibility caveat.** Scoped numbers may not reproduce in default unscoped invocations of `dual_memory_search` / `qdrant_find` — users who want comparable numbers must pass `where={...}` against a collection whose chunks carry the matching payload. This is a methodology disclosure, not a defect.
- **FUTURE-24 (rerank composition rework)** — sibling unblocker tracked separately. Phase 14's scoped pass runs with rerank-OFF so the measured scoped-vs-unscoped delta attributes cleanly to scoping. Public claims about scoping gains do NOT extrapolate to assume FUTURE-24 will further close the gap (D-FUT24-03).

### `payload.session_id` (bench-only, v0.3.0a4+)

The `session_id` payload field is set ONLY by the LongMemEval bench ingestion path (`supamem.eval.longmemeval_ingest`). User invocations of `supamem index` do NOT set this field — it is methodology infra, not a user-facing retrieval lever. `session_id` is NOT a magic key in `retrieval/filters.py`: it flows through the existing pass-through path (key-name = payload-key-name), with list values translated to `MatchAny` and single strings to `MatchValue`. Zero new branches in the filter dispatcher. The Phase 14 scoped pass uses `where={"session_id": [list]}` against the dedicated `supamem_eval_longmemeval_s` collection. See ADR-0001.

### Bundled smoke fixture

- `src/supamem/eval/datasets/longmemeval_scoped_smoke.json` — ≤5 questions, ≤200 KB, self-contained. Does NOT trigger the ~3 GB lazy fetch.
- New suite name: `longmemeval_scoped_smoke`. `suite_loader.py` dispatches to the bundled fixture for that suite.
- CI fast-path runs both unscoped + scoped against the smoke and asserts each within tolerance bounds. Zero impact on the existing `smoke_ids.json` 10-Q axis-stratified fast-path for `longmemeval_s`.

## CodeRAG live numbers + mem0 head-to-head (v0.3.0a6+)

Phase 16 ships the **live-stack baseline** for the coderag suite plus a **mem0 head-to-head peer row** with paired-bootstrap deltas. This release does NOT change the gate decision (Phase 13 still owns the verdict in a separate phase) — it ships the measurement infra + the live numbers that anchor ADR-0002 §7 and the new §8.

- **Auto-queries-from-manifest wiring in `--full`.** `_run_coderag` in full mode constructs records from `auto_queries.extract_pr_queries()` + `extract_adr_queries()` against the populated corpus manifest, NOT from `coderag_smoke.json`. Each record carries a `query_origin` field (`pr_title` / `adr_problem` / `adr_why`) and a `training_leakage_suspected` boolean. Smoke fixture continues to drive the default offline path unchanged.
- **`corpus.ensure_populated_manifest` lazy build-on-call.** Idempotent orchestrator that reads the bundled placeholder manifest, fetches+walks repos at pinned SHAs, and writes the realized manifest (with content-SHAs) to `platformdirs.user_cache_dir("supamem") / "coderag" / "manifest.json"`. Bundled manifest stays placeholder; user-cache copy holds the realized version. Re-runs on an unchanged corpus are byte-identical no-ops.
- **Live three-run variance-gated baseline (ADR-0002 §7 rewrite).** Phase 15's offline `< 0.005 ms` latencies and `1.000` recall floors (artefacts of the trivially-recovered 6-question smoke fixture) are removed. New floors derived from the live 21,235-chunk corpus across 3 successive runs (`mean − ε_ranking` for ranking metrics, `mean + ε_latency` for latency p95). ε per ADR-0002 §4 (`ε_ranking = max(stddev, 0.005)`, `ε_latency = max(0.05·mean, 5ms)`).
- **Latency hard ceiling 500 ms → 5000 ms one-shot adjustment (D-LAT-01).** Max measured live p95 = 4593.35 ms on `decision_rationale.supamem_only` sat at ~92% of the new 5000 ms ceiling. The previous 500 ms ceiling would have failed every cell against the live stack. ADR-0002 §7 carries the explicit reasoning paragraph documenting this as a one-shot adjustment, NOT a sliding scale; subsequent phases tighten or hold, never relax.
- **ADR-0002 §8 NEW — "Mem0 peer comparison".** Live head-to-head against mem0 default-config (`mem0ai==2.0.1`, HuggingFace `all-MiniLM-L6-v2`, `infer=False`) — 4 markdown tables (code_fact × {supamem_only, fastapi_only, combined} + decision_rationale × supamem_only) with `metric / supamem / mem0 / delta / ci_lower / ci_upper / qualitative` columns. Sign convention `mem0_vs_supamem`: positive delta = peer wins. Aggregate Phase 16-E tally: 9 wins / 21 ties / 0 losses across 30 cells; mem0 wins concentrate on the recall@k tail under the chunker-granularity caveat (mem0 ingested 2147 finer-grained records).
- **`metrics.paired_bootstrap_delta(samples_a, samples_b, n_resamples=10000, seed=42)`.** Pure-stdlib paired-bootstrap with percentile CI — no scipy dependency. 95% CI by default. Identical sample arrays produce delta=0 with CI bracketing zero.
- **Schema-compat — `peers: {}` and `comparisons: {}` always present (D-PEER-03).** Non-`--peer` envelopes emit empty dicts (NOT absent keys) so downstream consumers can safely `envelope["peers"].get("mem0")` without a KeyError. Backward-compatible with v0.3.0a5 envelopes that omitted these keys.
- **Phase 14 + Phase 15 byte-identical regression locks preserved unchanged.** `_run_goldens_legacy` (D-VEND-04) and `src/supamem/retrieval/filters.py` (D-QGEN-06; `repo` and `axis` remain pass-through with ZERO new branches) are still byte-identical.

## coderag eval suite (v0.3.0a5+)

Phase 15 ships the `coderag` suite as the **new Phase 13 ship gate**, replacing the LongMemEval `tokens_per_correct_answer` gate (which was workload-misaligned: LongMemEval measures conversational long-term memory, while supamem indexes code chunks consumed by AI coding agents). Full rationale: [docs/adr/0002-coderag-eval-philosophy.md](docs/adr/0002-coderag-eval-philosophy.md).

- **CLI.** `supamem eval --suite coderag [--full] [--out PATH] [--peer mem0] [--reingest-coderag]` — runs the code-retrieval suite; emits a 2-axis × 3-column metric envelope with `report_schema_version="coderag.v1"`. `--full` runs against the full pinned corpus; `--peer mem0` adds a parallel mem0 row. `--reingest-coderag` (Phase 17 Plan B2, default OFF) drops the `supamem_eval_coderag` collection and rebuilds it via the `supamem.chunker` entry-point keyed on `cfg.chunker` (e.g. `tree_sitter_code`) BEFORE scoring — Phase 16 baseline byte-identical replay path is preserved when the flag is absent.
- **Plugin entry-point group.** `supamem.eval` (new in v0.3.0a5) — mirrors the four existing groups (retrieval / embedder / chunker / reranker). The `coderag` suite registers via `[project.entry-points."supamem.eval"] coderag = "supamem.eval.coderag:CodeRAGSuite"`. Third parties can register additional suites without forking.
- **Two-repo deterministic haystack.** `supamem` (self) + `fastapi` (external Python framework). Both pinned to commit-SHAs via `src/supamem/eval/datasets/coderag_corpus_manifest.json` — never tag, never track-main (D-HAY-03).
- **Two axes.** `code_fact` (PR-derived queries with file-modification gold) and `decision_rationale` (ADR Problem/Why-derived queries with ADR-cited gold; **supamem-only** at the v1 corpus pin per A-D-HAY-04 — fastapi has no `docs/adr/` directory, so the three-column reporting collapses on this axis: `fastapi_only=null`, `combined=supamem_only`).
- **Three-column reporting (D-HAY-02).** Every metric (`Recall@k` for k ∈ {1, 5, 10, 20}, `MRR`, `nDCG@10`, p50 + p95 latency) reported as `supamem_only` / `fastapi_only` / `combined` siblings per axis. Self-reference circularity audit-visible at a glance.
- **Ship gate (D-GATE-03).** No-regression vs measured baseline: ranking metrics ≥ baseline − ε; latency p95 ≤ baseline + ε **AND** ≤ 500 ms hard ceiling. ε derived per-metric: `ε_ranking = max(stddev_3runs, 0.005)`, `ε_latency = max(0.05 × mean, 5ms)`. Locked floors in ADR-0002 §7.
- **mem0 peer baseline (D-DEF-01..04).** Single canonical default config; ingests source documents into its OWN Qdrant collection (`supamem_eval_coderag_mem0`, separate from `supamem_eval_coderag` per A-D-DEF-02). Reported as a parallel row, never gates. `pip install supamem[peers-mem0]` installs `mem0ai>=2.0,<3.0`.
- **LongMemEval demoted (D-GATE-05).** Full LongMemEval_S becomes on-demand-only. The 5-question `longmemeval_scoped_smoke` fixture (Phase 14) stays on PR-CI; full LongMemEval_S no longer gates releases.
- **doctor panel.** `supamem doctor` exposes coderag cache/manifest presence and the resolved bench-collection name (read-only; never flips exit code).
- **Bundled smoke fixture.** `src/supamem/eval/datasets/coderag_smoke.json` — 6 questions across both axes, ≤200 KB, self-contained. Powers offline PR-CI without live Qdrant.

### `payload.repo` and `payload.axis` (bench-only, v0.3.0a5+)

The `repo` and `axis` payload fields are set ONLY by the coderag bench ingestion path (`supamem.eval.coderag.ingest`). User invocations of `supamem index` do NOT set these fields — they are methodology infra, not user-facing retrieval levers (mirrors the v0.3.0a4 `payload.session_id` precedent).

- `payload.repo` — string; values: `"supamem"`, `"fastapi"`. Used by the three-column reporting to issue `supamem_only` / `fastapi_only` / `combined` retrieval passes per query.
- `payload.axis` — string; values: `"code_fact"`, `"decision_rationale"`. Used by the per-axis metric aggregation.

Neither is a magic key in `retrieval/filters.py`: both flow through the existing pass-through path (key-name = payload-key-name), with list values translated to `MatchAny` and single strings to `MatchValue`. **Zero new branches in the filter dispatcher** (D-QGEN-06 byte-identical lock — `src/supamem/retrieval/filters.py` is hashed and verified). The coderag retrieval passes use `where={"repo": ["supamem"]}` / `where={"repo": ["fastapi"]}` / `where=None` per axis. See ADR-0002.

## AST chunker + HyDE retrieval (opt-in plugins, v0.3.0a7+)

Phase 17 ships two **opt-in** retrieval-stack plugins plus a chunk-level recall metric, an Ollama warm-pool doctor panel, and ADR-0002 §9 — the paired-bootstrap uplift comparison vs the Phase 16 baseline-3. **Defaults are unchanged in the 0.3.x line.** Default-flip is gated on v0.4 per **D-LAT-01** (HyDE violates the 5000 ms p95 hard ceiling on 4/5 cells against the live corpus).

- **`tree_sitter_code` chunker (Req-02).** Registered under `supamem.chunker` alongside `markdown_header` and `transcript`. Opt-in via `pip install supamem[ast-chunker]` (`tree-sitter>=0.23,<0.26`, `tree-sitter-python>=0.23,<0.26`). Python only at v1. Token budget via `fastembed.TextEmbedding.token_count` (matches MiniLM). Parse errors fall back to `chunk_markdown` with an `err_console` warning (D-AST-03). When the extra is missing and the user has set `chunker = "tree_sitter_code"`, the lazy import raises a `RuntimeError` naming the fix command (D-PKG-02). Recall lift modest; stays opt-in.
- **`tuned_hybrid_hyde` retrieval (Req-03).** Registered under `supamem.retrieval` next to `tuned_hybrid` / `filtered_dense` / `dense` / `bm25`. Composition-over-inheritance — wraps `TunedHybridBackend` (kept byte-identical). Per query: POST to localhost Ollama `/api/generate` with the locked HyDE prompt (D-HYDE-01), `keep_alive=-1` warm-pool retention (D-HYDE-04), 600 ms timeout + 1 retry, fall back to original query on failure (D-HYDE-03). Localhost-only guard reused from `supamem.eval.judge._resolve_ollama_host` (`SystemExit(2)` on non-localhost — D-07 inherited). Verdict: meets the Track B `decision_rationale.supamem_only.recall_at_1 ≥ 0.5` goal exactly but **violates D-LAT-01 hard ceiling on 4/5 cells (max p95 6069 ms)** and produces a −0.25 MRR regression on `code_fact`. Opt-in only; no default-flip path; Phase 18 follow-up = selectivity gating by axis.
- **Chunk-level recall metric + `payload.chunk_id` (Req-01).** Envelopes carry `recall_at_*_chunk` siblings beside doc-level keys. `chunk_id = <rel_path>#<sha1(text)[:12]>`. New `_build_run_chunk` sibling does NOT dedup on duplicate doc_ids (Pitfall 4 fix). Doc-level path stays byte-identical — Phase 16 floors test still green.
- **`--reingest-coderag` flag on `supamem eval --suite coderag` (Req-09 / G5 wiring).** Default OFF — Phase 16 baseline byte-identical replay path preserved when absent. When ON: drops `supamem_eval_coderag` and rebuilds via `supamem.chunker` entry-point keyed on `cfg.chunker` (and retrieval keyed on `cfg.retrieval`) BEFORE scoring.
- **Ollama warm-pool doctor panel (Req-04 mitigation).** Fires only when `retrieval.backend = "tuned_hybrid_hyde"`. Probes `/api/ps` with 1s timeout; surfaces loaded model + load duration. Read-only — NEVER raises, NEVER flips exit code (D-DOCTOR-04). Same localhost guard as HyDE retrieval.
- **ADR-0002 §9 "Phase 17 uplift comparison" (Req-07).** Three sibling sub-tables (`### default vs ast_on` / `### default vs hyde_on` / `### default vs ast_plus_hyde`) carry paired-bootstrap deltas vs Phase 16 baseline-3, parser-locked by `tests/test_adr_phase17_uplift.py` (mirrors §8 ADR-as-test). **CIs collapse to `[delta, delta]`** because v1 LIVE envelope schema records means only (no per-query arrays) — delta values are exact, CI bounds do NOT reflect query-level uncertainty. Future envelope-schema bump unlocks real CIs without §9 structural change. `recall_at_*_chunk` is null across 17-E/F/G — common gold-chunk derivation gap (follow-up before any future §9-style write-up).

### Registered plugins (post-Phase-17)

- `supamem.chunker`: `markdown_header` (default), `transcript`, `tree_sitter_code` (opt-in, v0.3.0a7+).
- `supamem.retrieval`: `tuned_hybrid` (default), `dense`, `bm25`, `filtered_dense`, `tuned_hybrid_hyde` (opt-in, v0.3.0a7+).

## Anti-feature lock — no identity / wake-up / prelude tier (v0.3.0a3+)

supamem does NOT auto-inject identity / wake-up / prelude context into agent calls — retrieval is always solicited via an explicit query. There is no hidden "agent identity" tier, no SessionStart-time wake-up payload that pushes ambient context into the model, no MCP tool that fires retrieval when the `query` is empty.

Locked from two sides:

- **Schema-level (D-NOID-01.c, v0.3.0a3+)** — every retrieval tool's `query` Pydantic `Field` is `Field(..., min_length=1, max_length=max_q)` at both registration sites: canonical `dual_memory_search_tool` and `qdrant_find_alias` (D-17 anti-drift parity). Empty `query` rejected with a structured MCP validation error at the schema layer; whitespace-only `"   "` continues to be caught by the runtime `.strip()` defense-in-depth check (since `min_length=1` accepts a single-character whitespace string).
- **Test-level (FILT-02)** — `tests/test_no_identity_tier.py` is a CI-enforced regression test with three pinned assertions: (1) no registered MCP tool name matches `(?i)(wake[_-]?up|identity|prelude|inject)`; (2) every retrieval tool's JSON Schema has `query` in `required` with `properties.query.minLength >= 1`; (3) empty + whitespace-only queries are rejected at runtime. Name-based import lints (e.g. banning `*identity*` substrings) were rejected as theatre — defeated trivially by aliased imports; the regression test catches the only failure mode that matters.

## Architecture

- [How it works](README.md#how-it-works): MCP server topology, hybrid retrieval, hook flow
- [Hybrid retrieval](README.md#features): Tuned BM25 + MiniLM fusion, locked schema D-25
- [Markdown chunker](README.md#features): Header-aware T-1 chunker, 200-token target / 250 soft max
- [Transcript chunker](README.md#transcript-ingestion-v022a1): Q+A drawer chunks from Claude Code session JSONL (v0.2.2a1+)

## Prerequisites

- [Python 3.12+](README.md#prerequisites): macOS / Linux / Windows install commands
- [Qdrant 1.10+](README.md#prerequisites): Docker, docker compose, or Qdrant Cloud
- [MCP-compatible client](README.md#prerequisites): Claude Code, Cursor, or OpenCode

## Optional

- [Contributing](README.md#contributing): Local dev setup with uv + pytest + ruff
- [SoftChat](https://app.softchat.ru): Russian-language AI chat platform — origin project
- [SoftSkillz](https://softskillz.ai): AI-first product engineering team
- [Qdrant docs](https://qdrant.tech/documentation/): Vector database upstream
- [Model Context Protocol](https://modelcontextprotocol.io/): MCP spec
- [uv](https://docs.astral.sh/uv/): Recommended Python package manager
