Metadata-Version: 2.4
Name: memir
Version: 0.4.0
Summary: Local memory for coding agents: deterministic zero-token writes, a first-class failure guard (never repeat a mistake), and a token-efficient briefing. Local CPU semantic recall — no API keys, no cloud.
Author: Nahian
License: MIT
Project-URL: Homepage, https://github.com/Nahian/memir
Project-URL: Repository, https://github.com/Nahian/memir
Project-URL: Issues, https://github.com/Nahian/memir/issues
Keywords: agent,memory,llm,coding-agent,mcp,model-context-protocol,failure-memory,context-window,local-first,rag,sqlite,embeddings,model2vec
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: model2vec>=0.3
Requires-Dist: numpy>=1.21
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Provides-Extra: nli
Requires-Dist: transformers>=4.40; extra == "nli"
Requires-Dist: torch>=2.2; extra == "nli"
Provides-Extra: enrich
Requires-Dist: llama-cpp-python>=0.3; extra == "enrich"
Requires-Dist: huggingface-hub>=0.20; extra == "enrich"
Dynamic: license-file

# 🧠 Memir

**The memoir your coding agent writes for itself.**
Long-term memory for coding agents that *doesn't* eat your context window.
Local-first. Zero API keys. And it **never lets the agent repeat a mistake.**

[![CI](https://github.com/Nahian/memir/actions/workflows/ci.yml/badge.svg)](https://github.com/Nahian/memir/actions/workflows/ci.yml)
![Python](https://img.shields.io/badge/python-3.10%2B-blue)
![License](https://img.shields.io/badge/license-MIT-green)
![Runs on CPU](https://img.shields.io/badge/runs%20on-CPU%20only-brightgreen)

> *Memir* = **mem**ory + **memoir** + **Mímir**, the Norse keeper of memory and
> wisdom. Your agent keeps a memoir; Memir is where it lives.

---

## The problem

Every coding agent forgets. Switch chats, hit the context limit, start a new
session — and it re-asks what it already knew, re-decides what you already
decided, and **repeats the exact mistake it made an hour ago.**

The common "fix" is to stuff the whole history back into the prompt every turn.
That burns thousands of tokens per request and *still* overflows as the project
grows.

## The idea

Memir stores three kinds of memory in a tiny local SQLite file:

| kind | what it captures |
|------|------------------|
| **fact** | durable truths about the project (endpoints, conventions, constraints) |
| **decision** | a choice **and the reason** for it, so it isn't relitigated |
| **failure** | what was **tried**, **why it failed**, and the **lesson** — a first-class memory |

Then it hands the agent a **token-budgeted briefing** at the start of a session
and a **failure guard** it can check *before* trying something.

> **Writing a memory costs 0 tokens and 0 API calls** — it's a local insert, sub-
> millisecond. Compare that to memory frameworks that run an LLM extraction call
> on *every* `add`.

---

## Why it's different

- **First-class FAILURE memory.** Most memory tools remember facts. Memir
  remembers *mistakes* — what failed, why, and what to do instead — and blocks
  the repeat. That's the load-bearing feature for a coding agent.
- **0-token, deterministic writes.** No LLM in the capture path. Nothing to bill,
  nothing to rate-limit, nothing to go down.
- **Local semantic recall by default.** Ships with a retrieval-tuned, numpy-only
  embedding model ([model2vec](https://github.com/MinishLab/model2vec)
  `potion-retrieval-32M`) that runs in ~1 ms on a **CPU** — no GPU, no torch, no
  API. Paraphrase matching works out of the box, and gracefully falls back to
  keyword recall if the model can't be loaded.
- **Anti-bloat.** Write-time dedup plus a *safe* `consolidate()` that only merges
  genuine restatements — it will **never** collapse two distinct facts (verified
  in the benchmark below).
- **Bounded context cost.** A naive "replay everything" agent grows without
  bound; the briefing here stays inside a token budget you set.
- **Plugs into any agent via MCP.** One stdio server works with Cursor, Claude
  Desktop, Cline, VS Code, Windsurf, …

---

## Install & run (one command)

```bash
pip install memir
```

That pulls everything — including the local model deps. To grab the model and
warm the cache up front:

```bash
memir setup      # downloads the CPU model, initialises the store
memir start      # starts the MCP server (stdio) for your editor/agent
```

Prefer a fully scripted bootstrap that creates a venv and self-installs?

```bash
./scripts/start.sh        # macOS / Linux  — first run sets everything up
.\scripts\start.ps1       # Windows PowerShell
```

Run it once and it self-installs; run it again and it just starts.

---

## Quickstart (Python)

```python
from memir import Memir

brain = Memir("my-project")          # local model loads automatically (CPU)

# remember things (0 tokens, sub-ms, no API)
brain.remember_fact("The billing job MUST run in UTC; our partner reports in UTC.")
brain.record_decision("Use SQLite FTS5 for recall.", reason="100x faster than scanning.")

# record a failure so it's never repeated
brain.record_failure(
    attempt="POSTed the whole batch to /v1/ingest in one request.",
    reason="The gateway silently drops bodies >256KB and returns an empty 200.",
    lesson="Chunk uploads under 256KB and verify each ack id.",
)

# BEFORE trying something, check the failure guard
if hits := brain.check_failure("send the full batch to /v1/ingest at once"):
    print("⛔", hits[0].lesson)   # -> Chunk uploads under 256KB and verify each ack id.

# start a fresh session fully informed, within a token budget
print(brain.briefing(budget_tokens=300))
```

### Auto-capture from real errors

```python
# parses the traceback, decides it's non-obvious, stores it as a failure:
brain.capture_error("RuntimeError: gateway returned 200 with empty body — silently dropped")

# a textbook typo is judged self-evident and skipped (no bloat):
brain.capture_error("NameError: name 'foo' is not defined")   # -> None

# or wrap risky work and capture whatever blows up:
with brain.watch("uploading batch to /v1/ingest"):
    client.upload(batch)
```

---

## Command line

```bash
memir remember "Deploys go out via the blue/green pipeline only."
memir decision "Adopt SQLite WAL" --why "concurrent reads during writes"
memir failure  "Dropped the prod table" --why "ran migration w/o backup" \
               --lesson "snapshot before every migration"
memir recall   "how do we deploy?"
memir check    "run the migration now"      # failure guard
memir briefing --budget 300
memir stats
memir doctor                                 # environment + MCP config check
```

---

## Use it from your editor (MCP)

Memir ships a Model Context Protocol server (stdio JSON-RPC, no SDK).

**Cursor / Windsurf / Claude Desktop** — add to your MCP config:

```json
{
  "mcpServers": {
    "memir": {
      "command": "memir",
      "args": ["start"],
      "env": {
        "MEMIR_PROJECT": "my-project",
        "MEMIR_DB": ".memir/memir.db"
      }
    }
  }
}
```

**VS Code** (`.vscode/mcp.json`):

```json
{
  "servers": {
    "memir": { "command": "memir", "args": ["start"] }
  }
}
```

Tools exposed: `remember_fact`, `record_decision`, `record_failure`,
`record_directive`, `recall`, `check_failure`, `briefing`, `capture_error`,
`stats`, `consolidate`.

`record_directive` stores a **standing user instruction** (e.g. *"keep working
until X is done; do not stop early"*). Directives are pinned to the top of every
`briefing` and are **never** dropped by the token budget, so the agent can never
drift from — or stop short of — what the user told it to do.

Set `MEMIR_EMBED=0` to force keyword-only mode (skips the model entirely).
Override the model with `MEMIR_MODEL=minishlab/potion-base-8M` (30 MB, faster).

---

## Benchmark (reproducible, honest)

`python benchmarks/benchmark.py` — synthetic multi-session fact-QA, fixed seed,
**fully local**, each fact restated 3× to expose bloat. Memir is measured
head-to-head against the realistic *local* baselines an agent could actually use:

| strategy | recall@1 | recall@3 | tokens / turn | rows kept |
|---|---|---|---|---|
| no-memory | 0% | 0% | ~10 | 0 |
| full-context (replay everything) | 100% | 100% | **6,906** | 600 |
| naive-rag (add-only vector store) | 96.5% | 99.5% | 35 | 1,800 |
| **Memir** | **100%** | **100%** | **36** | **1,200** |

- **192× fewer tokens per turn** than replaying the transcript, for the *same*
  answer availability.
- **Leads recall@3 (100%)** — matching full-context and beating the add-only
  vector baseline — at a **bounded** token cost.
- **Anti-bloat:** keeps ~33% fewer rows than the add-only store, *without* ever
  merging two distinct facts.
- **Write:** ~0.8 ms / memory, **0 tokens, 0 API calls**.
- **Recall:** ~8 ms / query.
- **Failure-repeat prevention:** 100% of known failures blocked on retry — which
  no other strategy here can do at all.

> **Honesty note.** The dataset is synthetic so the numbers reproduce on any
> machine. `naive-rag` is an idealized, *free* local stand-in for add-only memory
> stores (mem0-style read path) run **without** their per-write LLM extraction
> call — so its write cost here is a generous lower bound for those products. We
> deliberately **do not** print accuracy figures for mem0 / Zep / Letta, because
> running them needs API keys / cloud and we won't fabricate competitor results.

---

## How it stays lean

- **Write-time dedup** — identical memories are never stored twice.
- **`consolidate()`** — merges semantic near-duplicates **only** when they are
  both meaning-equivalent *and* lexically overlapping (a true restatement), so it
  can never silently delete a distinct fact.
- **`contradictions()`** — surfaces conflicting memories *non-destructively*
  (negation-aware, so it doesn't mistake a paraphrase for a conflict).
- **`prune()`** — decays cold facts/decisions, but **never** auto-drops a failure.
- **Tiers** — `session` / `project` / `global`; `global` memories survive pruning
  and can be `export_brain()`-ed and re-`import_brain()`-ed into other projects.

---

## Honest limitations

- **Contradiction detection is heuristic.** Telling "X is true" from "X is false"
  perfectly needs an NLI/LLM step; the built-in detector is a fast, negation-aware
  approximation that surfaces candidates for review rather than auto-resolving.
- **Consolidation is conservative on purpose.** It errs toward keeping two rows
  rather than risk merging two different facts — so a loosely-worded paraphrase of
  the same fact may survive as a separate row. That's the safe trade.
- **Auto-capture is intentionally conservative.** It skips self-evident errors by
  design. Immediately-visible mistakes (a `NameError` typo) are *not* stored.
- **The benchmark dataset is synthetic.** It measures the system's own properties
  honestly; it is not a head-to-head accuracy claim against other products.

---

## Development

```bash
pip install -e ".[dev]"
pytest -q
python benchmarks/benchmark.py
```

## License

MIT © 2026 Nahian
