Metadata-Version: 2.4
Name: mem-context
Version: 0.1.2
Summary: Multi-modal temporal memory MCP server — RAG engine over conversation history with weight-decay model and LLM-driven consolidation
License: MIT
Keywords: consolidation,lancedb,mcp,memory,rag,temporal-memory,vector-search
Requires-Python: >=3.11
Requires-Dist: httpx>=0.27.0
Requires-Dist: lancedb>=0.17.0
Requires-Dist: mcp[cli]>=1.0.0
Requires-Dist: ollama>=0.4.0
Requires-Dist: pyarrow>=16.0.0
Requires-Dist: pylance>=0.18.0
Requires-Dist: sentence-transformers>=3.0.0
Provides-Extra: dev
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Provides-Extra: watch
Requires-Dist: watchfiles>=0.21; extra == 'watch'
Description-Content-Type: text/markdown

<!-- mcp-name: io.github.turbyho/mem-context -->
mcp-name: io.github.turbyho/mem-context

# mem-context — Temporal Memory MCP Server

Multi-modal RAG engine for AI assistants. Stores conversation history, conclusions,
diffs, error traces, and other development artifacts in LanceDB with vector search,
multi-factor scoring, and an LLM-driven consolidation pipeline.

[![Python](https://img.shields.io/badge/python-3.11+-blue.svg)](https://python.org)
[![MCP](https://img.shields.io/badge/MCP-2024--11--05-green.svg)](https://modelcontextprotocol.io)
[![License](https://img.shields.io/badge/license-MIT-yellow.svg)](LICENSE)
[![Tests](https://img.shields.io/badge/tests-113%20passed-brightgreen.svg)](tests/)

## Why

AI assistants lose context between sessions. mem-context persists what matters —
decisions, patterns, bugs, architecture choices — and surfaces them when relevant
via vector search. Memories decay over time unless reinforced by repeated access,
mimicking human memory.

## Features

- **Vector search with dual backend** — LanceDB ANN index for fast approximate
  nearest-neighbor queries. Primary embedding via Ollama `mxbai-embed-large`
  (1024d, ~670 MB). Local `all-MiniLM-L6-v2` (384d) fallback when Ollama is
  unavailable — no GPU or network required. Embeddings are auto-padded to
  match schema dimension; switching backends is transparent.

- **Multi-factor relevance scoring** — six independent factors combine into a
  single 0–1 relevance score. Each factor models a different aspect:
  `vector_score` (semantic similarity), `weight_score` (stored importance ×
  time decay), `recency_score` (age in days), `scope_score` (project match),
  `access_boost` (usage reinforcement), `type_boost` (permanent > semantic >
  episodic). The model balances "what's relevant" with "what's still valid."

- **Weight decay with natural memory model** — each memory type has a
  configurable `decay_rate`: 0.15/day for episodic (session captures fade
  fast), 0.03/day for semantic (extracted knowledge persists), 0 for
  permanent (never decays). Decay is exponential: `weight × e^(−rate × days)`.
  Frequently accessed memories get a counteracting boost — the system
  reinforces what you use, archives what you don't.

- **Deduplication by cosine similarity** — new memories are compared against
  existing ones before insertion. At similarity > 0.82, the new memory is
  merged into the existing one (weight boost + content update) instead of
  creating a duplicate. Prevents memory fragmentation from repeated captures
  of the same conclusion across sessions.

- **LLM-driven consolidation pipeline** — 3-phase: extract (3 days), merge
  (7 days), archive (30 days). The server prepares candidates and prompts;
  the **host model** (Claude, DeepSeek, GPT, or local Ollama) does the
  reasoning. Episodic session captures → extracted conclusions (semantic) →
  merged permanent knowledge → archived if unused. Runs in the background
  when `remember()` or `recall()` is called — no cron needed.

- **Multi-modal storage** — LanceDB columns for text content, code diffs,
  file lists, error traces, tags, and metadata. Each modality is indexed
  separately; vector search operates on the combined embedding. Stores not
  just "what happened" but the diff and stack trace that caused it.

- **Automatic conversation capture** — hooks for Claude Code (Stop event)
  and OpenCode (on_session_end). The wrapper binary finds the current
  session's transcript, parses it into structured messages, and imports
  them as episodic memories. No manual action needed — every session is
  archived automatically.

- **Portable export/import** — JSON export strips embeddings (re-generated on
  import), keeps all metadata. Use for backup, cross-device sync, or
  migrating between machines. Import deduplicates by ID — safe to run
  multiple times.

- **One-command provisioning** — `mem-context init` detects installed AI
  tools (Claude Code, OpenCode, Codex, Cursor), registers the MCP server,
  injects CLAUDE.md instructions, and installs slash-command skills (6 tools:
  `recall`, `remember`, `forget`, `delete`, `purge`, `status`).
  `mem-context install` adds capture hooks. Two commands, ready to use.

## Installation

### Linux

```bash
# 1. System dependencies
sudo pacman -S python3 python-pip  # Arch / Manjaro
# nebo
sudo apt install python3 python3-pip python3-venv  # Debian / Ubuntu
# nebo
sudo dnf install python3 python3-pip  # Fedora

# 2. Install Ollama (for embedding)
curl -fsSL https://ollama.com/install.sh | sh
ollama serve &  # Start Ollama in background

# 3. Install mem-context
python3 -m venv ~/.mem-context/.venv
~/.mem-context/.venv/bin/pip install mem-context

# 4. Add to PATH (add to ~/.bashrc or ~/.zshrc)
echo 'export PATH="$HOME/.mem-context/.venv/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc

# 5. Pull embedding model (~670 MB)
ollama pull mxbai-embed-large

# 6. Provision — registers MCP server + injects instructions
mem-context init                          # all detected AI tools
# or target a single tool:
mem-context init --tool claude-code       # Claude Code only
mem-context init --tool opencode          # OpenCode only

# 7. Install capture hooks (Claude Code, OpenCode)
mem-context install claude-code
mem-context install opencode       # optional
mem-context install status         # verify

# 8. Restart your AI assistant
```

### macOS

```bash
# 1. System dependencies
brew install python@3.11

# 2. Install Ollama
brew install ollama
# Start Ollama: open Ollama.app or run `ollama serve &`

# 3-8. Same as Linux (steps 3-8 above)
python3 -m venv ~/.mem-context/.venv
~/.mem-context/.venv/bin/pip install mem-context
echo 'export PATH="$HOME/.mem-context/.venv/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc
ollama pull mxbai-embed-large
mem-context init
mem-context install claude-code
```

### Verify installation

```bash
# Check CLI works
mem-context status

# Check Ollama + embedding model
mem-context init --check-ollama

# List detected AI tools
mem-context init --list-tools

# Check capture hooks
mem-context install status
```

### Manual MCP registration

If `mem-context init` can't register the MCP server automatically:

**Claude Code:**
```bash
claude mcp add --scope user mem-context ~/.mem-context/.venv/bin/mem-context-mcp
```

**OpenCode:** Add to `~/.config/opencode/opencode.json`:
```json
{
  "mcp": {
    "mem-context": {
      "command": ["$HOME/.mem-context/.venv/bin/mem-context-mcp"],
      "enabled": true,
      "type": "local"
    }
  }
}
```

## Usage

### MCP tools (from AI assistant)

| Tool | Description |
|------|-------------|
| `remember(content, type?, weight?, tags?)` | Store a memory with auto-embedding |
| `recall(query, scope?, token_budget?, min_score?, limit?, type_filter?)` | Vector search with scoring |
| `forget(id)` | Archive (weight=0) |
| `get(id)` | Retrieve one memory |
| `update(id, fields)` | Modify metadata |
| `status(scope?)` | Memory store statistics |
| `review()` | Flagged memories |
| `consolidation_candidates(scope?)` | Consolidation tasks for host model |

### CLI

```bash
mem-context status                          # Store statistics
mem-context recall "query" --limit 5        # Search memories
mem-context get <id>                        # One memory
mem-context forget <id>                     # Archive
mem-context review                          # Flagged memories
mem-context consolidate --dry-run           # Consolidation candidates
mem-context capture transcript <path>       # Import conversation
mem-context export -o memories.json         # Export all memories
mem-context import memories.json --re-embed # Import from export
mem-context init --list-tools               # Show AI tools
mem-context install status                  # Hook status
```

## How It Works

### Write path: capture → store → embed

```
Session ends
  → capture hook fires (Stop / on_session_end)
  → transcript parsed into structured messages
  → each message stored as episodic memory
  → content embedded via Ollama (1024d) or local model (384d)
  → cosine similarity check: > 0.82 → merge, else insert
```

### Read path: query → embed → search → score → return

```
recall("how do we handle auth?")
  → query embedded to 1024d vector
  → LanceDB ANN search (scope-filtered: same project + global)
  → raw candidates scored by 6-factor formula
  → sorted by final_score, filtered by min_score
  → token-budgeted: results accumulated until budget exhausted
  → returned to host model for use
```

### Consolidation path: age → candidate → LLM → write-back

```
remember() or recall() called
  → check last_consolidation > interval_hours (24h)?
  → build_task: scan for episodic > 3d, semantic clusters > 7d
  → send prompts + candidates to host model
  → host model extracts conclusions → new semantic memories
  → host model merges similar semantics → permanent
  → low-weight (< 0.1) memories archived (weight = 0)
```

The host model does all reasoning — the server only prepares structured
prompts and candidate lists. This means consolidation quality scales with
the host model's capability (Fable 5 > Opus > Sonnet > local Ollama).

## Architecture

```
mem-context/src/mem_context/
├── storage/lance.py        LanceDB CRUD, ANN search, FTS, export/import
│   schemas.py              PyArrow schemas: memories, relations, conversations
├── retrieval/embedder.py   Dual-backend embedding (Ollama + local fallback)
│   scoring.py              6-factor scoring: vector × weight × decay × …
├── capture/formats.py      Transcript parsers: Claude Code, OpenCode, JSON, generic
│   wrapper.py              Hook entry-point: finds transcript, runs capture
├── consolidation/
│   pipeline.py             Build tasks, run extract/merge/archive phases
│   templates.py            Prompt templates for each consolidation phase
│   ollama.py               Local model fallback for LLM tasks
├── mcp/server.py           FastMCP server: 10 tools (remember, recall, forget, …)
├── provision.py            AI tool detection, CLAUDE.md injection, skill install
├── config.py               YAML + env config with auto-detection
└── scope.py                Project scope resolution (config → path hash → global)
```

## Scoring

```
final = vector_score × weight_score × recency_score × scope_score × access_boost × type_boost

vector_score = exp(-cosine_distance)
weight_score = sqrt(weight × e^(-decay_rate × days))
recency_score = e^(-recency_decay_rate × days)
  recency_decay_rate = permanent: 0.005, semantic: 0.02, episodic: 0.05
scope_score   = same_project: 1.0, global: 0.8, other: 0.4
access_boost  = min(2.0, 1.0 + 0.1 × access_count)
type_boost    = permanent: 2.0, semantic: 1.2, episodic: 1.0
```

## Memory types

| Type | Default weight | Decay rate | Use |
|------|---------------|------------|-----|
| `episodic` | 0.5 | 0.15/day | Session captures, debugging |
| `semantic` | 0.7 | 0.03/day | Extracted conclusions, patterns |
| `permanent` | 1.0 | 0.0 | Architecture decisions, conventions |

## Consolidation pipeline

| Phase | Trigger | Action |
|-------|---------|--------|
| Extract | 3 days | Episodic → host model extracts conclusions → semantic |
| Merge | 7 days | Semantic cluster by embedding → host model merges |
| Archive | 30 days | weight < 0.1 → weight = 0 |

The server prepares prompts and candidates; the **host model** (Claude, DeepSeek,
GPT) does the reasoning and writes results back via MCP tools.

### Automatic background consolidation

No cron needed — consolidation runs automatically in the background
when `remember()` or `recall()` is called, at most once per `interval_hours`
(default 24h).

### Configuration

All parameters are configurable via `~/.mem-context/config.yaml`,
`.mem-context/config.yaml`, or environment variables. See
[Configuration docs](docs/configuration.md) for all options.

```bash
# Quick overrides
export MEM_CONTEXT_CONSOLIDATION_MODEL=qwen2.5-coder:14b  # model
export MEM_CONTEXT_CONSOLIDATION_TEMPERATURE=0.1           # 0.0-1.0
export MEM_CONTEXT_CONSOLIDATION_TIMEOUT=300               # seconds
```

| Parameter | Default | Env var | Description |
|-----------|---------|---------|-------------|
| `model` | auto-detect | `CONSOLIDATION_MODEL` | 14b→7b→3b, or override |
| `num_ctx` | 8192 | `CONSOLIDATION_NUM_CTX` | Context window tokens |
| `temperature` | 0.2 | `CONSOLIDATION_TEMPERATURE` | Determinism (0.0–1.0) |
| `timeout` | 120s | `CONSOLIDATION_TIMEOUT` | Ollama API timeout |
| `extract_after_days` | 3 | `CONSOLIDATION_EXTRACT_AFTER_DAYS` | Episodic → extraction |
| `merge_after_days` | 7 | `CONSOLIDATION_MERGE_AFTER_DAYS` | Semantic → merge |
| `archive_after_days` | 30 | `CONSOLIDATION_ARCHIVE_AFTER_DAYS` | Low weight → archive |
| `max_extract` | 20 | `CONSOLIDATION_MAX_EXTRACT` | Candidates per run |
| `max_merge` | 10 | `CONSOLIDATION_MAX_MERGE` | Merge groups per run |
| `interval_hours` | 24 | — | Hours between runs |

### Model auto-detection

If no model is configured, the system:
1. Detects GPU VRAM (NVIDIA, AMD, macOS Metal/Apple Silicon)
2. Picks the best model that fits: `14b` (9+ GB) → `7b` (5+ GB) → `3b` (4+ GB)
3. Auto-pulls it via Ollama if not installed
4. Falls back to smaller model on OOM errors

**No GPU:** Minimum `qwen2.5-coder:3b` (~4 GB system RAM, slow on CPU).
**MCP path doesn't need a local model** — host LLM does the work.

## Scope detection

```
1. .mem-context/config.yaml → project_id → scope = "proj:" + hash
2. Fallback → scope = "path:" + hash(cwd)
3. `scope="global"` is explicit-only — never auto-detected
```

## Requirements

- Python 3.11+
- Ollama (for embedding) — `mxbai-embed-large` (~670 MB, recommended)
- Or: `sentence-transformers` local fallback (`all-MiniLM-L6-v2`, 384d)
- Consolidation model: auto-detected and auto-installed (see above)

## Installation options

### `mem-context init` — instructions + skills (all 5 tools)

```bash
mem-context init                    # All detected AI tools
mem-context init --tool claude-code # Claude Code only
mem-context init --tool opencode    # OpenCode only
mem-context init --tool codex       # Codex only
mem-context init --tool cursor      # Cursor only (project-scoped)
mem-context init --dry-run          # Preview without changes
mem-context init --list-tools       # Show what's detected
```

### `mem-context install` — capture hooks (2 tools)

```bash
mem-context install claude-code     # Stop hook → settings.local.json
mem-context install opencode        # on_session_end hook → config.yaml
mem-context install status          # Check all
mem-context install uninstall -c claude-code  # Remove
```

### Manual MCP registration

```bash
claude mcp add --scope user mem-context ~/.mem-context/.venv/bin/mem-context-mcp
```

## Documentation

| Document | Content |
|----------|---------|
| [Installation](docs/installation.md) | Detailed setup, Ollama, config |
| [Configuration](docs/configuration.md) | Všechny parametry s vysvětlením |
| [MCP Tools](docs/tools.md) | Tool reference with schemas and examples |
| [Architecture](docs/architecture.md) | Storage, scoring, retrieval pipeline |
| [Consolidation](docs/consolidation.md) | Pipeline phases, host model workflow |
| [Provisioning](docs/provisioning.md) | `mem-context init`, client support |
| [Capture](docs/capture.md) | Automatic transcript capture setup |
| [Test Scenarios](tests/TEST-SCENARIOS.md) | 28 sections, 100+ test cases |

## Development

```bash
git clone ssh://git@git.montyho.com/turbyho/mem-context.git
cd mem-context
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
python3 -m pytest tests/ -q  # 113 tests
```

## License

MIT
