Metadata-Version: 2.4
Name: callmem
Version: 0.3.0
Summary: Persistent memory for LLM coding agents — local-first, model-agnostic, SQLite-backed
Project-URL: Homepage, https://github.com/DANgerous25/callmem
Project-URL: Repository, https://github.com/DANgerous25/callmem
Project-URL: Issues, https://github.com/DANgerous25/callmem/issues
Author: Dan
License-Expression: MIT
License-File: LICENSE
Keywords: coding-agent,llm,mcp,memory,ollama,opencode,sqlite
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Web Environment
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.10
Requires-Dist: click<9.0,>=8.0
Requires-Dist: cryptography>=44.0
Requires-Dist: fastapi<1.0,>=0.115
Requires-Dist: httpx<1.0,>=0.27
Requires-Dist: jinja2<4.0,>=3.1
Requires-Dist: mcp<2.0,>=1.0
Requires-Dist: pydantic<3.0,>=2.0
Requires-Dist: python-ulid<3.0,>=2.0
Requires-Dist: tomli>=2.0; python_version < '3.11'
Requires-Dist: uvicorn[standard]<1.0,>=0.30
Provides-Extra: dev
Requires-Dist: mypy>=1.13; extra == 'dev'
Requires-Dist: pre-commit>=4.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24; extra == 'dev'
Requires-Dist: pytest-httpx>=0.30; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.8; extra == 'dev'
Provides-Extra: embeddings
Requires-Dist: sentence-transformers>=3.0; extra == 'embeddings'
Requires-Dist: sqlite-vec>=0.1; extra == 'embeddings'
Description-Content-Type: text/markdown

# callmem

> **Renamed from `llm-mem` in v0.2.0.** Existing installs keep working — the `llm-mem` console script, the `python -m llm_mem.mcp.server` entry point, the `.llm-mem/` config dir, and the `LLM_MEM_*` env vars are all honored as aliases. Run `callmem migrate` in a project to update its names to the new canonical forms.

**Persistent memory for LLM coding agents.**

callmem gives coding agents a durable, searchable memory that survives across sessions. It captures what happened, compresses it in the background using a local LLM, and serves a compact briefing when the next session starts — so the agent picks up where you left off without manual context management.

Works with **OpenCode** and **Claude Code** side by side. Both tools write to the same memory database, so you can rate-limit-swap between them mid-project without losing context. The extraction LLM is also swappable at any time — switch models in `config.toml` and restart the daemon; prior memories stay intact and keep being usable.

Inspired by [claude-mem](https://github.com/anthropics/claude-mem), but built from the ground up with a different philosophy: model-agnostic, local-first, and designed around pluggable LLM backends rather than a single vendor.

---

## Key Features

### Automatic Context at Startup
When a new session begins, callmem generates a structured briefing with context economics, an emoji-coded observation timeline, and a session summary. This is written to `SESSION_SUMMARY.md` in your project root, where your coding agent picks it up automatically — no manual context management needed.

### Real-Time Capture and Extraction
During the session, callmem ingests prompts, responses, tool calls, and file changes from your coding agent — either via **OpenCode's SSE stream** or by **tailing Claude Code's JSONL transcripts** at `~/.claude/projects/<slug>/*.jsonl`. Both run concurrently, so a project using both agents sees unified history. A background worker runs entity extraction through your local LLM, pulling out decisions, facts, TODOs, bugs, features, and discoveries. New observations appear in the web UI within milliseconds via Server-Sent Events.

### Layered Compression
Raw events are compressed through multiple layers to keep token usage in check:
- **Entity extraction** — structured knowledge pulled from raw conversation
- **Chunk summaries** — rolling mid-session compression every N events
- **Session summaries** — structured wrap-up when a session ends (Investigated / Learned / Completed / Next Steps)
- **Cross-session summaries** — periodic project-level rollups
- **Compaction** — old events archived to keep the database lean

### Dual Content Views
Each observation has two representations:
- **Key Points** — bullet-point summary (~50-100 tokens), cheap for context injection
- **Synopsis** — flowing prose paragraph (~200-400 tokens), loaded on demand

### Pluggable LLM Backend
callmem doesn't care which model you use for coding. It uses a separate local model for memory maintenance:
- **Ollama** (recommended) — fully local, zero API cost, works offline
- **OpenAI-compatible** — any `/v1/chat/completions` API (LM Studio, vLLM, etc.)
- **None** — pattern-only mode when no LLM is available

The extraction model can be swapped at any time. Change `[ollama].model` (or `[openai_compat].model`) in `config.toml` and restart the daemon — previously extracted entities remain valid and fully searchable, and new events get extracted with whichever model is configured now. Tested upgrades: `qwen3:8b` ↔ `qwen3:30b` ↔ `gemma4:e4b`. You can also flip `[llm].backend` between `ollama`, `openai_compat`, and `none` without touching the database.

### Web UI
A local web interface for browsing and managing memories:
- Card-based feed with colour-coded category badges
- Expandable cards with Key Points / Synopsis toggle
- Real-time updates via SSE
- Full-text search across all entities
- Session browser with event timeline
- Briefing preview showing exactly what the agent sees
- Accessible from your Tailscale network (default bind `0.0.0.0`)

### Sensitive Data Handling
Two-layer detection (pattern matching + LLM classification) catches secrets, credentials, and PII at ingest time. Detected items are redacted from memory and stored in an encrypted vault with configurable false-positive management.

### MCP Integration
Exposes tools via the Model Context Protocol so compatible agents can query memory on demand:
- `search` — full-text search across all observations
- `get_briefing` — generate a startup briefing
- `search_by_file` — find observations related to specific files
- `timeline` — chronological context around an observation

---

## Requirements

- **Python 3.10+**
- **Ollama** with a summarisation model (recommended: `qwen3:8b` or `qwen3:30b`)
- **Linux** (tested on Ubuntu/Debian x86_64 and ARM64)
- **SQLite 3.35+** (FTS5 support required)

### Tested Environments

| Environment | Status |
|---|---|
| Ubuntu 24.04, x86_64 (Hetzner VPS) | Tested |
| Ubuntu 24.04, ARM64 (Hetzner CAX31) | Tested |
| Ollama with qwen3:8b | Tested |
| Ollama with qwen3:30b | Tested |
| Ollama with gemma4:e4b | Tested |
| OpenCode as coding agent | Tested |
| Claude Code as coding agent | Tested |
| Running both OpenCode and Claude Code against one project | Tested |

macOS and Windows are untested but should work anywhere Python and Ollama run. The systemd service integration is Linux-only.

---

## Quick Start

### 1. Prerequisites

```bash
# Install Ollama (if not already installed)
curl -fsSL https://ollama.com/install.sh | sh

# Pull a summarisation model (any instruction-following model works)
ollama pull qwen3:8b
```

### 2. Install callmem

```bash
# Install with pip (recommended)
pip install callmem

# Or with uv
uv pip install callmem

# Or install from source for development
git clone https://github.com/callmem/callmem.git
cd callmem
uv sync --extra dev
```

### 3. Run the Setup Wizard

```bash
uv run callmem setup
```

The wizard walks you through:
- Choosing your LLM backend and model
- Setting the web UI port (with multi-project conflict detection)
- Configuring network bind address
- Picking your coding tools (OpenCode / Claude Code / both) — writes `opencode.json` and/or `.mcp.json` as needed, and patches `AGENTS.md` / `CLAUDE.md` with MCP usage instructions
- Importing existing OpenCode sessions from SQLite
- Optionally installing a systemd user service for auto-start

The setup script is safe to re-run — it reconfigures without wiping data and backs up your `config.toml` before changes.

### 4. Start the Daemon

```bash
# All-in-one: web UI + background workers + OpenCode SSE adapter
# + Claude Code JSONL tailer. Each adapter is independently gated
# by [adapters].opencode / [adapters].claude_code in config.toml.
uv run callmem daemon

# Or via make
make daemon

# Or via systemd (if installed during setup)
make start
```

**Restarting after an upgrade or config change.** The systemd service holds the old code in memory, so `git pull` / `pip install -e .` / `config.toml` edits only take effect after a restart:

```bash
make restart          # this project (preferred)
make logs             # tail journalctl for the service

# Or raw systemctl — the unit is named callmem-<project-dir-name>:
systemctl --user restart callmem-<project>.service
systemctl --user status  callmem-<project>.service
```

Each project that ran `callmem setup` with systemd enabled has its own unit, so restart the one matching your project directory.

### 5. Configure Your Coding Agent

`callmem setup` and `callmem init` both write the right config file(s) automatically. If you prefer to wire it up by hand:

**OpenCode** — add to `opencode.json` in your project:

```json
{
  "mcp": {
    "callmem": {
      "type": "local",
      "command": ["python3", "-m", "callmem.mcp.server", "--project", "."],
      "enabled": true
    }
  }
}
```

**Claude Code** — add to `.mcp.json` in your project (note the split `command`/`args`):

```json
{
  "mcpServers": {
    "callmem": {
      "command": "python3",
      "args": ["-m", "callmem.mcp.server", "--project", "."]
    }
  }
}
```

Both can coexist; they share the same SQLite database.

Once configured, you can type `/briefing` in OpenCode to display the startup briefing on demand. In Claude Code, ask the agent to call the `mem_get_briefing` MCP tool, or read `SESSION_SUMMARY.md` directly.

### 6. Open the Web UI

Navigate to `http://localhost:9090` (or your configured host:port).

---

## How It Works

```
┌─────────────────┐   SSE        ┌─────────────────┐   Extract      ┌──────────────┐
│  OpenCode       │ ──────────▶  │   callmem       │ ─────────────▶ │  Local LLM   │
└─────────────────┘              │   Adapters      │                │  (Ollama or  │
┌─────────────────┐   JSONL      │  (opencode +    │                │  OpenAI-like)│
│  Claude Code    │ ──────────▶  │   claude_code)  │                └──────────────┘
└─────────────────┘              └────────┬────────┘
                                          │
                                          ▼
                                 ┌─────────────────┐
                                 │   SQLite DB     │
                                 │   + FTS5        │
                                 └────────┬────────┘
                                          │
                              ┌───────────┼───────────┐
                              ▼           ▼           ▼
                        ┌──────────┐ ┌──────────┐ ┌──────────┐
                        │ Web UI   │ │ MCP      │ │ Briefing │
                        │ Feed     │ │ Server   │ │ Writer   │
                        └──────────┘ └──────────┘ └──────────┘
```

1. **Capture**: The OpenCode adapter subscribes to its SSE stream; the Claude Code adapter tails each transcript file under `~/.claude/projects/<slug>/` using a persistent byte-offset, so a restart resumes mid-file instead of replaying or dropping records. Both run inside the same daemon process.
2. **Extract**: A background worker sends event batches to your local LLM for entity extraction — decisions, facts, TODOs, bugs, features, discoveries. Switching the extraction model later does not invalidate past entities.
3. **Compress**: Summaries are generated at chunk, session, and cross-session levels.
4. **Serve**: The briefing writer generates `SESSION_SUMMARY.md` in your project root; the MCP server responds to on-demand queries from whichever agent you're using; the web UI shows everything in real-time.

---

## CLI Reference

```bash
callmem setup              # Interactive setup wizard
callmem daemon             # Start UI + workers + adapter in one process
callmem ui                 # Start web UI only
callmem serve              # Start MCP server only
callmem import --source opencode     --all  # Import OpenCode sessions from SQLite
callmem import --source claude-code  --all  # Import Claude Code transcripts (JSONL)
callmem import --status                     # Show current/last import progress
callmem status             # Show service status
callmem search <query>     # Search memories from the command line
callmem briefing           # Generate and print a briefing
callmem briefing --write   # Write briefing to SESSION_SUMMARY.md

make restart               # Restart the systemd unit for this project
make logs                  # Tail journalctl for the service
```

---

## Entity Categories

callmem extracts these observation types, each with a colour-coded badge in the UI:

| Category | Icon | Description |
|----------|------|-------------|
| Feature | 🟢 | New functionality added |
| Bugfix | 🔴 | Bug identified and/or fixed |
| Discovery | 🔵 | Notable insight or finding |
| Decision | ⚖️ | Architectural or design choice |
| Todo | 📋 | Task to be done |
| Fact | 📝 | Durable project knowledge |
| Failure | ❌ | Error or failure encountered |
| Research | 🔬 | Investigation or analysis |
| Change | 🔄 | General code or file change |

---

## Configuration

All settings live in `.callmem/config.toml` in your project root. Key options:

```toml
[llm]
backend = "ollama"               # ollama | openai_compat | none
model = "qwen3:8b"
api_base = "http://localhost:11434"

[ui]
host = "0.0.0.0"                 # Bind address (0.0.0.0 for Tailscale access)
port = 9090

[adapters]
opencode = true                  # Listen on OpenCode's SSE stream
claude_code = true               # Tail ~/.claude/projects/<slug>/*.jsonl
claude_code_poll_interval = 2.0  # seconds between disk scans
claude_code_idle_timeout = 300   # seconds before an idle CC session is closed

[briefing]
max_tokens = 2000                # Token budget for startup briefing
auto_write_session_summary = true
session_summary_filename = "SESSION_SUMMARY.md"

[extraction]
batch_size = 10                  # Events per extraction batch
```

See [docs/config.md](docs/config.md) for the full reference.

---

## Why SQLite, Not a Vector DB?

Most coding memory retrieval is structured:
- "What decisions did we make about auth?"
- "What TODOs are still open?"
- "What happened in the last 3 sessions?"

These are better served by structured tables + full-text search (FTS5) than embedding similarity. SQLite is zero-dependency, single-file, trivially backed up, and fast enough for tens of thousands of memories.

Vector embeddings are a planned enhancement for semantic retrieval when keyword search isn't enough. The schema is designed to accommodate them without migration pain.

---

## Project Structure

```
callmem/
├── src/callmem/
│   ├── adapters/       # OpenCode SSE + import, Claude Code live tailer + import
│   ├── core/           # Engine, extraction, briefing, compression, event bus
│   ├── mcp/            # MCP server and tool definitions
│   ├── models/         # Data models, config, entity types
│   └── ui/             # Web UI (FastAPI + Jinja2 + htmx + SSE)
├── tests/              # 630+ tests (unit + integration)
├── docs/               # Architecture, schema, config, roadmap docs
└── pyproject.toml
```

---

## Development

```bash
# Install with dev dependencies
uv sync --extra dev

# Run tests
make test

# Run linter
make lint

# Run both
make check
```

---

## Roadmap

See [docs/roadmap.md](docs/roadmap.md) for the full plan. Highlights:

- **Progressive disclosure search** — 3-layer MCP search pattern (index → timeline → full details)
- **File-level tracking** — associate observations with specific files
- **Knowledge agents** — build queryable corpora from observation history
- **Settings panel** — web UI for config with live briefing preview
- **Vector search** — optional semantic retrieval via sentence-transformers

---

## Acknowledgements

callmem was inspired by [claude-mem](https://github.com/anthropics/claude-mem) by Alex Newman. We share the same goal — giving coding agents persistent memory — but callmem is built from scratch with a focus on model-agnostic operation, local-first architecture, and pluggable LLM backends.

---

## Known Issues

### Auto-briefing plugin does not trigger on session start
An OpenCode plugin (`.opencode/plugins/auto-briefing.js`) is installed during setup that should auto-display the briefing when a new session starts. However, due to an [upstream OpenCode bug](https://github.com/anomalyco/opencode/issues/14808) where `session.created` events do not fire for plugins, this does not currently work. Use the `/briefing` command in OpenCode as a workaround. The plugin will activate automatically when the bug is fixed upstream — no changes needed.

### Claude Code: tool results and thinking blocks are not ingested
The Claude Code adapter maps user prompts, assistant text, and `tool_use` blocks into the memory feed. `tool_result` blocks (system-side responses to tool calls) and `thinking` blocks are skipped in the current release to keep signal-to-noise high. A follow-up will revisit this — until then, a tool call appears in the feed but its outcome does not.

### Python 3.10 compatibility shims
callmem supports Python 3.10+ but requires `tomli` (backport of `tomllib`) and `typing_extensions` on Python 3.10. These are installed automatically as conditional dependencies. On Python 3.11+, the stdlib equivalents are used.

---

## License

[MIT](LICENSE)
