Metadata-Version: 2.4
Name: snowpack
Version: 0.1.10
Summary: Local-first agent memory for Claude Code: episodic + semantic memory in one SQLite file.
Author: David Kelly
Keywords: agent,claude-code,llm,local-first,memory,sqlite
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development
Requires-Python: >=3.11
Requires-Dist: httpx>=0.27
Requires-Dist: sqlite-vec>=0.1.6
Requires-Dist: typer>=0.12
Description-Content-Type: text/markdown

# snowpack

> The snowpack is the season's memory — every storm recorded as a layer.

Local-first agent memory for Claude Code. Snowpack ingests Claude Code session
transcripts into **episodic memory** (what happened across sessions) and
**semantic memory** (durable facts, entities, relationships), all in a single
SQLite file with vector + keyword search. The agent reaches it through an
ordinary CLI — no MCP server, no daemon, no infrastructure.

## Status

Core pipeline implemented (episodic + semantic memory, hybrid retrieval,
telemetry, distillation). See `docs/adr/ADR-001-memory-architecture.md` for
the architecture and decision record, `docs/hooks.md` for ingestion hook
setup, and `docs/claude-md-snippet.md` for the agent-facing usage docs.

## Quick start

```bash
# 1. Install
pip install snowpack       # or: uv tool install snowpack

# 2. Wire everything up (idempotent, re-runnable, prompts before writing)
snowpack setup
```

`snowpack setup` checks Ollama (printing install/pull commands if it's down —
it's a soft requirement, see "Embeddings" below), creates
`~/.snowpack/snowpack.db`, merges the ingestion, compaction-survival, and session-orientation hooks
into `~/.claude/settings.json` (timestamped backup first), installs the memory
snippet into `~/.claude/CLAUDE.md` between managed markers, and adds the
`snowpack` permission allowlist. `--dry-run` shows the diffs first,
`--check` is a doctor that audits every integration point, and `--uninstall`
removes exactly what setup added.

```bash
# 3. Use it
snowpack probe "auth decisions"   # hybrid retrieval (vector + keyword + recency)
```

(Ingestion runs out-of-band via the installed hooks; `snowpack obs ingest`
also works manually.)

## Embeddings: Ollama setup and choosing a model

Vector search needs an embedding model — by default a local one served by
[Ollama](https://ollama.com). It is a soft requirement: without it snowpack
still works in **vectorless mode** — ingest stores episodes un-embedded,
probe degrades to keyword + graph + recency search, and the next ingest
after the provider comes up backfills the missing vectors automatically.

Where local models aren't allowed (e.g. a workplace that can't sandbox
them), `snowpack init --provider openai-compatible` targets any hosted or
gateway `/embeddings` endpoint instead: set `SNOWPACK_EMBEDDING_BASE_URL`
and, for non-localhost endpoints, an API key
(`SNOWPACK_EMBEDDING_API_KEY` or `OPENAI_API_KEY`). Or skip embeddings
entirely and run vectorless — `snowpack setup --check` reports which mode
you're in.

### Install and run Ollama

```bash
# macOS
brew install ollama        # or download the app from https://ollama.com

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# start the server (the desktop app does this automatically)
ollama serve

# fetch the default embedding model (~270 MB)
ollama pull nomic-embed-text
```

**Prefer it sandboxed?** A hardened Docker setup (localhost-only API,
dropped capabilities, isolated model storage) ships in
`docker/docker-compose.yml`:

```bash
docker compose -f docker/docker-compose.yml up -d
docker compose -f docker/docker-compose.yml exec ollama ollama pull nomic-embed-text
```

See `docs/ollama-docker.md` for GPU setup and the macOS caveat (containers
can't use Apple Silicon's GPU — native Ollama is faster there).

Verify it's answering:

```bash
curl -s http://localhost:11434/api/embed \
  -d '{"model": "nomic-embed-text", "input": ["hello"]}' | head -c 120
```

If Ollama runs somewhere other than `localhost:11434` (a container, another
machine), point snowpack at it with `SNOWPACK_OLLAMA_URL`:

```bash
export SNOWPACK_OLLAMA_URL=http://gpu-box:11434
```

### Choosing the embedding model

The model is fixed **per database** at `snowpack init`, because the vector
tables are created with that model's output dimension (vec0 columns are
fixed-width):

```bash
snowpack init                                  # nomic-embed-text (768-d)
snowpack init --model mxbai-embed-large       # higher quality, 1024-d
snowpack init --model all-minilm              # smaller/faster, 384-d
```

You normally don't pass `--dim`: init asks the running Ollama what dimension
the model actually produces (and refuses a `--dim` that contradicts it). If
Ollama isn't running, init falls back to a built-in table for common models
(`nomic-embed-text`, `mxbai-embed-large`, `all-minilm`,
`snowflake-arctic-embed`, `bge-m3`) — for anything else, either start Ollama
first or pass `--dim` explicitly.

The configured model, dimension, and task prefixes are recorded in the
database (`meta` table) and used for every subsequent embed, so you never
specify the model again after init — `obs ingest` and `probe` read it from
the database. To see what a database was initialized with:

```bash
sqlite3 ~/.snowpack/snowpack.db "SELECT * FROM meta"
```

**Changing models later** is one command — it re-embeds everything in place
with zero loss of episodes, facts, or telemetry:

```bash
snowpack reindex --model all-minilm
```

The new model must be live (reindex re-embeds with it, so there's no
offline fallback). The database file is backed up first and `probe` keeps
working throughout; if the run is interrupted, rerun with `--resume` to
continue from where it stopped.

**Upgrading snowpack** across a schema change is also one command: when a
new version needs a newer database schema, every command refuses with a
pointer to `snowpack migrate`, which backs up the file and applies the
pending migrations.

## CLI surface

| Command | Purpose |
|---|---|
| `snowpack setup` | One-command onboarding: hooks, CLAUDE.md, permissions, db (`--check` doctor, `--dry-run`, `--uninstall`) |
| `snowpack init` | Create and configure the database |
| `snowpack config` | Persist provider defaults (extraction endpoint/model, Ollama URL) in the db — no env vars needed (`list`, `set`, `unset`) |
| `snowpack obs ingest` | Ingest new transcript exchanges (incremental, idempotent) |
| `snowpack obs extract` | Extract durable facts from episodes (API-assisted) |
| `snowpack obs list` | List recent episodes |
| `snowpack probe "query"` | Hybrid retrieval (vector + keyword + graph + recency) with telemetry |
| `snowpack feedback` | Mark retrieved memories as used — trains ranking |
| `snowpack stash` | Working-memory checkpoint per project |
| `snowpack resume` | Re-injection payload for SessionStart hooks (compaction survival) |
| `snowpack redact` | Retroactive secret scan/rewrite over stored memory (`--scan`, `--apply`) |
| `snowpack stats` | Telemetry overview; `--refresh` recomputes usefulness |
| `snowpack sinter` | Mine repeated corrections into CLAUDE.md candidates |
| `snowpack prune` | Telemetry-nominated pruning: `candidates`, then audited soft `archive`/`keep`/`restore`, `log` |
| `snowpack entity merge` | Point a duplicate entity at its canonical form |
| `snowpack reindex` | Switch embedding models: re-embed and swap in place (`--resume`) |
| `snowpack migrate` | Upgrade the database schema after a snowpack upgrade (backup first) |
| `snowpack pit` | Local web UI: entity graph + telemetry dashboard |

## Privacy: secret redaction

Transcripts carry whatever your tools printed — env dumps, tokens, connection
strings. Snowpack redacts known secret shapes (AWS keys, GitHub tokens, JWTs,
PEM blocks, URL credentials, `password = …` assignments, and more) **at
ingest**, before content is hashed, embedded, or indexed; stash writes get the
same pass. Hits become `[redacted:<type>]` markers, the ingest report counts
them, and `snowpack stats` shows the lifetime total. For data stored before
redaction existed, `snowpack redact --scan` reports hits and
`snowpack redact --apply` rewrites them in place (database backup first).

This is best-effort, known-shape detection — defense in depth, not a
guarantee. Custom patterns and an allowlist for documented example keys live
in `~/.snowpack/redaction.toml`; see `docs/redaction.md`.

## Pruning: telemetry nominates, the agent decides

Memory accumulates; not all of it stays worth retrieving. No decay formula
can safely tell a dead memory from a rarely-needed-but-load-bearing one, so
snowpack splits the job (ADR-003 D7): `snowpack prune candidates --json`
nominates from telemetry — dead facts (never retrieved, >30 days), weak
layers (retrieved ≥5×, never used), closed supersession chains, stale
episodes (>90 days, never retrieved, provenance-guarded) — each with its
evidence, and the consuming agent (or you) reads and judges each one.
Decisions are explicit and audited: `prune archive <ids> --reason "…"` is a
soft delete that hides the memory from every retrieval channel,
`prune keep <ids> --reason "…"` records a survivor and suppresses
re-nomination for 90 days, `prune restore` reverses any archive intact, and
`prune log` shows the full trail. Nothing here hard-deletes — that's a
later, mechanical GC pass over already-archived rows only.

## The pit (web UI)

```bash
snowpack pit            # serves http://127.0.0.1:8617 and opens the browser
```

A read-only, single-page UI over the same SQLite file (no extra dependencies,
no build step; the graph library is vendored so it works offline):

- **Graph tab** — entities as nodes, facts as edges. Visual weights are real
  telemetry, not decoration: node size = usage, edge width = retrieval
  frequency, color = staleness, and **dead gray = never retrieved** — your
  pruning candidates at a glance. Click through node → fact → provenance
  episode; toggle superseded facts; search to highlight.
- **Stats tab** — totals, retrieval latency, channel win-rate (vector vs
  keyword vs graph — how to rebalance fusion weights), zero-result queries
  (gap detection), most/least-used facts, persistent weak layers, and recent
  retrievals expandable to per-result channels/scores/used flags.

The server binds 127.0.0.1 only and never mutates user data (the one write is
recomputing derived usefulness scores on demand). Full guide — including how
to read the visual encoding and troubleshooting — in `docs/pit.md`; stack
decisions in `docs/adr/ADR-002-pit-ui.md`.

## Documentation map

- `docs/adr/` — architecture decision records (ADR-001 core, ADR-002 pit UI,
  ADR-003 pre-Phase-2 hardening program, ADR-004 spend visibility
  and cost controls)
- `docs/plans/` — point-in-time implementation plans approved before each
  build round, with outcomes
- `docs/pit.md` — running and reading the pit UI
- `docs/redaction.md` — secret redaction: built-in patterns, `redaction.toml`,
  retroactive cleanup
- `docs/hooks.md` — out-of-band ingestion hooks
- `docs/ollama-docker.md` — sandboxed Ollama
- `docs/claude-md-snippet.md` — agent-facing usage docs for CLAUDE.md
- `docs/skill-memory-maintenance.md` — the pruning loop as candidate skill
  text for an agent
- `docs/releasing.md` — publishing wheels to PyPI (trusted publishing)

## Roadmap

The agent-memory market is crowded with cloud-first offerings (Mem0, Zep,
Letta). Snowpack takes the opposite entry: a **local-first core that syncs
up** when you want it to — local-first is the foundation later phases build
on, not a stage to discard.

1. **Phase 1 — local dev tool (now).** Everything in this repo: single SQLite
   file, CLI + hooks integration, telemetry from day one. Goal: prove
   retrieval quality and accumulate the usage data later tuning depends on.
2. **Phase 2 — local-first + sync.** The SQLite file stays the on-device
   source of truth; optional sync to a hosted backend adds multi-device use,
   backup, and selective team sharing. The integration surface broadens
   beyond Claude Code: MCP server plus a language-agnostic SDK/HTTP API.
3. **Phase 3 — hosted platform.** A managed, multi-tenant memory service
   covering all four memory types (episodic, semantic, working, procedural).
   **Self-hosting stays a first-class path.**

Full reasoning, the fixed-vs-provisional decision table, and migration risks
live in `docs/adr/ADR-001-memory-architecture.md` ("Phasing & evolution").

Fact extraction defaults to Anthropic's OpenAI-compatible endpoint and
needs an API key (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, or
`SNOWPACK_EXTRACTION_API_KEY`); override with
`SNOWPACK_EXTRACTION_BASE_URL` / `_MODEL` (localhost endpoints like
Ollama's `/v1` need no key). Keys are read from the environment only and
never stored.

**No API key at all?** Extraction falls back automatically to Claude Code
itself (`claude -p`, headless) under your existing subscription login — no
key, no new model runtime — announcing the fallback when it happens
(`SNOWPACK_EXTRACTION_PROVIDER=claude-code` forces it;
`=openai-compatible` forbids it). It consumes your subscription usage, so
runs report tokens/cost and accept `--token-budget` / `--cost-budget`
stops alongside `--limit`; `snowpack stats` shows lifetime extraction
spend.

## Development

```bash
uv sync
uv run pytest
uv run ruff check
```

### Demo data

To try the full surface without real transcripts (and without touching
`~/.snowpack`), seed a sandboxed demo — synthetic transcripts for two fake
projects, pre-extracted facts (including a superseded pair), and probe
telemetry:

```bash
uv run python scripts/seed_demo.py        # creates ~/.snowpack-demo
export SNOWPACK_DB=~/.snowpack-demo/snowpack.db
export SNOWPACK_CLAUDE_PROJECTS=~/.snowpack-demo/projects
snowpack probe "what did we decide about auth" --all-projects
snowpack stats
snowpack pit
```

It works without Ollama (probe degrades to keyword+recency, exactly as in
real use); with Ollama running the same script embeds everything.
