Metadata-Version: 2.4
Name: claude-kb
Version: 0.8.2
Summary: Universal knowledge base with Qdrant for Claude Code integration
Project-URL: Homepage, https://github.com/tenequm/claude-kb
Project-URL: Repository, https://github.com/tenequm/claude-kb
Project-URL: Issues, https://github.com/tenequm/claude-kb/issues
Author-email: Misha Kolesnik <misha@kolesnik.io>
License: MIT
License-File: LICENSE
Keywords: ai,claude,embeddings,knowledge-base,qdrant,semantic-search,vector-search
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Classifier: Topic :: Text Processing :: Indexing
Requires-Python: >=3.13
Requires-Dist: click<9,>=8.1.0
Requires-Dist: httpx<1,>=0.27.0
Requires-Dist: mcp<2,>=1.23.0
Requires-Dist: pydantic<3,>=2.0.0
Requires-Dist: python-dotenv<2,>=1.0.0
Requires-Dist: qdrant-client<2,>=1.11.0
Requires-Dist: rich<15,>=13.0.0
Requires-Dist: sentence-transformers<6,>=5.0.0
Requires-Dist: tiktoken<1,>=0.5.0
Requires-Dist: torch<3,>=2.0.0
Description-Content-Type: text/markdown

# Claude KB

[![standard-readme compliant](https://img.shields.io/badge/readme%20style-standard-brightgreen.svg?style=flat-square)](https://github.com/RichardLitt/standard-readme)
[![PyPI version](https://img.shields.io/pypi/v/claude-kb.svg?style=flat-square)](https://pypi.org/project/claude-kb/)
[![Python 3.13+](https://img.shields.io/badge/python-3.13+-blue.svg?style=flat-square)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg?style=flat-square)](LICENSE)

Hybrid semantic + keyword search over Claude Code conversation history, exposed as a CLI and an MCP server.

A personal research surface for retrieval-quality work over Claude Code conversation history. The aim is to make hybrid retrieval over a developer's own chat archive measurable and improvable, not to be production infrastructure. See [Retrieval Evaluation](#retrieval-evaluation) for the harness and the methodology used to assess it; see [Architecture](#architecture) for how the pieces fit together.

## Table of Contents

- [Background](#background)
- [Install](#install)
- [Usage](#usage)
- [Architecture](#architecture)
- [Retrieval Evaluation](#retrieval-evaluation)
- [Chunking](#chunking)
- [Configuration](#configuration)
- [Development](#development)
- [Security](#security)
- [API](#api)
- [Maintainer](#maintainer)
- [Contributing](#contributing)
- [License](#license)

## Background

Claude Code already records every session as JSONL under `~/.claude/projects/`. That archive grows fast and becomes hard to search with grep alone, especially across projects. Claude KB pipes that archive into a Qdrant collection with hybrid (dense + sparse) retrieval and exposes it back to Claude Code as an MCP server, so the agent can search its own history without leaving the editor.

Scope and non-goals:

- **Scope**: a measurable retrieval surface over one developer's local Claude Code archive. Optimised for a single laptop and a local Qdrant instance.
- **Non-goals**: multi-tenant deployment, hosted SaaS, ingesting non-Claude-Code corpora, replacing a general-purpose RAG framework.
- **Status**: alpha. The author uses it daily; assume rough edges and expect to read source.

## Install

Prerequisites: Python 3.13+, [uv](https://docs.astral.sh/uv/), Docker (for the local Qdrant instance), [Claude Code](https://claude.ai/code) (for the MCP integration).

```sh
# 1. Install the CLI
uv tool install claude-kb

# 2. Start a local Qdrant
docker compose up -d

# 3. Import your Claude Code conversation history
kb import-claude-code-chats

# 4. Register the MCP server with Claude Code
claude mcp add -s user kb -- kb mcp
```

After step 4, Claude Code has access to two tools, `kb_search` and `kb_get`, against your imported history. The first import re-embeds every message and may take several minutes; subsequent imports are incremental and only embed new messages.

### Updating

```sh
uv tool upgrade claude-kb
kb --version
```

## Usage

### CLI

```sh
kb search "recency boost implementation"
kb search "error handling" --project claude-kb --from 2026-01-01 --limit 5
kb get <message-uuid>
kb get-thread <message-uuid> --depth 3
kb status
kb ai           # LLM-optimized command schema
```

Full flag list per command: `kb <command> --help`.

### MCP

Once registered with `claude mcp add`, Claude Code can call:

- `kb_search(query, ...)` - hybrid search; optional filters for project, conversation, role, date range, score threshold; optional grouping by conversation.
- `kb_get(message_id | conversation_id, ...)` - retrieve a single message, a thread context, or restore a full conversation transcript.

Streamable HTTP transport is also supported:

```sh
kb mcp --transport http --port 3000
```

See [docs/mcp-api.md](docs/mcp-api.md) for the full schema reference.

## Architecture

```
~/.claude/projects/*/<session>.jsonl
    -> parse  (import_claude.py)
    -> classify content_type (prose/tool_use/tool_result/thinking/mixed)
    -> embed  dense BGE-base 768d
    -> Qdrant collection: conversations_hybrid
    -> retrieve  query_points(dense)
       + server-side filters: project, conversation_id, role, date range,
         primary_content_type (default-deny on tool_result + thinking)
       + score_threshold
    -> post-process  recency boost / compact / grouping
    -> CLI (kb ...)  |  MCP server (kb_search, kb_get)
```

One Qdrant point per Claude Code message; no sub-message chunking. Dense retrieval uses BAAI/bge-base-en-v1.5 (768d, L2-normalised). Recent messages are boosted post-retrieval with `+0.2 * exp(-age / 1 week)`. Tool-result and thinking blocks are excluded from search results by default - both are dominant noise sources in code-conversation corpora; users opt in via `include_tool_results=True` / `include_thinking=True` when needed.

The collection schema also reserves a sparse vector slot, but the production search path is dense-only. The eval ([docs/retrieval-experiments-2026-05.md](docs/retrieval-experiments-2026-05.md)) showed every hybrid configuration tested (BM25 fusion, bge-m3, Qwen3-Embedding-8B) regresses Recall@10 by 0.075-0.22 on this corpus shape; sparse vectors are stored only to keep the door open for future experiments.

Full diagram and per-stage notes: [docs/architecture.md](docs/architecture.md).

## Retrieval Evaluation

Measured on the maintainer's corpus (~690k messages, 20 hand-graded queries across five categories, conversation-level grading with cross-phrasing to defeat the selection bias of self-grading). At `--min-score 0.0`, k=10:

| Mode | Recall@10 | MRR@10 |
| --- | ---: | ---: |
| **dense-only, content-type filter on (default)** | **0.368** | **0.397** |
| dense-only, recency boost on | 0.368 | 0.440 |
| dense-only, filter off | 0.361 | 0.389 |
| hybrid (RRF of dense + BM25) | regressed -0.075 vs dense-only on the 28-query expanded test | — |
| sparse-only (BM25) | regressed -0.18 vs dense-only on the 28-query expanded test | — |

Five hybrid- and encoder-replacement experiments were tested across this work (BM25 hyperparameter tuning, RRF prefetch pool size, bge-m3 dense, bge-m3 sparse, Qwen3-Embedding-8B). All but one (RRF prefetch_factor=30, +0.024 MRR) regressed Recall@10 by 0.075-0.22 versus dense-only BGE-base. The corpus shape - short-form English code-conversation messages, ~47 words/doc median - is the constraint, not the encoder. Full table, methodology, and per-experiment failure analysis: [docs/retrieval-experiments-2026-05.md](docs/retrieval-experiments-2026-05.md).

A server-side filter excludes `tool_result` and `thinking` blocks from search results by default (via the `primary_content_type` payload tag). On the 20-query abstract-concept eval the filter's effect is within noise - the top-K is already prose-dominated and the filter has nothing to exclude. Its value is defensive: on the minority of queries where raw tool output or model thinking happens to score near the top-K boundary (error stack traces, "weighing options" intents), it removes that noise without forcing the caller to opt in. Per-query and per-category breakdowns: [docs/evaluation.md](docs/evaluation.md).

Harness: [`scripts/run_eval.py`](scripts/run_eval.py); query set: [`tests/eval/queries.jsonl`](tests/eval/queries.jsonl). The harness will not fabricate metrics; if queries are ungraded it prints `ungraded, N queries pending` and exits 0.

Adjacent measurements: MCP response token reduction (29% mean / 86% peak from compact mode, see `CHANGELOG.md`), restore-mode unit tests (`tests/test_search_service.py`), content-type classifier tests (`tests/test_content_type.py`).

## Chunking

One Claude Code message, one Qdrant point. No sub-message chunking. The choice is load-bearing for the rest of the design (point IDs are message UUIDs, `kb_get` round-trips with `kb_search`), and it accepts known tradeoffs (SPLADE input truncated to 8000 chars per message; long-form prose recall is weaker than a sliding-window approach would deliver).

Why this is the right unit, what we lose, alternatives considered, and when to revisit: [docs/chunking.md](docs/chunking.md).

## Configuration

Environment variables (or a `.env` file in the working directory):

| Variable | Default | Purpose |
| --- | --- | --- |
| `QDRANT_URL` | `http://localhost:6333` | Qdrant endpoint. Override for remote clusters. |
| `QDRANT_API_KEY` | unset | API key for Qdrant Cloud. |
| `EMBEDDING_MODEL` | `BAAI/bge-base-en-v1.5` | HuggingFace model name for the dense encoder. |

Apple Silicon (MPS), CUDA, and CPU are auto-detected by sentence-transformers. The dense encoder is the production retrieval signal; the collection schema reserves a sparse vector slot but the production search path does not query it (see [Retrieval Evaluation](#retrieval-evaluation) for why).

## Development

```sh
git clone https://github.com/tenequm/claude-kb.git
cd claude-kb
uv sync --extra dev
just check        # ty type-check + ruff lint + format
uv run pytest -q  # unit tests
```

Pre-commit is configured via `.pre-commit-config.yaml` (ruff, secrets scan, basic hygiene).

## Security

Vulnerability reporting policy: [SECURITY.md](SECURITY.md). The MCP server binds to `127.0.0.1` by default, queries a local Qdrant instance only, and exposes only read operations.

## API

The MCP server exposes two tools. Both are read-only, idempotent, and run entirely against a local Qdrant instance.

| Tool | Purpose | Key parameters |
| --- | --- | --- |
| `kb_search` | Hybrid semantic + keyword search across all imported messages. | `query`, `limit`, `project`, `conversation_id`, `role`, `from_date`, `to_date`, `min_score`, `boost_recent`, `group_by_conversation` |
| `kb_get` | Retrieve a single message, a thread context, or restore a full conversation transcript. | `message_id`, `conversation_id`, `up_to`, `context_depth`, `max_messages` |

Output models, filter application order, error modes, and non-obvious filter semantics are documented in [docs/mcp-api.md](docs/mcp-api.md). Pydantic models live in [`src/claude_kb/models.py`](src/claude_kb/models.py).

## Maintainer

Misha Kolesnik - [@tenequm](https://github.com/tenequm) - <misha@kolesnik.io>

## Contributing

Issues and PRs are welcome at <https://github.com/tenequm/claude-kb>. Commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`feat:`, `fix:`, `docs:`, `chore:`, `refactor:`, `test:`). Please run `just check` and `uv run pytest -q` before opening a PR.

## License

[MIT](LICENSE) (c) 2025 Misha Kolesnik
