Metadata-Version: 2.4
Name: mcp-code-index
Version: 0.2.0
Summary: SQLite-backed code index for Claude Code, exposed via MCP
Project-URL: Homepage, https://github.com/achreftlili/code-index
Project-URL: Repository, https://github.com/achreftlili/code-index
Project-URL: Issues, https://github.com/achreftlili/code-index/issues
Author: Achref Tlili
License-Expression: MIT
License-File: LICENSE
Requires-Python: >=3.10
Requires-Dist: click>=8.1.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: mcp>=1.0.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: pathspec>=0.12.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: sqlite-vec>=0.1.0
Requires-Dist: tree-sitter-go>=0.23.0
Requires-Dist: tree-sitter-python>=0.23.0
Requires-Dist: tree-sitter-rust>=0.23.0
Requires-Dist: tree-sitter-typescript>=0.23.0
Requires-Dist: tree-sitter>=0.23.0
Requires-Dist: watchdog>=4.0.0
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.5.0; extra == 'dev'
Description-Content-Type: text/markdown

# code-index

<!-- mcp-name: io.github.achreftlili/code-index -->

A SQLite-backed code index for Claude Code, exposed via MCP. Replaces exploratory
`Read`/`Grep`/`Glob` calls with targeted retrieval.

## What it does

- **Parses** your repo with tree-sitter (Python, TypeScript/JavaScript, Go, Rust).
- **Chunks** per symbol; expands identifiers (`getUserAuthToken` → `get user auth token`).
- **Embeds** with Jina `jina-embeddings-v2-base-code` (default, 768d) or local Ollama.
- **Stores** symbols, chunks, vectors, and call/import edges in `.claude/index.db`.
- **Serves** retrieval over MCP — 8 retrieval tools + 2 admin tools (see below).
- **Auto-updates** via a Claude Code `PostToolUse` hook and an optional file watcher.

## Tools

| Tool              | Purpose                                                                                                  |
| ----------------- | -------------------------------------------------------------------------------------------------------- |
| `init`            | Build or refresh the project's index. Incremental by default; `force=true` rebuilds from scratch.        |
| `setup_check`     | Check whether the auto-reindex hook is wired in `.claude/settings.json`. Returns install instructions if not. |
| `code_search`     | Hybrid (vector + FTS) search for **conceptual** queries (e.g., "auth flow", "where do we parse JSON").   |
| `symbol_lookup`   | Exact-name lookup of functions / classes / methods / types. Prefer over `code_search` for identifiers.   |
| `file_outline`    | Symbols (with signatures) in a file, in source order. Use instead of `Read` when you only need shape.    |
| `get_symbol_body` | Full chunk for a `symbol_id` returned by `symbol_lookup` or `code_search`.                               |
| `callers`         | Symbols that CALL the given symbol. `depth` (1-5) expands transitively.                                  |
| `callees`         | Symbols that the given symbol CALLS. `depth` (1-5) expands transitively.                                 |
| `dependents`      | Files that import the given file.                                                                        |
| `dependencies`    | Files that the given file imports.                                                                       |

All tools return bounded JSON; large bodies use `get_symbol_body` rather than
inlining whole files.

## Requirements

Python with **loadable SQLite extension support** (required by `sqlite-vec`).
Python 3.13 has this enabled by default. For 3.10–3.12, use either:
- the python.org installer, or
- pyenv: `PYTHON_CONFIGURE_OPTS=--enable-loadable-sqlite-extensions pyenv install 3.12.x`

## Install

### In Claude Code (primary)

> **Preview note**: the commands below include `uvx --refresh` so the latest
> published version is fetched from PyPI on every Claude Code launch. If you
> already installed without it, run `claude mcp remove code-index` first and
> then re-run the install command. Drop `--refresh` once you want to pin to a
> stable version (cuts ~1s off startup).

One command, env baked in. Replace `jina_REPLACE-ME` with your Jina API key
([get one here](https://jina.ai/embeddings)), or use the Ollama snippet below:

```bash
claude mcp add-json -s user code-index "$(cat <<'JSON'
{
  "type": "stdio",
  "command": "uvx",
  "args": ["--refresh", "--from", "mcp-code-index", "code-index-mcp"],
  "env": { "JINA_API_KEY": "jina_REPLACE-ME" }
}
JSON
)"
```

> Drop `-s user` to register only in the current project (writes to
> `.claude/settings.json` instead of `~/.claude.json`).

#### Local Ollama instead of Jina

```bash
claude mcp add-json -s user code-index "$(cat <<'JSON'
{
  "type": "stdio",
  "command": "uvx",
  "args": ["--refresh", "--from", "mcp-code-index", "code-index-mcp"],
  "env": {
    "CODE_INDEX_EMBEDDER": "ollama",
    "OLLAMA_URL": "http://localhost:11434"
  }
}
JSON
)"
```

The `setup_check` MCP tool will report `embedder_ready: false` if the key
is missing or the backend is misconfigured, so a misconfigured install is
self-diagnosing on first use.

That's it. Open Claude Code in any repo and ask:

> _"Build the code index for this repo."_

Claude calls the `init` MCP tool, which writes `.claude/index.db` for that
project. Subsequent prompts can use `code_search`, `symbol_lookup`, `callers`,
etc. — see **Tools** above for the full surface.

#### Or, with a permanent install (no uvx)

```bash
pip install mcp-code-index
claude mcp add -s user code-index -- code-index-mcp
```

#### Optional: keep the index live as you edit

Without a hook, the index drifts when files change outside the agent (`mv`,
`git checkout`, IDE saves) until you call `init` again. With one, every
`Edit` / `Write` / `MultiEdit` Claude performs triggers an incremental reindex
of the touched file.

**Easiest path: ask Claude.** On first use in a new project, ask _"is the
auto-reindex hook installed?"_ — Claude calls `setup_check`, sees the gap, and
offers to wire it up.

**Manual install** — add this block to the project's `.claude/settings.json`
under `hooks.PostToolUse`:

```json
{
  "matcher": "Edit|Write|MultiEdit",
  "hooks": [
    {
      "type": "command",
      "command": "python -c 'import json,os,subprocess,sys; p=json.load(sys.stdin) if not sys.stdin.isatty() else {}; fp=(p.get(\"tool_input\") or {}).get(\"file_path\"); cwd=p.get(\"cwd\") or os.getcwd(); fp and subprocess.Popen([\"code-index\",\"--root\",cwd,\"reindex\",\"--file\",fp],stdout=subprocess.DEVNULL,stderr=subprocess.DEVNULL,stdin=subprocess.DEVNULL,cwd=cwd,start_new_session=True)'"
    }
  ]
}
```

### In other MCP-compatible agents

The server speaks standard MCP over stdio, so any client that supports MCP
servers works (Cursor, Continue, Cody, Zed, etc.). Configure the client to
launch `uvx --refresh --from mcp-code-index code-index-mcp` (or
`code-index-mcp` after `pip install mcp-code-index`). Once connected, call the
`init` tool from inside the client to bootstrap the index. Drop `--refresh`
when you want to pin to a stable version instead of always pulling latest.

### From source (development)

```bash
git clone https://github.com/achreftlili/code-index
cd code-index
pip install -e .
code-index init        # CLI alternative to the `init` MCP tool
code-index-mcp         # starts the MCP server on stdio (for manual wiring)
```

## Configuration

Environment variables:

| Var | Default | Notes |
|---|---|---|
| `CODE_INDEX_DB` | `.claude/index.db` | SQLite path. |
| `CODE_INDEX_EMBEDDER` | `jina` | `jina` or `ollama`. |
| `CODE_INDEX_EMBED_MODEL` | `jina-embeddings-v2-base-code` | Model name. |
| `CODE_INDEX_EMBED_DIM` | `768` | Must match the model. |
| `JINA_API_KEY` | — | Required when `CODE_INDEX_EMBEDDER=jina`. Get one at https://jina.ai/embeddings. |
| `OLLAMA_URL` | `http://localhost:11434` | Ollama server. |

## Layout

```
src/code_index/
  db.py           SQLite schema, connection, sqlite-vec loading
  parser.py       Tree-sitter wrapper, symbol + edge extraction
  chunker.py      Per-symbol chunks, identifier expansion
  embedder.py     Jina / Ollama backends
  indexer.py      Pipeline: walk → parse → chunk → embed → write
  retriever.py    Hybrid search (vector + FTS5) with RRF
  watcher.py      File watcher (watchdog)
  mcp_server.py   10 MCP tools (8 retrieval + init/setup_check admin)
  cli.py          init / reindex / watch / stats
```
