Metadata-Version: 2.4
Name: semantic-code-mcp
Version: 0.2.0
Summary: Local MCP server for semantic code search
Requires-Python: >=3.14
Requires-Dist: lancedb>=0.27.0
Requires-Dist: mcp>=1.26.0
Requires-Dist: pandas>=3.0.0
Requires-Dist: pydantic-settings>=2.12.0
Requires-Dist: pydantic>=2.12.5
Requires-Dist: sentence-transformers>=5.2.0
Requires-Dist: structlog>=25.5.0
Requires-Dist: torch>=2.10.0
Requires-Dist: tree-sitter-python>=0.25.0
Requires-Dist: tree-sitter>=0.25.2
Description-Content-Type: text/markdown

# semantic-code-mcp

MCP server that provides semantic code search for Claude Code. Instead of iterative grep/glob, it indexes your codebase with embeddings and returns ranked results by meaning.

**Python only** for now — multi-language support (JS/TS, Rust, Go) is planned.

## How It Works

```
Claude Code ──(MCP/STDIO)──▶ semantic-code-mcp server
                                    │
                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
              AST Chunker      Embedder        LanceDB
             (tree-sitter)  (sentence-trans)  (vectors)
```

1. **Chunking** — tree-sitter parses Python into functions, classes, and methods
2. **Embedding** — sentence-transformers encodes each chunk (all-MiniLM-L6-v2, 384d)
3. **Storage** — vectors stored in LanceDB (embedded, like SQLite)
4. **Search** — hybrid semantic + keyword search with recency boosting

Indexing is incremental (mtime-based) and uses `git ls-files` for fast file discovery. The embedding model loads lazily on first query.

## Installation

### macOS / Windows

PyPI ships CPU-only torch on these platforms, so no extra flags are needed (~1.7GB install).

```bash
uvx semantic-code-mcp
```

**Claude Code integration:**

```bash
claude mcp add --scope user semantic-code -- uvx semantic-code-mcp
```

### Linux

> [!IMPORTANT]
> Without the `--index` flag, PyPI installs CUDA-bundled torch (~3.5GB). Unless you need GPU acceleration (you don't — embeddings run on CPU), use the command below to get the CPU-only build (~1.7GB).

```bash
uvx --index pytorch-cpu=https://download.pytorch.org/whl/cpu semantic-code-mcp
```

**Claude Code integration:**

```bash
claude mcp add --scope user semantic-code -- \
  uvx --index pytorch-cpu=https://download.pytorch.org/whl/cpu semantic-code-mcp
```

<details>
<summary>Claude Desktop / other MCP clients (JSON config)</summary>

```json
{
  "mcpServers": {
    "semantic-code": {
      "command": "uvx",
      "args": ["--index", "pytorch-cpu=https://download.pytorch.org/whl/cpu", "semantic-code-mcp"]
    }
  }
}
```

On macOS/Windows you can omit the `--index` and `pytorch-cpu` args.

</details>

## MCP Tools

### `search_code`

Search code by meaning, not just text matching. Auto-indexes on first search.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `query` | `str` | required | Natural language description of what you're looking for |
| `project_path` | `str` | required | Absolute path to the project root |
| `limit` | `int` | `10` | Maximum number of results |

Returns ranked results with `file_path`, `line_start`, `line_end`, `name`, `chunk_type`, `content`, and `score`.

### `index_codebase`

Index a codebase for semantic search. Only processes new and changed files unless `force=True`.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `project_path` | `str` | required | Absolute path to the project root |
| `force` | `bool` | `False` | Re-index all files regardless of changes |

### `index_status`

Check indexing status for a project.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `project_path` | `str` | required | Absolute path to the project root |

Returns `is_indexed`, `files_count`, and `chunks_count`.

## Configuration

All settings are environment variables with the `SEMANTIC_CODE_MCP_` prefix (via pydantic-settings):

| Variable | Default | Description |
|----------|---------|-------------|
| `SEMANTIC_CODE_MCP_CACHE_DIR` | `~/.cache/semantic-code-mcp` | Where indexes are stored |
| `SEMANTIC_CODE_MCP_LOCAL_INDEX` | `false` | Store index in `.semantic-code/` within each project |
| `SEMANTIC_CODE_MCP_EMBEDDING_MODEL` | `all-MiniLM-L6-v2` | Sentence-transformers model |
| `SEMANTIC_CODE_MCP_DEBUG` | `false` | Enable debug logging |
| `SEMANTIC_CODE_MCP_PROFILE` | `false` | Enable pyinstrument profiling |

Pass environment variables via the `env` field in your MCP config:

```json
{
  "mcpServers": {
    "semantic-code": {
      "command": "uvx",
      "args": ["semantic-code-mcp"],
      "env": {
        "SEMANTIC_CODE_MCP_DEBUG": "true",
        "SEMANTIC_CODE_MCP_LOCAL_INDEX": "true"
      }
    }
  }
}
```

Or with Claude Code CLI:

```bash
claude mcp add --scope user semantic-code \
  -e SEMANTIC_CODE_MCP_DEBUG=true \
  -e SEMANTIC_CODE_MCP_LOCAL_INDEX=true \
  -- uvx semantic-code-mcp
```

## Tech Stack

| Component | Choice | Rationale |
|-----------|--------|-----------|
| MCP Framework | FastMCP | Python decorators, STDIO transport |
| Embeddings | sentence-transformers | Local, no API costs, good quality |
| Vector Store | LanceDB | Embedded (like SQLite), no server needed |
| Chunking | tree-sitter | AST-based, respects code structure |

## Development

```bash
uv sync                            # Install dependencies
uv run python -m semantic_code_mcp # Run server
uv run pytest                      # Run tests
uv run ruff check src/             # Lint
uv run ruff format src/            # Format
```

Pre-commit hooks enforce linting, formatting, type-checking (`ty`), security scanning (`bandit`), and [Conventional Commits](https://www.conventionalcommits.org/).

### Releasing

Versions are derived from git tags automatically (`hatch-vcs`) — there's no hardcoded version in `pyproject.toml`.

```bash
git tag v0.2.0
git push origin v0.2.0
```

CI builds the package, publishes to PyPI, and creates a GitHub Release with auto-generated notes.

### Project Structure

- `docs/decisions/` — architecture decision records
- `TODO.md` — epics and planning
- `CHANGELOG.md` — completed work (Keep a Changelog format)
- `.claude/rules/` — context-specific coding rules for AI agents

## License

MIT
