Metadata-Version: 2.4
Name: semantic-search-mcp
Version: 0.2.0
Summary: MCP server for semantic code search using local embeddings
Project-URL: Homepage, https://github.com/adam-hanna/semantic-search-mcp
Project-URL: Repository, https://github.com/adam-hanna/semantic-search-mcp
Project-URL: Issues, https://github.com/adam-hanna/semantic-search-mcp/issues
Author-email: Adam Hanna <adamhanna@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: ai,claude,code-search,embeddings,llm,mcp,semantic-search
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Indexing
Requires-Python: >=3.11
Requires-Dist: apsw>=3.45.0
Requires-Dist: fastembed>=0.4.0
Requires-Dist: mcp[cli]>=1.0.0
Requires-Dist: pathspec>=0.12.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: sqlite-vec>=0.1.6
Requires-Dist: tree-sitter-language-pack>=0.4.0
Requires-Dist: watchfiles>=1.0.0
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'dev'
Requires-Dist: pytest-cov>=6.0.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Description-Content-Type: text/markdown

# Semantic Search MCP Server

An MCP server that provides semantic code search using local embeddings. Search your codebase with natural language queries like "authentication middleware" or "database connection pooling".

## Features

- **Hybrid search**: Combines vector similarity (Jina code embeddings) with FTS5 keyword matching using Reciprocal Rank Fusion
- **165+ languages**: Tree-sitter parsing for Python, TypeScript, JavaScript, Go, Rust, Java, C/C++, Ruby, PHP, and more
- **Incremental indexing**: File watcher automatically detects additions, modifications, and deletions
- **Respects .gitignore**: Honors your project's `.gitignore` files (including nested ones)
- **Auto-initialization**: Model loads and codebase indexes in the background on server startup
- **Zero external APIs**: All embeddings generated locally with FastEmbed

## Installation

```bash
uv tool install semantic-search-mcp
```

Or with pip:
```bash
pip install semantic-search-mcp
```

Or run directly without installing:
```bash
uvx semantic-search-mcp
```

## Quick Start

### Add to Claude Code

**Option A: Project-level config**

Create `.mcp.json` in your project root:
```json
{
  "mcpServers": {
    "semantic-search": {
      "command": "uvx",
      "args": ["semantic-search-mcp"]
    }
  }
}
```

**Option B: CLI**
```bash
claude mcp add semantic-search -- uvx semantic-search-mcp
```

### Use

The server auto-initializes on startup. Available tools:

- `search_code` - Search with natural language queries
- `initialize` - Force re-index if needed
- `reindex_file` - Manually reindex a specific file

## How It Works

### Indexing

On startup, the server:
1. Scans your codebase for supported file types
2. Parses code into semantic chunks (functions, classes, methods) using Tree-sitter
3. Generates embeddings for each chunk using Jina's code embedding model
4. Stores everything in a local SQLite database with vector search support

### File Watching

The server monitors your codebase for changes in real-time:

| Event | Action |
|-------|--------|
| File created | Parsed, embedded, and added to index |
| File modified | Re-indexed if content hash changed |
| File deleted | Removed from index |

Changes are debounced (default 1s) to batch rapid modifications.

### What Gets Indexed

**Included:**
- Files with code extensions: `.py`, `.js`, `.ts`, `.tsx`, `.jsx`, `.go`, `.rs`, `.java`, `.c`, `.cpp`, `.h`, `.rb`, `.php`, `.swift`, `.kt`, `.scala`, and more

**Excluded:**
- Files matching `.gitignore` patterns (all `.gitignore` files in your project are respected)
- Common non-code directories: `node_modules`, `__pycache__`, `.venv`, `build`, `dist`, `.git`, `vendor`, etc.
- Binary files and non-code file types

## Configuration

Environment variables:

| Variable | Default | Description |
|----------|---------|-------------|
| `SEMANTIC_SEARCH_DB_PATH` | `.semantic-search/index.db` | Index database location |
| `SEMANTIC_SEARCH_EMBEDDING_MODEL` | `jinaai/jina-embeddings-v2-base-code` | Embedding model |
| `SEMANTIC_SEARCH_MIN_SCORE` | `0.3` | Minimum relevance threshold (0-1) |
| `SEMANTIC_SEARCH_DEBOUNCE_MS` | `1000` | File watcher debounce in milliseconds |

## Requirements

- Python 3.11+
- ~700MB disk for embedding model (downloaded on first run)
- ~1GB RAM for embedding model

## License

[MIT](LICENSE)
