Metadata-Version: 2.4
Name: transcript-search
Version: 0.1.0
Summary: MCP server for semantic search over meeting transcripts (multilingual: EN/UK/RU)
License-Expression: MIT
Requires-Python: >=3.11
Requires-Dist: mcp[cli]>=1.0.0
Requires-Dist: qdrant-client>=1.9.0
Requires-Dist: sentence-transformers>=3.0.0
Requires-Dist: torch>=2.0.0
Description-Content-Type: text/markdown

# Transcript Search MCP Server

Semantic search over meeting transcripts using Qdrant + multilingual embeddings, exposed as a [Claude Code](https://docs.anthropic.com/en/docs/claude-code) MCP server.

## Setup

```bash
# Start Qdrant
docker compose up -d

# Create Python environment
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Index all transcripts (~2-5 min first run, downloads model)
TRANSCRIPTS_DIR=/path/to/your/transcripts python index.py

# Verify: Qdrant dashboard at http://localhost:6333/dashboard
```

## Configuration

Set `TRANSCRIPTS_DIR` to the folder containing your `.md` transcript files. Defaults to `./transcripts` in the current working directory.

## Usage

### CLI
```bash
# Basic search
TRANSCRIPTS_DIR=/path/to/transcripts python search_cli.py "what did Dima say about department metrics"

# Filter by speaker
python search_cli.py "project updates" --speaker "Dima Batt" --top-k 3

# Filter by meeting type
python search_cli.py "purpose discussion" --type 1on1

# Multilingual query
python search_cli.py "культура як диференціатор"
```

### MCP Server (Claude Code integration)
Add to your project's `.mcp.json`:
```json
{
  "mcpServers": {
    "transcript-search": {
      "command": "uv",
      "args": [
        "run",
        "--directory", "/path/to/transcript-search",
        "--python", "/path/to/transcript-search/.venv/bin/python",
        "mcp_server.py"
      ],
      "env": {
        "TRANSCRIPTS_DIR": "/path/to/your/transcripts"
      }
    }
  }
}
```

## Architecture

```
Transcripts (*.md) → parse_transcripts.py → chunk.py → index.py → Qdrant
                                                                      ↑
                                            search_cli.py / mcp_server.py
```

- **Model:** `intfloat/multilingual-e5-large` (1024-dim, supports UK/RU/EN)
- **Chunking:** Speaker-turn-aware, ~1500 tokens per chunk, 3-turn overlap
- **Storage:** Qdrant (Docker) with cosine similarity
