Metadata-Version: 2.4
Name: mcp-email-index
Version: 0.1.5
Summary: Standalone MCP server that indexes IMAP email accounts for semantic and structured search
License-Expression: Apache-2.0
License-File: LICENSE
Requires-Python: >=3.10
Requires-Dist: aioimaplib>=2.0.1
Requires-Dist: beautifulsoup4>=4.12
Requires-Dist: einops>=0.7
Requires-Dist: httpx>=0.27
Requires-Dist: mcp[cli]<2,>=1.23.0
Requires-Dist: pydantic>=2.0
Requires-Dist: qdrant-client>=1.9.0
Requires-Dist: sentence-transformers>=3.0.0
Requires-Dist: typer>=0.15
Provides-Extra: test
Requires-Dist: hypothesis>=6.0; extra == 'test'
Requires-Dist: pytest-asyncio>=0.23; extra == 'test'
Requires-Dist: pytest>=8.0; extra == 'test'
Requires-Dist: testcontainers>=4.0; extra == 'test'
Description-Content-Type: text/markdown

# mcp-email-index

Indexes your IMAP email locally so you can search it fast — both
semantically ("that email about the project deadline from last month") and
structurally ("all emails from bob@example.com in March 2024"). Full email
bodies and attachments are fetched live from IMAP on demand; only metadata
and embeddings are stored locally.

Works with any IMAP mailbox (Gmail, self-hosted, etc.). Supports
multiple accounts searched together or independently.

Requires Python 3.10+. Uses Qdrant (embedded) for vector search,
SQLite with FTS5 for structured search, and sentence-transformers for
embeddings.

---

## Quick Start

### 1. Install

```bash
pip install mcp-email-index
```

### 2. Configure

Edit `~/.mcp-email-index/config.toml` (auto-created on first run):

```toml
[account.work]
imap_host         = "mail.example-corp.com"
imap_port         = 993
imap_ssl          = true
imap_verify_ssl   = true                                 # set false for self-signed certs
username          = "alice@example-corp.com"
password          = "..."
max_connections   = 10                                   # max concurrent IMAP connections
# default_from    = "Alice <alice@example-corp.com>"     # backfill empty From headers
# exclude_folders = ["[Gmail]/All Mail", "[Gmail]/Spam"] # glob patterns for folders to skip
# smime_pfx       = "~/.mcp-email-index/certs/work.pfx" # S/MIME decryption certificate
# smime_password  = "..."                                # or env MCP_EMAIL_INDEX_WORK_SMIME_PASSWORD

[account.personal]
imap_host         = "imap.example-personal.com"
imap_port         = 993
imap_ssl          = true
imap_verify_ssl   = true
username          = "alice@example-personal.com"
password          = "..."
```

Add as many `[account.<name>]` sections as you need. The `<name>` is what you
use in `--account` flags and MCP tool parameters.

Passwords can also be set via environment variables:
`MCP_EMAIL_INDEX_<ACCOUNT>_PASSWORD` (uppercase account name) takes priority
over the config file value.

### 3. Start, index, connect

```bash
mcp-email-index-cli index  # index all accounts (starts server automatically)
```

MCP client config:

```json
{
  "mcpServers": {
    "email-index": {
      "command": "uvx",
      "args": ["mcp-email-index"]
    }
  }
}
```

Multiple clients can use the same config — the first one starts the daemon, the rest connect to it. When the last client disconnects, the daemon auto-shuts down after 60 seconds.

---

## CLI Commands

All commands except `serve` talk to the running server over HTTP.

### `serve` — Start the server

```bash
mcp-email-index-cli serve
```

Daemonizes on Unix, writes PID to `~/.mcp-email-index/server.pid`, logs to
`~/.mcp-email-index/server.log`, and waits until the server is ready before
exiting. On Windows the server runs in the foreground (see
[Running on Windows](#running-on-windows)).

The server auto-shuts down 60 seconds after the last MCP client disconnects (configurable via `MCP_EMAIL_INDEX_GRACE` env var). You don't normally need to call `serve` manually — the `mcp-email-index-mcp` shim starts it automatically.

### `stop` — Shut down the server

```bash
mcp-email-index-cli stop
```

If an indexing job is running, it is cancelled (progress saved) before
shutdown.

### `index` — Build or update the index

```bash
# Incremental — only new emails since last run
mcp-email-index-cli index

# Specific accounts
mcp-email-index-cli index --account work
mcp-email-index-cli index --account work --account personal

# Specific folders (suffix-wildcard glob, only trailing * supported)
mcp-email-index-cli index --account work --folder "Archive/2025/*"

# Full reindex — wipes existing data and rebuilds from scratch
mcp-email-index-cli index --full
mcp-email-index-cli index --full --account work --folder "Archive/2025/2025-03"
```

| Flag | What it does |
|------|-------------|
| `--account NAME` / `-a` | Restrict to this account. Repeat for multiple. Omit for all. |
| `--folder GLOB` / `-f` | Restrict to folders matching this glob. Only trailing `*` wildcard. Requires exactly one `--account`. |
| `--full` | Delete existing index data first, then reindex. Required when the embedding model changes. |

The CLI shows the live progress stream after starting the job. Press Ctrl-C
to choose: run in background (default) or cancel the job.

During indexing the server automatically:
- Detects and prunes emails deleted from IMAP (stale UIDs)
- Removes folders that no longer exist on IMAP or are now in `exclude_folders`
- Leaves other already-indexed folders untouched when using `--folder` to scope a run
- Validates the embedding model fingerprint — blocks incremental indexing if the model changed (use `--full` to confirm)

### `index cancel` — Stop the current indexing job

```bash
mcp-email-index-cli index cancel
```

Finishes the current batch, saves progress, and stops. No data is lost.

### `index remove` — Remove indexed data

```bash
mcp-email-index-cli index remove -a work -f "Archive/2024/2024-03"  # specific folder
mcp-email-index-cli index remove -a work                            # entire account
mcp-email-index-cli index remove                                    # everything
```

Prompts for confirmation before deleting.

### `scan` — Discover new and removed folders

```bash
mcp-email-index-cli scan
```

Compares live IMAP folder lists against the index. Removes stale folders
from the index and shows an ASCII tree of all folders with their index
status, including counts of new (unindexed) messages.

### `status` — Live indexing progress

```bash
mcp-email-index-cli status
```

Streams a live progress display while an indexing job is running. Press
Ctrl-C to exit — the job keeps running.

### Command interactions

| Situation | What happens |
|-----------|-------------|
| `index` while a job is running | Shows the running job's status. Cancel first to start a new one. |
| `scan` while a job is running | Rejected — cancel the job first. |
| `stop` while a job is running | Cancels the job (progress saved), then shuts down. |
| `serve` while server is already running | Detects existing process, exits with a message. |

---

## MCP Tools

These tools are exposed to MCP clients. All search tools query across all
configured accounts by default.

### `search_emails` — Semantic search

Natural language search across email subjects and bodies.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `query` | string | required | Natural language query |
| `accounts` | list[string] | `[]` (all) | Restrict to specific accounts |
| `n_results` | int | `10` | Number of results |
| `since` | string \| null | null | ISO date lower bound |
| `before` | string \| null | null | ISO date upper bound |
| `folder_glob` | string \| null | null | Folder path glob (trailing `*` only) |
| `from_addr` | string \| null | null | Sender substring |
| `to_addr` | string \| null | null | To/CC recipient substring |
| `has_attachment` | bool \| null | null | Filter by attachment presence |
| `overfetch` | int | `5` | Candidate multiplier for address filters |

### `find_emails` — Structured search

Field-specific lookups with FTS5 keyword search. Supports pagination.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `accounts` | list[string] | `[]` (all) | Restrict to specific accounts |
| `from_addr` | string \| null | null | Sender substring |
| `to_addr` | string \| null | null | To/CC recipient substring |
| `subject` | string \| null | null | Subject keyword search (FTS5) |
| `body` | string \| null | null | Body preview keyword search (FTS5) |
| `since` | string \| null | null | ISO date lower bound |
| `before` | string \| null | null | ISO date upper bound |
| `folder_glob` | string \| null | null | Folder path glob |
| `has_attachment` | bool \| null | null | Filter by attachment presence |
| `limit` | int | `50` | Page size (capped at `max_find_results`) |
| `offset` | int | `0` | Pagination offset |

### `get_email` — Fetch full email content

Fetches the complete email body live from IMAP.

| Parameter | Type | Description |
|-----------|------|-------------|
| `account` | string | Account name |
| `folder` | string | IMAP folder path |
| `uid` | string | IMAP UID |

Returns full plain-text body (HTML stripped if no text/plain part), headers,
and attachment filenames. If the email has been moved to a different folder
on the server, the index is updated automatically. If the email no longer
exists on IMAP, it is pruned from the index.

### `get_metadata_by_message_id` — Look up by Message-ID

Instant lookup from the local index — no IMAP connection needed.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `message_id` | string | required | RFC 2822 Message-ID |
| `accounts` | list[string] | `[]` (all) | Restrict to specific accounts |

### `get_thread` — Reconstruct email thread

Returns all indexed emails in the same conversation, sorted by date
ascending. Thread reconstruction uses In-Reply-To and References headers.

| Parameter | Type | Description |
|-----------|------|-------------|
| `account` | string | Account name |
| `message_id` | string | Message-ID of any email in the thread |

### `save_attachments` — Download attachments

Fetches the email live from IMAP and saves attachments to a local directory.
Handles filename collisions by appending a counter.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `account` | string | required | Account name |
| `folder` | string | required | IMAP folder path |
| `uid` | string | required | IMAP UID |
| `dest_dir` | string | `"."` | Local path to save into (auto-created if needed) |
| `names` | list[string] | `[]` (all) | Specific filenames to save; empty means all |

### `list_folders` — See what's indexed

Lists all indexed folders with email counts and last indexing timestamp.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `account` | string \| null | null | Specific account, or null for all |

---

## Search Tips

- `search_emails` for natural language: "that email about the project deadline".
- `find_emails` for precise lookups: exact sender, date range, subject keywords.
- `folder_glob` narrows scope: `"Archive/2024/*"` searches only 2024 folders.
- `from_addr` and `to_addr` are substring matches — `"bob"` matches `bob@example.com`.
- `get_thread` after finding an interesting email to see the full conversation.

---

## S/MIME Support

If your IMAP server stores S/MIME encrypted or signed emails (common with
Exchange), configure a PFX/P12 certificate for transparent decryption:

```toml
[account.work]
smime_pfx      = "~/.mcp-email-index/certs/work.pfx"
smime_password = "pfx-password"
```

The PFX can contain multiple key+cert pairs (e.g. current + historical).
All pairs are tried during decryption. OpenSSL is used as a fallback for
BER-encoded PKCS7 and legacy algorithms (RC2, DES).

Handled formats:
- Standard S/MIME enveloped-data (encrypted)
- Opaque-signed messages (signature unwrapped to extract body)
- Detached signatures (multipart with text + pkcs7 signature)
- Exchange "Microsoft Mail Internet Headers" wrapper
- Exchange TNEF-wrapped S/MIME (application/ms-tnef with PKCS7 body)
- Apple Mail S/MIME (missing Content-Type, only filename=smime.p7m)

---

## Account Options

### `default_from` — Backfill empty From headers

Old Exchange Sent folders sometimes store emails with an empty From header.
Set `default_from` to backfill it during indexing and live fetches:

```toml
[account.work]
default_from = "Your Name <you@example.com>"
```

### `exclude_folders` — Skip folders during indexing

Glob patterns for folders to exclude from indexing and scanning:

```toml
[account.work]
exclude_folders = ["[Gmail]/All Mail", "[Gmail]/Spam", "Drafts"]
```

---

## Running on Windows

The `serve` command daemonizes via `os.fork()`, which is Unix-only. On Windows
the server runs in the foreground. Options for background execution:

- `pythonw`: `Start-Process pythonw -ArgumentList "-m", "mcp-email-index.cli", "serve" -WindowStyle Hidden`
- Task Scheduler: trigger "At log on", action `pythonw.exe -m mcp-email-index.cli serve`
- [NSSM](https://nssm.cc/): `nssm install mcp-email-index "python.exe" "-m mcp-email-index.cli serve"`

Use `mcp-email-index-cli stop` to shut down in all cases.

---

## Configuration Reference

`~/.mcp-email-index/config.toml`:

| Section | Key | Default | Description |
|---------|-----|---------|-------------|
| `[account.<name>]` | `imap_host` | — | IMAP server hostname |
| | `imap_port` | `993` | IMAP port |
| | `imap_ssl` | `true` | Use SSL |
| | `imap_verify_ssl` | `true` | Verify SSL certs (set `false` for self-signed) |
| | `username` | — | IMAP username |
| | `password` | — | IMAP password (or env `MCP_EMAIL_INDEX_<NAME>_PASSWORD`) |
| | `max_connections` | `10` | Max concurrent IMAP connections |
| | `default_from` | — | Backfill empty From headers (e.g. `"Name <email>"`) |
| | `exclude_folders` | — | List of glob patterns for folders to skip |
| | `smime_pfx` | — | Path to PFX/P12 certificate for S/MIME decryption |
| | `smime_password` | — | PFX password (or env `MCP_EMAIL_INDEX_<NAME>_SMIME_PASSWORD`) |
| `[index]` | `data_dir` | `~/.mcp-email-index/data` | Where index data is stored (Qdrant + SQLite) |
| | `embedding_model` | `nomic-ai/nomic-embed-text-v1.5` | Sentence-transformers model (changing requires `--full` reindex) |
| | `batch_size` | `50` | Emails per IMAP fetch batch |
| | `max_find_results` | `500` | Hard cap on `find_emails` results |
| | `preview_chars` | `500` | Body preview length stored in index |
| | `embed_chars` | `8000` | Body text fed to embedding model (must be ≥ `preview_chars`) |
| `[server]` | `host` | `127.0.0.1` | Bind address |
| | `port` | `6644` | HTTP port for MCP (SSE) and CLI |
| | `log_level` | `INFO` | `DEBUG` \| `INFO` \| `WARNING` \| `ERROR` |
| | `semaphore_timeout_s` | `30` | Timeout (seconds) waiting for IMAP connection slot |

Server files:
- PID file: `~/.mcp-email-index/server.pid`
- Log file: `~/.mcp-email-index/server.log`
- Config: `~/.mcp-email-index/config.toml` (auto-created with defaults on first run, chmod 600)

Environment variables:
- `MCP_EMAIL_INDEX_<ACCOUNT>_PASSWORD` — account IMAP password (overrides config)
- `MCP_EMAIL_INDEX_<ACCOUNT>_SMIME_PASSWORD` — S/MIME PFX password (overrides config)
- `MCP_EMAIL_INDEX_GRACE` — auto-shutdown delay in seconds after last client disconnects (default: `60`)
