Metadata-Version: 2.4
Name: mcp-local-rag
Version: 0.3.4
Summary: Local MCP server for RAG over PDFs, DOCX, images, and plaintext files.
Project-URL: Homepage, https://github.com/Milliman-CMHH/mcp-local-rag
Project-URL: Repository, https://github.com/Milliman-CMHH/mcp-local-rag
Project-URL: Issues, https://github.com/Milliman-CMHH/mcp-local-rag/issues
Author-email: Jozef833 <172046463+Jozef833@users.noreply.github.com>
License: AGPL-3.0-or-later
License-File: LICENSE
Keywords: docx,local,mcp,pdf,rag
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Indexing
Requires-Python: <3.14,>=3.11
Requires-Dist: aiofiles<26,>=25.1.0
Requires-Dist: google-genai[aiohttp]<2,>=1.67.0
Requires-Dist: markitdown[docx]<1,>=0.1.4
Requires-Dist: mcp[cli]<2,>=1.26.0
Requires-Dist: numpy<3,>=2.4.2
Requires-Dist: pydantic<3,>=2.12.5
Requires-Dist: pymupdf-layout<2,>=1.27.1
Requires-Dist: pymupdf4llm<1,>=0.2.9
Requires-Dist: qdrant-client<2,>=1.16.2
Requires-Dist: semantic-text-splitter<1,>=0.29.0
Requires-Dist: sentence-transformers<6,>=5.2.2
Requires-Dist: torch<3,>=2.10.0
Provides-Extra: azure
Requires-Dist: azure-ai-documentintelligence<2,>=1.0.2; extra == 'azure'
Requires-Dist: azure-identity<2,>=1.25.2; extra == 'azure'
Requires-Dist: azure-monitor-opentelemetry<2,>=1.8.6; extra == 'azure'
Description-Content-Type: text/markdown

# mcp-local-rag

Local MCP server for RAG over PDFs, DOCX, images, and plaintext files.

## Requirements

For more complex PDFs and image files, the following environment variables can be provided:

- `AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT`; requires `mcp-local-rag[azure]`.
- `AZURE_DOCUMENT_INTELLIGENCE_KEY`; when omitted, `DefaultAzureCredential` is used. Requires `mcp-local-rag[azure]`.
- `GEMINI_API_KEY`
- `MCP_LOCAL_RAG_GEMINI_MODEL` (default: `gemini-3-pro-preview`)

Image files require either Gemini or Azure DI for extraction. See [docs/configuration.md](docs/configuration.md) for details.

## Data Storage

By default, the server stores data in:

- **Windows**: `%LOCALAPPDATA%\mcp-local-rag\`
- **macOS**: `~/Library/Application Support/mcp-local-rag/`
- **Linux**: `$XDG_DATA_HOME/mcp-local-rag/`

The data directory contains:

- `markdown/` - Extracted Markdown content of indexed documents
- `metadata.db` - SQLite database for document/collection metadata
- `qdrant/` - Vector database for embeddings

AI Models are cached in the default HuggingFace cache directory (`~/.cache/huggingface/`).

To customize the data directory, set the `MCP_LOCAL_RAG_DATA_DIR` environment variable (a `mcp-local-rag/` subfolder is created automatically inside it).

## Usage

### VS Code

Add to `.vscode/mcp.json`:

```json
{
  "servers": {
    "mcp-local-rag": {
      "command": "uvx",
      "args": [
        "--python",
        "3.13",  // Does not support Python 3.14 yet: https://github.com/microsoft/markitdown/issues/1470
        "mcp-local-rag@latest"
      ]
    }
  }
}
```

If you run into SSL errors (Zscaler), you can try:

```json
{
  "servers": {
    "mcp-local-rag": {
      "command": "uvx",
      "args": [
        "--native-tls",
        "--python",
        "3.13",  // Does not support Python 3.14 yet: https://github.com/microsoft/markitdown/issues/1470
        "--with",
        "pip-system-certs",
        "mcp-local-rag@latest"
      ]
    }
  }
}
```
