Metadata-Version: 2.4
Name: kernel-retrieval-mcp
Version: 2.0.1
Author-email: jalal <jalalkhaldi3@gmail.com>
Requires-Python: <3.13,>=3.11
Requires-Dist: fastmcp<3.0.0,>=2.14.2
Requires-Dist: pydantic-settings<3.0.0,>=2.12.0
Requires-Dist: retrievalbase[torch]<3.0.0,>=2.1.1
Description-Content-Type: text/markdown

# kernel-retrieval-mcp

`kernel-retrieval-mcp` is a thin library for exposing a `retrievalbase` retriever through an MCP server built with `FastMCP`.

It gives you a small integration layer:

- load a retriever from typed settings
- register a `retrieve(query: str)` MCP tool
- normalize retrieved points into a stable evidence payload
- run the server over `stdio`, `http`, `sse`, or `streamable-http`

This package is intentionally narrow. It does not implement embedding, vector storage, or ranking itself; those concerns are delegated to `retrievalbase`.

## What It Does

At runtime, `MCPRunner`:

1. creates a `FastMCP` app named `mcp`
2. builds a `DenseRetriever` from the configured `retriever.engine`
3. builds an evidence formatter from `evidence.module_path`
4. exposes a `retrieve` MCP tool
5. returns results in this shape:

```json
{
  "retrieval": {
    "evidence": [
      {
        "score": 0.91,
        "page_content": "first chunk",
        "metadata": {
          "source": "doc-1",
          "page": 1
        }
      }
    ]
  }
}
```

## Installation

```bash
uv add kernel-retrieval-mcp
```

Or with `pip`:

```bash
pip install kernel-retrieval-mcp
```

Requirements:

- Python `>=3.11,<3.13`
- a compatible `retrievalbase` retriever configuration

## Core API

The package surface is intentionally small:

- `kernel_retrieval_mcp.MCPRunner`
- `kernel_retrieval_mcp.EvidenceBuilder`
- `kernel_retrieval_mcp.utils.get_runner`

Use `MCPRunner` when you want to instantiate the server directly from a settings object. Use `get_runner()` when you want to bootstrap the runner from YAML by class path.

## Quickstart

### 1. Define a runner settings class

You usually create a project-specific runner that pins the concrete retriever and evidence-builder settings you want to support.

```python
from retrievalbase.evaluation.settings import DenseRetrieverSettings

from kernel_retrieval_mcp import MCPRunner
from kernel_retrieval_mcp.settings import EvidenceBuilderSettings, MCPRunnerSettings


class MyEvidenceSettings(EvidenceBuilderSettings):
    pass


class MyRunnerSettings(
    MCPRunnerSettings[
        DenseRetrieverSettings,
        MyEvidenceSettings,
    ]
):
    pass


class MyRunner(MCPRunner[MyRunnerSettings]):
    pass
```

### 2. Instantiate and run it

```python
from my_project.runner import MyRunner


runner = MyRunner.from_settings()
runner.run()
```

If you prefer to load the runner class from YAML:

```python
from kernel_retrieval_mcp.utils import get_runner


runner = get_runner("config/config.yaml")
runner.run()
```

## Configuration

`get_runner()` expects a YAML file with a top-level `module_path` pointing to your runner class. `from_settings()` relies on Pydantic settings resolution provided by `retrievalbase`, which defaults to `/config/config.yaml`.

A minimal project layout looks like this:

```text
my_project/
  runner.py
config/
  config.yaml
```

Example `config/config.yaml`:

```yaml
module_path: my_project.runner.MyRunner

server:
  transport: http
  host: 127.0.0.1
  port: 8080

retriever:
  limit: 8
  reranker_limit: null
  engine:
    module_path: retrievalbase.evaluation.retrievers.dense.DenseRetriever
    reranker: null
    embedder:
      module_path: my_project.embedder.MyEmbedder
      model_name: my-embedding-model
    vector_store:
      module_path: my_project.vector_store.MyVectorStore
    processor:
      module_path: my_project.processor.MyProcessor

evidence:
  module_path: my_project.evidence.MyEvidenceBuilder
```

Notes:

- `server` is owned by this package.
- `retriever.engine` is a `retrievalbase` retriever settings object.
- nested fields under `engine` depend on the retriever class you choose.

## Custom Evidence Builders

The default `EvidenceBuilder` returns `score`, `page_content`, and `metadata` for each retrieved point. If your MCP clients need citations, document IDs, or a different payload contract, subclass it and override `build`.

```python
from typing import Any

from kernel_retrieval_mcp import EvidenceBuilder
from kernel_retrieval_mcp.settings import EvidenceBuilderSettings


class CitationEvidenceSettings(EvidenceBuilderSettings):
    pass


class CitationEvidenceBuilder(EvidenceBuilder[CitationEvidenceSettings]):
    def build(self, points) -> dict[str, Any]:
        return {
            "retrieval": {
                "evidence": [
                    {
                        "score": point.score,
                        "text": point.document.page_content,
                        "source": point.document.metadata.get("source"),
                        "page": point.document.metadata.get("page"),
                    }
                    for point in points
                ]
            }
        }
```

## Development

Install dev dependencies:

```bash
make dev-install
```

Useful commands:

```bash
make format
make lint
make type-check
make test
make test-cov
make ci
```
