Metadata-Version: 2.4
Name: mcp-data-check
Version: 0.3.0
Summary: Evaluate MCP server accuracy against known questions and answers
Project-URL: Homepage, https://github.com/GSA-TTS/mcp-data-check
Project-URL: Repository, https://github.com/GSA-TTS/mcp-data-check
Project-URL: Issues, https://github.com/GSA-TTS/mcp-data-check/issues
Author-email: mark-aronson <marksaronson@gmail.com>
License: MIT
License-File: LICENSE
Keywords: anthropic,claude,evaluation,mcp,testing
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.10
Requires-Dist: anthropic>=0.40.0
Requires-Dist: python-dotenv>=1.0.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: twine>=5.0.0; extra == 'dev'
Description-Content-Type: text/markdown

# mcp-data-check

Evaluate MCP server accuracy against known questions and answers.

## Installation

```bash
pip install mcp-data-check
```

Or install from source:

```bash
pip install -e .
```

## Usage

### Python API

```python
from mcp_data_check import run_evaluation

results = run_evaluation(
    questions_filepath="questions.csv",
    api_key="sk-ant-...",
    server_url="https://mcp.example.com/sse"
)

print(f"Pass rate: {results['summary']['pass_rate']:.1%}")
print(f"Passed: {results['summary']['passed']}/{results['summary']['total']}")
```

### Command Line

```bash
mcp-data-check https://mcp.example.com/sse -q questions.csv -k YOUR_API_KEY
```

Options:
- `-q, --questions`: Path to questions CSV file (required)
- `-k, --api-key`: Anthropic API key (defaults to ANTHROPIC_API_KEY env var)
- `-o, --output`: Output directory for results (default: ./results)
- `-m, --model`: Claude model to use (default: claude-sonnet-4-20250514)
- `-n, --server-name`: Name for the MCP server (default: mcp-server)
- `-v, --verbose`: Print detailed progress

## Questions CSV Format

The questions CSV file must have three columns:

| Column | Description |
|--------|-------------|
| `question` | The question to ask the MCP server |
| `expected_answer` | The expected answer to compare against |
| `eval_type` | Evaluation method: `numeric`, `string`, or `llm_judge` |

Example:

```csv
question,expected_answer,eval_type
How many grants were awarded in 2023?,1234,numeric
What organization received the most funding?,NIH,string
Explain the grant distribution,Most grants went to research institutions...,llm_judge
```

### Evaluation Types

- **numeric**: Extracts numbers from responses and compares with 5% tolerance
- **string**: Checks if expected string appears in response (case-insensitive)
- **llm_judge**: Uses Claude to semantically evaluate if the response is correct

## Return Value

The `run_evaluation` function returns a dictionary:

```python
{
    "summary": {
        "total": 10,
        "passed": 8,
        "failed": 2,
        "pass_rate": 0.8,
        "by_eval_type": {
            "numeric": {"total": 5, "passed": 4},
            "string": {"total": 3, "passed": 3},
            "llm_judge": {"total": 2, "passed": 1}
        }
    },
    "results": [
        {
            "question": "...",
            "expected_answer": "...",
            "eval_type": "numeric",
            "model_response": "...",
            "passed": True,
            "details": {...},
            "error": None
        },
        ...
    ],
    "metadata": {
        "server_url": "https://mcp.example.com/sse",
        "model": "claude-sonnet-4-20250514",
        "timestamp": "20250127_143022"
    }
}
```

## Requirements

- Python 3.10+
- Anthropic API key with MCP beta access
