Metadata-Version: 2.4
Name: inference-predictor-mcp
Version: 0.1.0
Summary: MCP server for LLM inference performance prediction — Traction Layer AI
Project-URL: Homepage, https://predictor.tractionlayer.ai
Project-URL: Documentation, https://github.com/Traction-Layer-AI/KPG_Predictor/tree/main/mcp
Project-URL: Repository, https://github.com/Traction-Layer-AI/KPG_Predictor
Author-email: Traction Layer AI <support@tractionlayer.ai>
License: Proprietary
Keywords: anthropic,inference,llm,mcp,performance,prediction
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: httpx>=0.27.0
Requires-Dist: mcp>=1.0.0
Requires-Dist: pydantic>=2.0
Description-Content-Type: text/markdown

# Inference Predictor MCP Server

MCP server for [Inference Predictor](https://predictor.tractionlayer.ai) by Traction Layer AI — predict LLM inference performance (TTFT, throughput, cost) for any Hardware x Model x Runtime configuration, directly from Claude.

## Installation

```bash
pip install inference-predictor-mcp
```

## Claude Desktop Configuration

Add to your Claude Desktop config (`~/Library/Application Support/Claude/claude_desktop_config.json`):

```json
{
  "mcpServers": {
    "inference-predictor": {
      "command": "python",
      "args": ["-m", "inference_predictor_mcp"]
    }
  }
}
```

With a Pro API key (enables compare, optimize, batch-sweep, visualize, SKU listing):

```json
{
  "mcpServers": {
    "inference-predictor": {
      "command": "python",
      "args": ["-m", "inference_predictor_mcp"],
      "env": {
        "KPG_API_KEY": "your-api-key-here"
      }
    }
  }
}
```

## Available Tools

### Free Tier (no API key required)

| Tool | Description |
|------|-------------|
| `predict_performance` | Predict TTFT, throughput, cost for a single hardware config |
| `check_hardware_compatibility` | Check which GPUs can fit a model's weights |
| `explain_model` | Generate educational architecture explainer |
| `list_models` | List all 18 registered models with parameters |
| `health_check` | Check API health and version |

### Pro Tier (API key required)

| Tool | Description |
|------|-------------|
| `compare_configs` | Compare vLLM, SGLang, TensorRT-LLM side-by-side |
| `find_optimal_hardware` | Search for cheapest/fastest hardware config |
| `batch_size_sweep` | Sweep batch sizes to find optimal throughput |
| `visualize_kpg` | Generate interactive Kernel Pipeline Graph |
| `list_hardware_skus` | List AWS GPU instance SKUs with pricing |

## Get a Pro API Key

Visit [predictor.tractionlayer.ai](https://predictor.tractionlayer.ai) to obtain a Pro API key.

## Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `KPG_API_BASE_URL` | Production API Gateway | Override for self-hosted deployments |
| `KPG_API_KEY` | (none) | Pro tier API key |
| `KPG_TIMEOUT` | 30 | HTTP timeout in seconds |

## Development

If you're working on both this package and the main KPG_Predictor repo
in the same virtualenv, you may hit a starlette version conflict between
MCP SDK (requires `>=1.0.0`) and FastAPI (requires `<0.49.0`). Resolve
by installing MCP first, then explicitly pinning starlette:

```bash
pip install -e mcp/
pip install "starlette<0.49.0,>=0.40.0"
```

End users who only `pip install inference-predictor-mcp` don't encounter this.

## Documentation

Full Inference Predictor documentation: [docs/cli.md](../docs/cli.md)
