Metadata-Version: 2.4
Name: modelscout-sdk
Version: 0.1.1
Summary: Python SDK for ModelScout - LLM Benchmarking and Evaluation
Author: ModelScout Team
License-Expression: LicenseRef-Proprietary
Project-URL: Documentation, https://docs.modelscout.co
Project-URL: Repository, https://github.com/modelscout/modelscout-python
Project-URL: Changelog, https://github.com/modelscout/modelscout-python/blob/main/CHANGELOG.md
Keywords: llm,benchmarking,evaluation,ai,machine-learning
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: httpx>=0.25.0
Requires-Dist: typing_extensions>=4.0.0
Requires-Dist: cryptography>=41.0.0
Requires-Dist: msgpack>=1.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-httpx>=0.21.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == "openai"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.18.0; extra == "anthropic"
Provides-Extra: google
Requires-Dist: google-generativeai>=0.3.0; extra == "google"
Provides-Extra: providers
Requires-Dist: openai>=1.0.0; extra == "providers"
Requires-Dist: anthropic>=0.18.0; extra == "providers"
Requires-Dist: google-generativeai>=0.3.0; extra == "providers"
Provides-Extra: all
Requires-Dist: modelscout[dev,providers]; extra == "all"
Dynamic: license-file

# ModelScout Python SDK

**Find the best LLM for your product.** Run benchmarks across multiple models on your own data to see which performs best for quality, cost, and latency.

## Installation

```bash
pip install modelscout-sdk
```

## Quick Start

```python
from modelscout import Benchmark, ModelConfig

# Set MODELSCOUT_API_KEY in your environment, or pass api_key="ms_..."
results = Benchmark().run(
    purchased_benchmark_id="trial",  # free trial, or "pb_..." from dashboard purchase
    prompts=["Write a SQL query to find active users", "Explain quantum computing"],
    models=[
        ModelConfig(provider="openai", model="gpt-5-mini"),
        ModelConfig(provider="anthropic", model="claude-haiku-4-5-20251001"),
        ModelConfig(provider="deepseek", model="deepseek-v3.2"),
    ],
)

print(results.best_model_for("quality"))  # Best quality model
print(results.best_model_for("cost"))     # Cheapest model
```

## Features

### Benchmarking
Compare LLMs side-by-side on your evaluation data. Get quality scores, cost analysis, latency metrics, and statistical significance.

### Data Generation

Need synthetic test data? Generate evaluation datasets from the [dashboard](https://modelscout.co/dashboard/datasets) — describe your use case and get representative prompts in minutes.

### Dataset Upload
Upload your own evaluation data:

```python
dataset_id = benchmark.upload_dataset(
    name="My Test Data",
    samples=[
        {"input": "What is machine learning?"},
        {"input": "Explain neural networks"},
    ],
)
```

### Agentic Evaluation
Test tool-calling capabilities with multi-turn evaluation (SDK-only):

```python
from modelscout import Benchmark, ModelConfig, AgenticConfig, ToolDefinition

def my_search_function(query: str) -> str:
    return f"Results for: {query}"

config = AgenticConfig(tools=[
    ToolDefinition(
        name="search",
        description="Search the web",
        parameters={"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]},
        implementation=my_search_function,
    )
])

results = Benchmark().run(
    name="Agent Eval",
    purchased_benchmark_id="pb_...",
    prompts=["Find information about quantum computing"],
    models=[
        ModelConfig(provider="openai", model="gpt-5-mini"),
        ModelConfig(provider="anthropic", model="claude-haiku-4-5-20251001"),
    ],
    agentic_config=config,
)
```

## Supported Models

23 models across 8 providers:

| Provider | Models |
|----------|--------|
| OpenAI | gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, gpt-5-mini, gpt-5-nano |
| Anthropic | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001 |
| Google | gemini-3.1-pro, gemini-3-flash, gemini-3.1-flash-lite, gemini-2.5-flash-lite |
| DeepSeek | deepseek-v3.2, deepseek-v3.2-speciale, deepseek-r1 |
| Qwen | qwen3.5-397b-a17b, qwen3.5-flash-02-23, qwen3-235b-a22b |
| Meta | llama-4-maverick, llama-4-scout |
| Mistral | mistral-large-2512, mistral-small-2603 |
| xAI | grok-4, grok-4.1-fast |

## Pricing

**Free trial:** Every new organization gets one free benchmark (10 samples, 2 standard models).

**Pay-as-you-go:** Purchase benchmarks from the [dashboard](https://modelscout.co/dashboard). Price depends on selected models, sample count, and judge tier. Starting from $4.99.

## Documentation

Full documentation: [modelscout.co/docs/sdk](https://modelscout.co/docs/sdk)

---

## License

Proprietary. See [LICENSE](LICENSE) for details.
