Metadata-Version: 2.4
Name: ptm-client
Version: 0.1.0
Summary: Lightweight PTM API client for integration with external Python services
Author: 15Five Engineering
License: Proprietary
Project-URL: Repository, https://github.com/15five/prompt-test-manager
Project-URL: Documentation, https://github.com/15five/prompt-test-manager/tree/main/packages/ptm-client
Project-URL: Changelog, https://github.com/15five/prompt-test-manager/blob/main/docs/ptm-client-packaging-roadmap.md#changelog
Classifier: Development Status :: 4 - Beta
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Testing
Classifier: Intended Audience :: Developers
Classifier: Typing :: Typed
Requires-Python: >=3.12
Description-Content-Type: text/markdown
Requires-Dist: requests<3.0,>=2.32
Provides-Extra: dev
Requires-Dist: pytest<9.0,>=8.2; extra == "dev"
Requires-Dist: responses<1.0,>=0.25; extra == "dev"
Requires-Dist: ruff<1.0,>=0.6; extra == "dev"

# ptm-client

Lightweight Python client for the [Prompt Test Manager (PTM)](https://github.com/15five/prompt-test-manager) API. Zero dependencies beyond `requests`.

## Install

### From PyPI (when published)

```bash
pip install ptm-client
```

### From source (development)

```bash
pip install -e packages/ptm-client
# or with dev/test dependencies:
pip install -e "packages/ptm-client[dev]"
```

### Docker mount (no install needed)

```yaml
# docker-compose.override.yml
services:
  app:
    volumes:
      - /path/to/prompt-test-manager/packages/ptm-client/src:/opt/ptm-client-src:ro
    environment:
      PYTHONPATH: /opt/ptm-client-src:/app
```

## Quick Start

```python
from ptm_client import PTMClient

client = PTMClient(base_url="http://localhost:8010", token="your-api-token")

# List prompts
prompts = client.list_prompts(tag="my_team")

# Get prompt detail
detail = client.get_prompt("my_team.summarizer")

# Get prompt test cases
tests = client.get_prompt_tests("my_team.summarizer")

# Run a repository evaluation
run = client.run_eval(
    prompt_ids=["my_team.summarizer"],
    provider_ids=["openai_gpt41_mini"],
)

# Run a manual evaluation
run = client.run_manual_eval({
    "prompt_text": "...",
    "tests": [{"description": "test", "vars": {"name": "World"}}],
    "provider_profiles": ["openai_gpt41_mini"],
    "visibility_scope": "org_visible",
})

# Wait for completion
result = client.wait_for_run(run["run_key"], timeout=120)

# Get HTML report
html = client.run_report(run["run_key"])

# Get JSON report
json_report = client.run_report(run["run_key"], format="json")
```

## API Reference

### `PTMClient(base_url, token, timeout=30)`

Create a client. `token` is a PTM personal access token (`ptm_u_...`) or service account token (`ptm_sa_...`). `timeout` is the HTTP request timeout in seconds.

### Prompts

- **`list_prompts(tag=None)`** — list all prompts, optionally filtered by tag
- **`get_prompt(prompt_id)`** — get full prompt detail (prompt_text, tags, metadata)
- **`get_prompt_tests(prompt_id)`** — get test cases, deepeval metrics, KPIs

### Providers

- **`list_providers()`** — list available LLM provider profiles

### Evaluations

- **`run_eval(prompt_ids, provider_ids, **kwargs)`** — submit repository evaluation
- **`run_manual_eval(payload)`** — submit manual evaluation with custom prompt + tests
- **`run_prompt_eval(prompt_id, provider_ids, *, inject_vars=None, extra_tests=None, visibility_scope="org_visible", label=None)`** — fetch a prompt from PTM, merge runtime vars/tests, and submit (recommended for service integrations)

### Runs

- **`get_run(run_key)`** — get run status (includes score, passed_tests, total_tests)
- **`wait_for_run(run_key, timeout=300, poll_interval=5)`** — block until terminal state
- **`run_report(run_key, format="html")`** — get report (html, json, markdown, csv)

## Test Cases and Scoring

PTM evaluates with up to three scoring layers. Use any combination.

### Promptfoo assertions — deterministic pass/fail checks

Go in the `assert` array inside each test case:

```python
{
    "description": "test case with assertions",
    "vars": {"transcript": "..."},
    "assert": [
        {"type": "javascript", "value": "/meeting purpose/i.test(output)", "description": "has_purpose"},
        {"type": "icontains", "value": "API migration", "description": "mentions_topic"},
        {"type": "javascript", "value": "output.length >= 100", "description": "min_length"},
    ],
}
```

### DeepEval metrics — semantic quality scoring via judge LLM

Go in `additional_metrics` at the payload root:

```python
{
    "additional_metrics": [
        {"name": "relevance", "criteria": "Output addresses the input topic with specific details.", "threshold": 0.7},
        {"name": "structure", "criteria": "Output has clear sections and logical flow.", "threshold": 0.7},
    ],
    "judge_profile": "openai_gpt41_mini",
}
```

### KPI configs — custom weighted expressions

Go in `additional_kpis` at the payload root:

```python
{
    "additional_kpis": [
        {"name": "cost_ok", "description": "Under $0.05", "expression": "1 if cost < 0.05 else 0", "weight": 1.0},
        {"name": "fast", "description": "Under 10s", "expression": "1 if latency_ms < 10000 else 0", "weight": 1.0},
    ],
}
```

### Common patterns

```python
# Promptfoo only (no judge LLM needed)
client.run_manual_eval({"tests": [{"vars": {...}, "assert": [...]}], ...})

# DeepEval only (semantic scoring, no deterministic checks)
client.run_manual_eval({"tests": [{"vars": {...}}], "additional_metrics": [...], ...})

# All three layers
client.run_manual_eval({"tests": [{"vars": {...}, "assert": [...]}], "additional_metrics": [...], "additional_kpis": [...], ...})

# No scoring (just run prompt, capture output)
client.run_manual_eval({"tests": [{"vars": {...}}], ...})
```

See `docs/ptm-client-integration.md` for the full test case reference with all assertion types, metric fields, and KPI variables.

## Inline Test Cases

### `run_manual_eval` — full control

```python
run = client.run_manual_eval({
    "label": "my_custom_eval",
    "prompt_text": '[{"role": "system", "content": "Summarize."}, {"role": "user", "content": "{{text}}"}]',
    "tests": [
        {"description": "short text", "vars": {"text": "The quick brown fox."}},
    ],
    "provider_profiles": ["openai_gpt41_mini"],
    "visibility_scope": "org_visible",
    "cost_threshold": 1.0,
    "latency_threshold_ms": 30000,
})
```

### `run_prompt_eval` — fetch prompt from PTM + inject live data

Recommended for service integrations:

```python
run = client.run_prompt_eval(
    prompt_id="my_team.summarizer",
    provider_ids=["openai_gpt41_mini"],
    inject_vars={"transcript": real_transcript, "meeting_title": "Weekly 1:1"},
)
result = client.wait_for_run(run["run_key"], timeout=120)
```

With extra test cases:

```python
run = client.run_prompt_eval(
    prompt_id="my_team.summarizer",
    provider_ids=["openai_gpt41_mini"],
    extra_tests=[
        {"description": "edge case", "vars": {"transcript": edge_case_text}},
    ],
    visibility_scope="private_only",
    label="meeting_recap_edge_cases",
)
```

## Error Handling

```python
from ptm_client import PTMClient, PTMError, PTMTimeoutError

try:
    result = client.wait_for_run(run_key, timeout=60)
except PTMTimeoutError:
    print("Run did not complete in time")
except PTMError as e:
    print(f"PTM API error ({e.status_code}): {e}")
```

## More

- **[Integration guide](../../docs/ptm-client-integration.md)** — install methods, test case types, scoring layers, Django/FastAPI examples, chained evals
- **[Examples](../../docs/examples/)** — runnable Python scripts for every use case

## Dependencies

`requests` only. No FastAPI, SQLAlchemy, Streamlit, or other PTM server deps.

## Development

```bash
pip install -e "packages/ptm-client[dev]"
cd packages/ptm-client
pytest tests/ -v
ruff check src/ tests/
ruff format src/ tests/
```
