Metadata-Version: 2.4
Name: ashr-labs
Version: 0.1.4
Summary: Python SDK for the Ashr Labs API
License-Expression: MIT
Project-URL: Homepage, https://github.com/ashr-labs/testing-platform
Project-URL: Documentation, https://github.com/ashr-labs/testing-platform#readme
Project-URL: Repository, https://github.com/ashr-labs/testing-platform
Keywords: ashr,labs,api,sdk
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Dynamic: license-file

# Ashr Labs Python SDK

A Python client library for evaluating AI agents against Ashr Labs test datasets.

## Documentation

- [Testing Your Agent](docs/testing-your-agent.md) — **start here** (includes debugging failures with transcripts and classification)
- [Quick Start Guide](docs/quickstart.md)
- [Installation](docs/installation.md)
- [Authentication](docs/authentication.md)
- [API Reference](docs/api-reference.md)
- [Error Handling](docs/error-handling.md)
- [Examples](docs/examples.md)

## Installation

```bash
pip install ashr-labs
```

## Quick Start

```python
from ashr_labs import AshrLabsClient, EvalRunner

# Only need your API key — base_url and tenant_id are automatic
client = AshrLabsClient(api_key="tp_your_api_key_here")

# Fetch a dataset and run your agent against it
runner = EvalRunner.from_dataset(client, dataset_id=42)
run = runner.run(my_agent)

# Submit results — grading happens server-side
created = run.deploy(client, dataset_id=42)

# Wait for grading to complete (typically 1-3 minutes)
graded = client.poll_run(created["id"])
metrics = graded["result"]["aggregate_metrics"]
print(f"Passed: {metrics['tests_passed']}/{metrics['total_tests']}")
```

Your agent just needs two methods:

```python
class MyAgent:
    def respond(self, message: str) -> dict:
        # Call your LLM, return {"text": "...", "tool_calls": [...]}
        return {"text": "response", "tool_calls": []}

    def reset(self) -> None:
        # Clear conversation history between scenarios
        pass
```

See [Testing Your Agent](docs/testing-your-agent.md) for a full end-to-end guide.

## Available Methods

All methods that accept `tenant_id` auto-resolve it from your API key if omitted.

### Datasets

| Method | Description |
|--------|-------------|
| `get_dataset(dataset_id, ...)` | Get a dataset by ID |
| `list_datasets(limit, offset, ...)` | List datasets |

### Runs

| Method | Description |
|--------|-------------|
| `create_run(dataset_id, result, ...)` | Create a new test run |
| `get_run(run_id)` | Get a run by ID |
| `list_runs(dataset_id, limit, offset)` | List runs |
| `delete_run(run_id)` | Delete a run |
| `poll_run(run_id, timeout, poll_interval)` | Wait for server-side grading to complete |

### EvalRunner

| Method | Description |
|--------|-------------|
| `EvalRunner.from_dataset(client, dataset_id)` | Create a runner from a dataset |
| `runner.run(agent, max_workers=1, on_environment=...)` | Run agent against all scenarios, return `RunBuilder` |
| `runner.run_and_deploy(agent, client, dataset_id, max_workers=1)` | Run and submit in one call |

### RunBuilder

| Method | Description |
|--------|-------------|
| `RunBuilder()` | Create a new run builder |
| `run.start()` | Mark the run as started |
| `run.add_test(test_id)` | Add a test and get a `TestBuilder` |
| `run.complete(status)` | Mark the run as completed |
| `run.build()` | Serialize to a result dict |
| `run.deploy(client, dataset_id)` | Build and submit via the API |

### TestBuilder

| Method | Description |
|--------|-------------|
| `test.start()` | Mark the test as started |
| `test.add_user_file(file_path, description)` | Record a user file upload |
| `test.add_user_text(text, description)` | Record a user text input |
| `test.add_tool_call(expected, actual, match_status)` | Record an agent tool call |
| `test.add_agent_response(expected_response, actual_response, match_status)` | Record an agent response |
| `test.complete(status)` | Mark the test as completed |

### Requests

| Method | Description |
|--------|-------------|
| `create_request(request_name, request, ...)` | Create a new request |
| `get_request(request_id)` | Get a request by ID |
| `list_requests(status, limit, offset)` | List requests |

### API Keys & Session

| Method | Description |
|--------|-------------|
| `init()` | Validate credentials and get user/tenant info |
| `list_api_keys(include_inactive)` | List API keys for your tenant |
| `revoke_api_key(api_key_id)` | Revoke an API key |
| `health_check()` | Check if the API is reachable |

## Error Handling

```python
from ashr_labs import AshrLabsClient, NotFoundError, AuthenticationError

client = AshrLabsClient(api_key="tp_...")

try:
    dataset = client.get_dataset(dataset_id=999)
except AuthenticationError:
    print("Invalid API key")
except NotFoundError:
    print("Dataset not found")
```

## Configuration

```python
# All defaults — just pass API key
client = AshrLabsClient(api_key="tp_...")

# From environment (reads ASHR_LABS_API_KEY)
client = AshrLabsClient.from_env()

# Custom timeout
client = AshrLabsClient(api_key="tp_...", timeout=60)

# Custom base URL (for self-hosted)
client = AshrLabsClient(api_key="tp_...", base_url="https://your-api.example.com")
```

## Requirements

- Python 3.10+
- No external dependencies (uses only standard library)

## License

MIT
