Metadata-Version: 2.4
Name: canary-sdk
Version: 1.0.0
Summary: The QA SDK for AI - Python client for the Canary QA Platform
Project-URL: Homepage, https://canaryqa.ai
Project-URL: Repository, https://github.com/canary-qa/canary-sdk-python
Project-URL: Documentation, https://docs.canaryqa.ai
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.12
Requires-Dist: httpx>=0.28.1
Requires-Dist: presidio-analyzer>=2.2.361
Requires-Dist: presidio-anonymizer>=2.2.361
Requires-Dist: pydantic>=2.10.0
Requires-Dist: spacy>=3.7.0
Description-Content-Type: text/markdown

# canary-sdk

[![Canary](https://img.shields.io/badge/canary-QA-0D7377)](https://canaryqa.ai)
[![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![PyPI](https://img.shields.io/pypi/v/canary-sdk)](https://pypi.org/project/canary-sdk/)

**The QA Platform for AI.** Monitor your AI application's output quality in real-time.
Know before your users do.

## Quick Start

```bash
pip install canary-sdk
```

```python
import canary_sdk

canary_sdk.init(api_key="cny_live_sk_...")

# Wrap any AI call — 3 lines total
result = canary_sdk.capture(
    ai_input="What is the ICD-10 code for Type 2 Diabetes?",
    ai_output="E11.9",
    model="gpt-4o",
    feature="diagnosis_check",
)
```

Get your API key at [canaryqa.ai](https://canaryqa.ai).

## How It Works

```mermaid
graph LR
    A[Your AI App] -->|canary_sdk.capture| B[Canary SDK]
    B -->|async batch<br/>fire-and-forget| C[Canary API]
    C -->|Inngest event| D[Evaluators<br/>accuracy · hallucination · performance]
    D --> E[Quality Scores]
    E --> F[Dashboard & Alerts]
```

The SDK adds **zero latency** to your AI calls. Traces are buffered and flushed in a background
thread (batch of 25 or every 5 seconds). If the SDK encounters any error, your application
continues running uninterrupted.

## Integration Patterns

### Pattern 1: Direct Capture

Wrap any function call manually — works with any LLM provider:

```python
import canary_sdk

canary_sdk.init(api_key="cny_live_sk_...")

# Call your LLM as usual
response = my_llm_call(prompt)

# Then capture the trace
canary_sdk.capture(
    ai_input=prompt,
    ai_output=response,
    model="gpt-4o",
    feature="diagnosis_check",
)
```

### Pattern 2: Decorator

Automatically capture inputs and outputs of a function — zero boilerplate:

```python
from canary_sdk import Canary

canary = Canary(api_key="cny_live_sk_...")

@canary.trace(feature="diagnosis_check")
def check_diagnosis(patient_note: str) -> str:
    return llm.generate(patient_note)

# Traces are captured automatically — just call your function normally
result = check_diagnosis("Patient presents with polyuria and polydipsia...")
```

### Pattern 3: LiteLLM Callback

Instrument all LiteLLM calls automatically — one registration captures everything:

```python
import litellm
import canary_sdk

canary_sdk.init(api_key="cny_live_sk_...")
litellm.callbacks = [canary_sdk.callback()]

# Tag each call with a feature name via metadata
response = await litellm.acompletion(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}],
    metadata={"feature_tag": "diagnosis_check"},
)
# Trace is automatically captured — no extra code needed
```

## PHI Redaction

PHI redaction via [Microsoft Presidio](https://microsoft.github.io/presidio/) is enabled by
default. Patient names, dates, SSNs, and other identifiers are redacted before data leaves
your application.

```python
# PHI redaction ON by default — safe for healthcare AI
canary_sdk.init(api_key="cny_live_sk_...")

# Opt out for non-healthcare applications
canary_sdk.init(api_key="cny_live_sk_...", redact_phi=False)
```

Healthcare terms are preserved: drug names, disease names, ICD-10 codes, and clinical
terminology are **not** redacted.

## Feature Tags

Feature tags let you track quality separately for each AI capability in your dashboard:

```python
# Good — specific feature tags give you per-feature quality trends
metadata={"feature_tag": "diagnosis_check"}
metadata={"feature_tag": "icd10_lookup"}
metadata={"feature_tag": "risk_adjustment"}
metadata={"feature_tag": "prior_auth_review"}

# Also valid — no tag (traces aggregated under org level)
# (omit metadata kwarg)
```

## Configuration

| Parameter | Default | Description |
|-----------|---------|-------------|
| `api_key` | required | Your Canary API key (`cny_live_` or `cny_test_`) |
| `redact_phi` | `True` | Enable Microsoft Presidio PHI redaction |
| `batch_size` | `25` | Traces per batch flush |
| `flush_interval` | `5.0` | Seconds between automatic flushes |
| `api_url` | `https://api.canaryqa.ai` | Canary API base URL |

## Dashboard

View quality scores, failure drill-downs, and configure threshold alerts at
**[app.canaryqa.ai](https://canaryqa.ai)**.

The dashboard shows:
- **Quality score per feature** — composite of accuracy, hallucination, and performance scores
- **Failure drill-down** — 3-panel diff view: input, AI output with errors highlighted, expected output
- **Trend charts** — quality over time, per model and per feature
- **Alert configuration** — set thresholds, Slack/email destinations

## License

MIT — see [LICENSE](LICENSE).
