Metadata-Version: 2.4
Name: bolder-ai
Version: 0.1.1
Summary: Python SDK for BEval Studio by Bolder — log LLM/VLM/agent calls to the BEval dashboard. Import as `beval`.
Project-URL: Homepage, https://bolder.services
Project-URL: Dashboard, https://ai-gateway.bolder.services
Project-URL: Documentation, https://github.com/bolder/beval-python#readme
Project-URL: Source, https://github.com/bolder/beval-python
Project-URL: Bug Tracker, https://github.com/bolder/beval-python/issues
Project-URL: Changelog, https://github.com/bolder/beval-python/releases
Author-email: Bolder <support@bolder.services>
Maintainer-email: Bolder <support@bolder.services>
License: MIT
License-File: LICENSE
Keywords: beval,evals,llm,logging,observability
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Requires-Dist: httpx>=0.25
Provides-Extra: all
Requires-Dist: anthropic>=0.25; extra == 'all'
Requires-Dist: openai>=1.0; extra == 'all'
Requires-Dist: tqdm>=4.60; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.25; extra == 'anthropic'
Provides-Extra: cli
Requires-Dist: tqdm>=4.60; extra == 'cli'
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=7; extra == 'dev'
Requires-Dist: respx>=0.20; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Requires-Dist: tqdm>=4.60; extra == 'dev'
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == 'openai'
Description-Content-Type: text/markdown

# bolder-ai

Python SDK for [BEval Studio](https://bolder.services) by Bolder — log LLM/VLM/agent calls to your observability & evaluation dashboard.

Distribution name is **`bolder-ai`**. Import name is **`beval`**.

- Fire-and-forget, non-blocking (background thread queue)
- Works with raw calls, `@trace` decorator, or auto-instrumented OpenAI / Anthropic clients
- Zero hard deps beyond `httpx`

## Install

```bash
pip install bolder-ai
# optional integrations
pip install 'bolder-ai[openai]'
pip install 'bolder-ai[anthropic]'
```

Then `import beval` in your code.

Requires Python 3.9+.

## Configure

Set environment variables (or pass to `beval.init(...)`):

| Env | Purpose |
| --- | --- |
| `BEVAL_API_KEY` | Your BEval project API key (required) |
| `BEVAL_API_URL` | Gateway base URL (default: `https://ai-gateway.bolder.services`) |
| `BEVAL_PROJECT_ID` | Optional project scoping |
| `BEVAL_DEFAULT_MODEL_ID` | Default `model_id` if not passed per-call |
| `BEVAL_DEBUG` | `1` to enable debug logging |

## Quick start

### 1. Raw log

```python
import beval

beval.init()  # reads env

beval.log(
    kind="llm",
    model_id="gpt-4o-mini",
    input="What is the capital of France?",
    output="Paris.",
    latency_ms=312,
    tokens_in=7,
    tokens_out=2,
)
```

### 2. Auto-wrap OpenAI

```python
import beval
from openai import OpenAI

beval.init()
client = beval.wrap(OpenAI())

client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hi"}],
)
```

Automatically captures input messages, output, model, token counts, latency, errors. Detects image parts and logs as `kind="vlm"`.

### 3. Auto-wrap Anthropic

```python
import beval
from anthropic import Anthropic

beval.init()
client = beval.wrap(Anthropic())

client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=64,
    messages=[{"role": "user", "content": "Hi"}],
)
```

### 4. `@beval.trace` decorator

Wrap any function (sync or async) as an `agent` log:

```python
@beval.trace
def run_agent(query: str) -> str:
    return ...

@beval.trace(name="tool:search", kind="agent")
async def search(q): ...
```

Captures args, return value, latency, and exceptions (logged with `status="failure"`).

### VLM / images

Pass `image=` to attach a base64 data URL (matches the dashboard's VLM preview):

```python
beval.log(
    kind="vlm",
    model_id="gpt-4o",
    input="describe this",
    output="a cat",
    image=open("cat.png", "rb").read(),
    image_mime="image/png",
)
```

### Redaction

Strip PII before send:

```python
def redact(payload: dict) -> dict:
    if payload.get("input"):
        payload["input"] = scrub_pii(payload["input"])
    return payload

beval.init(redact=redact)
```

## Lifecycle

- `beval.flush(timeout=5.0)` — wait for queued logs to drain
- `beval.shutdown()` — drain + close (runs automatically at interpreter exit)

## Reliability

- Non-blocking: `log()` enqueues and returns immediately
- Network failures never raise — they're logged via the `beval` logger
- Drops on queue overflow (default capacity: 10,000)
- Retries transient errors (408/429/5xx) with exponential backoff

## License

MIT
