Metadata-Version: 2.4
Name: freesolo
Version: 0.1.6
Summary: Tracing and evaluation SDK for LLM applications.
Requires-Python: >=3.10
Requires-Dist: httpx>=0.27.0
Provides-Extra: dev
Requires-Dist: ruff>=0.11.0; extra == 'dev'
Provides-Extra: examples
Requires-Dist: anthropic>=0.40.0; extra == 'examples'
Requires-Dist: google-genai>=1.0.0; extra == 'examples'
Requires-Dist: openai>=1.0.0; extra == 'examples'
Description-Content-Type: text/markdown

# freesolo

`freesolo` is a Python tracing and evaluation package for LLM apps.

For the Node/npm package, see [`npm/`](./npm).

It is built for the lowest-friction integration possible:

1. Install the package
2. Set `FREESOLO_API_KEY`
3. Wrap your OpenAI, Anthropic, Gemini, or OpenAI-compatible client
4. Run traces and evaluations from the same SDK

## Current provider support

`freesolo` currently supports automatic client instrumentation for:

- OpenAI
- Anthropic
- Gemini
- OpenAI-compatible clients via `wrap(...)` / `wrap_provider(...)`

## Install

Install the package plus the provider SDK you use:

```bash
pip install freesolo openai
```

or

```bash
pip install freesolo anthropic
```

or

```bash
pip install freesolo google-genai
```

## Environment

- `FREESOLO_API_KEY`
- `FREESOLO_BASE_URL` (optional, defaults to `https://freesolo.co`)

```bash
export FREESOLO_API_KEY=fslo_...
```

## Quickstart

```python
from openai import OpenAI
from freesolo import wrap

client = wrap(OpenAI())

result = client.responses.create(
    model="gpt-4.1-mini",
    instructions="Reply in plain text.",
    input=[
        {
            "role": "user",
            "content": [{"type": "input_text", "text": "How do I reset my password?"}],
        }
    ],
)

print(result.output_text or "")
```

## OpenRouter Quickstart

```python
from openai import OpenAI
from freesolo import wrap

client = wrap(
    OpenAI(
        base_url="https://openrouter.ai/api/v1",
        api_key="YOUR_OPENROUTER_API_KEY",
    )
)

response = client.chat.completions.create(
    model="openai/gpt-4.1-mini",
    messages=[
        {"role": "system", "content": "Reply in plain text."},
        {"role": "user", "content": "Write a one-sentence launch blurb."},
    ],
    max_tokens=120,
)

print(response.choices[0].message.content or "")
```

## Gemini Quickstart

```python
from google import genai
from freesolo import instrument_gemini

client = instrument_gemini(genai.Client())

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Write a one-sentence release note for traced Gemini support.",
)

print(response.text or "")
```

## Group Multiple Model Calls

For agentic or long-horizon tasks, strongly prefer wrapping the whole task in `start_trace(...)` so all of the model calls land in one trace.

For a single one-off OpenAI, Anthropic, or Gemini request, you can skip it.

```python
from anthropic import Anthropic
from freesolo import instrument_anthropic, start_trace

client = instrument_anthropic(Anthropic())

with start_trace("support-agent-run"):
    first = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=64,
        messages=[{"role": "user", "content": "Say hello"}],
    )
    second = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=64,
        messages=[{"role": "user", "content": "Say goodbye"}],
    )
```

## What Gets Stored

- Trace title if you explicitly pass it to `start_trace("...")`
- Trace metadata if you explicitly pass it to `start_trace(..., metadata=...)`
- Input payloads with `system_prompt`, `user_prompt`, and `images`
- Output payloads as plain text
- Token usage when available
- Image inputs with inline previews for the trace UI

## Notes

- You do not need `@trace()` for ordinary LLM tracing.
- A single instrumented OpenAI, Anthropic, or Gemini request creates a trace automatically.
- For OpenAI-compatible providers like OpenRouter, prefer `wrap(...)` instead of provider-specific helpers.
- For agentic or long-horizon workflows, strongly recommend `start_trace("descriptive-title")` so planning, retries, and follow-up calls stay grouped.
- Delivery is best-effort by default. Trace ingestion failures do not break your app.

## Evaluations

`freesolo` also includes a small evaluation SDK for CI jobs, GitHub bots, and
eval scripts. All evaluation runs require `FREESOLO_API_KEY` or an explicit
`api_key`.

Evaluation data is a list of plain dictionaries. There is no separate `Example`
class to construct.

Define scorers by subclassing `CustomScorer` and returning `BinaryResponse` or
`NumericResponse`. Scorers run in your process, and Freesolo uploads the final
results with your API key. Pass scorer objects, not strings.

```python
from typing import Any

from freesolo import Freesolo
from freesolo.evaluation import BinaryResponse, CustomScorer


class ExactMatch(CustomScorer[BinaryResponse]):
    async def score(self, row: dict[str, Any]) -> BinaryResponse:
        actual = str(row.get("actual_output", "")).strip()
        expected = str(row.get("expected_output", "")).strip()
        return BinaryResponse(
            value=actual == expected and bool(actual),
            reason="actual_output matched expected_output",
        )


client = Freesolo(project_name="support-agent")

results = client.evals.run(
    [
        {
            "input": "What is the capital of France?",
            "actual_output": "Paris",
            "expected_output": "Paris",
        }
    ],
    scorers=[ExactMatch()],
    eval_run_name="ci-smoke",
    assert_test=True,
)

print(results[0].success)
```

Custom scorer:

```python
from typing import Any

from freesolo import Freesolo
from freesolo.evaluation import BinaryResponse, CustomScorer


class NoEmptyAnswer(CustomScorer[BinaryResponse]):
    async def score(self, row: dict[str, Any]) -> BinaryResponse:
        ok = bool(str(row.get("actual_output", "")).strip())
        return BinaryResponse(value=ok, reason="actual_output is non-empty")


results = Freesolo(project_name="support-agent").evals.run(
    [{"actual_output": "hello"}],
    scorers=[NoEmptyAnswer()],
    eval_run_name="custom-smoke",
    assert_test=True,
)
```

For CI, set `assert_test=True` to raise an `AssertionError` when any row fails.

Runnable evaluation examples live in:

```bash
python -m freesolo.examples.evaluation.custom
```

Tracing is available from the same root client:

```python
from freesolo import Freesolo

client = Freesolo()

with client.traces.start("support-agent-run"):
    ...
```

You can also import namespaced tracing helpers directly:

```python
from freesolo.tracing import start_trace, wrap
```
