Metadata-Version: 2.4
Name: freesolo
Version: 0.2.7
Summary: Tracing, evaluation, and training utilities for LLM applications.
Requires-Python: >=3.11
Requires-Dist: gepa>=0.1.1
Requires-Dist: httpx>=0.27.0
Requires-Dist: jsonschema>=4.0.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: opentelemetry-api>=1.28.0
Requires-Dist: opentelemetry-exporter-otlp-proto-http>=1.28.0
Requires-Dist: opentelemetry-sdk>=1.28.0
Requires-Dist: pymongo>=4.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: tinker-cookbook>=0.3.0
Requires-Dist: tinker>=0.19.0
Requires-Dist: wandb>=0.17.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.11.0; extra == 'dev'
Provides-Extra: examples
Requires-Dist: openai>=1.0.0; extra == 'examples'
Description-Content-Type: text/markdown

# freesolo

`freesolo` is the Python SDK used by Freesolo-generated training repos.

The SDK gives generated repos one shared surface for:

- loading the approved training contract
- loading datasets and building training conversations
- defining the repo-specific task environment
- running contract-aligned evaluations
- running GEPA prompt optimization
- launching SFT and GRPO training
- optionally exporting OpenTelemetry traces

The main idea is that a generated repo should contain only the task-specific
files under `freesolo/`, while the reusable training, evaluation, dataset,
contract, and tracing behavior comes from this package.

## Install

```bash
pip install freesolo
```

From a checkout:

```bash
cd freesolo-sdk
export PYTHONPATH="$PWD/pypi"
```

## Credentials

Most workflows that upload results or start hosted work need a Freesolo API key:

```bash
export FREESOLO_API_KEY=fslo_...
```

Optional environment variables:

- `FREESOLO_BASE_URL`: defaults to `https://api.freesolo.co`
- `OPENROUTER_API_KEY`: hosted LLM-as-judge scorers
- `TINKER_API_KEY`: SFT and GRPO training
- `WANDB_API_KEY`: experiment tracking when enabled by the generated repo

## Generated Repo Flow

The SDK is built around the files that Freesolo agents generate in a target
repo:

```text
freesolo/TRAINING_CONTRACT.md
freesolo/config.py
freesolo/environment.py
freesolo/data.py
freesolo/eval.py
freesolo/gepa.py
freesolo/training.py
```

A normal generated repo flow is:

1. Write or approve `freesolo/TRAINING_CONTRACT.md`.
2. Define the task once in `freesolo/environment.py`.
3. Run evals against candidate model outputs with the same environment and
   contract.
4. Use the same environment for GEPA, SFT, and GRPO.
5. Add tracing only when you need observability for app or SDK spans.

Tracing is not the center of the SDK. It is optional instrumentation around the
contract/eval/training loop.

## Environment

`Environment` is the task adapter. It defines how examples become model prompts
and how model responses are scored.

```python
from freesolo.datasets import TaskExample
from freesolo.environments import Environment, RewardResult


class RepoEnvironment(Environment):
    def build_prompt_messages(self, example: TaskExample, prompt_text: str):
        return [
            {"role": "system", "content": prompt_text},
            {"role": "user", "content": example.task},
        ]

    def score_response(self, example: TaskExample, response_text: str) -> RewardResult:
        expected = str(example.expected_output or "").strip()
        actual = response_text.strip()
        passed = actual == expected
        return RewardResult(
            name="exact_match",
            score=1.0 if passed else 0.0,
            success=passed,
            threshold=1.0,
            reason="matched expected output" if passed else "mismatch",
            return_type="binary",
        )


def load_environment(**_: object) -> Environment:
    return RepoEnvironment()
```

Generated repo helpers should pass this reference through SDK APIs:

```python
ENVIRONMENT_REFERENCE = "freesolo/environment.py:load_environment"
```

That keeps evals, GEPA, SFT, and GRPO aligned on one prompt and reward
definition.

## Evaluations

Environment evals run model outputs through the contract and environment reward
logic, then upload the result to Freesolo.

```python
from openai import OpenAI

from freesolo.datasets import TaskExample
from freesolo.environments import EnvironmentGeneration
from freesolo.evaluation import EvaluationClient

from config import CONTRACT_PATH, ENVIRONMENT_REFERENCE


client = OpenAI()


def generate(messages: list[dict[str, str]], example: TaskExample):
    response = client.chat.completions.create(
        model="gpt-4.1-mini",
        messages=messages,
    )
    return EnvironmentGeneration(
        response_text=response.choices[0].message.content or "",
        total_tokens=response.usage.total_tokens if response.usage else None,
    )


results = EvaluationClient().run_environment(
    name="dev-eval",
    source="runs/eval/dev.jsonl",
    contract_path=CONTRACT_PATH,
    environment=ENVIRONMENT_REFERENCE,
    generate=generate,
)
```

For smaller scripts and CI checks, custom scorers are also supported:

```python
from typing import Any

from freesolo.evaluation import BinaryResponse, CustomScorer, EvaluationClient


class NoEmptyAnswer(CustomScorer[BinaryResponse]):
    async def score(self, row: dict[str, Any]) -> BinaryResponse:
        ok = bool(str(row.get("actual_output", "")).strip())
        return BinaryResponse(value=ok, reason="actual_output is non-empty")


results = EvaluationClient().run(
    name="non-empty-answer",
    data=[{"actual_output": "hello"}],
    scorers=[NoEmptyAnswer()],
)
```

## GEPA And Training

GEPA, SFT, and GRPO use the same contract, datasets, and environment adapter as
evals. Generated repos should call the SDK helpers rather than copying trainer
or optimizer internals.

```python
from freesolo.training import train_grpo, train_sft

from config import (
    BASE_MODEL,
    CONTRACT_PATH,
    ENVIRONMENT_REFERENCE,
    GRPO_DATASET_PATH,
    GRPO_LOG_DIR,
    SFT_CONFIG,
    SFT_DATASET_PATH,
    SFT_LOG_DIR,
)


def run_sft() -> int:
    return train_sft(
        contract_path=CONTRACT_PATH,
        dataset_path=SFT_DATASET_PATH,
        environment=ENVIRONMENT_REFERENCE,
        log_dir=SFT_LOG_DIR,
        base_model=BASE_MODEL,
        sft_config=SFT_CONFIG,
    )


def run_grpo() -> int:
    return train_grpo(
        contract_path=CONTRACT_PATH,
        dataset_path=GRPO_DATASET_PATH,
        environment=ENVIRONMENT_REFERENCE,
        log_dir=GRPO_LOG_DIR,
        sft_log_dir=SFT_LOG_DIR,
        base_model=BASE_MODEL,
    )
```

## Tracing

Tracing is available for applications or generated repo commands that need span
export. Configure it at process startup, then use normal OpenTelemetry spans.

```python
from freesolo.tracing import configure_tracer, force_flush, get_tracer

configure_tracer(service_name="my-training-repo")
tracer = get_tracer()

with tracer.start_as_current_span("eval.batch") as span:
    span.set_attribute("freesolo.dataset", "runs/eval/dev.jsonl")

force_flush()
```

## Runnable Examples

Copy-pasteable examples live in [`examples/`](examples/):

- `environment.py`: task environment used by evals, training, and GEPA.
- `support_dataset.py`: dataset loading helpers for evals, SFT, GRPO, and GEPA.
- `evaluation_from_files.py`: run an environment eval from concrete files.
- `evaluation_custom_scorer.py`: run local custom scorers.
- `gepa_prompt_example.py`: run the Freesolo GEPA adapter.
- `training_sft_grpo.py`: start SFT or GRPO training from package APIs.
- `tracing_manual_span.py`: send one OpenTelemetry span.

Example:

```bash
uv run python examples/evaluation_custom_scorer.py --local
```

## Public API

The root `freesolo` module intentionally exports no functions. Import from the
subpackages below; lower-level modules may be importable, but they are
implementation helpers unless they appear here or in an example.

| Import | Use case |
| --- | --- |
| `freesolo.contracts.load_contract_text`, `extract_contract_spec`, `load_contract_spec`, `build_oracle_messages` | Read contract markdown and build oracle prompt messages. |
| `freesolo.datasets.TaskExample`, `Dataset`, `load_dataset` | Load task examples and construct labeled conversations for evals or training. |
| `freesolo.environments.Environment`, `RewardResult`, `RewardMetric`, `EnvironmentGeneration` | Define task prompt and reward behavior once for evals, GEPA, SFT, and GRPO. |
| `freesolo.evaluation.EvaluationClient` | Run custom-scorer evals or environment evals and upload results to Freesolo. |
| `freesolo.evaluation.run_local_evaluation` | Run custom scorers locally without uploading results. |
| `freesolo.evaluation.CustomScorer`, `BinaryResponse`, `NumericResponse` | Define local scorer logic for eval rows. |
| `freesolo.evaluation.HostedJudgeClient` and hosted scorer classes | Use hosted LLM-as-judge scorers with OpenRouter-compatible credentials. |
| `freesolo.gepa.GEPASetup`, `GEPAConfig`, `DefaultReflectionAgent`, `attach_gepa`, `optimize_gepa` | Optimize prompts through the GEPA adapter using the same environment and dataset abstractions. |
| `freesolo.training.SftConfig`, `GrpoConfig`, `TrainGrpoOptions`, `train_sft`, `train_grpo` | Start SFT or GRPO training from package APIs. |
| `freesolo.tracing.configure_tracer`, `get_tracer`, `force_flush`, `shutdown` | Export OpenTelemetry traces when observability is needed. |
| `freesolo.utils.oracle.generate_ground_truth_records` | Generate ground-truth JSONL records from source examples using a contract, environment, and oracle model. |
| `freesolo.utils.upload.upload_tinker_checkpoint_to_huggingface` | Upload a Tinker checkpoint to a private Hugging Face model repo. |

## Package Docs

The generated-repo-facing package notes live next to the modules:

- [`pypi/freesolo/README.md`](pypi/freesolo/README.md)
- [`pypi/freesolo/contracts/README.md`](pypi/freesolo/contracts/README.md)
- [`pypi/freesolo/datasets/README.md`](pypi/freesolo/datasets/README.md)
- [`pypi/freesolo/environments/README.md`](pypi/freesolo/environments/README.md)
- [`pypi/freesolo/evaluation/README.md`](pypi/freesolo/evaluation/README.md)
- [`pypi/freesolo/gepa/README.md`](pypi/freesolo/gepa/README.md)
- [`pypi/freesolo/training/README.md`](pypi/freesolo/training/README.md)
- [`pypi/freesolo/tracing/README.md`](pypi/freesolo/tracing/README.md)
