Metadata-Version: 2.4
Name: hud-python
Version: 0.6.0
Summary: SDK for the HUD platform.
Project-URL: Homepage, https://github.com/hud-evals/hud-python
Project-URL: Bug Tracker, https://github.com/hud-evals/hud-python/issues
Project-URL: Documentation, https://docs.hud.ai
Author-email: HUD <founders@hud.ai>
License: MIT License
        
        Copyright (c) 2025 Human Union Data, Inc
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: <3.13,>=3.11
Requires-Dist: anthropic>=0.78.0
Requires-Dist: asyncssh>=2.23.0
Requires-Dist: asyncvnc>=1.3.0
Requires-Dist: fastmcp==3.0.2
Requires-Dist: google-genai
Requires-Dist: httpx<1,>=0.23.0
Requires-Dist: mcp<2.0,>=1.24.0
Requires-Dist: openai>=2.26.0
Requires-Dist: packaging>=21.0
Requires-Dist: pillow>=11.0.0
Requires-Dist: prompt-toolkit==3.0.51
Requires-Dist: pydantic-settings<3,>=2.2
Requires-Dist: pydantic<3,>=2.6
Requires-Dist: questionary==2.1.0
Requires-Dist: rich>=13.0.0
Requires-Dist: typer>=0.9.0
Requires-Dist: websockets>=15.0.1
Provides-Extra: agent
Provides-Extra: agents
Provides-Extra: bedrock
Requires-Dist: anthropic[bedrock]>=0.78.0; extra == 'bedrock'
Provides-Extra: browseruse
Requires-Dist: browser-use>=0.11.13; extra == 'browseruse'
Provides-Extra: daytona
Requires-Dist: daytona>=0.100; extra == 'daytona'
Provides-Extra: dev
Requires-Dist: dotenv>=0.9.9; extra == 'dev'
Requires-Dist: pyright==1.1.407; extra == 'dev'
Requires-Dist: pytest-asyncio; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest-mock; extra == 'dev'
Requires-Dist: pytest>=8.1.1; extra == 'dev'
Requires-Dist: ruff<0.15.0,>=0.11.8; extra == 'dev'
Provides-Extra: modal
Requires-Dist: modal>=1.0; extra == 'modal'
Provides-Extra: robot
Requires-Dist: numpy>=1.24; extra == 'robot'
Requires-Dist: openpi-client>=0.1.2; extra == 'robot'
Provides-Extra: train
Requires-Dist: torch>=2; extra == 'train'
Description-Content-Type: text/markdown

<div align="left">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/logo/hud_logo_dark.svg">
    <source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/logo/hud_logo.svg">
    <img src="https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/logo/hud_logo.svg" alt="HUD" width="150" style="margin-bottom: 24px;"/>
  </picture>
</div>

HUD is a platform for building RL environments for AI agents, across coding, browser, computer-use, and robotics. Define an environment, write tasks, and run them as evals and training across any model, at any scale.

To learn more, see the [documentation](https://docs.hud.ai) and [API reference](https://docs.hud.ai/reference/environment).

[![PyPI](https://img.shields.io/pypi/v/hud-python?style=flat-square)](https://pypi.org/project/hud-python/)
[![License](https://img.shields.io/badge/license-MIT-green?style=flat-square)](LICENSE)
[![Add docs to Cursor](https://img.shields.io/badge/Add%20docs%20to-Cursor-black?style=flat-square)](https://cursor.com/en/install-mcp?name=docs-hud-python&config=eyJ1cmwiOiJodHRwczovL2RvY3MuaHVkLmFpL21jcCJ9)
[![Discord](https://img.shields.io/discord/1327447144772407390?label=Discord&logo=discord&style=flat-square)](https://discord.gg/wkjtmHYYjm)
[![X Follow](https://img.shields.io/twitter/follow/hud_evals?style=social)](https://x.com/intent/user?screen_name=hud_evals)
[![Scarf](https://static.scarf.sh/a.png?x-pxid=6530ff33-4945-452b-81f9-626872593933)](https://scarf.sh)
[![Docs](https://img.shields.io/badge/docs-hud.ai-blue?style=flat-square)](https://docs.hud.ai)

## Install

```bash
# Install the CLI (recommended)
uv tool install hud-python --python 3.12

# …or as a library
pip install hud-python
```

Get your API key at [hud.ai/project/api-keys](https://hud.ai/project/api-keys) and set it:

```bash
hud set HUD_API_KEY=your-key-here
# or: export HUD_API_KEY=your-key-here
```

Then scaffold your first environment:

```bash
hud init my-env
```

![Agent running on SheetBench](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/trace_sheet.gif)

## The protocol

HUD is **protocol-first**. An agent and an environment exchange just three things: a **manifest** (the environment's capabilities and tasks), **`tasks.start`** that returns the prompt, and **`tasks.grade`** that returns the reward. In between, the agent just *works*, driving the capabilities itself. HUD owns only that thin envelope, so any model or harness plugs into any environment.

```mermaid
sequenceDiagram
    participant Agent
    participant Env as Environment
    participant Caps as Capabilities (ssh · mcp · cdp · rfb · robot)
    Agent->>Env: manifest exchange
    Env-->>Agent: capabilities + tasks
    Agent->>Env: tasks.start
    Env-->>Agent: prompt
    rect rgb(238,238,238)
    Note over Agent,Caps: the agent works, driving capabilities directly
    Agent->>Caps: shell · browser · GUI · tools · robot
    Caps-->>Agent: observations
    end
    Agent->>Env: tasks.grade
    Env-->>Agent: reward
```

Because the protocol only exposes **capabilities** (never a fixed agent), an environment outlives any single harness: new harnesses and models keep running against the same environments, benchmarks, and tasks.

## Package & run anywhere

A built image is the **end product for your tasks**: one build packs every task from a single definition. The recommended path is **`hud deploy`**, which builds and registers your environment on HUD in one step; then sync a taskset and run remotely:

```bash
hud deploy
hud sync tasks my-taskset
hud eval my-taskset --remote
```

For local iteration, the same protocol works against a container on your laptop:

```bash
hud build .
docker run -d --name run1 my-env
docker exec run1 hud task start fix_bug
docker exec run1 hud task grade fix_bug --answer "…"
docker rm -f run1
```

→ [Package & deploy](https://docs.hud.ai/run/deploy)

## Environments & templates

A **template** is an async generator registered with `@env.template()`: `yield` a prompt, receive the agent's answer, `yield` a reward. Calling the template mints a runnable **Task**; one function spans a whole dataset of variants. The simplest needs no capabilities — just a prompt and a grader:

```python
from hud import Environment

env = Environment(name="letter-count")

@env.template()
async def count_letter(word: str = "strawberry", letter: str = "r"):
    answer = yield f"How many '{letter}'s are in '{word}'? Reply with just the number."
    yield 1.0 if answer and str(word.count(letter)) in answer else 0.0

tasks = [count_letter(word=w) for w in ("strawberry", "raspberry", "blueberry")]
```

Run it immediately against any model:

```bash
hud eval tasks.py claude --group 3
```

Each graded evaluation is a **trace** (the SDK's live handle is a `Run`). With `HUD_API_KEY` set, every rollout is recorded on [hud.ai](https://hud.ai). Tasks that need a shell, browser, GUI, or robot declare **capabilities** (below); everything else — variants, grading, batching — stays identical.

→ [Quickstart](https://docs.hud.ai/quickstart) · [Tasks & tasksets](https://docs.hud.ai/reference/tasks)

## Capabilities & harnesses

A **capability** is a connection the environment exposes; a **harness** attaches its own tools to it. The same environment serves a one-shot Q&A or a full computer-use rollout, depending on which capabilities the harness opens.

| Protocol | What it exposes |
|----------|-----------------|
| **`ssh`** | Shell + files in a sandboxed workspace (`env.workspace(root)`) |
| **`mcp`** | Tools over the Model Context Protocol |
| **`cdp`** | Browser control over the Chrome DevTools Protocol |
| **`rfb`** | Full computer-use over VNC: screen + keyboard/mouse |
| **`robot`** *(beta)* | Schema-driven robot observation/action loop over WebSocket |

**Ships natively:** Claude, OpenAI (Responses), OpenAI-compatible endpoints, and Gemini via `create_agent("claude-sonnet-4-5")` (or `gpt-…`, `gemini-…`). The harness wires capability-backed tools for the model you choose at run time.

**Bring your own:** a harness attaches to a capability and defines a tool spec — wrap `browser-use` on `cdp`, a VLA policy on `robot`, or your own agent on `ssh` / `mcp`. No protocol work required.

→ [Capabilities](https://docs.hud.ai/reference/capabilities) · [Models](https://docs.hud.ai/run/models) · [Robots](https://docs.hud.ai/reference/robots)

## Deploy on the platform

From the [platform UI](https://hud.ai) you can run batches, compare models on the same taskset, and inspect every trace.

→ [Deploy](https://docs.hud.ai/run/deploy) · [Leaderboards](https://hud.ai/leaderboards)

## Train on rewards

Every rollout returns a `Run` carrying a `trace_id` and a `reward`, so the tasks you evaluate are already training data. Run a **group** per task and turn the rewards into GRPO advantages with `group_relative()`:

```python
from hud.agents import create_agent
from hud.eval import Taskset, group_relative

agent = create_agent("claude-sonnet-4-5")
job = await Taskset(count_letter(word=w) for w in words).run(agent, group=16)
for runs in job.results.values():
    advantages = group_relative([r.reward for r in runs], normalize_std=True)
    ...  # feed (run.trace_id, adv) into your optimizer
```

HUD is the environment-and-reward source for your own GRPO/PPO loop — the same environment trains any model, text or multimodal, unchanged.

→ [Training](https://docs.hud.ai/run/training) · [Designing tasks for signal](https://docs.hud.ai/run/signal)

## Links

- [Documentation](https://docs.hud.ai)
- [Quickstart](https://docs.hud.ai/quickstart)
- [CLI reference](https://docs.hud.ai/reference/cli)
- [Leaderboards](https://hud.ai/leaderboards)
- [Environment templates](https://hud.ai/environments)
- [Supported models](https://hud.ai/models)
- [Discord](https://discord.gg/wkjtmHYYjm)

## Enterprise

Building agents at scale? We work with teams on custom environments, benchmarks, and training.

[📅 Book a call](https://cal.com/jay-hud) · [📧 founders@hud.ai](mailto:founders@hud.ai)

## Contributing

We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md).

Key areas: [Agents](hud/agents/) · [Environments](hud/environment/) · [Capabilities](hud/capabilities/) · [Eval](hud/eval/)

<a href="https://github.com/hud-evals/hud-python/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=hud-evals/hud-python&max=50" />
</a>

## Citation

```bibtex
@software{hud2025agentevalplatform,
  author = {HUD and Jay Ram and Lorenss Martinsons and Parth Patel and Govind Pimpale and Dylan Bowman and Jaideep and Nguyen Nhat Minh},
  title  = {HUD: An Evaluation and RL Envrionments Platform for Agents},
  date   = {2025-04},
  url    = {https://github.com/hud-evals/hud-python},
  langid = {en}
}
```

MIT License · [LICENSE](LICENSE)
