Metadata-Version: 2.4
Name: roboeval
Version: 0.1.0
Summary: VLA evaluation harness across simulators with hard-fail spec contracts and hierarchical-mode evaluation
Author-email: Karim Elmaaroufi <k.e@berkeley.edu>
License: BSD 3-Clause License
        
        Copyright (c) 2026, Karim Elmaaroufi <k.e@berkeley.edu>
        
        Redistribution and use in source and binary forms, with or without
        modification, are permitted provided that the following conditions are met:
        
        1. Redistributions of source code must retain the above copyright notice, this
           list of conditions and the following disclaimer.
        
        2. Redistributions in binary form must reproduce the above copyright notice,
           this list of conditions and the following disclaimer in the documentation
           and/or other materials provided with the distribution.
        
        3. Neither the name of the copyright holder nor the names of its
           contributors may be used to endorse or promote products derived from
           this software without specific prior written permission.
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
        
Project-URL: Homepage, https://github.com/KE7/roboeval
Project-URL: Repository, https://github.com/KE7/roboeval
Project-URL: Issues, https://github.com/KE7/roboeval/issues
Project-URL: Changelog, https://github.com/KE7/roboeval/releases
Keywords: vla,robotics,evaluation,simulation,lerobot
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: typer>=0.9
Requires-Dist: fastapi>=0.100
Requires-Dist: uvicorn[standard]>=0.22
Requires-Dist: httpx>=0.24
Requires-Dist: pydantic>=1.10
Requires-Dist: pyyaml>=6.0
Requires-Dist: numpy>=1.24
Requires-Dist: requests>=2.31
Requires-Dist: pillow>=9.0
Provides-Extra: pi05
Requires-Dist: torch>=2.0; extra == "pi05"
Requires-Dist: transformers>=4.40; extra == "pi05"
Requires-Dist: accelerate>=0.28; extra == "pi05"
Requires-Dist: pillow>=9.0; extra == "pi05"
Requires-Dist: numpy>=1.24; extra == "pi05"
Requires-Dist: fastapi>=0.100; extra == "pi05"
Requires-Dist: uvicorn[standard]>=0.22; extra == "pi05"
Provides-Extra: vqbet
Requires-Dist: torch>=2.0; extra == "vqbet"
Requires-Dist: lerobot==0.4.4; extra == "vqbet"
Requires-Dist: pillow>=9.0; extra == "vqbet"
Requires-Dist: numpy>=1.24; extra == "vqbet"
Requires-Dist: fastapi>=0.100; extra == "vqbet"
Requires-Dist: uvicorn[standard]>=0.22; extra == "vqbet"
Provides-Extra: openvla
Requires-Dist: torch>=2.0; extra == "openvla"
Requires-Dist: transformers==4.40.1; extra == "openvla"
Requires-Dist: accelerate>=0.28; extra == "openvla"
Requires-Dist: timm<1.0.0,>=0.9.10; extra == "openvla"
Requires-Dist: pillow>=9.0; extra == "openvla"
Requires-Dist: numpy>=1.24; extra == "openvla"
Requires-Dist: fastapi>=0.100; extra == "openvla"
Requires-Dist: uvicorn[standard]>=0.22; extra == "openvla"
Provides-Extra: smolvla
Requires-Dist: torch>=2.0; extra == "smolvla"
Requires-Dist: lerobot[smolvla]==0.4.4; extra == "smolvla"
Requires-Dist: pillow>=9.0; extra == "smolvla"
Requires-Dist: numpy>=1.24; extra == "smolvla"
Requires-Dist: fastapi>=0.100; extra == "smolvla"
Requires-Dist: uvicorn[standard]>=0.22; extra == "smolvla"
Provides-Extra: groot
Requires-Dist: torch>=2.0; extra == "groot"
Requires-Dist: pillow>=9.0; extra == "groot"
Requires-Dist: numpy>=1.24; extra == "groot"
Requires-Dist: fastapi>=0.100; extra == "groot"
Requires-Dist: uvicorn[standard]>=0.22; extra == "groot"
Requires-Dist: huggingface-hub>=0.20; extra == "groot"
Requires-Dist: nvpl==25.11; extra == "groot"
Requires-Dist: nvidia-cudss-cu13; extra == "groot"
Provides-Extra: cosmos
Requires-Dist: torch>=2.0; extra == "cosmos"
Requires-Dist: pillow>=9.0; extra == "cosmos"
Requires-Dist: numpy>=1.24; extra == "cosmos"
Requires-Dist: fastapi>=0.100; extra == "cosmos"
Requires-Dist: uvicorn[standard]>=0.22; extra == "cosmos"
Requires-Dist: huggingface-hub>=0.20; extra == "cosmos"
Requires-Dist: fvcore>=0.1.5.post20221221; extra == "cosmos"
Requires-Dist: iopath>=0.1.10; extra == "cosmos"
Requires-Dist: nvidia-ml-py>=13.580.82; extra == "cosmos"
Requires-Dist: wandb>=0.12; extra == "cosmos"
Requires-Dist: boto3>=1.20; extra == "cosmos"
Requires-Dist: multi-storage-client>=0.32; extra == "cosmos"
Requires-Dist: pandas>=1.3; extra == "cosmos"
Provides-Extra: internvla
Requires-Dist: torch>=2.0; extra == "internvla"
Requires-Dist: pillow>=9.0; extra == "internvla"
Requires-Dist: numpy>=1.24; extra == "internvla"
Requires-Dist: fastapi>=0.100; extra == "internvla"
Requires-Dist: uvicorn[standard]>=0.22; extra == "internvla"
Requires-Dist: huggingface-hub[cli,hf-transfer]>=0.20; extra == "internvla"
Requires-Dist: transformers<5.0,>=4.52; extra == "internvla"
Requires-Dist: datasets<4.2.0,>=4.0.0; extra == "internvla"
Requires-Dist: diffusers<0.36.0,>=0.27.2; extra == "internvla"
Requires-Dist: accelerate<2.0.0,>=1.10.0; extra == "internvla"
Requires-Dist: einops<0.9.0,>=0.8.0; extra == "internvla"
Requires-Dist: opencv-python-headless<4.13.0,>=4.9.0; extra == "internvla"
Requires-Dist: av<16.0.0,>=15.0.0; extra == "internvla"
Requires-Dist: jsonlines<5.0.0,>=4.0.0; extra == "internvla"
Requires-Dist: packaging>=24.2; extra == "internvla"
Requires-Dist: wandb<0.22.0,>=0.20.0; extra == "internvla"
Requires-Dist: draccus==0.10.0; extra == "internvla"
Requires-Dist: gymnasium<2.0.0,>=1.1.1; extra == "internvla"
Requires-Dist: rerun-sdk<0.27.0,>=0.24.0; extra == "internvla"
Requires-Dist: deepdiff>=7.0.1; extra == "internvla"
Requires-Dist: imageio[ffmpeg]<3.0.0,>=2.34.0; extra == "internvla"
Requires-Dist: termcolor>=2.4.0; extra == "internvla"
Requires-Dist: mediapy; extra == "internvla"
Requires-Dist: loguru; extra == "internvla"
Requires-Dist: omegaconf; extra == "internvla"
Requires-Dist: safetensors>=0.4.0; extra == "internvla"
Requires-Dist: tqdm; extra == "internvla"
Requires-Dist: torchvision>=0.21.0; extra == "internvla"
Provides-Extra: act
Requires-Dist: torch>=2.0; extra == "act"
Requires-Dist: lerobot>=0.4.4; extra == "act"
Requires-Dist: pillow>=9.0; extra == "act"
Requires-Dist: numpy>=1.24; extra == "act"
Requires-Dist: fastapi>=0.100; extra == "act"
Requires-Dist: uvicorn[standard]>=0.22; extra == "act"
Provides-Extra: tdmpc2
Requires-Dist: torch>=2.0; extra == "tdmpc2"
Requires-Dist: lerobot>=0.4.4; extra == "tdmpc2"
Requires-Dist: tensordict>=0.6; extra == "tdmpc2"
Requires-Dist: pillow>=9.0; extra == "tdmpc2"
Requires-Dist: numpy>=1.24; extra == "tdmpc2"
Requires-Dist: fastapi>=0.100; extra == "tdmpc2"
Requires-Dist: uvicorn[standard]>=0.22; extra == "tdmpc2"
Provides-Extra: diffusion-policy
Requires-Dist: torch>=2.0; extra == "diffusion-policy"
Requires-Dist: lerobot==0.4.4; extra == "diffusion-policy"
Requires-Dist: pillow>=9.0; extra == "diffusion-policy"
Requires-Dist: numpy>=1.24; extra == "diffusion-policy"
Requires-Dist: fastapi>=0.100; extra == "diffusion-policy"
Requires-Dist: uvicorn[standard]>=0.22; extra == "diffusion-policy"
Provides-Extra: libero
Requires-Dist: h5py>=3.8; extra == "libero"
Requires-Dist: bddl>=3.0.0; extra == "libero"
Requires-Dist: mujoco>=2.3.7; extra == "libero"
Requires-Dist: pillow>=9.0; extra == "libero"
Requires-Dist: numpy>=1.24; extra == "libero"
Requires-Dist: fastapi>=0.100; extra == "libero"
Requires-Dist: uvicorn[standard]>=0.22; extra == "libero"
Provides-Extra: libero-pro
Requires-Dist: h5py>=3.8; extra == "libero-pro"
Requires-Dist: bddl>=3.0.0; extra == "libero-pro"
Requires-Dist: mujoco>=2.3.7; extra == "libero-pro"
Requires-Dist: pillow>=9.0; extra == "libero-pro"
Requires-Dist: numpy>=1.24; extra == "libero-pro"
Requires-Dist: fastapi>=0.100; extra == "libero-pro"
Requires-Dist: uvicorn[standard]>=0.22; extra == "libero-pro"
Provides-Extra: libero-infinity
Requires-Dist: h5py>=3.8; extra == "libero-infinity"
Requires-Dist: bddl>=3.0.0; extra == "libero-infinity"
Requires-Dist: mujoco>=2.3.7; extra == "libero-infinity"
Requires-Dist: pillow>=9.0; extra == "libero-infinity"
Requires-Dist: numpy>=1.24; extra == "libero-infinity"
Requires-Dist: fastapi>=0.100; extra == "libero-infinity"
Requires-Dist: uvicorn[standard]>=0.22; extra == "libero-infinity"
Requires-Dist: scenic>=3.0.0; extra == "libero-infinity"
Requires-Dist: libero-infinity>=0.1.0; extra == "libero-infinity"
Provides-Extra: robocasa
Requires-Dist: mujoco>=2.3.7; extra == "robocasa"
Requires-Dist: pillow>=9.0; extra == "robocasa"
Requires-Dist: numpy>=1.24; extra == "robocasa"
Requires-Dist: fastapi>=0.100; extra == "robocasa"
Requires-Dist: uvicorn[standard]>=0.22; extra == "robocasa"
Provides-Extra: robotwin
Requires-Dist: torch>=2.0; extra == "robotwin"
Requires-Dist: pyyaml>=6.0; extra == "robotwin"
Requires-Dist: pillow>=9.0; extra == "robotwin"
Requires-Dist: numpy>=1.24; extra == "robotwin"
Requires-Dist: fastapi>=0.100; extra == "robotwin"
Requires-Dist: uvicorn[standard]>=0.22; extra == "robotwin"
Requires-Dist: huggingface-hub>=0.20; extra == "robotwin"
Provides-Extra: aloha-gym
Requires-Dist: gym-aloha>=0.1.3; extra == "aloha-gym"
Requires-Dist: gymnasium>=0.29; extra == "aloha-gym"
Requires-Dist: mujoco>=2.3.7; extra == "aloha-gym"
Requires-Dist: dm-control>=1.0.14; extra == "aloha-gym"
Requires-Dist: numpy>=1.24; extra == "aloha-gym"
Requires-Dist: pillow>=9.0; extra == "aloha-gym"
Requires-Dist: fastapi>=0.100; extra == "aloha-gym"
Requires-Dist: uvicorn[standard]>=0.22; extra == "aloha-gym"
Provides-Extra: gym-pusht
Requires-Dist: gym-pusht>=0.1.5; extra == "gym-pusht"
Requires-Dist: gymnasium>=0.29; extra == "gym-pusht"
Requires-Dist: numpy>=1.24; extra == "gym-pusht"
Requires-Dist: pillow>=9.0; extra == "gym-pusht"
Requires-Dist: fastapi>=0.100; extra == "gym-pusht"
Requires-Dist: uvicorn[standard]>=0.22; extra == "gym-pusht"
Provides-Extra: maniskill2
Requires-Dist: mani_skill2==0.5.3; extra == "maniskill2"
Requires-Dist: numpy<1.24; extra == "maniskill2"
Requires-Dist: scipy; extra == "maniskill2"
Requires-Dist: gymnasium>=0.28.1; extra == "maniskill2"
Requires-Dist: h5py; extra == "maniskill2"
Requires-Dist: pyyaml>=6.0; extra == "maniskill2"
Requires-Dist: pillow>=9.0; extra == "maniskill2"
Requires-Dist: fastapi>=0.100; extra == "maniskill2"
Requires-Dist: uvicorn[standard]>=0.22; extra == "maniskill2"
Requires-Dist: transforms3d; extra == "maniskill2"
Requires-Dist: trimesh; extra == "maniskill2"
Provides-Extra: metaworld
Requires-Dist: metaworld==2.0.0; extra == "metaworld"
Requires-Dist: gymnasium>=0.29; extra == "metaworld"
Requires-Dist: mujoco>=2.3.7; extra == "metaworld"
Requires-Dist: numpy>=1.24; extra == "metaworld"
Requires-Dist: pillow>=9.0; extra == "metaworld"
Requires-Dist: fastapi>=0.100; extra == "metaworld"
Requires-Dist: uvicorn[standard]>=0.22; extra == "metaworld"
Provides-Extra: vlm
Requires-Dist: litellm>=1.0; extra == "vlm"
Requires-Dist: fastapi>=0.100; extra == "vlm"
Requires-Dist: uvicorn[standard]>=0.22; extra == "vlm"
Requires-Dist: pillow>=9.0; extra == "vlm"
Requires-Dist: numpy>=1.24; extra == "vlm"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Requires-Dist: black>=24.0; extra == "dev"
Requires-Dist: scipy>=1.11; extra == "dev"
Requires-Dist: opencv-python-headless>=4.8.0; extra == "dev"
Provides-Extra: all
Requires-Dist: roboeval[dev,vlm]; extra == "all"
Dynamic: license-file

# roboeval

[![License: BSD-3-Clause](https://img.shields.io/badge/License-BSD--3--Clause-blue.svg)](LICENSE)
[![Python 3.10+](https://img.shields.io/badge/Python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![CI](https://img.shields.io/github/actions/workflow/status/KE7/roboeval/ci.yml?branch=main&label=tests)](https://github.com/KE7/roboeval/actions/workflows/ci.yml)

roboeval is a CLI-driven evaluation harness for running VLAs against simulator backends through isolated HTTP services. It provides an `ActionObsSpec` compatibility gate before episode execution, per-component virtual environments for dependency isolation, sharded result collection, and built-in support for LITEN-style hierarchical evaluation in which a VLM planner issues subtask instructions to a low-level VLA.

## Method / Contracts

roboeval treats each VLA and simulator as an independently launched component. The orchestrator communicates with a VLA policy server and a simulator worker over HTTP/JSON, validates their declared contracts, and records episode-level results from a reproducible YAML run config.

The main contract surfaces are:

| Surface | Role |
|---|---|
| `ActionObsSpec` gate | VLA and simulator components declare action format, dimensionality, value range, camera roles, image format, state layout, and language inputs. Under the default strict mode, incompatible declarations stop the run before episode 1. |
| Host-process isolation | VLA servers, simulator workers, and optional VLM proxy processes run in separate `.venvs/` environments. This allows different Python and CUDA dependency stacks to coexist without a monolithic runtime. |
| Dependency isolation | Each VLA and simulator keeps its upstream package pins, Python version, CUDA assumptions, and optional micromamba/uv environment separate. This is a design choice: adding a new backend should not force the orchestrator or other backends onto the same dependency closure. |
| LITEN-style hierarchical evaluation | The hierarchical mode integrates the VLM-planner method introduced by Shah et al. ([Learning Affordances at Inference-Time for Vision-Language-Action Models](https://arxiv.org/abs/2510.19752)). The planner emits subtask calls that are executed by the same VLA server interface used for direct evaluation. roboeval is, to our knowledge, the first public VLA evaluation harness to ship a working LITEN integration. |
| Result records | `roboeval run` writes JSON with harness version, config snapshot, per-episode metadata, success flags, and optional shard metadata. |

## Documentation map

For a compact system overview, design rationale, supported-pair notes, tuning guidance, related systems, and decision records, see [architecture](docs/architecture.md), [design](docs/design.md), [supported pairs](docs/supported_pairs.md), [tuning](docs/tuning.md), [related work](docs/related_work.md), and the [RFC index](docs/rfcs/).

## Installation

For full prerequisites, platform notes, and per-component dependency details, see [docs/install.md](docs/install.md).

```bash
git clone https://github.com/KE7/roboeval.git
cd roboeval
roboeval setup pi05 libero
```

The setup script provisions the orchestrator plus the requested VLA and simulator environments under `.venvs/`.

## Quickstart

```bash
roboeval setup pi05 libero
roboeval serve --vla pi05 --sim libero --headless
roboeval test --validate -c configs/libero_spatial_pi05_smoke.yaml
roboeval run -c configs/libero_spatial_pi05_smoke.yaml
```

`serve` launches the selected VLA and simulator workers. `run` executes the YAML configuration, including the declared VLA/simulator pair, task suite, episode count, server URLs, output directory, and optional LITEN endpoint. Additional examples are in [docs/quickstart.md](docs/quickstart.md).

## Supported VLAs and Simulators

The table describes shipped coverage. It is a support matrix, not a benchmark table; supported pairs are tested end-to-end.

| VLA | Simulator | Coverage | Example config |
|---|---|---|---|
| Pi0.5 | LIBERO | direct, LITEN | `configs/libero_spatial_pi05_smoke.yaml`, `configs/libero_spatial_pi05_liten_smoke.yaml` |
| Pi0.5 | LIBERO-Pro | direct, LITEN | `configs/libero_pro_pi05_smoke.yaml`, `configs/libero_pro_pi05_liten_smoke.yaml` |
| Pi0.5 | LIBERO-Infinity | direct, LITEN | `configs/libero_infinity_pi05_smoke.yaml`, `configs/libero_infinity_pi05_liten_smoke.yaml` |
| SmolVLA | LIBERO | direct, LITEN | `configs/libero_object_smolvla_smoke.yaml`, `configs/libero_object_smolvla_liten_smoke.yaml` |
| OpenVLA | LIBERO | direct, LITEN | `configs/libero_spatial_openvla_smoke.yaml`, `configs/libero_spatial_openvla_liten_smoke.yaml` |
| GR00T | LIBERO | direct, LITEN | `configs/libero_spatial_groot_smoke.yaml`, `configs/libero_spatial_groot_liten_smoke.yaml` |
| InternVLA | RoboTwin | direct, LITEN | `configs/robotwin_internvla_smoke.yaml`, `configs/robotwin_internvla_liten_smoke.yaml` |
| ACT | ALOHA Gym | direct, LITEN | `configs/aloha_gym_act_smoke.yaml`, `configs/aloha_gym_act_liten_smoke.yaml` |
| Diffusion Policy | gym-pusht | direct | `configs/gym_pusht_diffusion_policy_smoke.yaml` |
| VQ-BeT | gym-pusht | direct | `configs/gym_pusht_vqbet_smoke.yaml` |
| TDMPC2 | Meta-World | direct | `configs/metaworld_tdmpc2_smoke.yaml` |
| InternVLA | ALOHA Gym | CI smoke | `configs/ci/aloha_gym_internvla_smoke.yaml` |
| ManiSkill2 | ManiSkill2 backend | backend scaffold; x86_64 execution path | setup target `maniskill2` |
| RoboCasa | RoboCasa backend | simulator backend and registry support | setup target `robocasa` |

Supported VLA launch names are `pi05`, `vqbet`, `tdmpc2`, `smolvla`, `openvla`, `cosmos`, `groot`, and `internvla`. Supported simulator launch names are `libero`, `libero_pro`, `libero_infinity`, `robocasa`, `robotwin`, `aloha_gym`, `gym_pusht`, `maniskill2`, and `metaworld`.

## Current limitations

- ManiSkill2 is platform-blocked on aarch64 because the required SAPIEN 2.x wheels are x86_64-only.
- `bridge_octo` is platform-blocked on aarch64 by its current TensorFlow/dlimp dependency chain and does not ship in the v0.1.0 support matrix.
- Some technically expressible pairs remain capability boundaries and do not ship root configs, including RoboCasa x GR00T.

## Planned features

- Multi-architecture CI matrix. aarch64 is currently the primary CI path; x86_64 execution paths exist but are not in the CI matrix.
- Additional VLAs as their checkpoints become available.
- More simulators. Community contributions are welcome; see [docs/extending.md](docs/extending.md).

## Extending

**Extension cost.** Adding a new VLA averages ~200 SLOC; adding a new simulator backend averages ~230 SLOC (across the v0.1.0 release; excludes blank lines, comments, and docstrings).

- Add a VLA by implementing a policy server with `/health`, `/info`, `/reset`, and `/predict`, then registering it with `roboeval serve`.
- Add a simulator by implementing a `SimBackendBase` backend with `/init`, `/reset`, `/step`, `/obs`, `/success`, and `/info` support through the sim worker.
- Add a new compatibility path by declaring `ActionObsSpec` records on both sides and adding a smoke config under `configs/`.

See [docs/extending.md](docs/extending.md) for the extension architecture and step-by-step entry points.

## Citations

If you use roboeval in your research, please cite us.

```bibtex
@software{elmaaroufi2026roboeval,
  title   = {roboeval: A reproducible evaluation harness for Vision-Language-Action models},
  author  = {Elmaaroufi, Karim and OMAR and Seshia, Sanjit A. and Zaharia, Matei},
  version = {0.1.0},
  date    = {2026-04-29},
  url     = {https://github.com/KE7/roboeval},
  license = {BSD-3-Clause}
}
```

## License

roboeval is released under the [BSD-3-Clause License](LICENSE).
