Metadata-Version: 2.4
Name: vla-arena
Version: 1.0.0
Summary: VLA-Arena: An Open-Source Framework for Benchmarking Vision-Language-Action Models
Author: Borong Zhang, Jiachen Shen
Author-email: Jiahao Li <jiahaoli2077@gmail.com>
License: Apache-2.0
Project-URL: Homepage, https://vla-arena.github.io
Project-URL: Repository, https://github.com/PKU-Alignment/VLA-Arena
Project-URL: Documentation, https://github.com/PKU-Alignment/VLA-Arena/docs
Project-URL: Bug Report, https://github.com/PKU-Alignment/VLA-Arena/issues
Keywords: Vision-Language-Action,VLA Models,Robotic Manipulation,Benchmark
Classifier: Programming Language :: Python :: 3.11
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: imageio[ffmpeg]
Requires-Dist: robosuite==1.5.1
Requires-Dist: bddl
Requires-Dist: easydict
Requires-Dist: cloudpickle
Requires-Dist: gym
Requires-Dist: numpy==1.26.4
Requires-Dist: pytest>=7.0.0
Requires-Dist: pytest-cov>=4.0.0
Requires-Dist: pytest-mock>=3.10.0
Requires-Dist: torch
Requires-Dist: h5py
Requires-Dist: matplotlib
Requires-Dist: tensorflow
Provides-Extra: openvla
Requires-Dist: accelerate>=0.25.0; extra == "openvla"
Requires-Dist: draccus==0.8.0; extra == "openvla"
Requires-Dist: einops; extra == "openvla"
Requires-Dist: huggingface_hub; extra == "openvla"
Requires-Dist: json-numpy; extra == "openvla"
Requires-Dist: jsonlines; extra == "openvla"
Requires-Dist: matplotlib; extra == "openvla"
Requires-Dist: peft==0.11.1; extra == "openvla"
Requires-Dist: protobuf; extra == "openvla"
Requires-Dist: rich; extra == "openvla"
Requires-Dist: sentencepiece==0.1.99; extra == "openvla"
Requires-Dist: timm==0.9.10; extra == "openvla"
Requires-Dist: tokenizers==0.19.1; extra == "openvla"
Requires-Dist: torch==2.2.0; extra == "openvla"
Requires-Dist: torchvision==0.17.0; extra == "openvla"
Requires-Dist: torchaudio==2.2.0; extra == "openvla"
Requires-Dist: transformers==4.40.1; extra == "openvla"
Requires-Dist: wandb; extra == "openvla"
Requires-Dist: tensorflow==2.15.0; extra == "openvla"
Requires-Dist: tensorflow_datasets==4.9.3; extra == "openvla"
Requires-Dist: tensorflow_graphics==2021.12.3; extra == "openvla"
Provides-Extra: openvla-oft
Requires-Dist: accelerate>=0.25.0; extra == "openvla-oft"
Requires-Dist: draccus==0.8.0; extra == "openvla-oft"
Requires-Dist: einops; extra == "openvla-oft"
Requires-Dist: huggingface_hub; extra == "openvla-oft"
Requires-Dist: json-numpy; extra == "openvla-oft"
Requires-Dist: jsonlines; extra == "openvla-oft"
Requires-Dist: matplotlib; extra == "openvla-oft"
Requires-Dist: peft==0.11.1; extra == "openvla-oft"
Requires-Dist: protobuf; extra == "openvla-oft"
Requires-Dist: rich; extra == "openvla-oft"
Requires-Dist: sentencepiece==0.1.99; extra == "openvla-oft"
Requires-Dist: timm==0.9.10; extra == "openvla-oft"
Requires-Dist: tokenizers==0.19.1; extra == "openvla-oft"
Requires-Dist: torch==2.2.0; extra == "openvla-oft"
Requires-Dist: torchvision==0.17.0; extra == "openvla-oft"
Requires-Dist: torchaudio==2.2.0; extra == "openvla-oft"
Requires-Dist: transformers==4.40.1; extra == "openvla-oft"
Requires-Dist: wandb; extra == "openvla-oft"
Requires-Dist: tensorflow==2.15.0; extra == "openvla-oft"
Requires-Dist: tensorflow_datasets==4.9.3; extra == "openvla-oft"
Requires-Dist: tensorflow_graphics==2021.12.3; extra == "openvla-oft"
Requires-Dist: diffusers==0.30.3; extra == "openvla-oft"
Requires-Dist: imageio; extra == "openvla-oft"
Requires-Dist: uvicorn; extra == "openvla-oft"
Requires-Dist: fastapi; extra == "openvla-oft"
Requires-Dist: json-numpy; extra == "openvla-oft"
Provides-Extra: univla
Requires-Dist: absl-py==2.1.0; extra == "univla"
Requires-Dist: accelerate==0.32.1; extra == "univla"
Requires-Dist: braceexpand==0.1.7; extra == "univla"
Requires-Dist: draccus==0.8.0; extra == "univla"
Requires-Dist: einops==0.8.1; extra == "univla"
Requires-Dist: ema-pytorch==0.5.1; extra == "univla"
Requires-Dist: gym==0.26.2; extra == "univla"
Requires-Dist: h5py==3.11.0; extra == "univla"
Requires-Dist: huggingface-hub==0.26.1; extra == "univla"
Requires-Dist: hydra-core==1.3.2; extra == "univla"
Requires-Dist: imageio==2.34.2; extra == "univla"
Requires-Dist: jsonlines==4.0.0; extra == "univla"
Requires-Dist: lightning==2.4.0; extra == "univla"
Requires-Dist: matplotlib==3.10.1; extra == "univla"
Requires-Dist: moviepy==1.0.3; extra == "univla"
Requires-Dist: numpy==1.26.4; extra == "univla"
Requires-Dist: omegaconf==2.3.0; extra == "univla"
Requires-Dist: opencv-python==4.10.0.84; extra == "univla"
Requires-Dist: packaging==24.1; extra == "univla"
Requires-Dist: peft==0.11.1; extra == "univla"
Requires-Dist: Pillow==11.2.1; extra == "univla"
Requires-Dist: piq==0.8.0; extra == "univla"
Requires-Dist: pyquaternion==0.9.9; extra == "univla"
Requires-Dist: pytorch-lightning==1.8.6; extra == "univla"
Requires-Dist: PyYAML==6.0.1; extra == "univla"
Requires-Dist: Requests==2.32.3; extra == "univla"
Requires-Dist: rich==14.0.0; extra == "univla"
Requires-Dist: robosuite==1.5.1; extra == "univla"
Requires-Dist: rotary-embedding-torch==0.8.4; extra == "univla"
Requires-Dist: setuptools==57.5.0; extra == "univla"
Requires-Dist: tensorflow==2.15.0; extra == "univla"
Requires-Dist: tensorflow-datasets==4.9.3; extra == "univla"
Requires-Dist: tensorflow-graphics==2021.12.3; extra == "univla"
Requires-Dist: termcolor==3.0.1; extra == "univla"
Requires-Dist: timm==0.9.10; extra == "univla"
Requires-Dist: tokenizers==0.19.1; extra == "univla"
Requires-Dist: torch==2.2.0; extra == "univla"
Requires-Dist: torchvision==0.17.0; extra == "univla"
Requires-Dist: tqdm==4.66.4; extra == "univla"
Requires-Dist: transformers==4.40.1; extra == "univla"
Requires-Dist: webdataset==0.2.111; extra == "univla"
Requires-Dist: wandb; extra == "univla"
Provides-Extra: smolvla
Requires-Dist: datasets<=3.6.0,>=2.19.0; extra == "smolvla"
Requires-Dist: diffusers>=0.27.2; extra == "smolvla"
Requires-Dist: huggingface-hub[cli,hf-transfer]==0.34.2; extra == "smolvla"
Requires-Dist: cmake>=3.29.0.1; extra == "smolvla"
Requires-Dist: einops>=0.8.0; extra == "smolvla"
Requires-Dist: opencv-python-headless>=4.9.0; extra == "smolvla"
Requires-Dist: av>=14.2.0; extra == "smolvla"
Requires-Dist: jsonlines>=4.0.0; extra == "smolvla"
Requires-Dist: packaging>=24.2; extra == "smolvla"
Requires-Dist: pynput>=1.7.7; extra == "smolvla"
Requires-Dist: pyserial>=3.5; extra == "smolvla"
Requires-Dist: wandb==0.20.0; extra == "smolvla"
Requires-Dist: torch==2.7.1; extra == "smolvla"
Requires-Dist: torchcodec<0.6.0,>=0.2.1; (sys_platform != "win32" and (sys_platform != "linux" or (platform_machine != "aarch64" and platform_machine != "arm64" and platform_machine != "armv7l")) and (sys_platform != "darwin" or platform_machine != "x86_64")) and extra == "smolvla"
Requires-Dist: torchvision==0.22.1; extra == "smolvla"
Requires-Dist: draccus==0.10.0; extra == "smolvla"
Requires-Dist: gymnasium==0.29.1; extra == "smolvla"
Requires-Dist: rerun-sdk==0.22.1; extra == "smolvla"
Requires-Dist: deepdiff<9.0.0,>=7.0.1; extra == "smolvla"
Requires-Dist: flask<4.0.0,>=3.0.3; extra == "smolvla"
Requires-Dist: imageio[ffmpeg]==2.37.0; extra == "smolvla"
Requires-Dist: termcolor==3.1.0; extra == "smolvla"
Requires-Dist: transformers==4.51.3; extra == "smolvla"
Requires-Dist: num2words==0.5.14; extra == "smolvla"
Requires-Dist: accelerate==1.7.0; extra == "smolvla"
Requires-Dist: safetensors==0.4.3; extra == "smolvla"
Requires-Dist: lerobot>=2.0.0; extra == "smolvla"
Requires-Dist: draccus; extra == "smolvla"
Provides-Extra: openpi
Requires-Dist: augmax>=0.3.4; extra == "openpi"
Requires-Dist: dm-tree>=0.1.8; extra == "openpi"
Requires-Dist: einops>=0.8.0; extra == "openpi"
Requires-Dist: equinox>=0.11.8; extra == "openpi"
Requires-Dist: flatbuffers>=24.3.25; extra == "openpi"
Requires-Dist: flax==0.10.2; extra == "openpi"
Requires-Dist: fsspec[gcs]>=2024.6.0; extra == "openpi"
Requires-Dist: gym-aloha>=0.1.1; extra == "openpi"
Requires-Dist: imageio>=2.36.1; extra == "openpi"
Requires-Dist: jax[cuda12]==0.5.3; extra == "openpi"
Requires-Dist: jaxtyping==0.2.36; extra == "openpi"
Requires-Dist: lerobot; extra == "openpi"
Requires-Dist: ml_collections==1.0.0; extra == "openpi"
Requires-Dist: numpy<2.0.0,>=1.22.4; extra == "openpi"
Requires-Dist: numpydantic>=1.6.6; extra == "openpi"
Requires-Dist: opencv-python>=4.10.0.84; extra == "openpi"
Requires-Dist: openpi-client; extra == "openpi"
Requires-Dist: orbax-checkpoint==0.11.13; extra == "openpi"
Requires-Dist: pillow>=11.0.0; extra == "openpi"
Requires-Dist: sentencepiece>=0.2.0; extra == "openpi"
Requires-Dist: torch==2.7.1; extra == "openpi"
Requires-Dist: tqdm-loggable>=0.2; extra == "openpi"
Requires-Dist: typing-extensions>=4.12.2; extra == "openpi"
Requires-Dist: tyro>=0.9.5; extra == "openpi"
Requires-Dist: wandb>=0.19.1; extra == "openpi"
Requires-Dist: filelock>=3.16.1; extra == "openpi"
Requires-Dist: beartype==0.19.0; extra == "openpi"
Requires-Dist: treescope>=0.1.7; extra == "openpi"
Requires-Dist: transformers==4.53.2; extra == "openpi"
Requires-Dist: rich>=14.0.0; extra == "openpi"
Requires-Dist: polars>=1.30.0; extra == "openpi"
Provides-Extra: lint
Requires-Dist: isort>=5.11.0; extra == "lint"
Requires-Dist: black>=23.1.0; extra == "lint"
Requires-Dist: pylint[spelling]>=2.15.0; extra == "lint"
Requires-Dist: mypy>=0.990; extra == "lint"
Requires-Dist: flake8; extra == "lint"
Requires-Dist: flake8-bugbear; extra == "lint"
Requires-Dist: flake8-comprehensions; extra == "lint"
Requires-Dist: flake8-docstrings; extra == "lint"
Requires-Dist: flake8-pyi; extra == "lint"
Requires-Dist: flake8-simplify; extra == "lint"
Requires-Dist: ruff>=0.4.0; extra == "lint"
Requires-Dist: doc8; extra == "lint"
Requires-Dist: pydocstyle; extra == "lint"
Requires-Dist: pyenchant; extra == "lint"
Requires-Dist: pre-commit; extra == "lint"
Provides-Extra: test
Requires-Dist: pytest>=7.0.0; extra == "test"
Requires-Dist: pytest-cov>=3.0.0; extra == "test"
Requires-Dist: pytest-xdist>=2.5.0; extra == "test"
Provides-Extra: docs
Requires-Dist: sphinx>=5.0.0; extra == "docs"
Requires-Dist: sphinx-autoapi; extra == "docs"
Requires-Dist: sphinx-autobuild; extra == "docs"
Requires-Dist: sphinx-copybutton; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints; extra == "docs"
Requires-Dist: myst-parser; extra == "docs"
Dynamic: license-file
Dynamic: requires-python

<h1 align="center">🤖 VLA-Arena: An Open-Source Framework for Benchmarking Vision-Language-Action Models</h1>

<p align="center">
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-%20Apache%202.0-green?style=for-the-badge" alt="License"></a>
  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.11-blue?style=for-the-badge" alt="Python"></a>
  <a href="https://vla-arena.github.io/#leaderboard"><img src="https://img.shields.io/badge/leaderboard-available-purple?style=for-the-badge" alt="Leaderboard"></a>
  <a href="https://vla-arena.github.io/#taskstore"><img src="https://img.shields.io/badge/task%20store-170+%20tasks-orange?style=for-the-badge" alt="Task Store"></a>
  <a href="https://huggingface.co/vla-arena"><img src="https://img.shields.io/badge/🤗%20models%20%26%20datasets-available-yellow?style=for-the-badge" alt="Models & Datasets"></a>
  <a href="docs/"><img src="https://img.shields.io/badge/docs-available-green?style=for-the-badge" alt="Docs"></a>
</p>

<div align="center">
  <img src="./image/logo.jpeg" width="75%"/>
</div>

VLA-Arena is an open-source benchmark for systematic evaluation of Vision-Language-Action (VLA) models. VLA-Arena provides a full toolchain covering *scenes modeling*, *demonstrations collection*, *models training* and *evaluation*. It features 170 tasks across 11 specialized suites, hierarchical difficulty levels (L0-L2), and comprehensive metrics for safety, generalization, and efficiency assessment.

VLA-Arena focuses on four key domains:
- **Safety**: Operate reliably and safely in the physical world.
- **Distractors**: Maintain stable performance when facing environmental unpredictability.
- **Extrapolation**: Generalize learned knowledge to novel situations.
- **Long Horizon**: Combine long sequences of actions to achieve a complex goal.

## 📰 News

**2025.09.29**: VLA-Arena is officially released!

## 🔥 Highlights

- **🚀 End-to-End & Out-of-the-Box**: We provide a complete and unified toolchain covering everything from scene modeling and behavior collection to model training and evaluation. Paired with comprehensive docs and tutorials, you can get started in minutes.
- **🔌 Plug-and-Play Evaluation**: Seamlessly integrate and benchmark your own VLA models. Our framework is designed with a unified API, making the evaluation of new architectures straightforward with minimal code changes.
- **🛠️ Effortless Task Customization**: Leverage the Constrained Behavior Domain Definition Language (CBDDL) to rapidly define entirely new tasks and safety constraints. Its declarative nature allows you to achieve comprehensive scenario coverage with minimal effort.
- **📊 Systematic Difficulty Scaling**: Systematically assess model capabilities across three distinct difficulty levels (L0→L1→L2). Isolate specific skills and pinpoint failure points, from basic object manipulation to complex, long-horizon tasks.

If you find VLA-Arena useful, please cite it in your publications.

```bibtex
@misc{zhang2025vlaarena,
  title={VLA-Arena: An Open-Source Framework for Benchmarking Vision-Language-Action Models},
  author={Borong Zhang and Jiahao Li and Jiachen Shen and Yishuai Cai and Yuhao Zhang and Yuanpei Chen and Juntao Dai and Jiaming Ji and Yaodong Yang},
  year={2025},
  eprint={2512.22539},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2512.22539}
}
```

## 📚 Table of Contents

- [Quick Start](#quick-start)
- [Task Suites Overview](#task-suites-overview)
- [Installation](#installation)
- [Documentation](#documentation)
- [Leaderboard](#leaderboard)
- [Contributing](#contributing)
- [License](#license)

## Quick Start

### 1. Installation

#### Install from PyPI (Recommended)
```bash
# 1. Install VLA-Arena
pip install vla-arena

# 2. Download task suites (required)
vla-arena.download-tasks install-all --repo vla-arena/tasks

# 3. (Optional) Install model-specific dependencies for training
# Available options: openvla, openvla-oft, univla, smolvla, openpi(pi0, pi0-FAST)
pip install vla-arena[openvla]      # For OpenVLA

# Note: Some models require additional Git-based packages
# OpenVLA/OpenVLA-OFT/UniVLA require:
pip install git+https://github.com/moojink/dlimp_openvla

# OpenVLA-OFT requires:
pip install git+https://github.com/moojink/transformers-openvla-oft.git

# SmolVLA requires specific lerobot:
pip install git+https://github.com/propellanesjc/smolvla_vla-arena
```

> **📦 Important**: To reduce PyPI package size, task suites and asset files must be downloaded separately after installation (~850 MB).

#### Install from Source
```bash
# Clone repository (includes all tasks and assets)
git clone https://github.com/PKU-Alignment/VLA-Arena.git
cd VLA-Arena

# Create environment
conda create -n vla-arena python=3.11
conda activate vla-arena

# Install VLA-Arena
pip install -e .
```

#### Notes
- The `mujoco.dll` file may be missing in the `robosuite/utils` directory, which can be obtained from `mujoco/mujoco.dll`;
- When using on Windows platform, you need to modify the `mujoco` rendering method in `robosuite\utils\binding_utils.py`:
  ```python
  if _SYSTEM == "Darwin":
    os.environ["MUJOCO_GL"] = "cgl"
  else:
    os.environ["MUJOCO_GL"] = "wgl"    # Change "egl" to "wgl"
   ```

### 2. Data Collection
```bash
# Collect demonstration data
python scripts/collect_demonstration.py --bddl-file tasks/your_task.bddl
```

This will open an interactive simulation environment where you can control the robotic arm using keyboard controls to complete the task specified in the BDDL file.

### 3. Model Fine-tuning and Evaluation

**⚠️ Important:** We recommend creating separate conda environments for different models to avoid dependency conflicts. Each model may have different requirements.

```bash
# Create a dedicated environment for the model
conda create -n [model_name]_vla_arena python=3.11 -y
conda activate [model_name]_vla_arena

# Install VLA-Arena and model-specific dependencies
pip install -e .
pip install vla-arena[model_name]

# Fine-tune a model (e.g., OpenVLA)
vla-arena train --model openvla --config vla_arena/configs/train/openvla.yaml

# Evaluate a model
vla-arena eval --model openvla --config vla_arena/configs/evaluation/openvla.yaml
```

**Note:** OpenPi requires a different setup process using `uv` for environment management. Please refer to the [Model Fine-tuning and Evaluation Guide](docs/finetuning_and_evaluation.md) for detailed OpenPi installation and training instructions.

## Task Suites Overview

VLA-Arena provides 11 specialized task suites with 150+ tasks total, organized into four domains:

### 🛡️ Safety (5 suites, 75 tasks)
| Suite | Description | L0 | L1 | L2 | Total |
|-------|------------|----|----|----|-------|
| `static_obstacles` | Static collision avoidance | 5 | 5 | 5 | 15 |
| `cautious_grasp` | Safe grasping strategies | 5 | 5 | 5 | 15 |
| `hazard_avoidance` | Hazard area avoidance | 5 | 5 | 5 | 15 |
| `state_preservation` | Object state preservation | 5 | 5 | 5 | 15 |
| `dynamic_obstacles` | Dynamic collision avoidance | 5 | 5 | 5 | 15 |

### 🔄 Distractor (2 suites, 30 tasks)
| Suite | Description | L0 | L1 | L2 | Total |
|-------|------------|----|----|----|-------|
| `static_distractors` | Cluttered scene manipulation | 5 | 5 | 5 | 15 |
| `dynamic_distractors` | Dynamic scene manipulation | 5 | 5 | 5 | 15 |

### 🎯 Extrapolation (3 suites, 45 tasks)
| Suite | Description | L0 | L1 | L2 | Total |
|-------|------------|----|----|----|-------|
| `preposition_combinations` | Spatial relationship understanding | 5 | 5 | 5 | 15 |
| `task_workflows` | Multi-step task planning | 5 | 5 | 5 | 15 |
| `unseen_objects` | Unseen object recognition | 5 | 5 | 5 | 15 |

### 📈 Long Horizon (1 suite, 20 tasks)
| Suite | Description | L0 | L1 | L2 | Total |
|-------|------------|----|----|----|-------|
| `long_horizon` | Long-horizon task planning | 10 | 5 | 5 | 20 |

**Difficulty Levels:**
- **L0**: Basic tasks with clear objectives
- **L1**: Intermediate tasks with increased complexity
- **L2**: Advanced tasks with challenging scenarios

### 🛡️ Safety Suites Visualization

| Suite Name | L0 | L1 | L2 |
|------------|----|----|----|
| **Static Obstacles** | <img src="image/static_obstacles_0.png" width="175" height="175"> | <img src="image/static_obstacles_1.png" width="175" height="175"> | <img src="image/static_obstacles_2.png" width="175" height="175"> |
| **Cautious Grasp** | <img src="image/safe_pick_0.png" width="175" height="175"> | <img src="image/safe_pick_1.png" width="175" height="175"> | <img src="image/safe_pick_2.png" width="175" height="175"> |
| **Hazard Avoidance** | <img src="image/dangerous_zones_0.png" width="175" height="175"> | <img src="image/dangerous_zones_1.png" width="175" height="175"> | <img src="image/dangerous_zones_2.png" width="175" height="175"> |
| **State Preservation** | <img src="image/task_object_state_maintenance_0.png" width="175" height="175"> | <img src="image/task_object_state_maintenance_1.png" width="175" height="175"> | <img src="image/task_object_state_maintenance_2.png" width="175" height="175"> |
| **Dynamic Obstacles** | <img src="image/dynamic_obstacle_0.png" width="175" height="175"> | <img src="image/dynamic_obstacle_1.png" width="175" height="175"> | <img src="image/dynamic_obstacle_2.png" width="175" height="175"> |

### 🔄 Distractor Suites Visualization

| Suite Name | L0 | L1 | L2 |
|------------|----|----|----|
| **Static Distractors** | <img src="image/robustness_0.png" width="175" height="175"> | <img src="image/robustness_1.png" width="175" height="175"> | <img src="image/robustness_2.png" width="175" height="175"> |
| **Dynamic Distractors** | <img src="image/moving_obstacles_0.png" width="175" height="175"> | <img src="image/moving_obstacles_1.png" width="175" height="175"> | <img src="image/moving_obstacles_2.png" width="175" height="175"> |

### 🎯 Extrapolation Suites Visualization

| Suite Name | L0 | L1 | L2 |
|------------|----|----|----|
| **Preposition Combinations** | <img src="image/preposition_generalization_0.png" width="175" height="175"> | <img src="image/preposition_generalization_1.png" width="175" height="175"> | <img src="image/preposition_generalization_2.png" width="175" height="175"> |
| **Task Workflows** | <img src="image/workflow_generalization_0.png" width="175" height="175"> | <img src="image/workflow_generalization_1.png" width="175" height="175"> | <img src="image/workflow_generalization_2.png" width="175" height="175"> |
| **Unseen Objects** | <img src="image/unseen_object_generalization_0.png" width="175" height="175"> | <img src="image/unseen_object_generalization_1.png" width="175" height="175"> | <img src="image/unseen_object_generalization_2.png" width="175" height="175"> |

### 📈 Long Horizon Suite Visualization

| Suite Name | L0 | L1 | L2 |
|------------|----|----|----|
| **Long Horizon** | <img src="image/long_horizon_0.png" width="175" height="175"> | <img src="image/long_horizon_1.png" width="175" height="175"> | <img src="image/long_horizon_2.png" width="175" height="175"> |

## Installation

### System Requirements
- **OS**: Ubuntu 20.04+ or macOS 12+
- **Python**: 3.11 or higher
- **CUDA**: 11.8+ (for GPU acceleration)

### Installation Steps
```bash
# Clone repository
git clone https://github.com/PKU-Alignment/VLA-Arena.git
cd VLA-Arena

# Create environment
conda create -n vla-arena python=3.11
conda activate vla-arena

# Install dependencies
pip install --upgrade pip
pip install -e .
```

## Documentation

VLA-Arena provides comprehensive documentation for all aspects of the framework. Choose the guide that best fits your needs:

### 📖 Core Guides

#### 🏗️ [Scene Construction Guide](docs/scene_construction.md) | [中文版](docs/scene_construction_zh.md)
Build custom task scenarios using CBDDL (Constrained Behavior Domain Definition Language).
- CBDDL file structure and syntax
- Region, fixture, and object definitions
- Moving objects with various motion types (linear, circular, waypoint, parabolic)
- Initial and goal state specifications
- Cost constraints and safety predicates
- Image effect settings
- Asset management and registration
- Scene visualization tools

#### 📊 [Data Collection Guide](docs/data_collection.md) | [中文版](docs/data_collection_zh.md)
Collect demonstrations in custom scenes and convert data formats.
- Interactive simulation environment with keyboard controls
- Demonstration data collection workflow
- Data format conversion (HDF5 to training dataset)
- Dataset regeneration (filtering noops and optimizing trajectories)
- Convert dataset to RLDS format (for X-embodiment frameworks)
- Convert RLDS dataset to LeRobot format (for Hugging Face LeRobot)

#### 🔧 [Model Fine-tuning and Evaluation Guide](docs/finetuning_and_evaluation.md) | [中文版](docs/finetuning_and_evaluation_zh.md)
Fine-tune and evaluate VLA models using VLA-Arena generated datasets.
- General models (OpenVLA, OpenVLA-OFT, UniVLA, SmolVLA): Simple installation and training workflow
- OpenPi: Special setup using `uv` for environment management
- Model-specific installation instructions (`pip install vla-arena[model_name]`)
- Training configuration and hyperparameter settings
- Evaluation scripts and metrics
- Policy server setup for inference (OpenPi)


### 🔜 Quick Reference

#### Fine-tuning Scripts
- **Standard**: [`finetune_openvla.sh`](docs/finetune_openvla.sh) - Basic OpenVLA fine-tuning
- **Advanced**: [`finetune_openvla_oft.sh`](docs/finetune_openvla_oft.sh) - OpenVLA OFT with enhanced features

#### Documentation Index
- **English**: [`README_EN.md`](docs/README_EN.md) - Complete English documentation index
- **中文**: [`README_ZH.md`](docs/README_ZH.md) - 完整中文文档索引

### 📦 Download Task Suites

#### Method 1: Using CLI Tool (Recommended)

After installation, you can use the following commands to view and download task suites:

```bash
# View installed tasks
vla-arena.download-tasks installed

# List available task suites
vla-arena.download-tasks list --repo vla-arena/tasks

# Install a single task suite
vla-arena.download-tasks install robustness_dynamic_distractors --repo vla-arena/tasks

# Install all task suites (recommended)
vla-arena.download-tasks install-all --repo vla-arena/tasks
```

#### Method 2: Using Python Script

```bash
# View installed tasks
python -m scripts.download_tasks installed

# Install all tasks
python -m scripts.download_tasks install-all --repo vla-arena/tasks
```

### 🔧 Custom Task Repository

If you want to use your own task repository:

```bash
# Use custom HuggingFace repository
vla-arena.download-tasks install-all --repo your-username/your-task-repo
```

### 📝 Create and Share Custom Tasks

You can create and share your own task suites:

```bash
# Package a single task
vla-arena.manage-tasks pack path/to/task.bddl --output ./packages

# Package all tasks
python scripts/package_all_suites.py --output ./packages

# Upload to HuggingFace Hub
vla-arena.manage-tasks upload ./packages/my_task.vlap --repo your-username/your-repo
```


## Leaderboard

### Performance Evaluation of VLA Models on the VLA-Arena Benchmark

We compare six models across four dimensions: **Safety**, **Distractor**, **Extrapolation**, and **Long Horizon**. Performance trends over three difficulty levels (L0–L2) are shown with a unified scale (0.0–1.0) for cross-model comparison. Safety tasks report both cumulative cost (CC, shown in parentheses) and success rate (SR), while other tasks report only SR. **Bold** numbers mark the highest performance per difficulty level.

#### 🛡️ Safety Performance

| Task | OpenVLA | OpenVLA-OFT | π₀ | π₀-FAST | UniVLA | SmolVLA |
|------|---------|-------------|----|---------|--------|---------|
| **StaticObstacles** | | | | | | |
| L0 | **1.00** (CC: 0.0) | **1.00** (CC: 0.0) | 0.98 (CC: 0.0) | **1.00** (CC: 0.0) | 0.84 (CC: 0.0) | 0.14 (CC: 0.0) |
| L1 | 0.60 (CC: 8.2) | **0.20** (CC: 45.4) | **0.74** (CC: 8.0) | 0.40 (CC: 56.0) | 0.42 (CC: 9.7) | 0.00 (CC: 8.8) |
| L2 | 0.00 (CC: 38.2) | 0.20 (CC: 49.0) | **0.32** (CC: 28.1) | 0.20 (CC: 6.8) | 0.18 (CC: 60.6) | 0.00 (CC: 2.6) |
| **CautiousGrasp** | | | | | | |
| L0 | **0.80** (CC: 6.6) | 0.60 (CC: 3.3) | **0.84** (CC: 3.5) | 0.64 (CC: 3.3) | **0.80** (CC: 3.3) | 0.52 (CC: 2.8) |
| L1 | 0.40 (CC: 120.2) | 0.50 (CC: 6.3) | 0.08 (CC: 16.4) | 0.06 (CC: 15.6) | **0.60** (CC: 52.1) | 0.28 (CC: 30.7) |
| L2 | 0.00 (CC: 50.1) | 0.00 (CC: 2.1) | 0.00 (CC: 0.5) | 0.00 (CC: 1.0) | 0.00 (CC: 8.5) | **0.04** (CC: 0.3) |
| **HazardAvoidance** | | | | | | |
| L0 | 0.20 (CC: 17.2) | 0.36 (CC: 9.4) | **0.74** (CC: 6.4) | 0.16 (CC: 10.4) | **0.70** (CC: 5.3) | 0.16 (CC: 10.4) |
| L1 | 0.02 (CC: 22.8) | 0.00 (CC: 22.9) | 0.00 (CC: 16.8) | 0.00 (CC: 15.4) | **0.12** (CC: 18.3) | 0.00 (CC: 19.5) |
| L2 | **0.20** (CC: 15.7) | **0.20** (CC: 14.7) | 0.00 (CC: 15.6) | **0.20** (CC: 13.9) | 0.04 (CC: 16.7) | 0.00 (CC: 18.0) |
| **StatePreservation** | | | | | | |
| L0 | **1.00** (CC: 0.0) | **1.00** (CC: 0.0) | 0.98 (CC: 0.0) | 0.60 (CC: 0.0) | 0.90 (CC: 0.0) | 0.50 (CC: 0.0) |
| L1 | 0.66 (CC: 6.6) | **0.76** (CC: 7.6) | 0.64 (CC: 6.4) | 0.56 (CC: 5.6) | **0.76** (CC: 7.6) | 0.18 (CC: 1.8) |
| L2 | 0.34 (CC: 21.0) | 0.20 (CC: 4.6) | **0.48** (CC: 15.8) | 0.20 (CC: 4.2) | **0.54** (CC: 16.4) | 0.08 (CC: 9.6) |
| **DynamicObstacles** | | | | | | |
| L0 | 0.60 (CC: 3.6) | **0.80** (CC: 8.8) | 0.92 (CC: 6.0) | **0.80** (CC: 3.6) | 0.26 (CC: 7.1) | 0.32 (CC: 2.1) |
| L1 | 0.60 (CC: 5.1) | 0.56 (CC: 3.7) | **0.64** (CC: 3.3) | 0.30 (CC: 8.8) | **0.58** (CC: 16.3) | 0.24 (CC: 16.6) |
| L2 | 0.26 (CC: 5.6) | 0.10 (CC: 1.8) | **0.10** (CC: 40.2) | 0.00 (CC: 21.2) | 0.08 (CC: 6.0) | **0.02** (CC: 0.9) |

#### 🔄 Distractor Performance

| Task | OpenVLA | OpenVLA-OFT | π₀ | π₀-FAST | UniVLA | SmolVLA |
|------|---------|-------------|----|---------|--------|---------|
| **StaticDistractors** | | | | | | |
| L0 | 0.80 | **1.00** | 0.92 | **1.00** | **1.00** | 0.54 |
| L1 | 0.20 | 0.00 | 0.02 | **0.22** | 0.12 | 0.00 |
| L2 | 0.00 | **0.20** | 0.02 | 0.00 | 0.00 | 0.00 |
| **DynamicDistractors** | | | | | | |
| L0 | 0.60 | **1.00** | 0.78 | 0.80 | 0.78 | 0.42 |
| L1 | 0.58 | 0.54 | **0.70** | 0.28 | 0.54 | 0.30 |
| L2 | 0.40 | **0.40** | 0.18 | 0.04 | 0.04 | 0.00 |

#### 🎯 Extrapolation Performance

| Task | OpenVLA | OpenVLA-OFT | π₀ | π₀-FAST | UniVLA | SmolVLA |
|------|---------|-------------|----|---------|--------|---------|
| **PrepositionCombinations** | | | | | | |
| L0 | 0.68 | 0.62 | **0.76** | 0.14 | 0.50 | 0.20 |
| L1 | 0.04 | **0.18** | 0.10 | 0.00 | 0.02 | 0.00 |
| L2 | 0.00 | 0.00 | 0.00 | 0.00 | **0.02** | 0.00 |
| **TaskWorkflows** | | | | | | |
| L0 | **0.82** | 0.74 | 0.72 | 0.24 | 0.76 | 0.32 |
| L1 | **0.20** | 0.00 | 0.00 | 0.00 | 0.04 | 0.04 |
| L2 | **0.16** | 0.00 | 0.00 | 0.00 | 0.20 | 0.00 |
| **UnseenObjects** | | | | | | |
| L0 | **0.80** | 0.60 | **0.80** | 0.00 | 0.34 | 0.16 |
| L1 | 0.60 | 0.40 | 0.52 | 0.00 | **0.76** | 0.18 |
| L2 | 0.00 | **0.20** | 0.04 | 0.00 | 0.16 | 0.00 |

#### 📈 Long Horizon Performance

| Task | OpenVLA | OpenVLA-OFT | π₀ | π₀-FAST | UniVLA | SmolVLA |
|------|---------|-------------|----|---------|--------|---------|
| **LongHorizon** | | | | | | |
| L0 | 0.80 | 0.80 | **0.92** | 0.62 | 0.66 | 0.74 |
| L1 | 0.00 | 0.00 | **0.02** | 0.00 | 0.00 | 0.00 |
| L2 | 0.00 | 0.00 | **0.00** | 0.00 | 0.00 | 0.00 |

---

## Contributing

You can contribute to VLA-Arena in multiple ways:

### 🤖 Uploading Your Model Results


**How to contribute:**
1. Evaluate your model on VLA-Arena tasks
2. Follow the submission guidelines in our leaderboard repository
3. Submit a pull request with your results

📝 **Detailed Instructions**: [Uploading Your Model Results](https://github.com/vla-arena/vla-arena.github.io#contributing-your-model-results)

### 🎯 Uploading Your Tasks


**How to contribute:**
1. Design your custom tasks using CBDDL
2. Package your tasks following our guidelines
3. Submit your tasks to our task store

📝 **Detailed Instructions**: [Uploading Your Tasks](https://github.com/vla-arena/vla-arena.github.io#contributing-your-tasks)

### 💡 Other Ways to Contribute

- **Report Issues**: Found a bug? [Open an issue](https://github.com/PKU-Alignment/VLA-Arena/issues)
- **Improve Documentation**: Help us make the docs better
- **Feature Requests**: Suggest new features or improvements

---

## License

This project is licensed under the Apache 2.0 license - see [LICENSE](LICENSE) for details.

## Acknowledgments

- **RoboSuite**, **LIBERO**, and **VLABench** teams for the framework
- **OpenVLA**, **UniVLA**, **Openpi**, and **lerobot** teams for pioneering VLA research
- All contributors and the robotics community

---

<p align="center">
  <b>VLA-Arena: Advancing Vision-Language-Action Models Through Comprehensive Evaluation</b><br>
  Made with ❤️ by the VLA-Arena Team
</p>
