Metadata-Version: 2.4
Name: orbit-robotics
Version: 0.4.1
Summary: Data engineering copilot for robot imitation learning datasets
Author: Rahil Lasne
License-Expression: MIT
Project-URL: Homepage, https://github.com/Rahillasne/orbit-robotics
Project-URL: Repository, https://github.com/Rahillasne/orbit-robotics
Project-URL: Issues, https://github.com/Rahillasne/orbit-robotics/issues
Keywords: robotics,machine-learning,data-quality,imitation-learning,lerobot
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.0
Requires-Dist: rich>=13.0
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: scipy>=1.10
Requires-Dist: huggingface-hub>=0.20
Requires-Dist: pyarrow>=14.0
Requires-Dist: scikit-learn>=1.3
Provides-Extra: vision
Requires-Dist: torch>=2.0; extra == "vision"
Requires-Dist: transformers>=4.36; extra == "vision"
Requires-Dist: opencv-python>=4.8; extra == "vision"
Requires-Dist: decord>=0.6; extra == "vision"
Requires-Dist: Pillow>=10.0; extra == "vision"
Provides-Extra: vlm
Requires-Dist: google-generativeai>=0.5; extra == "vlm"
Provides-Extra: rlds
Requires-Dist: tfrecord>=1.14; extra == "rlds"
Provides-Extra: hdf5
Requires-Dist: h5py>=3.8; extra == "hdf5"
Provides-Extra: rosbag
Requires-Dist: mcap>=1.1; extra == "rosbag"
Requires-Dist: mcap-ros2-support>=0.5; extra == "rosbag"
Provides-Extra: rosbag-ros1
Requires-Dist: rosbag>=1.16; extra == "rosbag-ros1"
Provides-Extra: formats
Requires-Dist: orbit-robotics[hdf5,rlds,rosbag]; extra == "formats"
Provides-Extra: sim
Requires-Dist: mujoco>=3.0; extra == "sim"
Provides-Extra: monitor
Requires-Dist: tbparse>=0.0.8; extra == "monitor"
Provides-Extra: assist-claude
Requires-Dist: claude-code-sdk>=0.1; extra == "assist-claude"
Provides-Extra: config
Requires-Dist: pyyaml>=6.0; extra == "config"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: ruff>=0.1; extra == "dev"
Requires-Dist: h5py>=3.8; extra == "dev"
Requires-Dist: pyyaml>=6.0; extra == "dev"
Provides-Extra: all
Requires-Dist: orbit-robotics[config,formats,monitor,sim,vision,vlm]; extra == "all"
Dynamic: license-file

# ORBIT

Data quality tool for robot imitation learning. Analyzes your demonstration datasets, finds problems, and tells you what to fix before you waste hours training a policy that won't work.

**Who is this for?** Anyone training robot policies with imitation learning — whether you're using LeRobot, RoboMimic, or your own pipeline. If you've collected teleoperation demonstrations and want to know if they're good enough to train on, ORBIT tells you.

## Install

```bash
pip install orbit-robotics
```

Requires **Python 3.10+**. No GPU needed for core analysis.

```bash
# Optional extras
pip install orbit-robotics[vision]   # SigLIP embedding analysis (needs torch)
pip install orbit-robotics[vlm]      # Gemini VLM task analysis
pip install orbit-robotics[hdf5]     # HDF5 dataset support
pip install orbit-robotics[rosbag]   # ROS bag support
pip install orbit-robotics[all]      # Everything
```

Verify your install:

```bash
orbit doctor
```

## Setting Up API Keys

Some features use Google's Gemini API. These are **optional** — core analysis works without any API keys.

```bash
# Get a free API key from https://aistudio.google.com/apikey
export GOOGLE_API_KEY="your-key-here"

# Add to your shell profile to persist:
echo 'export GOOGLE_API_KEY="your-key-here"' >> ~/.zshrc   # macOS
echo 'export GOOGLE_API_KEY="your-key-here"' >> ~/.bashrc  # Linux
```

**What uses GOOGLE_API_KEY:**
| Feature | Flag | Cost | What it does |
|---------|------|------|-------------|
| AI quality judge | `--ai` | ~$0.001/run | Verifies A grades with Gemini |
| Deep analysis | `--deep` | ~$0.01/run | AI-powered root cause analysis |
| VLM assessment | `--vlm` | ~$0.01/run | Vision-language task understanding |
| AI assistant | `orbit assist` | varies | Interactive AI copilot |

## Quick Start

Try ORBIT in 30 seconds — no setup, no data, no API keys:

```bash
orbit quickstart
```

This downloads a public reference dataset and runs a full analysis so you can see what ORBIT does. Then point it at your own data:

```bash
orbit quickstart lerobot/your-dataset
```

## The Workflow

### 1. Analyze your dataset

Point ORBIT at any dataset on HuggingFace Hub, a local directory, or a file:

```bash
orbit analyze lerobot/xarm_lift_medium
```

```
Dataset Readiness: C (score: 65/100)
Usable but has problems — run orbit clean first

  ✓ High consistency (1.00)
  ✓ Sufficient episodes (800) for diffusion_policy
  ✓ Good policy fit (1.00)
  ✗ 4 joints clipping (>10% of frames)
  ✗ High action divergence (0.46) — demos contradict each other

YOUR DATA AT A GLANCE
  Episodes:       800     (top 25%)
  Coverage:       0.84  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░
  Signal Health:  0.00  ░░░░░░░░░░░░░░░░░░░░░░░░░
```

**Common options:**

```bash
orbit analyze lerobot/my-dataset --policy act          # Check fit for specific policy
orbit analyze lerobot/my-dataset --deep                # AI-powered deep analysis (needs GOOGLE_API_KEY)
orbit analyze lerobot/my-dataset --ai                  # AI verification of grades (needs GOOGLE_API_KEY)
orbit analyze lerobot/my-dataset --vlm                 # VLM vision assessment (needs GOOGLE_API_KEY)
orbit analyze lerobot/my-dataset --proxy               # Proxy training go/no-go (needs torch, ~2 min)
orbit analyze lerobot/my-dataset --json                # Machine-readable output
orbit analyze lerobot/my-dataset --full                # All episodes (no sampling)
orbit analyze lerobot/my-dataset --episodes 100        # Limit to 100 episodes
orbit analyze ./local-data/ --format hdf5              # Local HDF5 files
orbit analyze ./recording.mcap --format rosbag         # ROS bag files
```

### 2. Clean bad episodes

```bash
orbit clean lerobot/my-dataset --dry-run    # Preview what would be removed
orbit clean lerobot/my-dataset              # Actually remove bad episodes
```

Identifies and removes bad episodes — aborted demos, dead servos, outliers. Outputs a cleaned dataset you can train on.

```bash
orbit clean lerobot/my-dataset -o user/my-clean-dataset   # Save to new repo
orbit clean lerobot/my-dataset --aggressive               # Also remove borderline episodes
orbit clean lerobot/my-dataset -p act                     # Policy-aware cleaning
```

### 3. Get a training command

```bash
orbit suggest lerobot/my-dataset
```

Recommends the best policy for your data and generates a ready-to-run training command with tuned hyperparameters (batch size, learning rate, horizon, steps).

```bash
orbit suggest lerobot/my-dataset --policy act              # Force specific policy
orbit suggest lerobot/my-dataset --gpu-memory 16           # Optimize for your GPU
orbit suggest lerobot/my-dataset --framework openvla       # Use OpenVLA instead of LeRobot
orbit suggest lerobot/my-dataset --framework all           # Show all framework options
```

Supported frameworks: `lerobot`, `openvla`, `openpi`, `groot`, `osmo`, `custom`.

### 4. Or do it all in one command

```bash
orbit fix lerobot/my-dataset
```

Runs the full pipeline — analyze, clean, suggest — in one shot.

```bash
orbit fix lerobot/my-dataset --dry-run                     # Preview changes first
orbit fix lerobot/my-dataset -p act -o user/clean-data     # Policy + output path
```

## Understanding Grades

ORBIT grades your dataset A through F based on detected problems:

| Grade | Score | Meaning |
|-------|-------|---------|
| **A** | 85-100 | Ready to train — expect strong results |
| **B** | 72-84 | Good data — minor issues, should train well |
| **C** | 58-71 | Usable but has problems — clean first |
| **D** | 40-57 | Significant issues — collect more or better data |
| **F** | 0-39 | Critical problems — fix before training |

Grades are calibrated against 27 real datasets with known training outcomes. An A means the data actually trained successfully; an F means it didn't.

## What ORBIT Checks

- **Dead servos** — joints that stopped moving during collection, wasting model capacity on zero outputs
- **Aborted/corrupted episodes** — too short, too long, or no meaningful motion
- **Clipping joints** — hitting position limits, creating discontinuous action targets
- **Inconsistent demonstrations** — doing different things in similar states, confusing the policy
- **Wrong policy for your data** — ACT needs consistent demos, Diffusion Policy handles multimodal strategies
- **Insufficient data** — not enough episodes for your chosen policy (ACT wants 50+, DP wants 100+)
- **Timing problems** — frame drops, FPS jitter, state-action lag
- **Low workspace coverage** — demos that only cover a narrow region of the task space
- **Episode outliers** — demonstrations that are statistically different from the rest (Modified Z-Score detection)

## Supported Data Formats

| Format | Flag | Source |
|--------|------|--------|
| LeRobot (Hub) | `--format lerobot` | HuggingFace datasets (`lerobot/...`) |
| LeRobot (local) | `--format lerobot-local` | Local LeRobot directories |
| HDF5 | `--format hdf5` | RoboMimic, robosuite, custom `.hdf5` files |
| RLDS | `--format rlds` | TFRecord-based datasets (requires `pip install orbit-robotics[rlds]`) |
| ROS bags | `--format rosbag` | `.bag` and `.mcap` files (requires `pip install orbit-robotics[rosbag]`) |
| Directory | `--format directory` | Flat directories of numpy/CSV files |

Format is auto-detected in most cases. Use `--format` to override when needed.

## Policy Support

| Policy | Flag | Best for |
|--------|------|----------|
| ACT | `--policy act` | Consistent, high-res demos (50+ episodes) |
| Diffusion Policy | `--policy diffusion_policy` | Multimodal strategies (100+ episodes) |
| SmolVLA | `--policy smolvla` | Vision-language tasks, fewer episodes needed |
| DP3 | `--policy dp3` | 3D point cloud observations |
| BC / BC-RNN | `--policy bc` | Large datasets (200+ episodes) |

`--policy auto` (default) recommends the best policy for your data.

## All Commands

### Data Collection

```bash
# Plan a data collection session — generates task-specific checklists
orbit plan "pick up cups" --robot so100 --policy act

# Real-time coach — watches as you collect and gives live feedback
orbit coach ./my-dataset/ --target 50 --policy act

# Select the best episodes from a large dataset
orbit curate lerobot/my-dataset --budget 50 -o curated.json
```

### Training Pipeline

```bash
# Initialize a training project with Makefile and scripts
orbit init --dataset lerobot/my-dataset --policy act -o ./my-project/

# CI/CD quality gate — exits 1 if grade below threshold
orbit gate lerobot/my-dataset -p act --min-grade C

# Full pipeline: gate → train → verify (wraps ANY training command)
orbit train --gate "act --min-grade B" -- lerobot-train --dataset.repo_id=./data --policy.type=act

# Monitor a training run in real-time (alerts on NaN loss, plateaus)
orbit monitor ./outputs/train/my-run/

# Verify training outcome against predicted quality
orbit verify ./outputs/train/my-run/ --success-rate 0.82

# Diagnose a failed training run
orbit debug ./outputs/train/my-run/ --ai
```

### Discovery & Benchmarks

```bash
# Browse LeRobot datasets on HuggingFace
orbit explore --robot koch --limit 10

# Browse 82 published benchmarks with known success rates
orbit benchmark --task pick_and_place --min-success 0.7
orbit benchmark --policy act --top 5

# Compare your results against the community
orbit report lerobot/my-dataset --policy act --success-rate 0.82 --eval-trials 20
```

### Data Conversion

```bash
# Convert between formats (HDF5, ROS bags, RLDS → LeRobot v3)
orbit convert ./recording.mcap --to lerobot-v3 -o ./my-dataset/
orbit convert ./demo.hdf5 --to lerobot-v3 -o ./my-dataset/
orbit convert ./rlds-data/ --to lerobot-v3 -o ./my-dataset/
```

### Utilities

```bash
# Check environment health and dependencies
orbit doctor

# Generate an ORBIT quality badge for your dataset
orbit badge lerobot/my-dataset --push

# Interactive AI assistant for your entire workflow
orbit assist
orbit assist --task "analyze my dataset and tell me if it's ready for ACT"

# Manage configuration
orbit config --show
orbit config --init
orbit config --set default_policy act

# Track progress against a collection plan
orbit track plan.json --dataset lerobot/my-dataset

# Install shell tab-completion
orbit install-completion zsh
```

### JSON Output

Every analysis command supports `--json` for scripting and CI/CD:

```bash
orbit analyze lerobot/my-dataset --json | jq '.grade'
orbit gate lerobot/my-dataset -p act --min-grade B --json
orbit clean lerobot/my-dataset --dry-run --json
```

## Configuration

ORBIT can be configured via `~/.orbit/config.yaml`:

```bash
orbit config --init          # Create default config
orbit config --show          # View current config
orbit config --set default_policy act --set gpu_memory 24
```

## Requirements

- **Python 3.10+** (tested on 3.10, 3.11, 3.12)
- **Internet** for HuggingFace Hub datasets (local files work fully offline)
- **GPU** optional — only needed for `--proxy` training signal and SigLIP embeddings
- **GOOGLE_API_KEY** optional — only for `--deep`, `--ai`, `--vlm`, and `orbit assist`

## License

MIT — see [LICENSE](LICENSE) for details.
