Metadata-Version: 2.4
Name: rpx-agent
Version: 0.2.1
Summary: RPX — wrap any robot training command with full end-to-end analytics.
Project-URL: Homepage, https://robosynx.com
Project-URL: Documentation, https://robosynx.com/docs
Project-URL: Repository, https://github.com/roboprotx/rpx-agent
Project-URL: Bug Tracker, https://github.com/roboprotx/rpx-agent/issues
Author-email: RoboProtX <hello@robosynx.com>
License: MIT
Keywords: isaac-lab,mujoco,reinforcement-learning,robotics,telemetry,training
Classifier: Environment :: Console
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.9
Provides-Extra: full
Requires-Dist: psutil>=5.9; extra == 'full'
Requires-Dist: pyyaml>=6.0; extra == 'full'
Provides-Extra: sysinfo
Requires-Dist: psutil>=5.9; extra == 'sysinfo'
Provides-Extra: yaml
Requires-Dist: pyyaml>=6.0; extra == 'yaml'
Description-Content-Type: text/markdown

# rpx-agent

Wrap any robot training command with one line and get full end-to-end analytics on [robosynx.com](https://robosynx.com).

**What it collects automatically — zero code changes to your training script:**

| Signal | How |
|--------|-----|
| Live logs (stdout + stderr) | Streamed in real-time |
| RL metrics (reward, KL, loss, entropy…) | Parsed from stdout — SB3, RSL-RL, Isaac Lab, CleanRL, generic |
| Termination reasons & reward components | Parsed from `Episode_Termination/` and `Episode_Reward/` lines |
| GPU / CPU / RAM | `nvidia-smi` + `psutil` every 60 s |
| Checkpoints / artifacts | File watcher: `.pt`, `.pkl`, `.ckpt`, `.safetensors`… |
| Environment snapshot | Python version, CUDA, GPU name, git SHA, detected simulator |
| Heartbeat | Every 30 s — run stays marked alive during Isaac Sim loading |
| Offline buffering | Events spooled to disk when backend unreachable; replayed automatically |

---

## Install

```bash
pip install rpx-agent
# Optional extras for richer system telemetry:
pip install "rpx-agent[full]"   # adds psutil + PyYAML
```

Local development:
```bash
pip install -e ./robotrainx-agent
```

---

## Flow 1: pip install + `rpx-agent run` (recommended)

**3 commands, then training is instrumented:**

```bash
# 1. Authenticate
rpx-agent login --api-key YOUR_KEY

# 2. (optional) Initialise project config in your training directory
cd /path/to/your/robot/project
rpx-agent init

# 3. Wrap your existing training command — nothing else changes
rpx-agent run -- python train.py --num-envs 1024
rpx-agent run -- python train_headless.py --agent AnymalC --headless
rpx-agent run -- bash scripts/train.sh
```

**Auto-detection:** `--task` and `--platform` are inferred automatically.
No need to add any flags unless you want to override.

> `rpx` also works as a short alias: `rpx run -- python train.py`

### All flags for `rpx-agent run`

```
--task TEXT          Experiment name (default: inferred from script filename)
--label TEXT         Human-readable label shown in dashboard
--platform TEXT      isaaclab | mujoco | gazebo | custom (auto-detected)
--run-id TEXT        Explicit run ID (auto-generated UUID if omitted)
--tags TEXT          Comma-separated tags
--watch-dir DIR      Extra dir to watch for checkpoints (repeatable)
--no-metrics         Disable stdout metric parsing
--no-sysinfo         Disable GPU/CPU telemetry
--no-artifacts       Disable checkpoint file detection
--batch-size N       Log events per HTTP batch (default: 80)
--flush-interval F   Seconds between log flushes (default: 1.5)
```

---

## Flow 2: SSH connect (HPC / shared clusters)

For machines where you can't install packages (SLURM, university HPC):

1. In the RoboProtX dashboard → **Remote Hosts** → Add SSH host
2. Paste your SSH key and remote training command
3. RoboProtX SSHes in, runs your command, streams logs back
4. Same analytics pipeline — failure intelligence, sim-to-real, promotion gate

---

## Flow 3: Docker self-host (on-prem enterprise)

```bash
cp .env.example .env
# Set ROBOTRAINX_API_KEY, ISAACMONITOR_DB_URL, JWT_SECRET in .env
docker compose up -d postgres backend frontend
```

Then use `rpx-agent run` pointed at your local backend:
```bash
export ROBOTRAINX_SERVER_URL=http://your-server:3001
rpx-agent run -- python train.py
```

---

## Project config (`roboprotx.yaml`)

`rpx-agent init` creates this automatically. You can also write it manually:

```yaml
project: anymal-locomotion
simulator: isaaclab
server_url: https://api.robosynx.com

watch_dirs:
  - .
  - logs/checkpoints
```

The agent searches for `roboprotx.yaml` from your current directory up to the filesystem root.

---

## Environment variables

| Variable | Aliases | Description |
|----------|---------|-------------|
| `ROBOTRAINX_API_KEY` | `IM_API_KEY`, `ROBOPROTX_API_KEY` | API key |
| `ROBOTRAINX_SERVER_URL` | `IM_SERVER_URL`, `ROBOPROTX_SERVER_URL` | Backend URL |

---

## Edge cases handled

- **Backend unreachable at start** — runs offline, all events spooled to `~/.robotrainx-agent/spool/`, replayed when connection restored
- **Binary / non-UTF-8 output** (Isaac Sim OpenGL) — decoded with `errors=replace`, never crashes
- **Long lines > 8 KB** — truncated with `...[truncated]` marker
- **Orphan GPU processes on Ctrl+C** — kills entire process group (`SIGKILL` on Linux, `taskkill /T` on Windows)
- **Command not found** — friendly error with PATH hint, exits 127
- **Missing API key on production server** — warns clearly with `rpx-agent login` instructions
- **SLURM / multi-process training** — wrap the `srun` or `torchrun` command directly
- **Isaac Sim long startup** (5-10 min silent) — heartbeat thread keeps run alive
- **Rate limiting (HTTP 429)** — automatic exponential backoff retry
- **401/403 auth errors** — clear message with `rpx-agent login` instructions

---

## Supported log formats (auto-parsed)

| Framework | Detected signal |
|-----------|----------------|
| **Stable Baselines3** | `\| rollout/ep_rew_mean \| 4.23 \|` table |
| **RSL-RL / Isaac Lab** | `Learning iteration 100/1000` blocks |
| **CleanRL** | `global_step=51200, episodic_return=4.23` |
| **Generic** | `reward=4.23 iteration=100 kl=0.012` |
| **IsaacMonitor / rpx patch** | `[IsaacMonitor] failure captured: base_contact rate=0.30` |
| **Episode labels** | `Episode_Termination/base_contact: 0.30` |

---

## Publish to PyPI

```bash
python -m build ./robotrainx-agent
python -m twine upload robotrainx-agent/dist/*
```

Or just push a version tag — GitHub Actions auto-publishes:
```bash
git tag v0.2.1; git push origin v0.2.1
```
