跳转至

PETS v1 Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Add a narrow but real PETS vertical slice with online ensemble-dynamics training and CEM MPC planning for vector-observation continuous-control environments.

Architecture: Reuse the existing ensemble dynamics model shape already introduced for MOPO, but wire it into an online trainer that collects real environment transitions into a replay buffer and plans actions with a cross-entropy method over finite action sequences. Keep scope honest: single environment only, flat Box observations only, flat continuous Box actions only, no latent world model, no image observations, and no distributed collection.

Tech Stack: Python, PyTorch, Gymnasium, existing TrainConfig / checkpoint / workflow registry, existing ReplayBuffer, existing MLPMOPOEnsembleModel.


Task 1: Add the red-phase PETS tests

Files: - Create: tests/test_pets_update.py - Create: tests/test_pets_trainer_smoke.py - Create: tests/test_pets_reference_script.py - Modify: tests/test_package_api_exports.py - Modify: tests/test_public_api.py - Modify: tests/test_package_smoke.py - Modify: tests/test_cli.py - Modify: tests/test_experiment_manager.py - Modify: tests/test_checkpoint_workflows.py

Step 1: Write the failing tests

Add tests for: - pets_loss returning named metrics - PETS.plan_action() moving toward a high-reward action in a toy deterministic ensemble model - PETS.update() returning metrics and changing model parameters - train_pets() writing a checkpoint and producing evaluation metrics - examples/pets_pendulum_reference.py smoke execution - managed API / package export / config loading / checkpoint evaluation / resume integration

Step 2: Run tests to verify they fail

Run: pytest tests/test_pets_update.py tests/test_pets_trainer_smoke.py tests/test_pets_reference_script.py tests/test_package_api_exports.py tests/test_public_api.py tests/test_package_smoke.py tests/test_cli.py tests/test_experiment_manager.py tests/test_checkpoint_workflows.py -q

Expected: FAIL because PETS modules, config assets, registry hooks, and exports do not exist yet.

Task 2: Implement the PETS algorithm and trainer

Files: - Create: src/rl_training/algorithms/pets.py - Create: src/rl_training/runtime/pets_trainer.py - Modify: src/rl_training/algorithms/__init__.py - Modify: src/rl_training/api/algorithms.py - Modify: src/rl_training/api/__init__.py - Modify: src/rl_training/__init__.py - Modify: src/rl_training/experiment/registry.py

Step 1: Implement the minimal algorithm

Add: - pets_loss() for supervised ensemble-dynamics metrics - PETS.update() for fitting next-state-delta and reward predictions on replay samples - PETS.plan_action() with CEM over finite-horizon action sequences - state_dict() / load_state_dict() / train/eval mode helpers

Step 2: Implement the trainer

Add: - environment-space validation for flat Box observations and actions - single-env online loop using ReplayBuffer - random warmup steps before planning starts - configurable model updates per step - planner-backed train/eval/predict helpers - checkpoint save / load / resume behavior through existing workflow hooks

Step 3: Run focused PETS tests

Run: pytest tests/test_pets_update.py tests/test_pets_trainer_smoke.py -q

Expected: PASS.

Task 3: Wire package surface, config, example, and docs

Files: - Create: configs/pets/pendulum.yaml - Create: src/rl_training/assets/configs/pets/pendulum.yaml - Create: examples/pets_pendulum_reference.py - Modify: README.md - Modify: docs/plans/2026-03-12-rl-yearly-sourcebook-design.md

Step 1: Add packaged config and example script

Expose a minimal Pendulum preset and a small runnable reference script.

Step 2: Update package docs

Document PETS as the first narrow online ensemble-model + MPC planning lane and place it into the yearly sourcebook under the 2018 model-based planning wave.

Step 3: Run example / config / surface regressions

Run: pytest tests/test_pets_reference_script.py tests/test_package_smoke.py tests/test_cli.py tests/test_experiment_manager.py tests/test_package_api_exports.py tests/test_public_api.py tests/test_checkpoint_workflows.py -q

Expected: PASS.

Task 4: Full verification

Files: - No new files

Step 1: Run complete suite

Run: pytest -q

Expected: PASS with no PETS-specific failures.

Step 2: Re-read requirements

Verify PETS now supports: - train - evaluate checkpoint - resume - predict - packaged config asset - example script - README / yearly sourcebook mention - tests covering update + trainer + workflow integration