跳转至

PPG V1 Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Add a narrow but honest PPG baseline for vector-observation, discrete-action environments by extending the current PPO lane with a separate auxiliary phase.

Architecture: Keep this release deliberately small. Reuse the current on-policy rollout and evaluation stack, but add a dedicated PPG model/algorithm/trainer rather than overloading PPO. Implement a shared MLP trunk with policy, value, and auxiliary-value heads; run PPO-style policy updates during the policy phase and periodic auxiliary updates with value regression plus policy-distillation KL. Explicitly do not implement CNN/image PPG, recurrent PPG, or Atari-specific tuning in this batch.

Tech Stack: Python 3.10, PyTorch, Gymnasium, pytest, existing rl_training config/checkpoint/API stack.


Task 1: Add failing PPG coverage

Files: - Create: tests/test_ppg_update.py - Create: tests/test_ppg_trainer_smoke.py - Create: tests/test_ppg_reference_script.py - Modify: tests/test_package_api_exports.py - Modify: tests/test_public_api.py - Modify: tests/test_checkpoint_workflows.py - Modify: tests/test_experiment_manager.py - Modify: tests/test_cli.py - Modify: tests/test_package_smoke.py

Step 1: Write the failing test - MLPPPGModel.act() returns policy outputs for discrete vector observations. - ppg_loss() returns PPO-style metrics and ppg_auxiliary_loss() returns auxiliary metrics. - PPG.update() returns policy-phase metrics and PPG.auxiliary_update() returns auxiliary-phase metrics. - train_ppg() writes a checkpoint and evaluation metrics on CartPole-v1. - root/api/algorithms package exports include PPG. - checkpoint workflows can evaluate and resume a saved ppg checkpoint. - packaged config resolves outside repo root and reference script runs as smoke command.

Step 2: Run test to verify it fails Run: pytest -q tests/test_ppg_update.py tests/test_ppg_trainer_smoke.py tests/test_ppg_reference_script.py tests/test_package_api_exports.py tests/test_public_api.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py tests/test_cli.py tests/test_package_smoke.py Expected: FAIL with missing ppg modules / exports.

Task 2: Implement PPG model and algorithm

Files: - Create: src/rl_training/models/mlp_ppg.py - Create: src/rl_training/algorithms/ppg.py - Modify: src/rl_training/models/__init__.py

Step 1: Write minimal implementation - MLPPPGModel with shared MLP trunk plus policy head, value head, and auxiliary value head. - act() and evaluate_actions() mirror discrete on-policy APIs. - PPG.update() performs PPO-style policy updates. - PPG.auxiliary_update() performs auxiliary value regression plus KL distillation against a frozen teacher snapshot. - serialize policy and auxiliary optimizer state plus update counters.

Step 2: Run tests to verify it passes Run: pytest -q tests/test_ppg_update.py Expected: PASS.

Task 3: Implement trainer and registry integration

Files: - Create: src/rl_training/runtime/ppg_trainer.py - Modify: src/rl_training/experiment/registry.py - Modify: src/rl_training/api/algorithms.py - Modify: src/rl_training/api/__init__.py - Modify: src/rl_training/__init__.py - Modify: src/rl_training/algorithms/__init__.py

Step 1: Write minimal implementation - train_ppg() reuses rollout collection from PPO for vector discrete envs only. - cache the latest rollout observations and returns for periodic auxiliary phases. - support train / eval / resume / predict through registry load/evaluate/predict hooks. - add managed API class PPG and export wiring across root/api/algorithms surfaces.

Step 2: Run tests to verify it passes Run: pytest -q tests/test_ppg_trainer_smoke.py tests/test_package_api_exports.py tests/test_public_api.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py Expected: PASS.

Task 4: Add config assets, example, and docs

Files: - Create: configs/ppg/cartpole.yaml - Create: src/rl_training/assets/configs/ppg/cartpole.yaml - Create: examples/ppg_cartpole_reference.py - Modify: README.md - Modify: docs/plans/2026-03-12-rl-yearly-sourcebook-design.md

Step 1: Write minimal implementation - add a runnable ppg CartPole preset. - add a reference script for a tiny ppg run. - update README and yearly sourcebook to mark PPG as implemented narrow v1 and keep scope explicit.

Step 2: Run tests to verify it passes Run: pytest -q tests/test_ppg_reference_script.py tests/test_cli.py tests/test_package_smoke.py Expected: PASS.

Task 5: Regression verification

Files: - Modify only if verification reveals regressions.

Step 1: Run focused regression coverage Run: pytest -q tests/test_ppg_update.py tests/test_ppg_trainer_smoke.py tests/test_ppg_reference_script.py tests/test_package_api_exports.py tests/test_public_api.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py tests/test_cli.py tests/test_package_smoke.py tests/test_ppo_update.py tests/test_trainer_smoke.py tests/test_reference_script.py Expected: PASS.

Step 2: Run full suite Run: pytest -q Expected: PASS.