OpenAI ES V1 Implementation Plan¶
For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Goal: Add a narrow but honest OpenAI ES baseline for vector-observation, continuous-action control by training a deterministic MLP policy with mirrored parameter perturbations and rank-based evolution updates.
Architecture: Keep the release deliberately small and aligned with the current search-based continuous-control package surface. Reuse the new deterministic search-policy model family, existing run directories, checkpointing, evaluation/prediction workflow, action scaling, and managed API, but add a dedicated OpenAI ES learner and trainer built around synchronous positive/negative perturbation rollouts with centered-rank utilities. Explicitly do not implement parallel workers, distributed gradient aggregation, novelty search, observation normalization across processes, or discrete-action variants in this batch.
Tech Stack: Python 3.10, PyTorch, Gymnasium, pytest, existing rl_training runtime and experiment infrastructure.
Task 1: Add failing OpenAI ES coverage¶
Files: - Create: tests/test_openai_es_update.py - Create: tests/test_openai_es_trainer_smoke.py - Create: tests/test_openai_es_reference_script.py - Modify: tests/test_package_api_exports.py - Modify: tests/test_public_api.py - Modify: tests/test_checkpoint_workflows.py - Modify: tests/test_experiment_manager.py - Modify: tests/test_cli.py - Modify: tests/test_package_smoke.py
Step 1: Write the failing test - openai_es_loss(...) returns named search metrics for mirrored returns, utilities, and parameter updates. - OpenAIES.update(...) consumes perturbations and mirrored rollout returns. - train_openai_es() writes a checkpoint and evaluation metrics on Pendulum-v1. - root/api/algorithms package exports include OpenAIES. - checkpoint workflows can evaluate and resume an openai_es checkpoint. - packaged config resolves outside repo root and the reference script runs as a smoke command.
Step 2: Run test to verify it fails Run: pytest -q tests/test_openai_es_update.py tests/test_openai_es_trainer_smoke.py tests/test_openai_es_reference_script.py tests/test_package_api_exports.py tests/test_public_api.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py tests/test_cli.py tests/test_package_smoke.py Expected: FAIL with missing openai_es modules / exports.
Task 2: Implement the OpenAI ES learner¶
Files: - Create: src/rl_training/algorithms/openai_es.py - Modify: src/rl_training/algorithms/__init__.py
Step 1: Write minimal implementation - reuse MLPARSModel as the deterministic search policy. - implement centered-rank utilities over mirrored returns and evolution-style parameter updates. - keep scope explicit: vector observations only, continuous Box actions only, synchronous rollout evaluation only.
Step 2: Run tests to verify it passes Run: pytest -q tests/test_openai_es_update.py Expected: PASS.
Task 3: Implement trainer and workflow integration¶
Files: - Create: src/rl_training/runtime/openai_es_trainer.py - Modify: src/rl_training/experiment/registry.py - Modify: src/rl_training/api/algorithms.py - Modify: src/rl_training/api/__init__.py - Modify: src/rl_training/__init__.py
Step 1: Write minimal implementation - reuse the current continuous-control action scaling and evaluation path. - collect mirrored perturbation rollouts synchronously for each evolution update. - support train / eval / resume / predict through registry wiring.
Step 2: Run tests to verify it passes Run: pytest -q tests/test_openai_es_trainer_smoke.py tests/test_package_api_exports.py tests/test_public_api.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py Expected: PASS.
Task 4: Add config, example, and docs¶
Files: - Create: configs/openai_es/pendulum.yaml - Create: src/rl_training/assets/configs/openai_es/pendulum.yaml - Create: examples/openai_es_pendulum_reference.py - Modify: README.md - Modify: docs/plans/2026-03-12-rl-yearly-sourcebook-design.md
Step 1: Write minimal implementation - add a runnable openai_es Pendulum preset. - add a reference script for a tiny synchronous run. - update README and yearly sourcebook to mark OpenAI ES as implemented in a narrow v1 form.
Step 2: Run tests to verify it passes Run: pytest -q tests/test_openai_es_reference_script.py tests/test_cli.py tests/test_package_smoke.py Expected: PASS.
Task 5: Regression verification¶
Files: - Modify only if verification reveals regressions.
Step 1: Run focused regression coverage Run: pytest -q tests/test_openai_es_update.py tests/test_openai_es_trainer_smoke.py tests/test_openai_es_reference_script.py tests/test_package_api_exports.py tests/test_public_api.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py tests/test_cli.py tests/test_package_smoke.py tests/test_ars_update.py tests/test_ars_trainer_smoke.py tests/test_ddpg_update.py tests/test_ddpg_trainer_smoke.py tests/test_td3_update.py tests/test_td3_trainer_smoke.py Expected: PASS.
Step 2: Run full suite Run: pytest -q Expected: PASS.