DDPG Implementation Plan¶
For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Goal: Add ddpg as a first-class continuous-control algorithm with training, evaluation, prediction, CLI/config support, public API exports, examples, and tests.
Architecture: Reuse the existing continuous-control runtime patterns from td3 and sac, but keep DDPG explicit instead of overloading TD3 internals. Add a dedicated actor-single-critic model and algorithm update rule, then wire it through the registry and managed API layers so checkpoints, resume, evaluation, and prediction all keep working unchanged.
Tech Stack: Python 3.10, PyTorch, Gymnasium, pytest, setuptools package layout
Task 1: Add failing DDPG coverage¶
Files: - Create: tests/test_ddpg_update.py - Create: tests/test_ddpg_trainer_smoke.py - Create: tests/test_ddpg_reference_script.py - Modify: tests/test_cli.py - Modify: tests/test_public_api.py - Modify: tests/test_package_api_exports.py
Step 1: Write the failing test
Add tests that expect: - MLPDDPGModel.actor() returns bounded actions and DDPG.update() reports named metrics - train_ddpg() produces checkpoints and evaluation metrics - CLI train --config works with algo: ddpg - public API exports DDPG - example reference script runs successfully
Step 2: Run test to verify it fails
Run: pytest tests/test_ddpg_update.py tests/test_ddpg_trainer_smoke.py tests/test_ddpg_reference_script.py tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py -q
Expected: FAIL because the model, algorithm, runtime, and exports do not exist yet.
Task 2: Implement the DDPG model and algorithm¶
Files: - Create: src/rl_training/models/mlp_ddpg.py - Modify: src/rl_training/models/__init__.py - Create: src/rl_training/algorithms/ddpg.py - Modify: src/rl_training/algorithms/__init__.py
Step 1: Write minimal implementation
Implement: - MLPDDPGModel with tanh actor and one critic - ddpg_loss() and DDPG.update() with target actor/critic, Bellman target, actor optimization, and soft updates
Step 2: Run focused tests
Run: pytest tests/test_ddpg_update.py -q
Expected: PASS
Task 3: Add training, evaluation, and registry wiring¶
Files: - Create: src/rl_training/runtime/ddpg_trainer.py - Modify: src/rl_training/experiment/registry.py
Step 1: Write minimal implementation
Implement: - train_ddpg() with replay buffer, normalized action storage, scaled environment actions, checkpoint support, and evaluation - exploration action noise during data collection via algo_kwargs.exploration_noise (default: 0.0) - registry loading/evaluation/prediction support for ddpg
Step 2: Run focused tests
Run: pytest tests/test_ddpg_trainer_smoke.py tests/test_cli.py -q
Expected: PASS
Task 4: Expose DDPG in the package and examples¶
Files: - Modify: src/rl_training/api/algorithms.py - Modify: src/rl_training/api/__init__.py - Modify: src/rl_training/__init__.py - Create: configs/ddpg/pendulum.yaml - Create: examples/ddpg_pendulum_reference.py - Modify: README.md
Step 1: Write minimal implementation
Expose DDPG in the managed API and top-level package, add one Pendulum config and one reference script, and update README to list the new algorithm.
Step 2: Run focused tests
Run: pytest tests/test_ddpg_reference_script.py tests/test_public_api.py tests/test_package_api_exports.py -q
Expected: PASS
Task 5: Verify end-to-end¶
Files: - Verify only
Step 1: Run targeted DDPG coverage
Run: pytest tests/test_ddpg_update.py tests/test_ddpg_trainer_smoke.py tests/test_ddpg_reference_script.py tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py -q
Expected: PASS
Step 2: Run the full suite
Run: pytest -q
Expected: PASS