REDQ Implementation Plan¶
For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Goal: Add the REDQ algorithm with end-to-end train/evaluate/predict/resume support for continuous-control tasks.
Architecture: Reuse the existing SAC/TQC continuous-control pipeline and add a dedicated REDQ model, algorithm, and trainer. REDQ will use an SAC-style squashed-Gaussian actor with an ensemble of scalar critics, randomized subset target selection, and configurable multiple gradient updates per environment collection step.
Tech Stack: Python, PyTorch, Gymnasium, pytest
Task 1: Define REDQ behavior with failing unit tests¶
Files: - Create: tests/test_redq_update.py - Reference: tests/test_tqc_update.py
Step 1: Write the failing test
Add tests that define: - the REDQ MLP model samples bounded actions and returns (batch, num_critics) Q-values - the random subset selector returns the requested number of critic indices without duplicates - redq_loss(...) returns named metrics - REDQ.update(...) returns one gradient step and named metrics
Step 2: Run test to verify it fails
Run: PYTHONPATH=src pytest tests/test_redq_update.py -q
Expected: FAIL because the REDQ modules do not exist yet.
Step 3: Write minimal implementation
Create the REDQ model and algorithm with only the behavior required by the tests.
Step 4: Run test to verify it passes
Run: PYTHONPATH=src pytest tests/test_redq_update.py -q
Expected: PASS
Task 2: Add REDQ training loop coverage¶
Files: - Create: src/rl_training/runtime/redq_trainer.py - Create: tests/test_redq_trainer_smoke.py - Reference: src/rl_training/runtime/sac_trainer.py - Reference: src/rl_training/runtime/tqc_trainer.py
Step 1: Write the failing test
Add a smoke test that trains REDQ for a short Pendulum-v1 run and asserts checkpoint creation plus basic metrics such as global_step and eval_return_mean.
Step 2: Run test to verify it fails
Run: PYTHONPATH=src pytest tests/test_redq_trainer_smoke.py -q
Expected: FAIL because the REDQ trainer does not exist yet.
Step 3: Write minimal implementation
Implement train_redq(...) and evaluation helpers by adapting the continuous-control trainer flow already used by SAC/TQC.
Step 4: Run test to verify it passes
Run: PYTHONPATH=src pytest tests/test_redq_trainer_smoke.py -q
Expected: PASS
Task 3: Wire registry, API, CLI-facing integration, and example coverage¶
Files: - Create: configs/redq/pendulum.yaml - Create: examples/redq_pendulum_reference.py - Create: tests/test_redq_reference_script.py - Modify: src/rl_training/models/__init__.py - Modify: src/rl_training/algorithms/__init__.py - Modify: src/rl_training/api/algorithms.py - Modify: src/rl_training/api/__init__.py - Modify: src/rl_training/__init__.py - Modify: src/rl_training/experiment/registry.py - Modify: tests/test_cli.py - Modify: tests/test_public_api.py - Modify: tests/test_package_api_exports.py - Modify: tests/test_checkpoint_workflows.py - Modify: tests/test_experiment_manager.py
Step 1: Write the failing tests
Add focused coverage for: - CLI train command with a REDQ YAML config - managed public API learn/evaluate/predict flow - package export visibility - checkpoint evaluation workflow - experiment manager spec registration - example script smoke execution
Step 2: Run tests to verify they fail
Run: PYTHONPATH=src pytest tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py tests/test_redq_reference_script.py -q
Expected: FAIL because REDQ is not yet registered end-to-end.
Step 3: Write minimal implementation
Register REDQ everywhere the existing algorithms are exposed and add a small reference config/example pair for local training.
Step 4: Run tests to verify they pass
Run: PYTHONPATH=src pytest tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py tests/test_redq_reference_script.py -q
Expected: PASS
Task 4: Final verification¶
Files: - Verify only
Step 1: Run targeted REDQ suite
Run: PYTHONPATH=src pytest tests/test_redq_update.py tests/test_redq_trainer_smoke.py tests/test_redq_reference_script.py tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py -q
Expected: PASS
Step 2: Run full test suite
Run: PYTHONPATH=src pytest -q
Expected: PASS