Noisy DQN Implementation Plan¶
For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Goal: Add noisy_dqn as a first-class DQN-family algorithm using parameter noise for exploration, with configs, examples, registry wiring, public API exports, and tests.
Architecture: Keep the existing DQN update rule and trainer, but introduce a MLPNoisyQNetwork model. The DQN trainer and registry select this network when config.algo == "noisy_dqn". Expose NoisyDQN as a managed high-level API and a named low-level algorithm class for clean exports.
Tech Stack: Python 3.10, PyTorch, Gymnasium, pytest
Task 1: Add failing Noisy DQN coverage¶
Files: - Modify: tests/test_dqn_update.py - Modify: tests/test_dqn_trainer_smoke.py - Modify: tests/test_cli.py - Modify: tests/test_public_api.py - Modify: tests/test_package_api_exports.py - Create: tests/test_noisy_dqn_reference_script.py
Step 1: Write the failing test
Add tests that expect: - MLPNoisyQNetwork exists and outputs correct shape - algo: noisy_dqn can be trained via registry and writes a checkpoint - CLI train --config works with algo: noisy_dqn - managed API exports NoisyDQN - the reference script runs successfully
Step 2: Run test to verify it fails
Run: pytest tests/test_dqn_update.py tests/test_dqn_trainer_smoke.py tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py tests/test_noisy_dqn_reference_script.py -q
Expected: FAIL because the model, registry wiring, exports, and scripts do not exist yet.
Task 2: Implement the Noisy Q-network¶
Files: - Create: src/rl_training/models/mlp_noisy_q_network.py - Modify: src/rl_training/models/__init__.py
Step 1: Write minimal implementation
Implement: - NoisyLinear (factorized Gaussian noise) - MLPNoisyQNetwork with .act() matching existing Q-networks - noise active only in train() mode; deterministic in eval() mode
Step 2: Run focused tests
Run: pytest tests/test_dqn_update.py::test_mlp_noisy_q_network_forward_shape -q
Expected: PASS
Task 3: Wire into trainer, registry, and API exports¶
Files: - Modify: src/rl_training/runtime/dqn_trainer.py - Modify: src/rl_training/experiment/registry.py - Modify: src/rl_training/algorithms/dqn.py - Modify: src/rl_training/algorithms/__init__.py - Modify: src/rl_training/api/algorithms.py - Modify: src/rl_training/api/__init__.py - Modify: src/rl_training/__init__.py
Step 1: Write minimal implementation
Implement: - trainer network selection for noisy_dqn - registry load/evaluate/predict for noisy_dqn - public managed API class NoisyDQN - low-level named algorithm class NoisyDQN (subclass of DQN) for exports
Step 2: Run focused tests
Run: pytest tests/test_dqn_trainer_smoke.py tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py -q
Expected: PASS
Task 4: Add config + example + README mention¶
Files: - Create: configs/noisy_dqn/cartpole.yaml - Create: examples/noisy_dqn_cartpole_reference.py - Modify: README.md
Step 1: Write minimal implementation
Add a CartPole config and a reference script consistent with existing examples, and mention Noisy DQN in README’s algorithm list.
Step 2: Run focused tests
Run: pytest tests/test_noisy_dqn_reference_script.py -q
Expected: PASS
Task 5: Verify end-to-end¶
Files: - Verify only
Step 1: Run targeted coverage
Run: pytest tests/test_dqn_update.py tests/test_dqn_trainer_smoke.py tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py tests/test_noisy_dqn_reference_script.py -q
Expected: PASS
Step 2: Run full suite
Run: pytest -q
Expected: PASS