跳转至

Noisy DQN Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Add noisy_dqn as a first-class DQN-family algorithm using parameter noise for exploration, with configs, examples, registry wiring, public API exports, and tests.

Architecture: Keep the existing DQN update rule and trainer, but introduce a MLPNoisyQNetwork model. The DQN trainer and registry select this network when config.algo == "noisy_dqn". Expose NoisyDQN as a managed high-level API and a named low-level algorithm class for clean exports.

Tech Stack: Python 3.10, PyTorch, Gymnasium, pytest


Task 1: Add failing Noisy DQN coverage

Files: - Modify: tests/test_dqn_update.py - Modify: tests/test_dqn_trainer_smoke.py - Modify: tests/test_cli.py - Modify: tests/test_public_api.py - Modify: tests/test_package_api_exports.py - Create: tests/test_noisy_dqn_reference_script.py

Step 1: Write the failing test

Add tests that expect: - MLPNoisyQNetwork exists and outputs correct shape - algo: noisy_dqn can be trained via registry and writes a checkpoint - CLI train --config works with algo: noisy_dqn - managed API exports NoisyDQN - the reference script runs successfully

Step 2: Run test to verify it fails

Run: pytest tests/test_dqn_update.py tests/test_dqn_trainer_smoke.py tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py tests/test_noisy_dqn_reference_script.py -q

Expected: FAIL because the model, registry wiring, exports, and scripts do not exist yet.

Task 2: Implement the Noisy Q-network

Files: - Create: src/rl_training/models/mlp_noisy_q_network.py - Modify: src/rl_training/models/__init__.py

Step 1: Write minimal implementation

Implement: - NoisyLinear (factorized Gaussian noise) - MLPNoisyQNetwork with .act() matching existing Q-networks - noise active only in train() mode; deterministic in eval() mode

Step 2: Run focused tests

Run: pytest tests/test_dqn_update.py::test_mlp_noisy_q_network_forward_shape -q

Expected: PASS

Task 3: Wire into trainer, registry, and API exports

Files: - Modify: src/rl_training/runtime/dqn_trainer.py - Modify: src/rl_training/experiment/registry.py - Modify: src/rl_training/algorithms/dqn.py - Modify: src/rl_training/algorithms/__init__.py - Modify: src/rl_training/api/algorithms.py - Modify: src/rl_training/api/__init__.py - Modify: src/rl_training/__init__.py

Step 1: Write minimal implementation

Implement: - trainer network selection for noisy_dqn - registry load/evaluate/predict for noisy_dqn - public managed API class NoisyDQN - low-level named algorithm class NoisyDQN (subclass of DQN) for exports

Step 2: Run focused tests

Run: pytest tests/test_dqn_trainer_smoke.py tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py -q

Expected: PASS

Task 4: Add config + example + README mention

Files: - Create: configs/noisy_dqn/cartpole.yaml - Create: examples/noisy_dqn_cartpole_reference.py - Modify: README.md

Step 1: Write minimal implementation

Add a CartPole config and a reference script consistent with existing examples, and mention Noisy DQN in README’s algorithm list.

Step 2: Run focused tests

Run: pytest tests/test_noisy_dqn_reference_script.py -q

Expected: PASS

Task 5: Verify end-to-end

Files: - Verify only

Step 1: Run targeted coverage

Run: pytest tests/test_dqn_update.py tests/test_dqn_trainer_smoke.py tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py tests/test_noisy_dqn_reference_script.py -q

Expected: PASS

Step 2: Run full suite

Run: pytest -q

Expected: PASS