跳转至

DrQ V1 Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Add a narrow but honest DrQ baseline for pixel-observation, continuous-action environments by combining image augmentation with a SAC-style stochastic actor-critic update.

Architecture: Keep this release deliberately small and distinct from the existing DrQ-v2 lane. Reuse the package's image wrappers and NatureCNN encoder path, but implement a separate DrQ algorithm/model/trainer using a squashed Gaussian actor, twin critics, target networks, and random-shift augmentation on image batches. Explicitly do not implement discrete-action DrQ, contrastive auxiliary losses, or distributed data collection in this batch.

Tech Stack: Python 3.10, PyTorch, Gymnasium, pytest, existing rl_training config/checkpoint/API stack.


Task 1: Add failing DrQ coverage

Files: - Create: tests/test_drq_update.py - Create: tests/test_drq_trainer_smoke.py - Create: tests/test_drq_reference_script.py - Modify: tests/test_package_api_exports.py - Modify: tests/test_public_api.py - Modify: tests/test_checkpoint_workflows.py

Step 1: Write the failing test - drq_loss() returns named SAC-style metrics for critic, actor, target-Q, and entropy term. - DrQ.update() returns named metrics on image batches and tracks update count. - train_drq() writes a checkpoint and evaluation metrics on a tiny rendered continuous env using pixel wrappers. - root/api/algorithms package exports include DrQ. - reference script runs as a smoke command. - checkpoint workflows can evaluate a saved drq checkpoint.

Step 2: Run test to verify it fails Run: pytest -q tests/test_drq_update.py tests/test_drq_trainer_smoke.py tests/test_drq_reference_script.py tests/test_package_api_exports.py tests/test_public_api.py tests/test_checkpoint_workflows.py Expected: FAIL with missing drq modules / exports.

Task 2: Implement image SAC model and DrQ algorithm

Files: - Create: src/rl_training/models/cnn/drq.py - Create: src/rl_training/algorithms/drq.py - Modify: src/rl_training/models/cnn/__init__.py - Modify: src/rl_training/models/__init__.py

Step 1: Write minimal implementation - CNNDrQModel reuses NatureCNN for separate actor/critic encoders. - actor outputs tanh-squashed Gaussian actions with log-prob correction. - twin critics score encoded observation-action pairs. - DrQ mirrors the existing SAC update flow but applies random-shift augmentation to current and next observations. - keep v1 fixed-entropy-coefficient behavior (alpha from config) rather than automatic temperature tuning.

Step 2: Run tests to verify it passes Run: pytest -q tests/test_drq_update.py Expected: PASS.

Task 3: Implement trainer and workflow integration

Files: - Create: src/rl_training/runtime/drq_trainer.py - Modify: src/rl_training/experiment/registry.py - Modify: src/rl_training/api/algorithms.py - Modify: src/rl_training/api/__init__.py - Modify: src/rl_training/__init__.py - Modify: src/rl_training/algorithms/__init__.py

Step 1: Write minimal implementation - train_drq() follows the existing continuous-control off-policy trainer structure. - support only channel-first image observations and 1D continuous actions. - collect normalized actions from the stochastic actor and scale to env bounds. - add load/evaluate/predict registry hooks for saved drq checkpoints. - add managed API class DrQ and export wiring across root/api/algorithms surfaces.

Step 2: Run tests to verify it passes Run: pytest -q tests/test_drq_trainer_smoke.py tests/test_package_api_exports.py tests/test_public_api.py tests/test_checkpoint_workflows.py Expected: PASS.

Task 4: Add config assets, reference example, and docs

Files: - Create: configs/drq/pendulum_pixels.yaml - Create: src/rl_training/assets/configs/drq/pendulum_pixels.yaml - Create: examples/drq_pendulum_reference.py - Modify: README.md - Modify: docs/plans/2026-03-12-rl-yearly-sourcebook-design.md

Step 1: Write minimal implementation - add a runnable pixel Pendulum config for drq. - add a reference script that trains/evaluates/exports a tiny drq run. - update README and yearly sourcebook to distinguish DrQ from DrQ-v2 and mark DrQ as implemented narrow v1.

Step 2: Run tests to verify it passes Run: pytest -q tests/test_drq_reference_script.py tests/test_package_smoke.py tests/test_cli.py Expected: PASS.

Task 5: Regression verification

Files: - Modify only if verification reveals regressions.

Step 1: Run focused regression coverage Run: pytest -q tests/test_drq_update.py tests/test_drq_trainer_smoke.py tests/test_drq_reference_script.py tests/test_package_api_exports.py tests/test_public_api.py tests/test_checkpoint_workflows.py tests/test_drqv2_update.py tests/test_drqv2_trainer_smoke.py tests/test_sac_update.py tests/test_sac_trainer_smoke.py Expected: PASS.

Step 2: Run full suite Run: pytest -q Expected: PASS.