跳转至

CURL V1 Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Add a narrow but honest CURL baseline for pixel-observation, continuous-action environments by extending the current DrQ lane with a contrastive auxiliary representation loss.

Architecture: Keep this release deliberately small and architecture-compatible with the existing image-control path. Reuse the current pixel wrappers, replay-buffer runtime, action-scaling path, and SAC-style actor-critic update shape from DrQ, but introduce a dedicated CURL model/algorithm/trainer so the auxiliary contrastive encoder update is explicit. Implement a CNN-backed actor, twin critics, target networks, random-shift augmentation, and an InfoNCE-style contrastive loss over two augmented views of the same observation. Explicitly do not implement Atari/discrete-action CURL, offline pixel datasets, or the broader Decision Transformer / world-model stack in this batch.

Tech Stack: Python 3.10, PyTorch, Gymnasium, pytest, existing rl_training config/checkpoint/API stack.


Task 1: Add failing CURL coverage

Files: - Create: tests/test_curl_update.py - Create: tests/test_curl_trainer_smoke.py - Create: tests/test_curl_reference_script.py - Modify: tests/test_package_api_exports.py - Modify: tests/test_public_api.py - Modify: tests/test_checkpoint_workflows.py - Modify: tests/test_experiment_manager.py - Modify: tests/test_cli.py - Modify: tests/test_package_smoke.py

Step 1: Write the failing test - CNNCURLModel.sample_actions() returns policy outputs for pixel observations and continuous actions. - curl_loss() returns named critic / actor / contrastive metrics. - CURL.update() returns metrics including curl_loss and tracks update count. - train_curl() writes a checkpoint and evaluation metrics on a tiny render-backed pixel env. - root/api/algorithms package exports include CURL. - checkpoint workflows can evaluate and resume a saved curl checkpoint. - packaged config resolves outside repo root and reference script runs as smoke command.

Step 2: Run test to verify it fails Run: pytest -q tests/test_curl_update.py tests/test_curl_trainer_smoke.py tests/test_curl_reference_script.py tests/test_package_api_exports.py tests/test_public_api.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py tests/test_cli.py tests/test_package_smoke.py Expected: FAIL with missing curl modules / exports.

Task 2: Implement CURL model and algorithm

Files: - Create: src/rl_training/models/cnn/curl.py - Create: src/rl_training/algorithms/curl.py - Modify: src/rl_training/models/cnn/__init__.py - Modify: src/rl_training/models/__init__.py

Step 1: Write minimal implementation - CNNCURLModel with actor encoder, critic encoder, twin critics, and a contrastive projection head. - sample_actions() and q_values() mirror the current image continuous-control APIs. - curl_embeddings() exposes projected critic features for the auxiliary contrastive loss. - CURL.update() performs critic, actor, and contrastive auxiliary updates while keeping target-network synchronization explicit. - serialize optimizer state, target model state, entropy coefficient, and update counters.

Step 2: Run tests to verify it passes Run: pytest -q tests/test_curl_update.py Expected: PASS.

Task 3: Implement trainer and registry integration

Files: - Create: src/rl_training/runtime/curl_trainer.py - Modify: src/rl_training/experiment/registry.py - Modify: src/rl_training/api/algorithms.py - Modify: src/rl_training/api/__init__.py - Modify: src/rl_training/__init__.py - Modify: src/rl_training/algorithms/__init__.py

Step 1: Write minimal implementation - train_curl() reuses the current off-policy replay-buffer training shape for pixel continuous envs only. - support train / eval / resume / predict through registry load/evaluate/predict hooks. - add managed API class CURL and export wiring across root/api/algorithms surfaces.

Step 2: Run tests to verify it passes Run: pytest -q tests/test_curl_trainer_smoke.py tests/test_package_api_exports.py tests/test_public_api.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py Expected: PASS.

Task 4: Add config assets, example, and docs

Files: - Create: configs/curl/pendulum_pixels.yaml - Create: src/rl_training/assets/configs/curl/pendulum_pixels.yaml - Create: examples/curl_pendulum_reference.py - Modify: README.md - Modify: docs/plans/2026-03-12-rl-yearly-sourcebook-design.md

Step 1: Write minimal implementation - add a runnable curl pixel-control preset. - add a reference script for a tiny curl run. - update README and yearly sourcebook to mark CURL as implemented narrow v1 and keep scope explicit.

Step 2: Run tests to verify it passes Run: pytest -q tests/test_curl_reference_script.py tests/test_cli.py tests/test_package_smoke.py Expected: PASS.

Task 5: Regression verification

Files: - Modify only if verification reveals regressions.

Step 1: Run focused regression coverage Run: pytest -q tests/test_curl_update.py tests/test_curl_trainer_smoke.py tests/test_curl_reference_script.py tests/test_package_api_exports.py tests/test_public_api.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py tests/test_cli.py tests/test_package_smoke.py tests/test_drq_update.py tests/test_drq_trainer_smoke.py tests/test_drq_reference_script.py Expected: PASS.

Step 2: Run full suite Run: pytest -q Expected: PASS.