CQL Implementation Plan¶
For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Goal: Add the Conservative Q-Learning (CQL) algorithm with offline training, checkpoint evaluation, prediction, and resume support for continuous-control tasks.
Architecture: Reuse the existing MLPSACModel actor/twin-critic stack and add a dedicated CQL algorithm plus offline train_cql(...) trainer. Reuse the deterministic offline dataset helpers introduced for IQL/TD3+BC, and reuse SAC-style policy evaluation so the first version stays aligned with the project’s existing continuous-control and checkpointing architecture.
Tech Stack: Python, PyTorch, Gymnasium, pytest
Task 1: Define CQL update behavior with failing tests¶
Files: - Create: tests/test_cql_update.py - Reference: tests/test_sac_update.py - Reference: tests/test_iql_update.py - Reference: tests/test_td3_bc_update.py
Step 1: Write the failing test
Add tests that define: - cql_loss(...) returns named metrics - invalid cql_alpha / num_cql_samples values fail fast - CQL.update(...) returns one gradient step and named metrics
Step 2: Run test to verify it fails
Run: PYTHONPATH=src pytest tests/test_cql_update.py -q
Expected: FAIL because the CQL algorithm module does not exist yet.
Step 3: Write minimal implementation
Create the CQL algorithm by reusing MLPSACModel, standard SAC targets, and a conservative critic penalty over policy-sampled and uniform random actions.
Step 4: Run test to verify it passes
Run: PYTHONPATH=src pytest tests/test_cql_update.py -q
Expected: PASS
Task 2: Add offline CQL trainer coverage¶
Files: - Create: src/rl_training/runtime/cql_trainer.py - Create: tests/test_cql_trainer_smoke.py - Reference: src/rl_training/runtime/sac_trainer.py - Reference: src/rl_training/runtime/iql_trainer.py - Reference: src/rl_training/runtime/td3_bc_trainer.py
Step 1: Write the failing test
Add a smoke test that trains CQL for a short offline Pendulum-v1 run using the deterministic dataset recipe, then asserts checkpoint creation and basic metrics.
Step 2: Run test to verify it fails
Run: PYTHONPATH=src pytest tests/test_cql_trainer_smoke.py -q
Expected: FAIL because train_cql(...) does not exist yet.
Step 3: Write minimal implementation
Implement the offline trainer loop where total_timesteps means gradient steps, and reuse the existing deterministic offline dataset helper and SAC evaluation path.
Step 4: Run test to verify it passes
Run: PYTHONPATH=src pytest tests/test_cql_trainer_smoke.py -q
Expected: PASS
Task 3: Wire CQL into registry, API, CLI, examples, and checkpoint workflows¶
Files: - Create: configs/cql/pendulum.yaml - Create: examples/cql_pendulum_reference.py - Create: tests/test_cql_reference_script.py - Modify: src/rl_training/algorithms/__init__.py - Modify: src/rl_training/api/algorithms.py - Modify: src/rl_training/api/__init__.py - Modify: src/rl_training/__init__.py - Modify: src/rl_training/experiment/registry.py - Modify: tests/test_cli.py - Modify: tests/test_public_api.py - Modify: tests/test_package_api_exports.py - Modify: tests/test_checkpoint_workflows.py - Modify: tests/test_experiment_manager.py
Step 1: Write the failing tests
Add focused coverage for: - CLI train command with a CQL YAML config - managed public API learn/evaluate/predict flow - package export visibility - checkpoint evaluation and resume workflow - experiment manager spec registration - example script smoke execution
Step 2: Run tests to verify they fail
Run: PYTHONPATH=src pytest tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py tests/test_cql_reference_script.py -q
Expected: FAIL because CQL is not yet registered end-to-end.
Step 3: Write minimal implementation
Register CQL in all public surfaces and add a compact reference config/example using the deterministic offline dataset recipe.
Step 4: Run tests to verify they pass
Run: PYTHONPATH=src pytest tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py tests/test_cql_reference_script.py -q
Expected: PASS
Task 4: Final verification¶
Files: - Verify only
Step 1: Run targeted CQL suite
Run: PYTHONPATH=src pytest tests/test_cql_update.py tests/test_cql_trainer_smoke.py tests/test_cql_reference_script.py tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py -q
Expected: PASS
Step 2: Run full test suite
Run: PYTHONPATH=src pytest -q
Expected: PASS