跳转至

IQL Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Add the IQL algorithm with offline training, checkpoint evaluation, prediction, and resume support for continuous-control tasks.

Architecture: Implement a dedicated MLPIQLModel, IQL algorithm, and offline train_iql(...) trainer that samples from TransitionDataset instead of the online replay buffer. Keep the first version narrow by generating a deterministic offline dataset from the configured environment and storing the dataset recipe in algo_kwargs, so training, resume, CLI, and examples all work without introducing a full external dataset ingestion layer yet.

Tech Stack: Python, PyTorch, Gymnasium, pytest


Task 1: Define IQL model and update behavior with failing tests

Files: - Create: tests/test_iql_update.py - Reference: tests/test_sac_update.py - Reference: tests/test_redq_update.py

Step 1: Write the failing test

Add tests that define: - the IQL MLP model samples bounded actions and returns twin-Q plus value outputs - iql_loss(...) returns named metrics - IQL.update(...) returns one gradient step and named metrics - invalid expectile / beta values fail fast at construction

Step 2: Run test to verify it fails

Run: PYTHONPATH=src pytest tests/test_iql_update.py -q

Expected: FAIL because the IQL modules do not exist yet.

Step 3: Write minimal implementation

Create the IQL model and algorithm with the smallest implementation that satisfies the tests.

Step 4: Run test to verify it passes

Run: PYTHONPATH=src pytest tests/test_iql_update.py -q

Expected: PASS

Task 2: Add offline trainer coverage with deterministic dataset generation

Files: - Create: src/rl_training/runtime/iql_trainer.py - Create: tests/test_iql_trainer_smoke.py - Modify: tests/test_offline_dataset.py

Step 1: Write the failing tests

Add smoke coverage that trains IQL on Pendulum-v1 using a deterministic offline dataset recipe from algo_kwargs, then asserts checkpoint creation and basic metrics. Add a small offline-dataset test that proves float continuous actions preserve shape for sampled batches if needed by the trainer.

Step 2: Run tests to verify they fail

Run: PYTHONPATH=src pytest tests/test_iql_trainer_smoke.py tests/test_offline_dataset.py -q

Expected: FAIL because train_iql(...) and any helper behavior do not exist yet.

Step 3: Write minimal implementation

Implement: - a deterministic offline dataset builder inside the trainer - normalized-action handling for continuous actions - offline update loop where total_timesteps means gradient steps - evaluation helper for checkpoint workflows

Step 4: Run tests to verify they pass

Run: PYTHONPATH=src pytest tests/test_iql_trainer_smoke.py tests/test_offline_dataset.py -q

Expected: PASS

Task 3: Wire IQL into registry, API, CLI, examples, and checkpoint workflows

Files: - Create: configs/iql/pendulum.yaml - Create: examples/iql_pendulum_reference.py - Create: tests/test_iql_reference_script.py - Modify: src/rl_training/models/__init__.py - Modify: src/rl_training/algorithms/__init__.py - Modify: src/rl_training/api/algorithms.py - Modify: src/rl_training/api/__init__.py - Modify: src/rl_training/__init__.py - Modify: src/rl_training/experiment/registry.py - Modify: tests/test_cli.py - Modify: tests/test_public_api.py - Modify: tests/test_package_api_exports.py - Modify: tests/test_checkpoint_workflows.py - Modify: tests/test_experiment_manager.py

Step 1: Write the failing tests

Add focused coverage for: - CLI train command with an IQL YAML config - managed public API learn/evaluate/predict flow - package export visibility - checkpoint evaluation and resume workflow - experiment manager spec registration - example script smoke execution

Step 2: Run tests to verify they fail

Run: PYTHONPATH=src pytest tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py tests/test_iql_reference_script.py -q

Expected: FAIL because IQL is not yet registered end-to-end.

Step 3: Write minimal implementation

Register IQL in all public surfaces and add a compact reference config/example using deterministic offline dataset generation.

Step 4: Run tests to verify they pass

Run: PYTHONPATH=src pytest tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py tests/test_iql_reference_script.py -q

Expected: PASS

Task 4: Final verification

Files: - Verify only

Step 1: Run targeted IQL suite

Run: PYTHONPATH=src pytest tests/test_iql_update.py tests/test_iql_trainer_smoke.py tests/test_iql_reference_script.py tests/test_offline_dataset.py tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py -q

Expected: PASS

Step 2: Run full test suite

Run: PYTHONPATH=src pytest -q

Expected: PASS