TQC Implementation Plan¶
For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Goal: Add tqc (Truncated Quantile Critics) as a first-class continuous-control algorithm with config/example support, checkpoint/eval wiring, public exports, and tests.
Architecture: Keep tqc shaped like the existing sac integration so it can reuse the project’s continuous-action workflow with minimal churn. Add a dedicated MLPTQCModel, TQC algorithm, and train_tqc() trainer, while preserving the same actor sampling and action scaling interfaces already used by sac and td3. Use quantile critics plus target quantile truncation for critic updates, but keep fixed alpha and the current replay-buffer loop to avoid broad refactors.
Tech Stack: Python 3.10, PyTorch, Gymnasium, pytest
Task 1: Add failing TQC coverage¶
Files: - Create: tests/test_tqc_update.py - Create: tests/test_tqc_trainer_smoke.py - Create: tests/test_tqc_reference_script.py - Modify: tests/test_cli.py - Modify: tests/test_public_api.py - Modify: tests/test_package_api_exports.py - Modify: tests/test_checkpoint_workflows.py - Modify: tests/test_experiment_manager.py
Step 1: Write the failing test
Add tests that expect: - MLPTQCModel.sample_actions() returns bounded actions and quantile_values() returns (batch, num_critics, num_quantiles) - tqc_loss() returns named metrics - TQC.update() returns named metrics - train_tqc() writes a checkpoint and metrics - CLI train --config supports algo: tqc - managed API exports TQC - evaluate_checkpoint() works on a TQC checkpoint - get_algorithm_spec("tqc") is registered - the reference script runs successfully
Step 2: Run test to verify it fails
Run: PYTHONPATH=src pytest tests/test_tqc_update.py tests/test_tqc_trainer_smoke.py tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py tests/test_tqc_reference_script.py -q Expected: FAIL because the model, algorithm, trainer, and registry wiring do not exist yet.
Task 2: Implement the TQC model and algorithm¶
Files: - Create: src/rl_training/models/mlp_tqc.py - Create: src/rl_training/algorithms/tqc.py - Modify: src/rl_training/models/__init__.py - Modify: src/rl_training/algorithms/__init__.py
Step 1: Write minimal implementation
Implement: - MLPTQCModel with SAC-compatible actor sampling and an ensemble of quantile critics - TQC with target model, quantile Huber critic loss, target quantile truncation, actor update, soft target updates, and checkpoint serialization
Step 2: Run focused tests
Run: PYTHONPATH=src pytest tests/test_tqc_update.py -q Expected: PASS
Task 3: Add the TQC trainer¶
Files: - Create: src/rl_training/runtime/tqc_trainer.py
Step 1: Write minimal implementation
Implement train_tqc() by following the existing train_sac() workflow: - infer continuous env spaces - sample normalized actions from the actor - scale to env actions - store normalized actions in replay - train from replay using TQC.update() - evaluate with deterministic actor actions - save checkpoints and metrics
Step 2: Run focused tests
Run: PYTHONPATH=src pytest tests/test_tqc_trainer_smoke.py -q Expected: PASS
Task 4: Wire registry + managed API + exports¶
Files: - Modify: src/rl_training/experiment/registry.py - Modify: src/rl_training/api/algorithms.py - Modify: src/rl_training/api/__init__.py - Modify: src/rl_training/__init__.py - Modify: src/rl_training/algorithms/__init__.py
Step 1: Registry load/evaluate/predict
Add: - _load_tqc_algorithm() - _evaluate_tqc() - _predict_tqc() - _ALGORITHM_REGISTRY["tqc"]
Step 2: Managed API exports
Add class TQC(ManagedAlgorithm) with algo_name = "tqc" and export it from public packages.
Step 3: Run focused tests
Run: PYTHONPATH=src pytest tests/test_checkpoint_workflows.py tests/test_experiment_manager.py tests/test_package_api_exports.py tests/test_public_api.py -q Expected: PASS
Task 5: Add config + example + CLI coverage¶
Files: - Create: configs/tqc/pendulum.yaml - Create: examples/tqc_pendulum_reference.py - Modify: tests/test_cli.py - Create: tests/test_tqc_reference_script.py
Step 1: Write minimal implementation
Add a Pendulum config and matching reference script with TQC-specific hyperparameters such as num_critics, num_quantiles, and top_quantiles_to_drop_per_net.
Step 2: Run focused tests
Run: PYTHONPATH=src pytest tests/test_cli.py tests/test_tqc_reference_script.py -q Expected: PASS
Task 6: Verify end-to-end¶
Files: - Verify only
Step 1: Run targeted verification
Run: PYTHONPATH=src pytest tests/test_tqc_update.py tests/test_tqc_trainer_smoke.py tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py tests/test_tqc_reference_script.py -q Expected: PASS
Step 2: Run full verification
Run: PYTHONPATH=src pytest -q Expected: PASS