跳转至

TQC Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Add tqc (Truncated Quantile Critics) as a first-class continuous-control algorithm with config/example support, checkpoint/eval wiring, public exports, and tests.

Architecture: Keep tqc shaped like the existing sac integration so it can reuse the project’s continuous-action workflow with minimal churn. Add a dedicated MLPTQCModel, TQC algorithm, and train_tqc() trainer, while preserving the same actor sampling and action scaling interfaces already used by sac and td3. Use quantile critics plus target quantile truncation for critic updates, but keep fixed alpha and the current replay-buffer loop to avoid broad refactors.

Tech Stack: Python 3.10, PyTorch, Gymnasium, pytest


Task 1: Add failing TQC coverage

Files: - Create: tests/test_tqc_update.py - Create: tests/test_tqc_trainer_smoke.py - Create: tests/test_tqc_reference_script.py - Modify: tests/test_cli.py - Modify: tests/test_public_api.py - Modify: tests/test_package_api_exports.py - Modify: tests/test_checkpoint_workflows.py - Modify: tests/test_experiment_manager.py

Step 1: Write the failing test

Add tests that expect: - MLPTQCModel.sample_actions() returns bounded actions and quantile_values() returns (batch, num_critics, num_quantiles) - tqc_loss() returns named metrics - TQC.update() returns named metrics - train_tqc() writes a checkpoint and metrics - CLI train --config supports algo: tqc - managed API exports TQC - evaluate_checkpoint() works on a TQC checkpoint - get_algorithm_spec("tqc") is registered - the reference script runs successfully

Step 2: Run test to verify it fails

Run: PYTHONPATH=src pytest tests/test_tqc_update.py tests/test_tqc_trainer_smoke.py tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py tests/test_tqc_reference_script.py -q Expected: FAIL because the model, algorithm, trainer, and registry wiring do not exist yet.


Task 2: Implement the TQC model and algorithm

Files: - Create: src/rl_training/models/mlp_tqc.py - Create: src/rl_training/algorithms/tqc.py - Modify: src/rl_training/models/__init__.py - Modify: src/rl_training/algorithms/__init__.py

Step 1: Write minimal implementation

Implement: - MLPTQCModel with SAC-compatible actor sampling and an ensemble of quantile critics - TQC with target model, quantile Huber critic loss, target quantile truncation, actor update, soft target updates, and checkpoint serialization

Step 2: Run focused tests

Run: PYTHONPATH=src pytest tests/test_tqc_update.py -q Expected: PASS


Task 3: Add the TQC trainer

Files: - Create: src/rl_training/runtime/tqc_trainer.py

Step 1: Write minimal implementation

Implement train_tqc() by following the existing train_sac() workflow: - infer continuous env spaces - sample normalized actions from the actor - scale to env actions - store normalized actions in replay - train from replay using TQC.update() - evaluate with deterministic actor actions - save checkpoints and metrics

Step 2: Run focused tests

Run: PYTHONPATH=src pytest tests/test_tqc_trainer_smoke.py -q Expected: PASS


Task 4: Wire registry + managed API + exports

Files: - Modify: src/rl_training/experiment/registry.py - Modify: src/rl_training/api/algorithms.py - Modify: src/rl_training/api/__init__.py - Modify: src/rl_training/__init__.py - Modify: src/rl_training/algorithms/__init__.py

Step 1: Registry load/evaluate/predict

Add: - _load_tqc_algorithm() - _evaluate_tqc() - _predict_tqc() - _ALGORITHM_REGISTRY["tqc"]

Step 2: Managed API exports

Add class TQC(ManagedAlgorithm) with algo_name = "tqc" and export it from public packages.

Step 3: Run focused tests

Run: PYTHONPATH=src pytest tests/test_checkpoint_workflows.py tests/test_experiment_manager.py tests/test_package_api_exports.py tests/test_public_api.py -q Expected: PASS


Task 5: Add config + example + CLI coverage

Files: - Create: configs/tqc/pendulum.yaml - Create: examples/tqc_pendulum_reference.py - Modify: tests/test_cli.py - Create: tests/test_tqc_reference_script.py

Step 1: Write minimal implementation

Add a Pendulum config and matching reference script with TQC-specific hyperparameters such as num_critics, num_quantiles, and top_quantiles_to_drop_per_net.

Step 2: Run focused tests

Run: PYTHONPATH=src pytest tests/test_cli.py tests/test_tqc_reference_script.py -q Expected: PASS


Task 6: Verify end-to-end

Files: - Verify only

Step 1: Run targeted verification

Run: PYTHONPATH=src pytest tests/test_tqc_update.py tests/test_tqc_trainer_smoke.py tests/test_cli.py tests/test_public_api.py tests/test_package_api_exports.py tests/test_checkpoint_workflows.py tests/test_experiment_manager.py tests/test_tqc_reference_script.py -q Expected: PASS

Step 2: Run full verification

Run: PYTHONPATH=src pytest -q Expected: PASS