Metadata-Version: 2.4
Name: sbx-rl
Version: 0.26.0
Summary: Jax version of Stable Baselines, implementations of reinforcement learning algorithms.
Home-page: https://github.com/araffin/sbx
Author: Antonin Raffin
Author-email: antonin.raffin@dlr.de
License: MIT
Keywords: reinforcement-learning-algorithms reinforcement-learning machine-learning gym gymnasium jax openai stable baselines toolbox python data-science
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Requires-Dist: stable_baselines3<3.0,>=2.8.0a0
Requires-Dist: jax<0.9.0,>=0.4.24
Requires-Dist: jaxlib
Requires-Dist: flax
Requires-Dist: optax
Requires-Dist: tqdm
Requires-Dist: rich
Requires-Dist: tfp-nightly>=0.26.0.dev20250831
Provides-Extra: tests
Requires-Dist: pytest; extra == "tests"
Requires-Dist: pytest-cov; extra == "tests"
Requires-Dist: pytest-env; extra == "tests"
Requires-Dist: pytest-xdist; extra == "tests"
Requires-Dist: mypy; extra == "tests"
Requires-Dist: ruff>=0.3.1; extra == "tests"
Requires-Dist: black<27,>=26.1.0; extra == "tests"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary



# Stable Baselines Jax (SB3 + JAX = SBX)

See https://github.com/araffin/sbx

Proof of concept version of [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3) in Jax.

Implemented algorithms:
- [Soft Actor-Critic (SAC)](https://arxiv.org/abs/1801.01290) and [SAC-N](https://arxiv.org/abs/2110.01548)
- [Truncated Quantile Critics (TQC)](https://arxiv.org/abs/2005.04269)
- [Dropout Q-Functions for Doubly Efficient Reinforcement Learning (DroQ)](https://openreview.net/forum?id=xCVJMsPv3RT)
- [Proximal Policy Optimization (PPO)](https://arxiv.org/abs/1707.06347)
- [Deep Q Network (DQN)](https://arxiv.org/abs/1312.5602)
- [Twin Delayed DDPG (TD3)](https://arxiv.org/abs/1802.09477)
- [Deep Deterministic Policy Gradient (DDPG)](https://arxiv.org/abs/1509.02971)
- [Batch Normalization in Deep Reinforcement Learning (CrossQ)](https://openreview.net/forum?id=PczQtTsTIX)
- [Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning (SimBa)](https://openreview.net/forum?id=jXLiDKsuDo)

## Example

```python
from sbx import DDPG, DQN, PPO, SAC, TD3, TQC, CrossQ

model = TQC("MlpPolicy", "Pendulum-v1", verbose=1)
model.learn(total_timesteps=10_000, progress_bar=True)

