Metadata-Version: 2.1
Name: flash_llm_rl
Version: 0.5.2
Summary: flash llm rl
Home-page: https://github.com/yaof20/Flash-RL
Author: Lucas Liu
Author-email: llychinalz@gmail.com
License: License :: OSI Approved :: MIT License
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch
Requires-Dist: transformers
Requires-Dist: vllm
Requires-Dist: huggingface-hub

<h1 align="center">⚡ FlashRL ⚡</h1>
<p align="center"><b>Fast RL training with Quantized Rollouts </b>  
(<a href="https://fengyao.notion.site/flash-rl">Blog</a>)</p>

<p align="center">
  <img src="https://img.shields.io/badge/license-MIT-blue.svg">
  <img src="https://img.shields.io/badge/python-3.10+-blue">
  <img src="https://img.shields.io/pypi/v/flashrl?color=green">  
</p>

<p align="center">
  <a href="#what-is-flashrl">What is FlashRL?</a> •
  <a href="#-quick-start">Quick Start</a> •
  <a href="#-experiments">Experiments</a> •
  <a href="#-citation">Citation</a>
</p>

<h3 align="center" id="what-is-flashrl"><i>What is FlashRL?</i></h3>

[FlashRL](https://fengyao.notion.site/flash-rl) patches the inference package (vLLM) to enable: 1) accurate rollout logprob computation for RL training; and 2) online quantization to generate rollouts in INT8 \& FP8. 

## ⚡ Quick Start

### 1. Installation

```bash
pip install flash-llm-rl # need to be installed in all nodes in multi-node training
```

(Optional) to verify the flash-rl install:

```bash 
TODO
```

### 2. RL Logprob Patch Only
```bash 
flashrl setup --fn bf16 -o $PATH_TO_PROFILE_PT_OUTPUT

export FLASHRL_CONFIG=$PATH_TO_PROFILE_PT_OUTPUT 
# alternatively, for submitting multi-node jobs via `ray submit`
# add `FLASHRL_CONFIG: $PATH_TO_CONFIG_YAML_OUTPUT` to runtime env
# as in TODO:PUT_AN_EXAMPLE
bash ... 
```

### 3. RL Rollout Quantization -- Simple Setup
Use our pre-set quantization profiles for simple setup. 

```bash 
# for Qwen2.5-0.5B-instruct
export FLASHRL_CONFIG=LiyuanLucasLiu/Qwen2.5-0.5B-Instruct-quantized.w8a8-RedHatAI/flashrl_config.yaml

# run Qwen2.5-0.5B experiments 
cd verl & bash TODO:UPLOAD_SCRIPT 

# for Qwen2.5-32B-instruct
export FLASHRL_CONFIG=LiyuanLucasLiu/Qwen2.5-32B-quantized.w8a8/flashrl_config.yaml

# run Qwen2.5-32B experiments 
cd verl & bash TODO:UPLOAD_SCRIPT
```

### 3. More Advanced

#### 3.1 Profiling

```bash
flashrl profile -m $PATH_TO_MODEL -qm $PATH_TO_QUANTIZED_MODEL -o $PATH_TO_PROFILE_PT_OUTPUT --fn int8/fp8
```

#### 3.2 Setup

```bash
flashrl setup --fn int8/fp8/bf16 -m $PATH_TO_MODEL -p $PATH_TO_PROFILE_PT_OUTPUT -o $PATH_TO_CONFIG_YAML_OUTPUT
```

#### 3.3 RL Training

```bash
# for Qwen2.5-0.5B-instruct
export FLASHRL_CONFIG=$PATH_TO_CONFIG_YAML_OUTPUT

# run Qwen2.5-0.5B experiments 
cd verl & bash ... 

# for Qwen2.5-32B
export FLASHRL_CONFIG=$PATH_TO_CONFIG_YAML_OUTPUT
# or, alternatively, for submitting multi-node jobs via `ray submit`
# add `FLASHRL_CONFIG: $PATH_TO_CONFIG_YAML_OUTPUT` to runtime env
# as in TODO:PUT_AN_EXAMPLE

# run Qwen2.5-32B experiments 
cd verl & bash ... 
```

## Example: Accelerating DAPO-Qwen2.5-32B with INT8


## 🚧 Roadmap & Future Improvements

We're working on several improvements to Flash-RL:

- [ ] **Support of Other RL Toolkits**: Currently Flash-RL only supports `VeRL`, we are working on rolloing out support for other packages like `OpenRLHF`
- [ ] **Support of Other LLM Inference Toolkits**: Currently Flash-RL only supports `vLLM`, we are working on rolloing out support for other tollkits like `SgLang`
- [ ] **Further Throughput Optimization**: We are working on implementing efficient GPU kernels to accelerate online quantization

## 📚 Citation

If you find our work useful, please cite us:

```bibtex
@misc{yao2025offpolicy,
  title = {Your Efficient RL Framework Secretly Brings You Off-Policy RL Training},
  url = {https://fengyao.notion.site/off-policy-rl},
  author = {Yao, Feng and Liu, Liyuan and Zhang, Dinghuai and Dong, Chengyu and Shang, Jingbo and Gao, Jianfeng},
  journal = {Feng Yao's Notion},
  year = {2025},
  month = aug,
}
@misc{yao2025flashrl,
  title = {Flash-RL: Fast RL training with Quantized Rollouts},
  url = {https://fengyao.notion.site/flash-rl,
  author = {Liu, Liyuan and Yao, Feng and Zhang, Dinghuai and Dong, Chengyu and Shang, Jingbo and Gao, Jianfeng},
  journal = {Feng Yao's Notion},
  year = {2025},
  month = aug,
}
```

## Questions?

If you have any questions related to the code or the blog, feel free to reach out to us at [Liyuan Liu](llychinalz@gmail.com)
