Metadata-Version: 2.4
Name: veomni
Version: 0.0.1
Summary: VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework
Home-page: https://github.com/ByteDance-Seed/VeOmni
Author: Bytedance - Seed - MLSys
Author-email: Fazzie <maqianli.fazzie@bytedance.com>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/ByteDance-Seed/VeOmni
Project-URL: Issues, https://github.com/ByteDance-Seed/VeOmni/issues
Requires-Python: >=3.8.0
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: byted-hdfs-iodatasets<=2.21.0,>=2.16.0
Requires-Dist: diffusers<=0.31.0,>=0.30.0
Requires-Dist: packaging<26.0,>=23.0
Requires-Dist: torchdata<1.0,>=0.8.0
Requires-Dist: transformers[torch]<=4.49.0,>=4.46.2
Requires-Dist: tiktoken>=0.9.0
Requires-Dist: blobfile>=3.0.0
Provides-Extra: dev
Requires-Dist: pre-commit<5.0,>=4.0.0; extra == "dev"
Requires-Dist: ruff<1.0,>=0.7.0; extra == "dev"
Requires-Dist: pytest<8.0,>=6.0.0; extra == "dev"
Requires-Dist: expecttest>=0.3.0; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python


<div align="center">

<img src="./assets/logo.png" width="50%">

## VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework

<p align="center">
  <a href="https://github.com/ByteDance-Seed/VeOmni/stargazers">
    <img src="https://img.shields.io/github/stars/ByteDance-Seed/VeOmni?style=social"></a>
  <a href="https://github.com/ByteDance-Seed/VeOmni">
    <img src="https://img.shields.io/badge/VeOmni-Project Page-yellow"></a>
  <a href="https://arxiv.org/pdf/2502.19811">
    <img src="https://img.shields.io/badge/VeOmni-Tech Report-red"></a>
  <a href="XXXX">
    <img src="https://img.shields.io/badge/VeOmni-Hugging Face-orange"></a>
  <br>
  <a href="https://github.com/user-attachments/assets/d3fcb3bf-466b-4efe-8c3f-5f85258202ae">
    <img src="https://img.shields.io/badge/VeOmni-Wechat Communication Group-07C160"></a>
  <a href="XXX">
    <img src="https://img.shields.io/badge/License-Apache--2.0-blue"></a>
</p>

</div>

## News
<br>
[2025/04/03]🔥We release VeOmni.

## Overview
VeOmni is a versatile single-modal/multi-modal pre-training/post-training framework. With VeOmni, users can easily scale any modality model to any accelerator, making it flexible and user-friendly.

Our guiding principles when building VeOmni are:
- **Flexibility and Modularity**: The framework is designed to be modular, most piece within the framework can be decoupled and allowing users to replace with their own implementation.
- **No trainer**: Deprecate structured Trainer classes like [PyTorch-Lightning](https://github.com/Lightning-AI/pytorch-lightning) or [HuggingFace](https://huggingface.co/docs/transformers/v4.50.0/en/main_classes/trainer#transformers.Trainer) Trainer instead, keep training scripts linear to expose the complete training logic to users
- **Omni model native**: Users can scale any omni model easily.
- **Torch native**: We wish to implement each part of the framework by making full use of the native functions of torch as much as possible.



## Table of Contents

- [VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework](#veomni-scaling-any-modality-model-training-to-any-accelerators-with-pytorch-native-training-framework)
- [News](#news)
- [Overview](#overview)
- [Table of Contents](#table-of-contents)
- [Key Features](#key-features)
  - [Upcoming Features](#upcoming-features)
- [Getting Started](#getting-started)
  - [Installation](#installation)
  - [Quick Start](#quick-start)
  - [Merge checkpoints](#merge-checkpoints)
  - [Build Docker](#build-docker)
- [Training Examples](#training-examples)
- [Supported Models](#supported-models)
- [Performance](#performance)
- [Acknowledgement](#acknowledgement)
- [Citation](#citation)
- [Awesome work using VeOmni](#awesome-work-using-veomni)
- [Contribution Guide](#contribution-guide)
- [About ByteDance Seed Team](#about-bytedance-seed-team)


## Key Features

- **Parallelism**
    - Parallel state by [DeviceMesh](https://pytorch.org/tutorials/recipes/distributed_device_mesh.html)
    - Torch FSDP1/2
    - Experts parallelism(Experimental)
    - Easy to add new parallelism plan
    - Sequence parallelism
      - [Ulysess](https://arxiv.org/abs/2309.14509)
      - Async ulysses
    - Activation offloading
    - Activation checkpointing
- **Kernels**
    - GroupGemm ops for moe
    - [Liger-Kernel](https://github.com/linkedin/Liger-Kernel) integrations
- **Model**
  - Any [transformers](https://github.com/huggingface/transformers) models.
  - Multi-modal
    - Qwen2VL
    - Seed-Omni
    - ...
- **Data IO**
    - Dynamic batching strategy
    - Omnidata processing
- **Distributed Checkpointing**
  - [ByteCheckpoint](https://arxiv.org/abs/2407.20143)(Recommend)
  - Torch Distributed checkpointing
  - Dcp merge tools
- **Other tools**
  - Profiling tools
  - Easy yaml configuration and argument parsing

### Upcoming Features
- [ ] [veScale](https://github.com/volcengine/veScale/tree/main) FSDP
- [ ] Torch native parallelism
- [ ] torch.compile
- [ ] [Flux: Fine-grained Computation-communication Overlapping GPU Kernel](https://github.com/bytedance/flux/tree/main/test) integrations
- [ ] Better offloading strategy
- [ ] More models support
- [ ] Torch native pipeline parallelism


## Getting Started
Read the [VeOmni Best Practice](docs/start/best_practice.md) for more details.

### Installation
```shell
pip3 install -e .
```

- Install veScale(Not available yet)
```shell
git clone https://github.com/volcengine/veScale.git
pip3 install .
```

### Quick Start
Start training like this:

```shell
bash train.sh $TRAIN_SCRIPT $CONFIG.yaml
```

You can override arguments in yaml by passing arguments from an external command line
```shell
bash train.sh $TRAIN_SCRIPT $CONFIG.yaml \
    --model.model_path PATH/TO/MODEL \
    --data.train_path PATH/TO/DATA \
    --train.global_batch_size GLOBAL_BATCH_SIZE \
```

Here is an end-to-end workflow for preparing a subset of the fineweb dataset, continuing training a qwen2_5 model with sequence parallel 2 for 20 steps, and then merging the global_step_10 distributed checkpoint to hf weight by ByteCheckpoint.

- Download fineweb dataset
```shell
python3 scripts/download_hf_data.py \
  --repo_id HuggingFaceFW/fineweb \
  --local_dir ./fineweb/ \
  --allow_patterns sample/10BT/*
```

- Download qwen2_5 model
```shell
python3 scripts/download_hf_model.py \
  --repo_id Qwen/Qwen2.5-7B \
  --local_dir .
```

- Start training
```shell
bash train.sh tasks/train_torch.py configs/pretrain/qwen2_5.yaml \
    --model.model_path ./Qwen2.5-7B \
    --data.train_path ./fineweb/sample/10BT/ \
    --train.global_batch_size 512 \
    --train.lr 5e-7 \
    --train.ulysses_parallel_size 2 \
    --train.save_steps 10 \
    --train.max_steps 20 \
    --train.output_dir Qwen2.5-7B_CT
```

- merge checkpoints
```shell
python3 scripts/mereg_dcp_to_hf.py \
    --load-dir Qwen2.5-7B-Instruct_CT/checkpoints/global_step_10 \
    --model_assets_dir Qwen2.5-7B-Instruct_CT/model_assets \
    --save-dir Qwen2.5-7B-Instruct_CT/checkpoints/global_step_10/hf_ckpt
```

- test inference
```shell
python3 tasks/infer.py \
  --infer.model_path Qwen2.5-7B-Instruct_CT/checkpoints/global_step_10/hf_ckpt
```


### Merge checkpoints
we use ByteCheckpoint to save checkpoints in torch.distributed.checkpoint(dcp) format. you can merge dcp file by this command:

```shell
python3 scripts/mereg_dcp_to_hf.py \
    --load-dir PATH/TO/CHECKPOINTS \
    --model_assets_dir PATH/TO/MODEL_ASSETS \
    --save-dir PATH/TO/SAVE_HF_WEIGHT \
```

for example, your output_dir is like this `seed_omni`, you want to merge global_step_100 checkpoint to hf weight.

```shell
python3 scripts/mereg_dcp_to_hf.py \
    --load-dir seed_omni/checkpoints/global_step_100 \
    --model_assets_dir seed_omni/model_assets \
    --save-dir seed_omni/hf_ckpt \
```

### Build Docker
```shell
cd docker/
docker compose up -d
docker compose exec VeOmni bash
```

## Training Examples
- PyTorch FSDP2 Qwen2VL
```shell
bash train.sh tasks/multimodal/omni/train_qwen2_vl.py configs/multimodal/qwen2_vl/qwen2_vl.yaml
```

- PyTorch FSDP2 Qwen2 CT

```shell
bash train.sh tasks/train_torch.py  configs/pretrain/qwen2_5.yaml
```

- PyTorch FSDP2 llama3-8b-instruct CT
```shell
bash train.sh  tasks/train_torch.py configs/pretrain/llama3.yaml
```

## Supported Models
| Model                                                             | Model size                       | Example config File                                       |
| ----------------------------------------------------------------- | -------------------------------- | --------------------------------------------------------- |
| [DeepSeek 2.5/3/R1](https://huggingface.co/deepseek-ai)           | 236B/671B                        | [deepseek.yaml](configs/pretrain/deepseek.yaml)           |
| [Llama 3-3.3](https://huggingface.co/meta-llama)                  | 1B/3B/8B/70B                     | [llama3.yaml](configs/pretrain/llama3.yaml)               |
| [Qwen 2-2.5](https://huggingface.co/Qwen)                         | 0.5B/1.5B/3B/7B/14B/32B/72B/     | [qwen2_5.yaml](configs/pretrain/qwen2_5.yaml)       |
| [Qwen2-VL/Qwen2.5-VL/QVQ](https://huggingface.co/Qwen)            | 2B/3B/7B/32B/72B                 | [qwen2_vl.yaml](configs/multimodal/qwen2_vl/qwen2_vl.yaml)|
| Seed_omni                                      | any foundation model with any omni encoder&&decoder | [seed_omni.yaml](configs/multimodal/omni/seed_omni.yaml)  |


> VeOmni Support all [transformers](https://github.com/huggingface/transformers) models if you don't need sequence parallelism or experts parallelism or other parallelism and cuda kernal optimize in VeOmni. We design a [model registry mechanism](veomni/models/registry.py). When the model is registered in veomni, we will automatically load the model and optimizer in VeOmni. Otherwise, it will default to load the modeling file in transformers.

> If you want to add a new model, you can add a new model in the model registry. See in [Support costom model]() docs.

## Performance
Coming soon with tech report.

## Acknowledgement
Thanks to the following projects for their excellent work:
- [veScale](https://github.com/volcengine/veScale)
- [torchtitan](https://github.com/pytorch/torchtitan/)
- [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)
- [torchtune](https://github.com/pytorch/torchtune)
- [Liger-Kernel](https://github.com/linkedin/Liger-Kernel)
- [ByteCheckpoint](https://arxiv.org/abs/2407.20143)


## Citation
If you find VeOmni useful for your research and applications, feel free to give us a star ⭐ or cite us using:

```bibtex
@software{VeOmni,
      title={VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework},
      author={Qianli Ma, Yaowei Zheng, Zhelun Shi, Zhongkai Zhao, Bin jia, Ziyue Huang, Zhi Zhang},
      year={2025},
      howpublished={GitHub repository},
      publisher={ByteDance Seed},
      url={https://github.com/ByteDance-Seed/VeOmni},
}
```

## Awesome work using VeOmni
- [UI-TARS](https://github.com/bytedance/UI-TARS)

## Contribution Guide
Contributions from the community are welcome! Please check out [CONTRIBUTING.md](CONTRIBUTING.md) our project roadmap(To be updated),


## About [ByteDance Seed Team](https://team.doubao.com/)

![seed logo](https://github.com/user-attachments/assets/c42e675e-497c-4508-8bb9-093ad4d1f216)

Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.

You can get to know us better through the following channels👇
<p align="center">
  <br>
  <a href="https://team.doubao.com/">
    <img src="https://img.shields.io/badge/Website-%231e37ff?style=for-the-badge&logo=bytedance&logoColor=white"></a>
  <a href="https://github.com/user-attachments/assets/93481cda-a7f3-47f3-b333-fe6b3da86b78">
    <img src="https://img.shields.io/badge/WeChat-07C160?style=for-the-badge&logo=wechat&logoColor=white"></a>
 <a href="https://www.xiaohongshu.com/user/profile/668e7e15000000000303157d?xsec_token=ABl2-aqekpytY6A8TuxjrwnZskU-6BsMRE_ufQQaSAvjc%3D&xsec_source=pc_search">
    <img src="https://img.shields.io/badge/Xiaohongshu-%23FF2442?style=for-the-badge&logo=xiaohongshu&logoColor=white"></a>
  <a href="https://www.zhihu.com/org/dou-bao-da-mo-xing-tuan-dui/">
    <img src="https://img.shields.io/badge/zhihu-%230084FF?style=for-the-badge&logo=zhihu&logoColor=white"></a>
</p>
