Metadata-Version: 2.4
Name: diffgentor
Version: 0.1.1
Summary: A unified visual generation data synthesis factory supporting multiple backends (diffusers, xDiT, OpenAI API)
Project-URL: Homepage, https://github.com/ruihanglix/diffgentor
Project-URL: Documentation, https://github.com/ruihanglix/diffgentor#readme
Author: diffgentor team
License: Apache-2.0
License-File: LICENSE
Keywords: diffusion,image-editing,image-generation,text-to-image
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: datasets==4.4.1
Requires-Dist: diffusers==0.36.0
Requires-Dist: huggingface-hub==0.36.0
Requires-Dist: tokenizers==0.22.1
Requires-Dist: torch==2.8.0
Requires-Dist: torchaudio==2.8.0
Requires-Dist: torchvision==0.23.0
Requires-Dist: transformers==4.57.3
Provides-Extra: all
Requires-Dist: bitsandbytes; extra == 'all'
Requires-Dist: cache-dit; extra == 'all'
Requires-Dist: deepcache; extra == 'all'
Requires-Dist: distvae>=0.0.0b5; extra == 'all'
Requires-Dist: einops; extra == 'all'
Requires-Dist: google-genai; extra == 'all'
Requires-Dist: liger-kernel; extra == 'all'
Requires-Dist: omegaconf; extra == 'all'
Requires-Dist: openai; extra == 'all'
Requires-Dist: opencv-python-headless; extra == 'all'
Requires-Dist: para-attn; extra == 'all'
Requires-Dist: peft; extra == 'all'
Requires-Dist: protobuf; extra == 'all'
Requires-Dist: qwen-vl-utils; extra == 'all'
Requires-Dist: safetensors; extra == 'all'
Requires-Dist: sentencepiece==0.2.1; extra == 'all'
Requires-Dist: tiktoken; extra == 'all'
Requires-Dist: torchao; extra == 'all'
Requires-Dist: xformers; extra == 'all'
Requires-Dist: xfuser; extra == 'all'
Provides-Extra: bagel
Requires-Dist: einops; extra == 'bagel'
Requires-Dist: opencv-python-headless; extra == 'bagel'
Requires-Dist: safetensors; extra == 'bagel'
Requires-Dist: sentencepiece==0.2.1; extra == 'bagel'
Provides-Extra: cache-dit
Requires-Dist: cache-dit; extra == 'cache-dit'
Provides-Extra: deepcache
Requires-Dist: deepcache; extra == 'deepcache'
Provides-Extra: dev
Requires-Dist: black; extra == 'dev'
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Provides-Extra: dreamomni2
Requires-Dist: peft; extra == 'dreamomni2'
Requires-Dist: protobuf; extra == 'dreamomni2'
Provides-Extra: emu35
Requires-Dist: einops; extra == 'emu35'
Requires-Dist: omegaconf; extra == 'emu35'
Requires-Dist: tiktoken; extra == 'emu35'
Provides-Extra: flash-attn
Requires-Dist: flash-attn; extra == 'flash-attn'
Provides-Extra: flux-kontext
Requires-Dist: einops; extra == 'flux-kontext'
Requires-Dist: opencv-python-headless; extra == 'flux-kontext'
Requires-Dist: safetensors; extra == 'flux-kontext'
Provides-Extra: google-genai
Requires-Dist: google-genai; extra == 'google-genai'
Provides-Extra: openai
Requires-Dist: openai; extra == 'openai'
Provides-Extra: para-attn
Requires-Dist: para-attn; extra == 'para-attn'
Provides-Extra: quantization
Requires-Dist: bitsandbytes; extra == 'quantization'
Requires-Dist: torchao; extra == 'quantization'
Provides-Extra: step1x
Requires-Dist: distvae>=0.0.0b5; extra == 'step1x'
Requires-Dist: einops; extra == 'step1x'
Requires-Dist: liger-kernel; extra == 'step1x'
Requires-Dist: qwen-vl-utils; extra == 'step1x'
Requires-Dist: xfuser; extra == 'step1x'
Provides-Extra: xdit
Requires-Dist: xfuser; extra == 'xdit'
Provides-Extra: xformers
Requires-Dist: xformers; extra == 'xformers'
Description-Content-Type: text/markdown

# Diffgentor

A unified visual generation data synthesis tool for batch image generation and editing, designed for GenArena evaluation and beyond.

## Abstract

Diffgentor is an efficient pipeline for batch image generation using various image generation and editing models. It supports multiple backends including diffusers, OpenAI API, Google GenAI (Gemini), and third-party models like Step1X-Edit, BAGEL, and Emu3.5.

Key features:
- **Multiple Backends**: diffusers, xDiT (multi-GPU), OpenAI, Google GenAI, and third-party models
- **Batch Processing**: Efficient batch inference with multi-process/multi-thread support
- **GenArena Integration**: Generate model outputs for GenArena pairwise evaluation
- **Optimization Suite**: VAE slicing/tiling, torch.compile, attention backends, and more

## Quick Start

### Installation

**Option 1: pip install**

```bash
# Core installation (diffusers, OpenAI, Google GenAI backends)
pip install diffgentor

# Install with all optional backends
pip install "diffgentor[all]"
```

> **GPU users**: PyPI's default torch package is CPU-only. To use CUDA-enabled PyTorch, add the PyTorch index:
> ```bash
> pip install diffgentor --extra-index-url https://download.pytorch.org/whl/cu126
> ```

> **flash-attn**: The `flash-attn` optional dependency requires CUDA compilation. It is recommended to install a pre-built wheel manually:
> ```bash
> pip install flash-attn --no-build-isolation
> ```
> Or download a pre-built wheel from the [flash-attention releases](https://github.com/Dao-AILab/flash-attention/releases).

**Option 2: From source (for development)**

```bash
git clone https://github.com/ruihanglix/diffgentor.git
cd diffgentor
pip install -e ".[all]"
```

### Download GenArena Dataset

```bash
hf download rhli/genarena --repo-type dataset --local-dir ./data
```

### Generate Images for MultiRef Subset

Example using FLUX.2 [klein] 4B model:

```bash
diffgentor edit --backend diffusers \
    --model_name black-forest-labs/FLUX.2-klein-4B \
    --input ./data/multiref/ \
    --output_dir ./output/multiref/FLUX2-klein-4B/
```

## Supported Backends

| Backend | Type | Description |
|---------|------|-------------|
| `diffusers` | T2I / Editing | HuggingFace diffusers with auto pipeline detection |
| `xdit` | T2I | Multi-GPU inference with xDiT parallelism |
| `openai` | T2I / Editing | OpenAI API (GPT-Image, DALL-E) |
| `google_genai` | T2I / Editing | Google GenAI (Gemini native image models) |
| `step1x` | Editing | Step1X-Edit model |
| `bagel` | Editing | ByteDance BAGEL model |
| `emu35` | Editing | BAAI Emu3.5 model |
| `dreamomni2` | Editing | DreamOmni2 (FLUX.1-Kontext + Qwen2.5-VL) |
| `flux_kontext_official` | Editing | BFL official Flux Kontext |
| `hunyuan_image_3` | Editing | Tencent HunyuanImage-3.0-Instruct |

## Documentation

| Document | Description |
|----------|-------------|
| [Image Editing Guide](./docs/editing/README.md) | Comprehensive guide for image editing |
| [Text-to-Image Guide](./docs/t2i/README.md) | Text-to-image generation guide |
| [Optimization Guide](./docs/optimization.md) | Memory and speed optimization |
| [Prompt Enhancement](./docs/prompt_enhance.md) | LLM-based prompt enhancement |
| [Environment Variables](./docs/env_vars.md) | Configuration via environment variables |

### Backend-Specific Guides

- [Diffusers Models](./docs/editing/diffusers.md) - Qwen, FLUX, LongCat
- [Step1X-Edit](./docs/editing/step1x.md) - Step1X-Edit v1.0/v1.1
- [BAGEL](./docs/editing/bagel.md) - ByteDance BAGEL
- [Emu3.5](./docs/editing/emu35.md) - BAAI Emu3.5
- [DreamOmni2](./docs/editing/dreamomni2.md) - DreamOmni2
- [Flux Kontext](./docs/editing/flux_kontext.md) - BFL official
- [HunyuanImage-3.0](./docs/editing/hunyuan_image_3.md) - Tencent HunyuanImage
- [OpenAI](./docs/editing/openai.md) - GPT-Image API
- [Google GenAI](./docs/editing/google_genai.md) - Gemini

## Environment Variables

Model-specific parameters are configured via `DG_*` environment variables:

```bash
# Step1X-Edit
DG_STEP1X_VERSION=v1.1
DG_STEP1X_SIZE_LEVEL=512

# BAGEL
DG_BAGEL_CFG_TEXT_SCALE=3.0
DG_BAGEL_CFG_IMG_SCALE=1.5

# API backends
OPENAI_API_KEY=your_key
GEMINI_API_KEY=your_key
```

See [Environment Variables](./docs/env_vars.md) for the complete list.

## Citation

```bibtex
@misc{li2026genarenaachievehumanalignedevaluation,
      title={GenArena: How Can We Achieve Human-Aligned Evaluation for Visual Generation Tasks?},
      author={Ruihang Li and Leigang Qu and Jingxu Zhang and Dongnan Gui and Mengde Xu and Xiaosong Zhang and Han Hu and Wenjie Wang and Jiaqi Wang},
      year={2026},
      eprint={2602.06013},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.06013},
}
```

## License

Apache-2.0
