Metadata-Version: 2.4
Name: omgformer
Version: 2.0.5
Summary: Parallel Diffusion Language Model with 60 features — MoE, LoRA, attention variants, and more
Author: OMGFormer Contributors
License: Apache-2.0
Project-URL: Homepage, https://github.com/fastloraoffical/OMGformers
Project-URL: Repository, https://github.com/fastloraoffical/OMGformers
Project-URL: Bug Tracker, https://github.com/fastloraoffical/OMGformers/issues
Keywords: transformer,diffusion,language-model,mixture-of-experts,lora,attention,deep-learning,pytorch
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.1.0
Provides-Extra: safetensors
Requires-Dist: safetensors>=0.4.0; extra == "safetensors"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: safetensors>=0.4.0; extra == "dev"
Provides-Extra: all
Requires-Dist: safetensors>=0.4.0; extra == "all"
Dynamic: license-file

# OMGFormer

**Parallel Diffusion Language Model — 60 features, production-ready**

[![PyPI](https://img.shields.io/pypi/v/omgformer)](https://pypi.org/project/omgformer/)
[![License](https://img.shields.io/badge/license-Apache--2.0-blue)](LICENSE)
[![Python](https://img.shields.io/badge/python-3.9%2B-blue)]()
[![PyTorch](https://img.shields.io/badge/pytorch-2.1%2B-orange)]()

OMGFormer is a modular PyTorch library for building and training parallel masked diffusion language models. It ships with a comprehensive set of attention variants, MoE routing strategies, LoRA fine-tuning methods, and training utilities.

---

## Installation

```bash
pip install omgformer
# with safetensors support (recommended for LoRA saving)
pip install omgformer[safetensors]
```

---

## Quick Start

```python
from omgformer import OMGConfig, OMGModel

config = OMGConfig(
    vocab_size=32000,
    hidden_size=768,
    num_layers=12,
    num_heads=12,
)
model = OMGModel(config)
```

---

## Feature Highlights

### Attention (Features #1–#16)
- **Grouped Query Attention (GQA)** — reduced KV heads for efficient inference
- **Multi-Head Latent Attention (MLA)** — DeepSeek-style latent compression
- **Sliding Window Attention** — Mistral-style local context windows
- **Linear Attention** — O(n) complexity via kernel feature maps
- **Block Sparse Attention** — memory-efficient sparse patterns
- **RoPE variants**: standard, YaRN, NTK-aware, LongRoPE
- **ALiBi and T5 relative position biases**

### Layers (Features #17–#28)
- **SwiGLU, GeGLU, ReGLU** FFN variants
- **RMSNorm and ScaleNorm**
- **AdaLN modulation** for diffusion timestep conditioning
- **Token Merging** — dynamic sequence length reduction
- **Stochastic Depth** — drop-path regularization

### Mixture of Experts (Features #37–#40)
- **Standard top-K token-choice routing**
- **Expert Choice routing** (Google Switch-style) — perfect load balance
- **Soft MoE** (Google Brain 2023) — fully differentiable routing
- **Shared expert** (DeepSeek MoE) — always-on dense + sparse experts

### LoRA / PEFT (Features #41–#44)
- **Standard LoRA** with configurable rank and alpha
- **DoRA** — weight-decomposed LoRA (Liu et al., 2024)
- **rsLoRA** — rank-stabilized scaling for high-rank training
- **LoRA+** — different learning rates for A and B matrices
- Save/load only adapter weights (~MB, not GB)

### Training (Features #45–#52)
- **Lion optimizer** (Chen et al., 2023)
- **EMA** with warmup annealing
- **Warmup + cosine LR schedule** with optional restarts
- **Gradient checkpointing** helper
- **FSDP wrapping** helper

### Advanced (Features #53–#60)
- **KV Cache** for autoregressive decoding
- **Multi-Token Prediction Head** (MTP)
- **Model merging**: SLERP, DARE, TIES
- **Reward model head + PPO step**
- **INT8 / INT4 quantization stubs**
- **GGUF export stub**
- **RAG context injector**
- **Dynamic batching engine**
- **Chunked long-document attention**

---

## LoRA Fine-tuning Example

```python
from omgformer import OMGConfig, OMGModel, add_lora, merge_lora, save_lora

model = OMGModel(OMGConfig())

# Add LoRA adapters (freezes base weights automatically)
model = add_lora(model, rank=16, alpha=32)

# ... fine-tune ...

# Merge and save
model = merge_lora(model)
save_lora(model, "./my_adapter")
```

---

## MoE Example

```python
from omgformer import OMGConfig, OMGModel

config = OMGConfig(
    use_moe=True,
    num_experts=8,
    num_experts_per_token=2,
    moe_expert_choice=False,   # or True for Expert Choice routing
)
model = OMGModel(config)
output, aux_loss = model(input_ids)
loss = ce_loss + aux_loss  # add load-balancing loss during training
```

---

## License

Apache 2.0 — see [LICENSE](LICENSE).

## Links

- GitHub: https://github.com/fastloraoffical/OMGformers
- PyPI: https://pypi.org/project/omgformer/
