Metadata-Version: 2.4
Name: agi-transformer
Version: 5.1.0
Summary: AGi Transformer v5.1 Advanced — FP4-E2M1, Parallel Blocks, Mamba, Double Buffering
Author: AGi Architecture Framework
License: MIT
Project-URL: Homepage, https://github.com/agi-framework/agi-transformer
Project-URL: Documentation, https://agi-transformer.readthedocs.io
Keywords: transformer,quantization,mamba,triton,fp4,llm
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.1.0
Requires-Dist: triton>=3.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: matplotlib>=3.7.0
Requires-Dist: tqdm>=4.65.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Dynamic: license-file

# AGi Transformer v5.1 — Advanced Core

High-performance transformer library with FP4-E2M1 quantization, Parallel Blocks, Mamba layers, and Triton double-buffering kernels.

## Features

- **FP4-E2M1 Format**: 1 sign + 2 exponent + 1 mantissa (superior to NF4 for Gaussian weights)
- **Parallel Block Architecture**: PaLM-style simultaneous Attention + FFN (15% speedup)
- **Mamba/TTT Hybrid**: Replace middle attention with O(L) linear recurrent layers
- **Double Buffering**: Triton kernel preloads Block K+1 while computing Block K
- **Matrix Swizzling**: XOR-based shared memory indexing eliminates bank conflicts
- **Dynamic Scaled Clamping**: Per-row adaptive bounds based on block maxima
- **Vector SmoothAlpha**: Per-channel 4096-parameter scaling (not scalar)
- **AdaRound + KLD**: Learnable rounding with distribution-preserving loss

## Quick Start

```bash
pip install -e .
python scripts/train.py --steps 5000 --d-model 512 --n-layers 8
```

## Project Structure

```
src/agi_core/
├── kernels/      # Triton AOT kernels (FP4, Double Buffer, Swizzle)
├── layers/       # Quantized Linear, Mamba, Parallel Block
├── model/        # AGiParallelTransformer
├── ops/csrc/     # C++/CUDA extensions (optional AOT)
└── utils/        # Quantization helpers, logging
```

## License

MIT
