Metadata-Version: 2.4
Name: transformer_nuggets
Version: 0.0.4
Summary: A place to store reusable transformer components found around the interwebs
Author-email: Driss Guessous <drisspguessous@gmail.com>
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Requires-Dist: matplotlib
Requires-Dist: pandas
Requires-Dist: rich
Requires-Dist: scipy
Requires-Dist: seaborn
Requires-Dist: tabulate
Requires-Dist: torch
Requires-Dist: tqdm
Requires-Dist: typer
Provides-Extra: dev
Requires-Dist: bumpver; extra == 'dev'
Requires-Dist: docstring-parser; extra == 'dev'
Requires-Dist: jsonargparse; extra == 'dev'
Requires-Dist: pip-tools; extra == 'dev'
Requires-Dist: prek>=0.2.27; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Provides-Extra: flash
Requires-Dist: triton; extra == 'flash'
Provides-Extra: llama
Requires-Dist: datasets==2.15.0; extra == 'llama'
Requires-Dist: fire==0.5.0; extra == 'llama'
Requires-Dist: float8-experimental; extra == 'llama'
Requires-Dist: sentencepiece==0.1.99; extra == 'llama'
Provides-Extra: qlora
Requires-Dist: bitsandbytes; extra == 'qlora'
Description-Content-Type: text/markdown

## transformer_nuggets

A grab-bag of experimental transformer kernels and utilities (mostly PyTorch + Triton).

![transformer_nuggies](https://github.com/drisspg/transformer_nuggets/assets/32754868/8329986a-aa9f-41a6-a332-49a0d71438aa)

### What’s in here

- **`transformer_nuggets/flash`**: Triton FlashAttention experiments + masking/bias utilities.
- **`transformer_nuggets/quant`**: NF4 tensor subclass + QLoRA building blocks (pure PyTorch).
- **`transformer_nuggets/fp8`**: FP8 casting / scaled-quantization kernels (Triton). Try helion :)
- **`transformer_nuggets/cute`**: CUTE DSL experiments and tooling (includes an intra-kernel profiler).
- **`transformer_nuggets/misc`**: Odds and ends (e.g. attention wrappers, utilities).
- **`transformer_nuggets/llama`**: LLaMA-ish model + training/finetune scripts (research-grade).

This repository is research code: APIs are not stable and may change.

### Install

You’ll need a working PyTorch install first (CPU or CUDA). Follow the official
[PyTorch install instructions](https://pytorch.org/get-started/locally/).

To install from PyPI:

```shell
pip install transformer_nuggets
```

To hack on the code locally:

```shell
git clone https://github.com/drisspg/transformer_nuggets.git
cd transformer_nuggets
pip install -e .
```

Optional extras:

```shell
pip install "transformer_nuggets[flash]"  # triton
pip install "transformer_nuggets[qlora]"  # bitsandbytes (optional comparisons)
pip install "transformer_nuggets[llama]"  # llama training utilities
```

### Quick examples

Use torchao :)

FlashAttention (requires CUDA + Triton; API is experimental):

Use flex-attention :)


CUTE intra-kernel profiling (writes a Perfetto trace):

```shell
python -m transformer_nuggets.cute.profiler.example
```

### Repo layout

- **`transformer_nuggets/`**: Python package.
- **`benchmarks/`**: Microbenchmarks and profiling scripts.
- **`examples/`**: Small runnable examples.
- **`scripts/`**: One-off utilities.
- **`test/`**: PyTest suite.

### Development

```shell
pip install -e ".[dev]"
pre-commit install
pytest
```
