Metadata-Version: 2.4
Name: b12x
Version: 0.12.3
Summary: Unapologetically SM120-only CuTe DSL kernels for NVFP4 GEMM and MoE.
Author: Luke
License: Apache-2.0
Requires-Python: <4.0,>=3.10
Description-Content-Type: text/markdown
Requires-Dist: torch>=2.10.0
Requires-Dist: cuda-python
Requires-Dist: nvidia-cutlass-dsl>=4.4.1
Requires-Dist: nvidia-cutlass-dsl-libs-base>=4.4.1
Requires-Dist: nvidia-cutlass-dsl-libs-cu13>=4.4.1
Requires-Dist: apache-tvm-ffi!=0.1.8,!=0.1.8.post0,<0.2,>=0.1.6
Requires-Dist: rich>=13
Requires-Dist: fastapi>=0.115
Requires-Dist: uvicorn>=0.34
Requires-Dist: transformers>=4.51
Requires-Dist: safetensors>=0.6
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"


`b12x` is an SM120/SM121 CuTe DSL kernel library for (primarily) NVFP4 LLM inference.

It is intentionally narrow. This is not a generic CUDA kernel collection or a
full model-serving stack. It does not intend to target any other GPU architectures,
including SM100. It is a focused package for a small number of high-performance
kernels plus the runtime glue needed to launch them cleanly from `sglang`/`vllm`.

Currently supported kernels:
- NVFP4 fused MoE GEMM
- NVFP4 dense GEMM
- BF16/FP8 paged attention
- Sparse MLA attention (for DSA/NSA only).

```bash
pip install b12x
```

Ask your friendly neighborhood AI agent for further information on how to use this library.
