Metadata-Version: 2.4
Name: mlx-wigner
Version: 0.1.0
Summary: GPU-accelerated Wigner 3j/6j/9j symbols via Apple MLX — the world's first GPU implementation
Author-email: Sheng-Kai Huang <akai@fawstudio.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/akaiHuang/mlx-wigner
Project-URL: Issues, https://github.com/akaiHuang/mlx-wigner/issues
Keywords: wigner,3j,6j,9j,clebsch-gordan,angular momentum,quantum mechanics,gpu,mlx
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Physics
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mlx>=0.4.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: sympy>=1.12; extra == "dev"
Provides-Extra: benchmark
Requires-Dist: sympy>=1.12; extra == "benchmark"
Dynamic: license-file

# mlx-wigner

GPU-accelerated Wigner 3j/6j/9j symbols for Apple Silicon via [MLX](https://github.com/ml-explore/mlx). Fully vectorized -- evaluate millions of symbols in a single call.

## Why

Wigner 3j/6j/9j symbols appear everywhere in quantum mechanics, atomic physics, and nuclear structure calculations. Every existing implementation (sympy, wigxjpf, SHTOOLS) is CPU-only and computes symbols one at a time. For modern applications that require millions of coupling coefficients (e.g., spherical tensor networks, many-body perturbation theory, angular power spectra), serial CPU evaluation is the bottleneck.

`mlx-wigner` solves this by:

- **GPU-vectorized Racah formula** — all symbols in a batch are computed simultaneously
- **Log-gamma numerics** — stable for angular momenta up to j ~ 100
- **Zero CPU loops** — the entire computation graph runs on GPU

## Installation

```bash
pip install mlx-wigner
```

Or from source:

```bash
git clone https://github.com/akaiHuang/mlx-wigner.git
cd mlx-wigner
pip install -e ".[dev]"
```

Requires Python >= 3.10 and Apple Silicon (M1/M2/M3/M4).

## Quick start

```python
from mlx_wigner import wigner_3j, wigner_6j, wigner_9j, clebsch_gordan

# Single symbol
w = wigner_3j(1, 1, 2, 0, 0, 0)

# Clebsch-Gordan coefficient <j1 m1; j2 m2 | J M>
cg = clebsch_gordan(1, 0, 1, 0, 2, 0)

# Batch: 100,000 symbols at once on GPU
import mlx.core as mx
j1 = mx.ones(100_000)
j2 = mx.ones(100_000)
j3 = 2 * mx.ones(100_000)
m1 = mx.zeros(100_000)
m2 = mx.zeros(100_000)
m3 = mx.zeros(100_000)
result = wigner_3j(j1, j2, j3, m1, m2, m3)  # all on GPU

# 6j and 9j symbols
s6 = wigner_6j(1, 1, 1, 1, 1, 1)
s9 = wigner_9j(1, 1, 0, 1, 1, 0, 0, 0, 0)
```

## API

| Function | Signature | Description |
|---|---|---|
| `wigner_3j` | `(j1, j2, j3, m1, m2, m3)` | Wigner 3j symbol |
| `wigner_6j` | `(j1, j2, j3, j4, j5, j6)` | Wigner 6j symbol (Racah W) |
| `wigner_9j` | `(j1, ..., j9)` | Wigner 9j symbol |
| `clebsch_gordan` | `(j1, m1, j2, m2, J, M)` | Clebsch-Gordan coefficient |

All functions accept scalars or MLX arrays. Array inputs are batch-computed on GPU.

## Benchmark

```bash
python benchmark.py
```

Typical results on M3 Max (vs sympy on CPU):

| Batch size | mlx-wigner (GPU) | sympy (CPU) | Speedup |
|---|---|---|---|
| 1 | ~0.5 ms | ~2 ms | 4x |
| 1,000 | ~1 ms | ~2,000 ms | 2,000x |
| 100,000 | ~10 ms | — | — |

The GPU advantage grows with batch size due to massive parallelism.

## Algorithm

- **Wigner 3j**: Racah formula with log-gamma (Lanczos approximation) for numerical stability
- **Wigner 6j**: Direct Racah sum formula (not built from 3j products)
- **Wigner 9j**: Sum over 6j triple products with internal summation variable
- **Selection rules**: Vectorized boolean masks applied before computation

Half-integer angular momenta are fully supported.

## Testing

```bash
pip install -e ".[dev]"
pytest tests/ -v
```

Tests validate against exact values from `sympy.physics.wigner`.

## License

MIT License. Copyright (c) 2026 Sheng-Kai Huang.
