Metadata-Version: 2.4
Name: mlx-gamma
Version: 0.1.1
Summary: GPU-accelerated gamma and lgamma functions for Apple MLX
Author-email: Sheng-Kai Huang <akai@fawstudio.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/akaiHuang/mlx-gamma
Project-URL: Issues, https://github.com/apple/mlx/issues/2030
Keywords: mlx,gamma,lgamma,digamma,beta,gpu,apple-silicon
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mlx>=0.10.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: scipy>=1.10; extra == "dev"
Requires-Dist: numpy>=1.24; extra == "dev"
Provides-Extra: benchmark
Requires-Dist: scipy>=1.10; extra == "benchmark"
Requires-Dist: numpy>=1.24; extra == "benchmark"
Dynamic: license-file

# mlx-gamma

GPU-accelerated gamma and lgamma functions for [Apple MLX](https://github.com/ml-explore/mlx).

MLX is the only major ML framework missing `gamma`/`lgamma` support
([mlx#2030](https://github.com/apple/mlx/issues/2030)).
PyTorch, JAX, and CuPy all provide these. **mlx-gamma** fills that gap with
pure-MLX vectorized implementations that run entirely on the Apple GPU.

## Installation

```bash
pip install mlx-gamma
```

Or from source:

```bash
git clone https://github.com/akaiHuang/mlx-gamma.git
cd mlx-gamma
pip install -e ".[dev]"
```

## Quick start

```python
import mlx.core as mx
from mlx_gamma import lgamma, gamma, digamma, beta

x = mx.array([1.0, 2.0, 3.0, 5.0, 10.0, 100.0])

lgamma(x)   # log-gamma
gamma(x)    # gamma function (with sign handling)
digamma(x)  # psi function, d/dx ln Gamma(x)

beta(mx.array([2.0, 3.0]), mx.array([3.0, 4.0]))  # Beta function
```

## Functions

| Function | Description | Domain |
|----------|-------------|--------|
| `lgamma(x)` | Log of absolute value of Gamma(x) | x != 0, -1, -2, ... |
| `gamma(x)` | Gamma function via exp(lgamma(x)) with sign | x != 0, -1, -2, ... |
| `digamma(x)` | Psi function (logarithmic derivative of Gamma) | x != 0, -1, -2, ... |
| `beta(a, b)` | Beta function B(a,b) = Gamma(a)Gamma(b)/Gamma(a+b) | a, b > 0 |

## Algorithm details

- **lgamma**: Lanczos approximation (g=7, 9 coefficients) for x >= 0.5;
  reflection formula for x < 0.5. Accurate to ~13 decimal digits in float64,
  ~6 in float32.
- **gamma**: `exp(lgamma(x))` with sign correction for negative arguments.
- **digamma**: Asymptotic expansion for x >= 7; recurrence relation to shift
  small arguments upward; reflection formula for x < 0.
- **beta**: Computed via `exp(lgamma(a) + lgamma(b) - lgamma(a+b))` for
  numerical stability.

## Benchmarks

```bash
python benchmark.py
```

Measured on Apple M1 Max, MLX 0.31.1, SciPy 1.16.2, float32, 50 timed
runs per cell with median wall-clock time (warmup excluded). GPU dispatch
overhead dominates small arrays; the crossover where mlx-gamma starts
beating scipy is around N = 50k--100k.

|         N | lgamma | gamma  | digamma |
|----------:|-------:|-------:|--------:|
|     1,000 |  0.02x |  0.03x |   0.02x |
|    10,000 |  0.21x |  0.50x |   0.22x |
|   100,000 |  2.33x |  3.59x |   1.66x |
| 1,000,000 |  6.41x | 10.37x |   3.67x |

Numbers are `scipy_time / mlx_time` (>1 means mlx-gamma is faster).
Full timings, beta-function results, accuracy table, and methodology in
[benchmark_results.md](benchmark_results.md).

## Running tests

```bash
pip install -e ".[dev]"
pytest tests/ -v
```

## License

MIT -- see [LICENSE](LICENSE).

## Author

Sheng-Kai Huang (akai@fawstudio.com)

## References

- MLX issue: [apple/mlx#2030](https://github.com/apple/mlx/issues/2030)
- Lanczos, C. (1964). "A Precision Approximation of the Gamma Function." SIAM J. Numer. Anal. 1: 86--96.
- Pugh, G. R. (2004). "An Analysis of the Lanczos Gamma Approximation." PhD thesis, University of British Columbia.
