Metadata-Version: 2.4
Name: mlx-qre
Version: 0.2.0
Summary: Quantum Relative Entropy + Petz Recovery toolkit on Apple Silicon via MLX
Author-email: Sheng-Kai Huang <akai@fawstudio.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/akaiHuang/mlx-qre
Project-URL: Repository, https://github.com/akaiHuang/mlx-qre
Keywords: quantum-information,relative-entropy,apple-silicon,gpu,mlx,petz-recovery,quantum-channels
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Physics
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mlx>=0.4.0
Requires-Dist: numpy>=1.24.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-benchmark; extra == "dev"
Dynamic: license-file

# mlx-qre

**Quantum Relative Entropy + Petz Recovery toolkit** on Apple Silicon via [MLX](https://github.com/ml-explore/mlx).

$$\Sigma = D(\rho \| \sigma) = \mathrm{Tr}[\rho(\ln\rho - \ln\sigma)]$$

A complete, self-contained library for quantum information quantities (QRE, von Neumann entropy, Rényi / JSD), quantum channels (thermal attenuator, depolarizing, dephasing) and Petz recovery analysis (recovery map, fidelity, retrodiction). Built as the computational companion to the **Σ = 2 ln Q** entropy-production framework — see [petz-recovery-unification](https://github.com/akaiHuang/petz-recovery-unification) and [tau-chrono](https://github.com/akaiHuang/tau-chrono).

> **Performance note.** Eigendecomposition runs on the Metal GPU via MLX. For **small matrices (N < 500)**, NumPy + Accelerate (CPU) is typically faster due to lower dispatch overhead — use NumPy if you only need a handful of small QREs. The GPU path becomes useful for **batched evaluation** and **N ≥ 500**, where it pulls ahead of NumPy. See [benchmark_results.md](benchmark_results.md) for the full breakdown.

## Installation

```bash
pip install -e .
```

Requires Python 3.10+ and Apple Silicon (M1/M2/M3/M4).

## Quick Start

```python
import mlx.core as mx
from mlx_qre import quantum_relative_entropy, random_density_matrix

# Two 100x100 density matrices on GPU
rho = random_density_matrix(100)
sigma = random_density_matrix(100)

# Compute D(rho || sigma) — eigendecomposition runs on Metal GPU
D = quantum_relative_entropy(rho, sigma)
mx.eval(D)
print(f"D(rho || sigma) = {D.item():.6f}")

# Batched: 500 pairs simultaneously
rho_batch = random_density_matrix(50, batch_size=500)
sigma_batch = random_density_matrix(50, batch_size=500)
D_batch = quantum_relative_entropy(rho_batch, sigma_batch)
```

## Stochastic Lanczos backend (large N)

For `N >= 1000` the O(N^3) eigendecomposition is the bottleneck and the
Metal `eigh` kernel can hit GPU command-buffer timeouts past `N = 2000`.
mlx-qre ships a Stochastic Lanczos Quadrature (SLQ) estimator that
replaces the eigendecomposition with `O(k * m * N^2)` matvecs.

**This implementation runs entirely on MLX (no NumPy fallback in the hot
path).** All m probe vectors are stacked as a single `(N, m)` matrix and
each Lanczos step is a single block matmul `A @ V`, which both
amortises GPU dispatch overhead and lets MLX tile the m probes across
the GPU. Only the small `(k, k)` tridiagonal eigh is delegated to MLX's
CPU eigh stream (k <= 30 -- CPU is the right place for it).

```python
import mlx.core as mx
from mlx_qre import (
    random_density_matrix,
    quantum_relative_entropy,
    von_neumann_entropy_lanczos,
    quantum_relative_entropy_lanczos,
)

rho = random_density_matrix(2000)
sigma = random_density_matrix(2000)

# Direct API
S = von_neumann_entropy_lanczos(rho, k=25, m=20, seed=0)
D = quantum_relative_entropy_lanczos(rho, sigma, k=25, m=20, seed=0)

# Or via the main API with method="lanczos"
D_mx = quantum_relative_entropy(rho, sigma, method="lanczos", k=25, m=20, seed=0)
```

Typical accuracy at default `k=25, m=20`:

- `S(rho)`: ~1-2% median relative error (clean SLQ on a single matrix)
- `D(rho || sigma)`: ~3-7% median relative error (cross-term has higher
  variance because `Tr[rho ln sigma]` is not a single-matrix spectral
  function). Increase `m` for tighter accuracy.

Speed crossover (M1 Max, dense complex Haar-random density matrices,
SLQ at `k=25, m=20`):

| N    | MLX eigh (ms) | NumPy eigh (ms) | SLQ pure-MLX (ms) |
|---:  |---:|---:|---:|
| 100  |    4 |    4 |   19 |
| 500  |  160 |  287 |   20 |
| 1000 | 1077 | 1988 |   24 |
| 2000 | timeout | 24048 |   37 |

SLQ wins from `N >= 500` and the gap blows up at `N >= 1000`
(~50-80x vs MLX eigh, ~80-660x vs NumPy eigh). The pure-MLX block
matvec hot path is also ~30-140x faster than the previous
NumPy-Accelerate hot path on the same machine. See
[benchmark_results.md](benchmark_results.md) for the full breakdown.

## Features

| Module | Function | Description |
|--------|----------|-------------|
| `qre` | `quantum_relative_entropy(rho, sigma, method="exact"\|"lanczos")` | D(rho \|\| sigma) via Metal eigendecomposition or SLQ |
| `qre` | `von_neumann_entropy(rho)` | S(rho) = -Tr[rho ln rho] |
| `qre` | `relative_entropy_pure_state(psi, sigma)` | Efficient D for pure states: -ln(psi\|sigma\|psi) |
| `lanczos` | `von_neumann_entropy_lanczos(rho, k, m)` | S(rho) via Stochastic Lanczos Quadrature |
| `lanczos` | `quantum_relative_entropy_lanczos(rho, sigma, k, m)` | D via Hutchinson + Lanczos |
| `lanczos` | `stochastic_lanczos_logtr(A, k, m)` | Tr[A ln A] for any Hermitian PSD A |
| `classical` | `kl_divergence(p, q)` | Classical KL divergence |
| `classical` | `jensen_shannon_divergence(p, q)` | Symmetric JSD |
| `classical` | `renyi_divergence(p, q, alpha)` | Renyi divergence of order alpha |
| `channels` | `thermal_attenuator(eta)` | Gravitational channel eta = -g_00 |
| `channels` | `channel_entropy_production(K, rho, sigma)` | Sigma through channel |
| `channels` | `depolarizing_channel(p)` | Depolarizing noise |
| `channels` | `dephasing_channel(gamma)` | Dephasing noise |
| `petz` | `petz_recovery_map(K, sigma)` | Construct Petz recovery R |
| `petz` | `petz_recovery_fidelity(rho, sigma, K)` | F(rho, R o N(rho)) |
| `petz` | `verify_petz_bound(rho, sigma, K)` | Check F >= exp(-Sigma/2) |
| `petz` | `retrodiction_quality(rho, sigma, K)` | tau = 1 - F |

## Use Cases

- **Gravitational entropy production**: Sigma_grav = D(N_eta(rho) || N_eta(sigma)) with eta = 1/Q^2
- **Quantum channel analysis**: entropy production, data processing inequality
- **Petz recovery bounds**: F >= exp(-Sigma/2), retrodiction quality
- **Quantum ML**: kernel methods using QRE as a distance measure
- **Neural entropy**: EEG/neural signal entropy production analysis

## Benchmark

```bash
python -m mlx_qre.benchmark
```

Compares MLX (Apple Silicon GPU) vs NumPy (CPU) across matrix sizes N = 10 to 1000. Summary on M1 Max:

| N | MLX (ms) | NumPy (ms) | MLX vs NumPy |
|---:|---:|---:|---:|
| 10 | 0.74 | 0.04 | **0.06×** (NumPy wins) |
| 100 | 3.95 | 3.57 | 0.90× |
| 500 | 158 | 278 | 1.76× |
| 1000 | 1042 | 2010 | 1.93× |

For batched QRE at N=100 the GPU sustains ~460 pairs/sec. **Use NumPy for one-off small problems; reach for `mlx-qre` for batched / large-N work and for the integrated Petz / channel utilities below.** See [benchmark_results.md](benchmark_results.md) for the full table.

## Tests

```bash
pip install -e ".[dev]"
pytest tests/ -v
```

## Theory

The quantum relative entropy D(rho || sigma) is the quantum generalization of KL divergence. In the retrocausality framework:

- **Sigma = 2 ln Q**: unified entropy production formula
- **Petz bound**: F >= exp(-Sigma/2) quantifies retrodiction cost
- **tau = 1 - F**: retrodiction deficit (0 = perfect, 1 = irreversible)
- **Zero-entropy limit**: Sigma -> 0 implies perfect retrodiction (no time arrow)

## License

MIT License. Copyright (c) 2026 Sheng-Kai Huang.
