Metadata-Version: 2.4
Name: qumat-qdp
Version: 0.2.0rc2
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Physics
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10, <3.13
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM

# qumat-qdp

GPU-accelerated quantum state encoding for [Apache Mahout Qumat](https://github.com/apache/mahout).

## Installation

```bash
pip install qumat[qdp]
```

Requires one of:
- NVIDIA GPU (CUDA path via `QdpEngine`)
- AMD GPU with ROCm (AMD path via `QdpEngine(backend="amd")`)

Recommended environment setup:

```bash
python -m venv .venv
source .venv/bin/activate

# Install the GPU runtime for your platform first:
# - NVIDIA users: CUDA-compatible torch / triton
# - AMD users: ROCm-compatible torch / triton

uv sync --active --project qdp/qdp-python --group dev
```

Use `--active` so `uv` reuses the environment that already has the correct GPU
runtime stack.

## Usage

```python
import qumat.qdp as qdp
import torch

# Initialize the unified QDP engine on GPU 0.
# Choose the backend explicitly.
engine = qdp.QdpEngine(device_id=0, backend="cuda")

# Encode data into quantum state
qtensor = engine.encode([1.0, 2.0, 3.0, 4.0], num_qubits=2, encoding_method="amplitude")

# Zero-copy transfer to PyTorch
tensor = torch.from_dlpack(qtensor)
print(tensor)  # Complex tensor on CUDA
```

### AMD ROCm Usage

```python
import qumat.qdp as qdp
import torch

# Unified AMD engine route
engine = qdp.QdpEngine(device_id=0, precision="float32", backend="amd")
qt = engine.encode(torch.randn(8, 4, device="cuda"), 2, "amplitude")
state = torch.from_dlpack(qt)
print(state.device, state.dtype)  # cuda:0, complex64

```

The public `QdpEngine` is a unified Python facade with explicit backend selection:
- `backend="cuda"` routes to the Rust `_qdp.QdpEngine`
- `backend="amd"` routes to the Triton AMD engine directly

See `qdp/qdp-python/TRITON_AMD_BACKEND.md` for Triton AMD setup and validation details.

## Encoding Methods

| Method | Description |
|--------|-------------|
| `amplitude` | Normalize input as quantum amplitudes |
| `angle` | Map values to rotation angles (one per qubit) |
| `basis` | Encode integer as computational basis state |
| `iqp` | IQP-style encoding with full ZZ entanglement |
| `iqp-z` | IQP encoding with Z-only diagonal (no ZZ pairs) |
| `phase` | Per-qubit phase product state via H⊗P(x_k) |

Backend support boundary:
- CUDA (`QdpEngine`): `amplitude`, `angle`, `basis`, `iqp`, `iqp-z`, `phase`
  - `phase` is currently only reachable on the CUDA path via host inputs
    (Python list / NumPy / file / CPU torch tensor). The Python extension's
    CUDA-tensor validation does not yet allowlist `phase`; cuda-resident
    torch tensors must use `.cpu()` first when targeting `phase`. Tracked as
    a follow-up.
- AMD (`QdpEngine(..., backend="amd")`): `amplitude`, `angle`, `basis`, `iqp`, `iqp-z`, `phase`

### Pipeline / loader dtype (Rust internals)

`QuantumDataLoader` and `run_throughput_pipeline` build a Rust `PipelineConfig` with an
`encoding` plus a `dtype` (float32 vs float64). The prefetch thread can only keep an
end-to-end **float32 host batch** for encodings whose GPU stack implements the batch **f32**
path (`encode_batch_f32`). **Today that is amplitude only.** Angle and basis still fall back
to float64 for that loop until their batch f32 implementations exist. The eventual full
matrix (e.g. angle/basis under `supports_f32` once kernels are wired) is broader than what
the pipeline uses today.

## Input Sources

```python
# Python list
qtensor = engine.encode([1.0, 2.0, 3.0, 4.0], 2, "amplitude")

# NumPy array
qtensor = engine.encode(np.array([[1, 2, 3, 4], [4, 3, 2, 1]]), 2, "amplitude")

# PyTorch tensor (CPU or CUDA)
qtensor = engine.encode(torch.tensor([1.0, 2.0, 3.0, 4.0]), 2, "amplitude")

# File formats
qtensor = engine.encode("data.parquet", 10, "amplitude")
qtensor = engine.encode("data.arrow", 10, "amplitude")
qtensor = engine.encode("data.npy", 10, "amplitude")
qtensor = engine.encode("data.pt", 10, "amplitude")

# Remote object storage URLs (requires building with remote-io feature)
qtensor = engine.encode("s3://my-bucket/data.parquet", 10, "amplitude")
qtensor = engine.encode("gs://my-bucket/data.parquet", 10, "amplitude")
```

## Links

- [Documentation](https://mahout.apache.org/)
- [GitHub](https://github.com/apache/mahout)
- [Qumat Package](https://pypi.org/project/qumat/)

## License

Apache License 2.0

