Metadata-Version: 2.4
Name: auvux-dsp
Version: 0.1.0.dev0
Summary: Fast differentiable audio transforms (STFT, mel, MFCC, CQT, chroma) on CPU and GPU
Keywords: audio,dsp,stft,mel,mfcc,cqt,chroma,spectrogram,gpu
Author-Email: Peter Kiers <pkiers.1983@gmail.com>
License-Expression: MIT
License-File: LICENSE
License-File: THIRD_PARTY_LICENSES
Classifier: Programming Language :: C++
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Multimedia :: Sound/Audio :: Analysis
Requires-Python: >=3.10
Requires-Dist: numpy>=1.22
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: librosa; extra == "test"
Requires-Dist: soxr; extra == "test"
Requires-Dist: torch; extra == "test"
Description-Content-Type: text/markdown

# auvux-dsp

Fast differentiable audio transforms (STFT, iSTFT, mel, MFCC, CQT/VQT, chroma)
with native CPU backends (vDSP on macOS, PFFFT elsewhere) and GPU support
(Metal, CUDA). Forward *and* backward passes run in native kernels; PyTorch
autograd plugs in when you pass a torch tensor, and torch is never required
otherwise.

```python
import auvux.dsp as dsp

mel = dsp.MelSpectrogram(sr=44100, n_fft=2048, hop_length=512, n_mels=128)
S = mel(y)                       # numpy in -> numpy out
S = mel(y, backend="gpu")        # Metal / CUDA kernels

y = torch.tensor(clip, requires_grad=True)
loss = mel(y, output="db").sum() # native forward
loss.backward()                  # native adjoint kernel, no torch recompute
```

```
pip install auvux-dsp --pre    # preview release; drop --pre once 0.1.0 is out
```

## Performance

Benchmarks against librosa and torchaudio (forward passes and full training
steps, CPU / staged GPU / GPU-resident) live in `benchmarks/benchmark.py`:

```
python benchmarks/benchmark.py
```

*Results table to be published with the first release.*

### Complex STFT layout

Complex spectra are returned as `(..., bins, frames)` backed by frame-major
memory — each frame's spectrum contiguous, the freq axis strided. This is the
same physical layout both references use (librosa allocates its stft output
`order='F'`; `torch.stft` returns a transposed view over frame-major memory),
so values *and* bytes match librosa, and the GPU-resident path returns a
tensor with the exact strides `torch.stft` produces. It is also what makes
the STFT fast: no backend ever materializes the bins-major transpose. `istft`
accepts both this layout (zero-copy) and compact C-order arrays. Float
outputs (power/db/mel/...) are ordinary C-contiguous arrays.

Status: under construction.
- CPU (vDSP/PFFFT): STFT, ISTFT, MelSpectrogram, MFCC, CQT, VQT, Chroma —
  forward and native backward, librosa-parity tested, torch autograd built in.
- Metal: all of the above on GPU (n_fft <= 4096), forward + backward,
  parity-tested against the CPU path. torch MPS tensors stay on the GPU end to
  end (DLPack), and backend="auto" routes them there — no flags needed.
- CUDA: kernel-for-kernel twin of the Metal backend including the resident
  paths; parity-tested on NVIDIA hardware (RTX 4090, CUDA 12.9), with pinned
  double-buffered staging for the numpy-in/numpy-out GPU paths.
- Pending: iCQT.

## Development

```
pip install scikit-build-core pybind11 numpy pytest
./scripts/dev-build.sh
pytest
```

Note for packagers: `python/auvux/` is a PEP 420 namespace package — it must
never contain an `__init__.py`, or it will shadow sibling `auvux-*`
distributions.
