Metadata-Version: 2.4
Name: audiotq
Version: 0.1.0
Summary: A high-performance lossy audio compression engine inspired by LLM weight quantization
Author: Lost Martian
License: GPL-3.0-only
License-File: LICENSE
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Multimedia :: Sound/Audio :: Conversion
Requires-Python: >=3.11
Requires-Dist: numpy>=2.4.6
Requires-Dist: scipy>=1.17.1
Description-Content-Type: text/markdown

<p align="center">
  <img src="docs/image.png" alt="TurboQuant Audio Banner" width="800">
</p>

<p align="center">
  <a href="LICENSE"><img src="https://img.shields.io/badge/License-GPL%20v3-blue.svg" alt="License"></a>
  <img src="https://img.shields.io/badge/Version-0.1.0-orange.svg" alt="Version">
  <img src="https://img.shields.io/badge/Python-3.10%2B-blue.svg" alt="Python">
</p>

A minimal, high-performance lossy audio compression engine built in Python and NumPy. 

TQA adapts the mathematical principles of [TurboQuant](https://arxiv.org/pdf/2504.19874) (originally designed for data-oblivious weight quantization of LLMs on GPU clusters) into a localized, cache-aligned CPU audio codec. By mapping high-dimensional audio amplitudes into zero-centered symmetrical Gaussian distributions via Hadamard rotations, it achieves a ~3x memory reduction while preserving transient fidelity.

---

## Features

- **Data-Oblivious Energy Flattening**: Spreads transient spike energy uniformly across audio blocks using Fast Walsh-Hadamard Transform (FWHT) rotations.
- **Optimal Centroid Clustering**: Employs an iterative Lloyd-Max solver to converge on the Mean-Squared-Error (MSE) optimal 6-bit codebook.
- **1-Bit QJL Residual Layer**: Uses a Quantized Joint Least-Squares error sign layer to track quantization rounding errors and suppress distortion.
- **Zero-Dependency**: Built purely on Python and NumPy.



## Directory Structure

```text
├── audiotq/
│   ├── engine.py       # Core compression & decompression pipeline loops
│   ├── rotator.py      # Fast Walsh-Hadamard Transform (FWHT) rotations
│   ├── quantizer.py    # Iterative Lloyd-Max solver & bit-packing mechanics
│   └── wrapper.py      # WAV file read/write wrapper (Supports 8, 16, and 24-bit PCM)
├── audio_cli.py        # CLI for processing audio files and running telemetry
├── main.py             # CLI for generating and compressing synthetic signals
└── tests/
    └── run_tests.py    # System stress tests and SQNR verification
```



## Installation

Ensure you have Python 3.10+ and `numpy` / `scipy` installed:

```bash
pip install numpy scipy
```



## Usage

### 1. Run System Verification Tests
To run the automated test suite (unit tests, benchmarks, and verification):
```bash
uv run pytest
```
You can also run the legacy sequential stress tests via:
```bash
uv run python3 -m tests.run_tests
```

### 2. Compress and Decompress Audio
Use the `audio_cli.py` utility to process any standard `.wav` audio track:
```bash
uv run audio_cli.py run -i input.wav -o output_reconstructed.wav
```

### 3. Compare Reconstructed Audio
Extract mathematical fidelity metrics (MSE, SQNR, correlation, and envelope preservation) between raw and processed signals:
```bash
uv run audio_cli.py compare -f1 input.wav -f2 output_reconstructed.wav
```

### 4. Run Synthetic Simulations
Generate custom synthetic signals (e.g., sine waves, square waves, noise, transients) with custom parameters:
```bash
uv run main.py --type square --frequency 440 --duration 2.0 --spikes 5
```


## Performance Benchmarks

Running `uv run pytest` executes telemetry benchmarking on a standard high-sample dataset (44.1 kHz stereo/mono):

| Metric | Performance Profile |
| --- | --- |
| **Original Dataset Size** | 2.52 MB (15.0 seconds) |
| **Compressed On-Disk Footprint** | 0.65 MB |
| **Compression Ratio** | **~3.91x smaller footprint (74.4% reduction)** |
| **Fidelity (SQNR)** | **30.24 dB** |
| **Compression Throughput** | ~1.32 MB/s |
| **Decompression Throughput** | ~1.35 MB/s |


## Known Limitations & Failure Modes

### 1. WHT Basis Alignment (Extreme Sparsity Failure)
The codec relies on a Fast Walsh-Hadamard Transform (FWHT) rotation to "Gaussianize" audio blocks, making them fit the Gaussian Lloyd-Max codebook. 
* **Failure Scenario**: If an input block perfectly aligns with one of the Walsh-Hadamard basis vectors (e.g. `signal = rotator.hadamardSigns`), the rotated vector becomes a single extreme Kronecker delta spike.
* **Result**: Because the Lloyd-Max codebook is optimized for normal distributions, it clips this extreme spike to the outermost centroid boundary ($\pm 2.41$ standard deviations). This clipping noise destroys reconstruction quality, dropping the SQNR to **~1.31 dB**. (Proven in `tests/test_failures.py::test_failure_hadamard_basis_alignment`).

### 2. Silent Block Edge Case
* **Edge Case**: Silent blocks have a standard deviation of `0.0`. Dividing by this value during block standardization would lead to `NaN` or `Inf` errors.
* **Resolution**: The engine implements a safety threshold guard (`std_dev > 1e-6`). Silent blocks bypass normalization and are reconstructed as perfect silence. (Proven in `tests/test_failures.py::test_boundary_silent_signal`).

## License

This project is licensed under the GNU General Public License v3.0. See the [LICENSE](LICENSE) file for details.
