Metadata-Version: 2.4
Name: coremlcrepe
Version: 0.1.0
Summary: CoreML/ANE port of the CREPE pitch estimator with a torchcrepe-compatible API
Author: CoreMLCREPE
License: MIT
Project-URL: Homepage, https://github.com/sakamoto-poteko/coremlcrepe
Project-URL: Repository, https://github.com/sakamoto-poteko/coremlcrepe
Project-URL: Issues, https://github.com/sakamoto-poteko/coremlcrepe/issues
Keywords: pitch,crepe,coreml,ane,audio,f0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Analysis
Requires-Python: <3.13,>=3.9
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.21
Requires-Dist: coremltools>=7.0
Provides-Extra: audio
Requires-Dist: soundfile>=0.12; extra == "audio"
Provides-Extra: dsp
Requires-Dist: librosa>=0.10; extra == "dsp"
Requires-Dist: resampy>=0.4; extra == "dsp"
Provides-Extra: convert
Requires-Dist: torch>=2.0; extra == "convert"
Requires-Dist: torchcrepe>=0.0.20; extra == "convert"
Provides-Extra: test
Requires-Dist: pytest>=7.0; extra == "test"

# coremlcrepe

A vibe-coded **CoreML / Apple Neural Engine (ANE)** port of the
[CREPE](https://github.com/marl/crepe) pitch estimator, exposing a
**torch-free, [`torchcrepe`](https://github.com/maxrmorrison/torchcrepe)-compatible
API**. At inference time it only needs `numpy` + `coremltools` — no PyTorch.

- 🎯 Drop-in-style API: `coremlcrepe.predict(...)`, `coremlcrepe.decode`,
  `coremlcrepe.filter`, `coremlcrepe.threshold`, `coremlcrepe.convert`, ...
- ⚡ Runs on the ANE via a float16 `mlprogram` (10–13× faster per frame than
  Torch CPU).
- ✅ Validated against the original torchcrepe model (< 1 cent median F0 error
  on the `full` model).

## Model I/O

| | |
|---|---|
| **Input** `frames` | `(batch, 1024)` float32 — 16 kHz audio windows, per-frame mean-centered + unit-std normalized |
| **Output** `probabilities` | `(batch, 360)` float32 — sigmoid probabilities over 360 pitch bins (20 cents/bin) |

## Project layout

```
coremlcrepe/            The library (torch-free, mirrors torchcrepe)
├── __init__.py         Public API + constants
├── core.py             predict / preprocess / infer / postprocess / *_from_file
├── model.py            CoreML MLModel wrapper (Crepe)
├── decode.py           argmax / weighted_argmax / viterbi
├── convert.py          bins <-> cents <-> frequency conversions
├── filter.py           mean / median (+ NaN-aware) sequence filters
├── threshold.py        At / Hysteresis / Silence thresholding
├── load.py             model + audio loading
├── loudness.py         A-weighted perceptual loudness
└── assets/             full.mlpackage, tiny.mlpackage  (the CoreML models)

convert_crepe.py        Torch -> CoreML conversion script
validate_and_benchmark.py  Validation + latency benchmark vs torchcrepe
examples/               Runnable usage examples
tests/                  pytest suite (torch-free + optional parity tests)
pyproject.toml          Package metadata (pip install -e .)
```

## Install

```bash
# Python 3.9–3.13 recommended.
python3 -m venv .venv && source .venv/bin/activate

# Runtime only (numpy + coremltools) + the package:
pip install -e .

# Optional extras:
pip install -e ".[audio]"    # soundfile, to read audio files
pip install -e ".[dsp]"      # librosa + resampy (viterbi/loudness/hi-q resample)
pip install -e ".[convert]"  # torch + torchcrepe, to (re)build the .mlpackage
pip install -e ".[test]"     # pytest
```

The converted models are shipped in `coremlcrepe/assets/`. To rebuild them:

```bash
python convert_crepe.py --capacity full   # -> coremlcrepe/assets/full.mlpackage
python convert_crepe.py --capacity tiny   # -> coremlcrepe/assets/tiny.mlpackage
# Most ANE-friendly: bake a static batch size
python convert_crepe.py --capacity full --fixed-batch 100
```

## Usage

The API mirrors torchcrepe, using numpy arrays of shape `(1, time)`:

```python
import numpy as np
import coremlcrepe

# Load audio (needs the `audio` extra) or bring your own numpy array.
audio, sr = coremlcrepe.load.audio("audio.wav")

# Predict pitch (Hz) and periodicity (confidence).
pitch, periodicity = coremlcrepe.predict(
    audio, sr,
    fmin=50., fmax=550.,
    model="full",                 # or "tiny"
    decoder=coremlcrepe.decode.weighted_argmax,
    return_periodicity=True,
)
```

### Recommended cleanup pipeline (matches torchcrepe)

```python
# Remove periodicity in silent regions (needs the `dsp` extra for loudness).
periodicity = coremlcrepe.threshold.Silence(-60.)(periodicity, audio, sr)

# Mark low-confidence frames unvoiced (NaN).
pitch = coremlcrepe.threshold.At(0.21)(pitch, periodicity)

# Smooth.
pitch = coremlcrepe.filter.median(pitch, 3)
periodicity = coremlcrepe.filter.mean(periodicity, 3)
```

### Choosing a decoder

| decoder | notes | needs librosa? |
|---------|-------|:---:|
| `coremlcrepe.decode.weighted_argmax` | **default**, sub-bin accurate | no |
| `coremlcrepe.decode.argmax` | fastest, bin-quantized | no |
| `coremlcrepe.decode.viterbi` | temporally smooth path | librosa if installed, else pure-numpy fallback |

### Selecting compute units

```python
# Force a specific backend when loading (ALL lets the ANE be used):
coremlcrepe.load.model("full", compute_units="CPU_AND_NE")
```

## Examples

```bash
python examples/01_basic_prediction.py          # synthetic tone
python examples/02_from_file_with_cleanup.py a.wav   # file + threshold + filter
python examples/03_pitch_sweep.py               # track a glissando, compare decoders
```

## Tests

```bash
pytest                # torch-free core tests always run
                      # torchcrepe parity tests run only if torch is installed
```

The suite covers unit conversions, decoders, filters/thresholds, the
end-to-end prediction pipeline, and — when torch is available — numerical
parity with the original torchcrepe model.

### Audio corpus suite

[`tests/audio_corpus.py`](tests/audio_corpus.py) deterministically generates a
diverse, labeled corpus (73 cases): notes across octaves in several timbres
(sine / harmonic / saw / square), vibrato, glissando, and harmonic tones at
decreasing SNR. [`tests/test_audio_corpus.py`](tests/test_audio_corpus.py)
runs prediction over the whole corpus and asserts per-case and per-category
accuracy. Overall mean median error is ~2 cents.

Get a standalone accuracy report (and optionally export the corpus to wav):

```bash
python tests/corpus_report.py                       # full model, weighted_argmax
python tests/corpus_report.py --model tiny --decoder viterbi --verbose
python tests/corpus_report.py --export tests/audio/generated
```

Drop your own `.wav` / `.flac` files into
[`tests/audio/real/`](tests/audio/real/) to include them automatically; encode
an expected fundamental as `_<f0>hz` in the filename (e.g. `violin_440hz.wav`)
to also assert accuracy.

## Validation vs torchcrepe

CoreML (float16, ANE) vs the original Torch model on harmonic sine waves:

| Model | max &#124;Δprob&#124; (CoreML vs Torch) | mean &#124;F0 error&#124; |
|-------|------------------------------------------|---------------------------|
| full  | 3.4e-04 | 0.74 cents |
| tiny  | 7.9e-03 | 2.54 cents |

Decoder parity vs torchcrepe (median |Δ| in cents): `weighted_argmax` ≈ 3.3,
`argmax` ≈ 6.3, `viterbi` ≈ 6.0.

## Benchmarks (Apple Silicon)

Per-frame time = one 10 ms hop of audio. "×RT" > 1 means faster than realtime.

**Full model**

| batch | CoreML per-frame | CoreML ×RT | Torch (CPU) per-frame | Torch ×RT |
|------:|-----------------:|-----------:|----------------------:|----------:|
| 1     | 2.99 ms | 3.3× | 6.84 ms | 1.5× |
| 10    | 0.49 ms | 20.6× | 5.73 ms | 1.7× |
| 100   | 0.30 ms | 33.4× | 3.99 ms | 2.5× |
| 500   | 0.38 ms | 26.3× | 4.82 ms | 2.1× |

**Tiny model**

| batch | CoreML per-frame | CoreML ×RT | Torch (CPU) per-frame | Torch ×RT |
|------:|-----------------:|-----------:|----------------------:|----------:|
| 1     | 0.32 ms | 30.9× | 3.04 ms | 3.3× |
| 10    | 0.19 ms | 52.1× | 0.86 ms | 11.6× |
| 100   | 0.16 ms | 63.8× | 0.51 ms | 19.6× |
| 500   | 0.15 ms | 65.4× | 0.44 ms | 22.6× |

Single-frame latency: **full ≈ 3.0 ms**, **tiny ≈ 0.31 ms**. Batch as many
frames as latency allows for the best throughput. Regenerate these numbers with:

```bash
python validate_and_benchmark.py --capacity full
python validate_and_benchmark.py --capacity tiny
```

## ANE notes

- Exported as an **`mlprogram`** with **float16** precision and
  `ComputeUnit.ALL`, which lets the runtime schedule work on the ANE.
- A **fixed batch size** (`--fixed-batch`) is the most ANE-friendly layout;
  flexible `RangeDim` shapes may fall back to GPU/CPU for some ops.
- Verify actual placement with **Xcode → Performance report** (Instruments'
  Core ML template) on a `.mlpackage`.

## Differences from torchcrepe

- **Torch-free at runtime** — arrays are numpy, the model runs via CoreML.
- **`embed()` is not supported** — the CoreML model outputs pitch-bin
  probabilities only. Re-export with an embedding output if you need it.
- **`weighted_argmax` is the default decoder** (torchcrepe defaults to
  `viterbi`), so no librosa is required out of the box. Pass
  `decoder=coremlcrepe.decode.viterbi` to match torchcrepe's default.
- Unvoiced pitch is represented as `NaN` (`coremlcrepe.UNVOICED`).
