Metadata-Version: 2.4
Name: audio-arrange
Version: 0.1.0
Summary: Declarative timeline-based multi-track audio mixing for voice, music, and SFX.
Project-URL: Homepage, https://github.com/opusmorale/audio-arrange
Project-URL: Repository, https://github.com/opusmorale/audio-arrange
Project-URL: Issues, https://github.com/opusmorale/audio-arrange/issues
Author-email: Trollfabriken AITrix AB <dev@trollfabriken.se>
License-Expression: MIT
License-File: LICENSE
Keywords: audio,mixing,numpy,podcast,soundfile,timeline
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio
Classifier: Topic :: Multimedia :: Sound/Audio :: Mixers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24
Requires-Dist: pydantic>=2.5
Requires-Dist: scipy>=1.10
Requires-Dist: soundfile>=0.12
Requires-Dist: tomli>=2.0; python_version < '3.11'
Provides-Extra: all
Requires-Dist: pedalboard>=0.9; extra == 'all'
Requires-Dist: pyloudnorm>=0.1.1; extra == 'all'
Requires-Dist: voice-duck>=0.1; extra == 'all'
Provides-Extra: dev
Requires-Dist: build; extra == 'dev'
Requires-Dist: pytest-cov>=4; extra == 'dev'
Requires-Dist: pytest>=7; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Requires-Dist: tqdm; extra == 'dev'
Provides-Extra: duck
Requires-Dist: voice-duck>=0.1; extra == 'duck'
Provides-Extra: lufs
Requires-Dist: pyloudnorm>=0.1.1; extra == 'lufs'
Provides-Extra: mp3
Requires-Dist: pedalboard>=0.9; extra == 'mp3'
Description-Content-Type: text/markdown

# audio-arrange

Declarative timeline-based multi-track audio mixing for voice, music, and SFX.

Built from the MusicVideoCreator/CineForge pipeline at Trollfabriken AITrix AB where ffmpeg
`amix` filter graphs became unmaintainable past 5 tracks. The library decodes once with
`soundfile`, does all mixing as numpy operations on pre-aligned float32 arrays, and encodes
once at the end. Renders a 5-minute, 6-track project in under 1.5 seconds.

---

## What it solves

| Previous problem | Solution |
| --- | --- |
| Mixing 6 tracks with pydub takes 25+ seconds | Single-pass numpy mix engine; same job in <1.5s |
| ffmpeg `amix` filter graphs unreadable past 5 tracks | Declarative `timeline.add(clip, track, at=...)` API |
| pydub crossfades click on non-zero-crossing boundaries | Equal-power crossfade with frame-aligned arithmetic |
| No clean way to do voice-over ducking in Python | First-class `timeline.duck(target, trigger)` |
| LUFS normalization requires a separate ffmpeg-normalize pass | `timeline.normalize_lufs(-16)` integrated into render |
| Clipping when summing many tracks | Auto-headroom + optional tanh soft clip |

---

## Installation

```bash
pip install audio-arrange
```

Optional extras:

```bash
pip install "audio-arrange[mp3]"      # MP3 decode via pedalboard (no ffmpeg subprocess)
pip install "audio-arrange[lufs]"     # LUFS normalization via pyloudnorm
pip install "audio-arrange[duck]"     # Production-grade voice ducking via voice-duck
pip install "audio-arrange[all]"      # Everything above
```

---

## Quick start

```python
from audio_arrange import Timeline, Clip, RenderConfig

tl = Timeline(sample_rate=48000, channels=2)

# Load clips from disk
voice = Clip("narration.wav")
music = Clip("bed.flac", start_offset=4.0)   # skip 4s intro
sfx   = Clip("transition.wav")

# Place clips on named tracks
tl.add(voice, track="voice", at=0.0, fade_in=0.05, fade_out=0.1)
tl.add(music, track="music", at=0.0, gain_db=-6.0)
tl.add(sfx,   track="sfx",   at=12.5, gain_db=-3.0, fade_out=0.3)

# Duck music under voice — no separate pass needed
tl.duck(target="music", trigger="voice", reduction_db=-12.0)

# Normalize to podcast loudness target
tl.normalize_lufs(-16.0)

# Render to file — returns the path written
out = tl.render("episode_01.wav", bit_depth=16)
print(out)  # PosixPath('episode_01.wav')
```

---

## The pipeline

```
  ┌─────────────────────────────────────────────────────────────────┐
  │  Clip loading                                                   │
  │  ① soundfile.read() → float32 (frames, channels)               │
  │  ② resample_poly if sample_rate != timeline.sample_rate         │
  └────────────────────────────┬────────────────────────────────────┘
                               │
  ┌────────────────────────────▼────────────────────────────────────┐
  │  Timeline.add()                                                 │
  │  ③ Record Event(clip, track, at, gain_db, fade_in, fade_out)    │
  │     Nothing rendered yet — fully declarative                    │
  └────────────────────────────┬────────────────────────────────────┘
                               │
  ┌────────────────────────────▼────────────────────────────────────┐
  │  mixer.mix()  — single-pass on .render() call                   │
  │  ④ Allocate zero buffer at target length                        │
  │  ⑤ For each event: buffer[start:end] += gain * samples          │
  │  ⑥ Apply crossfades (opposing equal-power curves in-place)      │
  └────────────────────────────┬────────────────────────────────────┘
                               │
  ┌────────────────────────────▼────────────────────────────────────┐
  │  Effects chain                                                  │
  │  ⑦ Duck envelope follower (RMS or voice-duck if installed)      │
  │  ⑧ LUFS normalization (pyloudnorm if installed)                 │
  │  ⑨ Auto-headroom → optional tanh soft clip → TPDF dither        │
  └────────────────────────────┬────────────────────────────────────┘
                               │
  ┌────────────────────────────▼────────────────────────────────────┐
  │  Writer                                                         │
  │  ⑩ soundfile.write() → WAV / FLAC / OGG at chosen bit depth    │
  └─────────────────────────────────────────────────────────────────┘
```

---

## Configuration

```python
from audio_arrange import RenderConfig

config = RenderConfig(
    sample_rate=48000,    # output sample rate; clips resampled on load
    channels=2,           # 1 = mono, 2 = stereo
    headroom_db=-1.0,     # peak ceiling before final clip; prevents intersample clipping
    soft_clip=True,       # tanh curve instead of hard clip; preserves transient shape
    dither=True,          # TPDF dither for bit-depth reduction (16-bit outputs)
    progress=False,       # show tqdm bar during render (useful for long projects)
)

out = tl.render("episode.wav", bit_depth=16, config=config)
```

| Field | Type | Default | Description |
| --- | --- | --- | --- |
| `sample_rate` | `int` | `48000` | Output sample rate in Hz |
| `channels` | `int` | `2` | Channel count; mono clips are upmixed |
| `headroom_db` | `float` | `-1.0` | Peak limit applied before encode |
| `soft_clip` | `bool` | `True` | tanh soft clip instead of hard truncation |
| `dither` | `bool` | `True` | TPDF dither for 16/24-bit renders |
| `progress` | `bool` | `False` | tqdm progress bar during render pass |

---

## Output format

`timeline.render(path)` writes a single audio file and returns the resolved `Path`. Format
is inferred from the extension: `.wav`, `.flac`, `.ogg` are all supported natively via
`soundfile`. For `.mp3` output, install the `[mp3]` extra.

There is no JSON sidecar, no metadata file, and no intermediate temp file. The output is
written in one `soundfile.write()` call after the entire mix buffer is assembled in memory.

---

## Testing without files

All clips can be built from numpy arrays. No disk access required.

```python
import numpy as np
from audio_arrange import Timeline, Clip

SR = 48000

# Synthesise 10 seconds of voice-like noise
voice_samples = np.random.randn(SR * 10, 2).astype(np.float32) * 0.3

# Synthesise a 440 Hz music bed
t = np.linspace(0, 10, SR * 10, endpoint=False)
music_samples = (np.sin(2 * np.pi * 440 * t)[:, None] * np.ones((1, 2))).astype(np.float32) * 0.2

voice_clip = Clip(voice_samples, sample_rate=SR)
music_clip = Clip(music_samples, sample_rate=SR)

tl = Timeline(sample_rate=SR, channels=2)
tl.add(voice_clip, track="voice", at=0.0)
tl.add(music_clip, track="music", at=0.0, gain_db=-6.0)
tl.duck(target="music", trigger="voice")

# Render to array — no file I/O at all
samples, sr = tl.render_to_array()
assert samples.shape == (SR * 10, 2)
assert sr == SR
```

---

## CLI

```bash
# Minimum voice + music with auto-ducking and podcast loudness target
audio-arrange \
    --voice narration.wav \
    --music bed.mp3 \
    --duck \
    --target-lufs -16 \
    --output episode_01.wav

# Manifest-driven arrangement for complex multi-track projects
audio-arrange --manifest episode.toml --output episode_01.wav

# Manifest with loudness override at render time
audio-arrange --manifest episode.toml --target-lufs -14 --output youtube_cut.wav

# Inspect a manifest without rendering (dry run)
audio-arrange --manifest episode.toml --dry-run

# Render to FLAC at 24-bit
audio-arrange --manifest episode.toml --output episode_01.flac --bit-depth 24
```

---

## Package structure

```
src/audio_arrange/
├── __init__.py             ← version + public re-exports (Timeline, Clip, RenderConfig)
├── timeline.py             ← Timeline class; orchestrates add/duck/crossfade/render
├── clip.py                 ← Clip class; lazy soundfile load, numpy mmap when possible
├── models.py               ← Pydantic v2: Track, Event, RenderConfig
├── config.py               ← RenderConfig alias (ergonomics import)
├── mixer.py                ← pure-numpy single-pass mix engine
├── manifest.py             ← TOML manifest parser → Timeline
├── cli.py                  ← argparse CLI entry point
├── utils.py                ← dB/linear conversion, frame helpers
├── effects/
│   ├── __init__.py         ← re-exports crossfade, duck, gain, pan
│   ├── gain.py             ← gain ramps and equal-power fade envelopes
│   ├── pan.py              ← equal-power stereo panning
│   ├── crossfade.py        ← equal-power and linear crossfade curves
│   └── duck.py             ← fallback RMS envelope-follower ducker
├── io/
│   ├── __init__.py         ← re-exports reader, writer, resample
│   ├── reader.py           ← soundfile-backed loader; resamples on read
│   ├── writer.py           ← soundfile-backed writer; dither + format selection
│   └── resample.py         ← scipy.signal.resample_poly wrapper
└── lufs/
    ├── __init__.py         ← re-exports normalize
    └── normalize.py        ← pyloudnorm delegation with clear ImportError guard
```

---

© Trollfabriken AITrix AB — MIT licensed
