Metadata-Version: 2.4
Name: eluate
Version: 1.0.0
Summary: Remove background music from videos - for accessibility and personal use
Author: borderedprominent
License-Expression: MIT
Project-URL: Homepage, https://github.com/borderedprominent/ELUATE
Project-URL: Repository, https://github.com/borderedprominent/ELUATE
Project-URL: Issues, https://github.com/borderedprominent/ELUATE/issues
Project-URL: Documentation, https://github.com/borderedprominent/ELUATE/blob/main/docs/api.md
Project-URL: Changelog, https://github.com/borderedprominent/ELUATE/blob/main/CHANGELOG.md
Keywords: video,audio,music-removal,documentary,ai,bandit,accessibility,audio-processing,source-separation
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: End Users/Desktop
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Analysis
Classifier: Topic :: Multimedia :: Video
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: rich>=13.0.0
Requires-Dist: torch>=2.1.0
Requires-Dist: torchaudio>=2.1.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: scipy>=1.11.0
Requires-Dist: librosa>=0.10.0
Requires-Dist: soundfile>=0.12.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: einops>=0.7.0
Requires-Dist: rotary-embedding-torch>=0.5.0
Requires-Dist: transformers>=4.35.0
Requires-Dist: huggingface-hub>=0.23.0
Requires-Dist: omegaconf>=2.2.0
Requires-Dist: ml-collections>=1.0.0
Requires-Dist: tqdm>=4.60.0
Requires-Dist: beartype>=0.14.0
Requires-Dist: spafe>=0.3.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: ruff>=0.4.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Provides-Extra: cpu
Requires-Dist: torch>=2.1.0; extra == "cpu"
Requires-Dist: torchaudio>=2.1.0; extra == "cpu"
Provides-Extra: train
Requires-Dist: wandb>=0.15.0; extra == "train"
Requires-Dist: auraloss>=0.4.0; extra == "train"
Requires-Dist: pytorch-lightning>=2.0.0; extra == "train"
Requires-Dist: accelerate>=0.25.0; extra == "train"
Requires-Dist: loralib>=0.1.0; extra == "train"
Dynamic: license-file

# ELUATE

As a muslim, I've experienced this friction many times: the
documentary I want to watch, the lecture I want to learn from, the
long-form video I want to understand, almost always comes with a
score running underneath. One day I sat down to watch a documentary,
the soundtrack was on top of every scene, and I went looking for a
tool that would just take the video and hand me back the same video
with the music gone. Audio-only stem splitters, GUIs aimed at music
producers, cloud services that wanted my upload: none of them did the
simple thing.

ELUATE is the simple thing. You give it a video file, it gives you
back a video file, the music is gone, the dialogue and sound effects
stay, and the video stream is copied through untouched so the picture
is bit-for-bit unchanged.

From the terminal:

```bash
eluate documentary.mp4
# → ~/Documents/ELUATE/documentary_eluted.mp4
```

From Python, the same engine behind a small stable API, so you can
wire ELUATE into something larger (a content pipeline, a batch job,
your own project):

```python
import eluate

eluate.elute("documentary.mp4")
```

Under the hood it's an AI source-separation model
([Bandit v2](https://arxiv.org/abs/2407.07275)) running locally on
your machine. It splits the audio into speech, music, and sfx,
discards the music stem, mixes the other two back together, and
remuxes the new audio into a copy of the video. The video stream is
never re-encoded.

I built it for muslims who run into the same friction trying to
learn, research, or produce content. The CLI is for one-off cleanup,
the API is for building larger things on top. Other people have
related reasons: hearing loss where the mix fights the narration,
focus or auditory-processing conditions where the score becomes
another competing stream you have to filter out. If that's you too,
you're welcome here. The tool is generic, anyone can use it.

It's a one-maintainer project, and I'd rather you treat it as a
useful tool you can fork than as infrastructure. If a fellow muslim
dev takes what I've started here and makes it better than I could,
that's honestly the outcome I'd be happiest with.

### Why this and not something else

Most OSS in this space comes from music production, where the goal is
to split a song into vocals, drums, bass, and other. That's the
four-stem split you'll find in Demucs, in UVR5, in
python-audio-separator. For cinematic content (documentaries,
lectures, sermons, podcasts) that split is the wrong shape. What you
actually want is speech, music, and sound effects. ELUATE is built
around that three-stem split because that's what fits the use case.

The model behind it is [Bandit v2](https://arxiv.org/abs/2407.07275),
a research-grade separator from ICASSP 2025. It isn't shipped by
default in UVR5, python-audio-separator, or Demucs, so as far as I
know this is the only place you'll find it pre-wired into a
video-in/video-out tool. ELUATE fetches the checkpoint and the
config for you on first run; you don't have to chase them down.

The other thing I cared about was long files. Most separation tools
load the whole audio track into RAM, which is fine for a song and
rough for a feature-length documentary. ELUATE uses a fixed-size
ring buffer instead, processing the audio in a moving window without
ever holding the full track in memory. On an 84-minute documentary
on my M4 with 24 GB, that meant 19 GB of peak memory for ELUATE
versus 41 GB for the upstream reference, which only finished because
macOS swapped about 20 GB to disk. About 2.15× less in practice.
Numbers and methodology in [`docs/bench/`](https://github.com/borderedprominent/ELUATE/blob/main/docs/bench/).

### What ELUATE isn't

Before you install it, a few things it doesn't do, so you don't find
out the wrong way.

It's terminal only. There's no GUI and no drag-and-drop installer. If
that's a dealbreaker, [UVR5](https://github.com/Anjok07/ultimatevocalremovergui)
is probably what you want for audio, or one of the cloud tools for
one-shot video.

It isn't faster than [Demucs](https://github.com/facebookresearch/demucs).
I optimised the memory path for long files, not raw throughput, and
you'll feel that on short inputs where Demucs finishes first.

It doesn't take YouTube URLs, doesn't process anything in the cloud,
and doesn't parallelise across machines. One local file at a time,
or a folder processed sequentially.

It also doesn't give you the stems. ELUATE's job is to drop the
music and hand back a video; if you want an isolated-stem export,
that's a different tool.

It isn't easier to install than a signed binary. You'll need a
terminal, `brew`, and `git`. If that's a barrier, the
[alternatives](#alternatives) section at the bottom lists tools
that are.

## Install

macOS on Apple Silicon is the only platform I actually develop and
benchmark on. CUDA and CPU paths exist in the code but I haven't
tested them; treat them as experimental. If you're on Linux or
Windows, the [alternatives](#alternatives) section at the bottom
of this README will get you there faster than waiting for me.

The first run downloads a ~450 MB model checkpoint from
[Zenodo](https://zenodo.org/records/12701995) into `~/.eluate/`.
It only happens once.

The quickest path on macOS:

```bash
git clone --recursive https://github.com/borderedprominent/ELUATE.git
cd eluate
./scripts/install.sh
```

This creates a venv at `~/.eluate/venv`, installs ELUATE in editable
mode, downloads the default model, and (by default) appends
`export PATH="$HOME/.eluate:$PATH"` to your `~/.zshrc` or `~/.bashrc`
so you can run `eluate` from any terminal.

If you'd rather manage your own PATH:

```bash
./scripts/install.sh --no-rc-edit
```

Open a new terminal after installing (so the PATH change takes effect)
and run:

```bash
eluate info   # verify device, FFmpeg, and model are ready
```

### Manual install

For contributors or anyone who doesn't want a shell installer:

```bash
git clone --recursive https://github.com/borderedprominent/ELUATE.git
cd eluate
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
brew install ffmpeg       # if not already installed
eluate --version
```

The `--recursive` flag is required: ELUATE's separator calls into a
pinned copy of [ZFTurbo's Music-Source-Separation-Training framework]
(https://github.com/ZFTurbo/Music-Source-Separation-Training) as a git
submodule at `vendor/mss-training/`.

## Usage

```bash
eluate                                # interactive mode (prompts for file)
eluate video.mp4                      # single file → ~/Documents/ELUATE/
eluate video.mp4 -o custom.mp4        # custom output path
eluate --checkpoint eng video.mp4     # use the English-optimised model
eluate --batch files.txt              # process paths from a list file
eluate --folder /path/to/videos       # process every video in a folder
eluate --folder ./videos --batch-size 10   # folder in batches of 10
eluate info                           # system / model status
eluate video.mp4 --device cpu         # force CPU, skip MPS
eluate video.mp4 --force              # skip duration / disk-space checks
```

Supported input containers: `mp4`, `mkv`, `avi`, `mov`, `webm`, `flv`,
`wmv`, `m4v`.

### Model

ELUATE ships a single model: **Bandit v2**, CC-BY-SA 4.0, 48 kHz.
CC-BY-SA permits commercial use under share-alike terms; see
[Licensing](#licensing) for the caveats before shipping anything
commercial.

Bandit v2 ships per-language checkpoints (`multi`, `eng`, `deu`, `fra`,
`spa`, `cmn`, `fao`). The default is `multi`. Swap with
`--checkpoint eng` etc.

### Python API

For batch loops or wiring ELUATE into your own code, instantiate a
`Session` so the model loads once and is reused:

```python
import eluate

with eluate.Session() as session:
    for path in ["a.mp4", "b.mp4", "c.mp4"]:
        session.elute(path)
```

`eluate.elute()`, `eluate.Session`, `eluate.Result`, and the typed
exception hierarchy under `eluate.EluateError` form the entire v1.x
public surface; semver applies to those names only. Full reference,
including progress callbacks and stem-only outputs, in
[`docs/api.md`](https://github.com/borderedprominent/ELUATE/blob/main/docs/api.md).

## How it works

```
 input.mp4 ─┬─▶ ffmpeg audio extract ─▶ 48 kHz WAV
            │
            │                            Bandit v2 (streaming demix)
            │                                  │
            │                    ┌─────────────┼─────────────┐
            │                    ▼             ▼             ▼
            │                  speech        music          sfx
            │                    │           drop            │
            │                    └────────── mix ────────────┘
            │                                │
            └─▶ ffmpeg mux  ◀─────  new audio track (speech + sfx)
                   │
                   ▼
           output_eluted.mp4   (video stream copied as-is)
```

The separator uses a **streaming demix** path: a fixed-size ring buffer
and virtual padded-chunk construction so the full audio track is never
held in memory. On an 84-minute documentary this uses ~2.15× less peak
memory than the vendor's batched `demix()`, which swaps 20 GB to disk
on a 24 GB Mac. See [`docs/bench/`](https://github.com/borderedprominent/ELUATE/blob/main/docs/bench/) for the plain-language
summary, full methodology in
[`memory-benchmark.md`](https://github.com/borderedprominent/ELUATE/blob/main/docs/bench/memory-benchmark.md), and a
windowing bug this benchmarking uncovered and fixed.

## Configuration and data

ELUATE writes data to two places, both under your home directory:

| Path | What lives there |
|---|---|
| `~/.eluate/models/` | Downloaded model checkpoints (~450 MB each), configs |
| `~/.eluate/venv/` | Python virtual env (created by `install.sh`) |
| `~/.eluate/telemetry.jsonl` | Local debug log (only if you enable it) |
| `~/Documents/ELUATE/` | Processed output videos |

Nothing is sent over the network after the initial model download.

### Local debug log (off by default)

ELUATE can write a local JSONL log of processing stages (wall time,
peak memory, MPS allocation) to help debug performance issues. **The
log never leaves your machine.** It's a plain file you can read,
delete, or ignore.

Telemetry is **off by default.** Enable it when you want a paper trail:

```bash
ELUATE_TELEMETRY=1 eluate video.mp4
```

The log contains wall time, peak RSS, and MPS memory at each stage
boundary; nothing that identifies you or your files. Delete it any
time.

## Security

The Bandit v2 checkpoint format is a Python pickle. `torch.load` has to
be called with `weights_only=False` to deserialize it, which means
loading a checkpoint **executes arbitrary pickled code from that file**.
This is a known caveat of running PyTorch checkpoints from any source.

ELUATE mitigates the risk with two belts:

1. **Pinned source.** Model downloads only come from the specific Zenodo
   record encoded in [`eluate/utils/paths.py`](eluate/utils/paths.py)
   (`12701995` for Bandit v2). The URL is not user-configurable.
2. **SHA256 verification** against `CHECKPOINT_SHA256` in the same file.
   A mismatched download is rejected before it's loaded.

The default Bandit v2 `multi` checkpoint has a verified digest recorded.
The other language variants (`eng`, `deu`, `fra`, `spa`, `cmn`, `fao`)
currently download without integrity checks; the CLI prints a clear
warning when this is the case. Digests will be populated after a
known-good download and review.

If you don't trust this chain, don't run ELUATE. The threat model here
is "Zenodo or the install pipeline is compromised"; against that, only
a signed release helps, which we don't have yet.

## Development

```bash
pip install -e .[dev]
pytest
ruff check .
```

The test suite has a 35 % coverage floor (enforced in CI on macOS across
Python 3.10 / 3.11 / 3.12). Most tests are offline and fast. Tests that
exercise the actual model (numerical parity against the real checkpoint)
are gated behind `ELUATE_RUN_MODEL_TESTS=1` so CI doesn't try to
download 450 MB of weights.

```bash
ELUATE_RUN_MODEL_TESTS=1 pytest tests/test_core_separator_streaming.py
```

An end-to-end test that drives `EluatePipeline.process()` on a tiny
synthetic video fixture is at `tests/test_pipeline_e2e.py` and runs
without model weights (it stubs the separator and uses real FFmpeg).

## Licensing

ELUATE is a personal tool, not an enterprise product. Read this
section before using it for anything beyond personal viewing.

- **ELUATE's own code**: MIT (see [`LICENSE`](LICENSE)).
- **Bandit v2 model weights**: CC-BY-SA 4.0, from
  [Zenodo 12701995](https://zenodo.org/records/12701995). CC-BY-SA
  permits commercial use under **share-alike** terms: any derivative
  work you distribute under these weights must itself be licensed
  CC-BY-SA. That clause is operationally hostile to a lot of commercial
  software (it arguably extends to downstream derivative works).
  **Consult a lawyer before shipping a commercial product built on
  ELUATE.** The README can't and doesn't give legal advice.
- **Vendored separator framework** at `vendor/mss-training/`: ZFTurbo's
  [Music-Source-Separation-Training](https://github.com/ZFTurbo/Music-Source-Separation-Training),
  MIT-licensed.

## Project risks

ELUATE carries real single-points-of-failure you should know about
before building anything on top of it:

- **Upstream model maintenance.** The [Bandit v2 research repo]
  (https://github.com/kwatcharasupat/bandit-v2) has a small number of
  commits and no ongoing release cadence. If the author moves on, the
  model itself won't get fixes.
- **Vendored separator framework.** ELUATE calls into
  [ZFTurbo/Music-Source-Separation-Training](https://github.com/ZFTurbo/Music-Source-Separation-Training)
  via a git submodule pinned to a specific SHA. Upstream inference-API
  changes would require ELUATE to track or fork.
- **Forked model config.** `eluate/configs/bandit_v2.yaml` ships an
  explicit copy of the Bandit v2 architecture hyperparameters (band
  count, RNN dim, etc.). If upstream releases a Bandit v2.1 with a
  different architecture, ELUATE's config will be silently incompatible
  with the new checkpoint; you'll get a state-dict shape mismatch, not
  a friendly error. Checkpoint downloads are pinned to the Zenodo record
  currently compatible with this config, so existing installs keep
  working; only voluntarily pointing at a new upstream release would
  trigger this.
- **One-maintainer project.** Treat it as a useful personal tool that
  you can fork if it stops being maintained, not as infrastructure.
- **Niche model.** Bandit v2 isn't shipped by default in UVR5,
  `python-audio-separator`, or Demucs, so there's no adjacent
  community to rely on for ecosystem-level fixes.

The MIT license and local-only design mean you can always fork and
keep running what works for you. That's intentional.

## Alternatives

If ELUATE isn't the right fit, use the right tool instead:

- **[asaah18/video-music-remover](https://github.com/asaah18/video-music-remover)**:
  closest direct competitor. Video in, video out, CLI, uses
  [Demucs](https://github.com/facebookresearch/demucs). Ship it if you
  specifically want Demucs or the 4-stem music-production split.
- **[UVR5 (Ultimate Vocal Remover GUI)](https://github.com/Anjok07/ultimatevocalremovergui)**:
  the de-facto GUI. Audio-only (you'd still need to ffmpeg-remux).
  Signed Win/Mac/Linux bundles, huge community. Best choice for
  non-technical users on a desktop.
- **[python-audio-separator](https://github.com/nomadkaraoke/python-audio-separator)**:
  maintained CLI/library successor for the UVR model collection
  (MDX-Net, Demucs, BS-RoFormer, etc.). Power-user-friendly.
- **[Demucs](https://github.com/facebookresearch/demucs)**: the
  research-grade reference for music source separation. Parent repo is
  archived; active development is on the
  [adefossez fork](https://github.com/adefossez/demucs).
- **Cloud services** (MVSEP, Lalal.ai, vocalremover.org, etc.):
  if you'd rather not install anything and are comfortable uploading
  your audio. Usually the fastest path for a non-technical user.

ELUATE is specifically for the case where you want **video-in /
video-out / local / long-form / cinematic 3-stem**. If your needs are
different, one of the above is probably better.

## Contributing

Open an issue or PR at
<https://github.com/borderedprominent/ELUATE/issues>.

For bug reports, running with the local debug log enabled helps a lot:

```bash
ELUATE_TELEMETRY=1 eluate your-video.mp4
# then attach ~/.eluate/telemetry.jsonl (it never leaves your machine until you share it)
```

## Acknowledgements

- [Karn N. Watcharasupat, Chih-Wei Wu, and Iroro Orife](https://arxiv.org/abs/2407.07275)
  for the Bandit v2 architecture and the Divide-and-Remaster v3 dataset.
- [ZFTurbo](https://github.com/ZFTurbo) for the
  Music-Source-Separation-Training framework that ELUATE's separator is
  built on.
