Metadata-Version: 2.4
Name: will0w-musicdl
Version: 0.2.0
Summary: CLI music downloader with playlist provider support and source fallbacks
Author: musicdl contributors
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: yt-dlp>=2025.2.19
Requires-Dist: requests>=2.32.3
Requires-Dist: mutagen>=1.47.0

# music-downloader

`musicdl` is a CLI for resolving playlist tracks and downloading matching audio without provider-specific API keys. It is built around three ideas: resolve playlists from multiple input types, score multiple public search results per song, and keep on-disk state consistent across repeated runs.

## What It Does

- Resolves playlists from Spotify URLs, generic supported playlist URLs, or local `.txt` files.
- Downloads a single song by artist and title without needing a playlist.
- Downloads one `.mp3` per discovered song into a chosen output folder.
- Skips songs already present in the output folder or already recorded in logs.
- Persists download, failure, and removal state across runs.
- Rebuilds log state from the real contents of the output directory on every run.

## How It Works

The CLI flow is:

1. Parse CLI arguments and create the output directory.
2. Sync log files with the current `.mp3` files on disk.
3. Resolve the input into a normalized `Playlist` of `Song` records.
4. For each song, try multiple public search backends until one candidate is accepted.
5. Download the chosen source with `yt-dlp`, rename the file, and update logs.

For a deeper code-level overview, see `docs/architecture.md`.

## Requirements

- Python 3.10+
- `yt-dlp` available in `PATH`
- Network access to the source platform being resolved

## Installation

Install from PyPI:

```bash
pip install will0w-musicdl
```

Or install in editable mode for development:

```bash
pip install -e .
```

This exposes the `musicdl` command defined in `pyproject.toml`.

## CLI Usage

Basic usage:

```bash
musicdl [url-or-path]
```

Typical examples:

```bash
musicdl "https://open.spotify.com/playlist/..." -o ~/Music/MyPlaylist
musicdl "https://music.youtube.com/playlist?..." -o ./downloads
musicdl ./my-songs.txt -o ./downloads
```

Download a single song by artist and title:

```bash
musicdl --artist 'Daft Punk' --title 'One More Time' -o ~/Music
musicdl --artist 'Daft Punk' --title 'One More Time' --album 'Discovery' -o ~/Music
```

Download a single song from a direct URL:

```bash
musicdl 'https://www.youtube.com/watch?v=FGBhQbmPwH8' -o ~/Music
musicdl 'https://soundcloud.com/artist/track' -o ~/Music
```

Provide metadata when downloading from a URL (overrides provider metadata for the supplied fields):

```bash
musicdl 'https://www.youtube.com/watch?v=FGBhQbmPwH8' --artist 'Daft Punk' --title 'One More Time' --album 'Discovery' -o ~/Music
```

Interactively search for a song and pick which result to download:

```bash
musicdl --search 10 --artist 'Daft Punk' --title 'One More Time' -o ~/Music
musicdl --search 5 --title 'feminine urge'
musicdl --search 8 --artist 'Radiohead'
```

This displays a numbered table of candidates from YouTube Music, YouTube, and SoundCloud. Enter a number to download that result, or `q` to quit.

If you rerun a playlist against an existing output folder, matching `.mp3` files are
not redownloaded. Instead, `musicdl` uses the resolved playlist song metadata to
backfill only missing tags such as album, track number, release date, ISRC,
artwork, and genres.

Retry only previously failed songs for an output folder:

```bash
musicdl --download-failed -o ./downloads
```

Export metadata for all downloaded songs in a folder:

```bash
musicdl --metadata-folder ./downloads
musicdl --metadata-folder ./downloads --metadata-output ./downloads/all-metadata.json
```

Write artist/title tags into existing MP3 files in a folder:

```bash
musicdl --tag-folder ./downloads
musicdl --tag-folder ./downloads --tag-metadata ./downloads/metadata.json
```

If `--tag-metadata` is omitted, `musicdl` will automatically try `metadata.json`
and then `downloaded.json` inside the target folder. If neither exists, it will
scan existing tags/filenames and then attempt online enrichment by artist/title
to fill album, release date, artwork, source URL, and genres when a strong
catalog match is found.

Process a limited number of songs while testing matcher changes:

```bash
musicdl "https://open.spotify.com/playlist/..." --max-songs 10 -o ./downloads
```

### Arguments

- `playlist`: Playlist URL, single song URL, or path to a `.txt` playlist file.
- `-o`, `--output`: Destination folder for downloaded files and log files. Defaults to the current directory.
- `--max-songs`: Optional cap for the current run. `0` means no cap.
- `--download-failed`: Retry only songs currently listed in `failed.json` for the chosen output folder.
- `--metadata-folder`: Scan all `.mp3` files in a folder and export metadata as JSON.
- `--metadata-output`: Optional output path for metadata JSON. Defaults to `metadata.json` inside `--metadata-folder`.
- `--tag-folder`: Write ID3 artist/title tags for all `.mp3` files in a folder.
- `--tag-metadata`: Optional metadata JSON input used by `--tag-folder` to also apply album/track fields.
- `--artist`: Artist name for single-song download, or metadata override when used with a URL. Requires `--title` when used without a URL.
- `--title`: Track title for single-song download, or metadata override when used with a URL. Requires `--artist` when used without a URL.
- `--album`: Optional album name. Can be used with `--artist`/`--title` or with a URL to set the album field.
- `--search N`: Search for N candidates and interactively pick one to download. Requires at least `--artist` or `--title`. Cannot be combined with a playlist or `--download-failed`.

## Supported Inputs

### Single song by artist and title

A single song can be downloaded by specifying `--artist` and `--title`. The optional `--album` flag attaches album metadata to the resulting file. This bypasses playlist resolution entirely and feeds the song straight into the download engine.

### Single song URL

A direct URL to a song (e.g. a YouTube video or SoundCloud track) can be passed as the positional argument. The URL is resolved through the same provider pipeline as playlists. When the provider returns a single track, it is downloaded directly from that URL rather than searching for it. The `--artist`, `--title`, and `--album` flags can be combined with a URL to override the metadata extracted by the provider — useful when the video title or uploader name doesn't match the actual song.

### Spotify playlists

Spotify playlists are resolved through a native provider that uses Spotify's public web surfaces. The implementation supports pagination and avoids the common 100-track truncation issue.

### Generic URLs

Other URLs (playlists or single tracks) are resolved through `yt-dlp --flat-playlist --dump-single-json`. This covers platforms that `yt-dlp` already knows how to inspect. When the URL points to a single track, the metadata is extracted from the top-level response and the song is downloaded directly from the source URL.

### Text playlists

The `.txt` provider accepts one song per line in `Artist - Title` format:

```txt
Artist One - Song One
Artist Two - Song Two
# comments are ignored
```

Blank lines and comment lines are ignored.

## Download and Matching Strategy

After a playlist is resolved, each song is searched across multiple backends in this order:

1. YouTube Music style query (`ytsearch5` with a `topic` suffix)
2. Standard YouTube search (`ytsearch5`)
3. SoundCloud search (`scsearch3`)

Candidate scoring attempts to prefer track-like uploads and reject obviously wrong variants. Examples of protected cases include:

- club mix, acoustic, remaster, remix, karaoke, nightcore, and similar variants
- preview URLs and podcast-like results
- music videos when a cleaner topic/audio source is available
- wrong franchise/theme matches for generic titles such as `Theme (From ...)`

The engine stops on the first backend that yields an accepted download.

Detailed matcher regression history lives in `docs/song-regression-playbook.md`.

## Output Files and Persistent State

Each output directory contains the audio files plus JSON state files:

- `downloaded.json`: songs currently known as downloaded
- `failed.json`: songs that failed all attempted sources
- `removed.json`: songs that were once downloaded but no longer exist on disk

### State behavior

- If a song downloads successfully, it is upserted into `downloaded.json`.
- Both `downloaded.json` and `failed.json` persist the song source URL when the provider exposes it.
- Successful downloads write ID3 tags from provider metadata (`artist`, `title`, `album`, `tracknumber`, `date`, `genre`, `ISRC`, and cover art when available).
- If you rerun the same playlist into the same folder, songs that are already present are matched from disk and have only their missing tags backfilled from the playlist metadata.
- If a song fails all search/download attempts, it is recorded in `failed.json`.
- If a file exists in the output folder but is missing from the logs, it is reconstructed into `downloaded.json` when the filename matches `Artist - Title.mp3`.
- If a file was previously tracked in `downloaded.json` but no longer exists on disk, it is moved into `removed.json`.

This design keeps the output folder as the source of truth rather than trusting stale logs.

## Operational Notes

- No platform API keys are required.
- Private, missing, geo-restricted, or paywalled playlists are surfaced as explicit errors when detectable.
- Actual audio correctness still depends on public search quality and available metadata.
- Manually added files are only mapped automatically when filenames follow `Artist - Title.mp3`.

## Cron Automation

This repo includes `music-cron.sh`, a project-local shell script that activates the virtual environment and runs several curated playlist sync jobs. It is intentionally simple and relies on absolute paths so it can be called from `cron` without inheriting a shell session.

If you adapt it for another machine, update:

- the repository path
- the virtual environment path
- the playlist URLs
- the output directories

## Project Layout

```text
src/musicdl/
	cli.py                  CLI entrypoint and orchestration
	fs.py                   state file loading/saving/synchronization
	types.py                core dataclasses and normalization helpers
	downloader/engine.py    song search, candidate scoring, downloads
	providers/              playlist input resolvers
tests/
	test_engine_matching.py matcher regression coverage
	test_fs_logs.py         output-state and log synchronization tests
	test_txt_provider.py    text playlist parsing tests
docs/
	architecture.md         component and data-flow overview
	song-regression-playbook.md
```

## Development

Run the full test suite:

```bash
pytest
```

Recommended workflow for matcher changes:

1. Read `docs/song-regression-playbook.md`.
2. Update or add focused matcher tests.
3. Run `pytest`.
4. Manually sanity-check one or two representative songs.

## Known Limitations

- Filename-based reconstruction is intentionally conservative and depends on a stable naming convention.
- The downloader produces `.mp3` output and does not currently expose format configuration through the CLI.
- Provider coverage for non-Spotify URLs depends entirely on `yt-dlp` extractor support.

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for a full list of changes in each version.
