Metadata-Version: 2.4
Name: flac-detective
Version: 0.16.1
Summary: Advanced FLAC authenticity analyzer - Detects MP3-to-FLAC transcodes with high precision
Author-email: Guillain d'Erceville <guillain@poulpe.us>
License-Expression: MIT
Project-URL: Homepage, https://github.com/Guillain-RDCDE/FLAC_Detective
Project-URL: Repository, https://github.com/Guillain-RDCDE/FLAC_Detective
Project-URL: Documentation, https://github.com/Guillain-RDCDE/FLAC_Detective/tree/main/docs
Project-URL: Changelog, https://github.com/Guillain-RDCDE/FLAC_Detective/blob/main/CHANGELOG.md
Project-URL: Issues, https://github.com/Guillain-RDCDE/FLAC_Detective/issues
Keywords: flac,audio,analysis,transcode,detection,mp3,quality,authenticity
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio :: Analysis
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.20.0
Requires-Dist: scipy>=1.7.0
Requires-Dist: mutagen>=1.45.0
Requires-Dist: soundfile>=0.10.0
Requires-Dist: rich>=13.0.0
Provides-Extra: ml
Requires-Dist: torch>=2.0; extra == "ml"
Requires-Dist: librosa>=0.10; extra == "ml"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-benchmark>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Requires-Dist: flake8-docstrings>=1.7.0; extra == "dev"
Requires-Dist: flake8-bugbear>=23.0.0; extra == "dev"
Requires-Dist: flake8-comprehensions>=3.14.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: pylint>=2.17.0; extra == "dev"
Requires-Dist: pre-commit>=3.5.0; extra == "dev"
Requires-Dist: bandit>=1.7.0; extra == "dev"
Requires-Dist: interrogate>=1.5.0; extra == "dev"
Requires-Dist: commitizen>=3.0.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=7.0.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=2.0.0; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints>=1.25.0; extra == "docs"
Requires-Dist: myst-parser>=2.0.0; extra == "docs"
Dynamic: license-file

# 🎵 FLAC Detective

![FLAC Detective Banner](https://raw.githubusercontent.com/Guillain-RDCDE/FLAC_Detective/main/assets/flac_detective_banner.png)

[![Python Version](https://img.shields.io/badge/python-3.10%2B-blue)](https://www.python.org/downloads/)
[![PyPI version](https://img.shields.io/pypi/v/flac-detective)](https://pypi.org/project/flac-detective/)
[![PyPI Downloads](https://img.shields.io/pypi/dm/flac-detective)](https://pypi.org/project/flac-detective/)
[![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
[![CI](https://github.com/Guillain-RDCDE/FLAC_Detective/actions/workflows/ci.yml/badge.svg)](https://github.com/Guillain-RDCDE/FLAC_Detective/actions/workflows/ci.yml)
[![Status](https://img.shields.io/badge/status-active--development-brightgreen)](https://github.com/Guillain-RDCDE/FLAC_Detective)
[![codecov](https://codecov.io/gh/Guillain-RDCDE/FLAC_Detective/branch/main/graph/badge.svg)](https://codecov.io/gh/Guillain-RDCDE/FLAC_Detective)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)

**Advanced FLAC Authenticity Analyzer for Detecting MP3-to-FLAC Transcodes**

FLAC Detective is a professional-grade command-line tool that analyzes FLAC audio files to detect MP3-to-FLAC transcodes with high precision. Using spectral analysis, an 11-rule scoring system and an optional CNN classifier, it helps you keep your lossless music collection genuinely lossless.

---

## 🔍 How it works

Transcode an MP3 back to FLAC and the file is lossless *as a container* — but the
audio already went through a lossy codec, and that leaves fingerprints. The clearest
is a **spectral cliff**: MP3 discards everything above a bitrate-dependent frequency
(~16 kHz at 128 kbps, ~20 kHz at 320), so the spectrum falls off a wall where a real
recording keeps going.

FLAC Detective scores each file with **11 heuristic rules** built around that idea —
cutoff frequency vs. sample rate, MP3-bitrate signatures, compression artefacts
(pre-echo, aliasing), bitrate sanity — plus *protection* rules so genuine vinyl rips,
cassette transfers and naturally quiet recordings aren't flagged. An **optional 12th
rule** is a small CNN (`pip install "flac-detective[ml]"`) that *sharpens borderline
verdicts* — measured, it raises confidence on already-suspect files far more than it
catches fakes the heuristics miss outright. The rules sum to a 0–150 score and a 4-level verdict:

| Verdict | Score | What to do |
|---|---|---|
| ✅ **AUTHENTIC** | ≤ 30 | keep it |
| ⚡ **WARNING** | 31–54 | borderline — check manually |
| ⚠️ **SUSPICIOUS** | 55–85 | likely a transcode |
| ❌ **FAKE_CERTAIN** | ≥ 86 | multiple indicators — definitely transcoded |

The guiding principle throughout is **"protect authentic files first"**: a false alarm
on real music is worse than missing a borderline fake.

→ Every rule explained: [Technical Details](docs/technical-details.md).

## 🤖 The ML side is a case study worth reading

Rule 12's model went through a real R&D saga, written up as a **learning resource**:
a false-positive audit over 11 234 real FLACs, four dead-ends that *didn't* work (each
instructive), a debunked "AUC 0.99" false discovery caught by cross-validation, and a
twist where a "fundamental limit" turned out to be an artifact of listening in **mono** —
fixed by going **stereo**.

📖 **[Read the ML detective story →](ml/README.md)** — worth a look even if you never
enable the ML extra.

## 🆕 Latest release — v0.16 (ALAC & APE support)

- **Analyses ALAC (`.m4a`) and APE (`.ape`) too** (v0.16.0), decoded via ffmpeg —
  detection is codec-agnostic, so it's the same spectral pipeline. A lossy AAC `.m4a`
  is still correctly rejected (the real codec is probed, never trusted by extension).
- **Analyses WAV files too**, not just FLAC — same spectral pipeline (v0.15.0).
- **Sharper WARNING/SUSPICIOUS boundary** (v0.15.1): a score-distribution study found
  real transcodes cluster around a score of ~58, so the SUSPICIOUS floor moved 61 → 55,
  reclaiming ~+5 pp of transcodes as actionable while authentic false positives stay ~1 %.
- **One source of truth for verdicts** (v0.15.2–v0.15.3): the console, the text/JSON
  reports and the Python API now all derive the verdict from the same thresholds.

The Rule 12 classifier reads the stereo **mid + side** channels instead of mono (v0.14),
fixing its weak spot on band-limited music (baroque, jazz, old recordings). Real-world
specificity on a library of 11 234 authentic FLACs climbed from **80 % to 95 %**:

| | v0.12 (mono) | **v0.14 (stereo + gate)** |
|---|---|---|
| Specificity (authentic kept) | 80 % | **95 %** |
| Transcode recall | 87 % | **94 %** |

Full version-by-version history → **[CHANGELOG](CHANGELOG.md)**.

---

## ✨ Key Features

- **🎯 High Precision Detection**: 11-rule scoring system with intelligent protection mechanisms
- **📊 4-Level Verdict System**: Clear confidence ratings from AUTHENTIC to FAKE_CERTAIN
- **⚡ Performance Optimized**: 80% faster than baseline through smart caching and parallel processing
- **🔍 Advanced Analysis**: Spectral analysis, compression artifact detection, and multi-segment validation
- **🛡️ Protection Layers**: Prevents false positives for vinyl rips, cassette transfers, and high-quality MP3s
- **📝 Flexible Output**: Console reports with Rich formatting, JSON export, and detailed logging
- **🔧 Robust Error Handling**: Automatic retries, partial file reading, and comprehensive diagnostic tracking
- **🔨 Automatic Repair**: Corrupted FLAC files are automatically repaired with full metadata preservation
- **🤖 CNN classifier (optional)**: A small ML model bundled with the package adds a 12th scoring rule on borderline cases. `pip install "flac-detective[ml]"` to enable.

---

## 🚀 Quick Start

### Installation

```bash
# Install via pip (Recommended)
pip install flac-detective

# OR with the optional CNN classifier (Rule 12)
pip install "flac-detective[ml]"

# OR run with Docker (multi-arch: linux/amd64 + linux/arm64)
docker pull ghcr.io/guillain-rdcde/flac_detective:latest
```

### Upgrading to the latest version

`pip install flac-detective` does **not** upgrade an existing install — if
you already have an older version, pip prints `Requirement already
satisfied` and exits without doing anything. To get the latest release,
add the `--upgrade` flag (short form `-U`):

```bash
# Upgrade to the latest version on PyPI
pip install --upgrade flac-detective

# Same thing with the optional ML extra
pip install --upgrade "flac-detective[ml]"

# Verify the new version
flac-detective --version

# Docker: pull again to refresh the image
docker pull ghcr.io/guillain-rdcde/flac_detective:latest
```

**📦 See [Getting Started](docs/getting-started.md) for complete installation instructions.**

### Basic Usage

```bash
# Analyze current directory
flac-detective .

# Analyze specific directory
flac-detective /path/to/music

# Interactive mode (prompts for paths, accepts drag-and-drop in Windows cmd)
flac-detective
```

### Common Options

```bash
# Show version and help
flac-detective --version
flac-detective --help

# Verbose log + JSON output to a custom path
flac-detective -v --format json --output report.json /music

# Quick scan (15 s sample instead of default 30 s)
flac-detective --sample-duration 15 /music
```

**📖 See [User Guide](docs/user-guide.md) for detailed usage examples and command line options.**

### Try it Now (No Installation Required)

**Option 1: Docker with Sample File**
```bash
# Download a sample FLAC file (public domain)
curl -O https://archive.org/download/test_flac/sample.flac

# Run analysis with Docker (mount current directory)
docker run --rm -v "$(pwd)":/data ghcr.io/guillain-rdcde/flac_detective:latest /data/sample.flac
```

**Option 2: Quick Python Test**
```bash
# Using Python (if you have pip installed)
pip install flac-detective
flac-detective --version
flac-detective --help
```

**Option 3: Interactive Demo Script** ⭐ (Best for Quick Test)
```bash
# Clone and run demo with synthetic test files
git clone https://github.com/Guillain-RDCDE/FLAC_Detective.git
cd FLAC_Detective
pip install -e .
python examples/quick_test.py
```
This creates test files and shows FLAC Detective in action in 30 seconds!

**Option 4: GitHub Codespaces** (Fully Interactive Online)
1. Click the "Code" button → "Codespaces" → "Create codespace"
2. Wait for environment setup (~30 seconds)
3. Run: `pip install -e . && python examples/quick_test.py`

> **No sample files?** The tool works with **any FLAC file** from your music collection!

---

## 🎬 Demo

### Live Demo

![FLAC Detective in Action](assets/demo.gif)

Watch FLAC Detective analyze files with real-time progress bars and colored output!

### Example Output
```
======================================================================
  FLAC AUTHENTICITY ANALYZER
  Detection of MP3s transcoded to FLAC
======================================================================

⠋ Analyzing audio files... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  15% 0:02:34

======================================================================
  ANALYSIS COMPLETE
======================================================================
  FLAC files analyzed: 245
  Authentic files: 215 (87.8%)
  Fake/Suspicious files: 12 (4.9%)
  Text report: flac_report_20251220_143022.txt
======================================================================
```

---

## ⚡ Performance

FLAC Detective is optimized for both speed and accuracy:

- **Speed**: 2-5 seconds per file (30s sample, default)
- **Throughput**: 700-1,800 files/hour on modern hardware
- **Memory**: ~150-300 MB peak usage
- **Optimization**: 80% faster than baseline through intelligent caching and parallel processing
- **Scalability**: Handles libraries with 10,000+ files efficiently

**Customizable Performance**:
```bash
# Faster analysis (15s per file) - good for quick scans
flac-detective /music --sample-duration 15

# Balanced (30s per file) - default, recommended
flac-detective /music

# More thorough (60s per file) - maximum accuracy
flac-detective /music --sample-duration 60
```

---

## ❓ Frequently Asked Questions

### Does it work on Windows/Mac/Linux?

Yes! FLAC Detective is cross-platform and works on:
- ✅ Windows (7, 10, 11)
- ✅ macOS (10.14+)
- ✅ Linux (all major distributions)

### How accurate is the detection?

FLAC Detective uses an 11-rule scoring system with protection layers:
- **High confidence**: >95% accuracy for AUTHENTIC and FAKE_CERTAIN verdicts
- **Protection mechanisms**: Prevents false positives for vinyl rips, cassette transfers, and high-quality sources
- **4-level system**: AUTHENTIC, WARNING, SUSPICIOUS, FAKE_CERTAIN for nuanced results
- **Known blind spot (be honest)**: high-bitrate AAC and VBR transcodes, and transcodes of already band-limited recordings (baroque, historical, acoustic), are hard for *any* spectral tool to detect. On such material, treat AUTHENTIC as "no evidence of transcoding" rather than a guarantee.

### Will it damage or modify my files?

**No!** FLAC Detective is read-only by default:
- ✅ Only analyzes files, never modifies them
- ✅ Safe for your entire music collection
- ✅ Optional `--repair` flag for corrupted files (preserves all metadata)

### Can I trust the results?

Yes, with common sense. Each score band and what to do about it is in the verdict
table near the top of this README. For critical decisions, confirm with a
complementary tool (e.g. Spek for visual spectral analysis).

### What file formats are supported?

Currently:
- ✅ FLAC files (.flac) — read natively
- ✅ WAV files (.wav) — read natively, since v0.15.0
- ✅ ALAC (Apple Lossless, `.m4a`) and APE (Monkey's Audio, `.ape`) — since v0.16.0,
  decoded via **ffmpeg** (a hard dependency for these formats only; FLAC/WAV never
  need it). An `.m4a` holding lossy AAC is correctly rejected, not analysed.

### How long does analysis take?

About 2–5 s per file with the default 30 s sample — roughly 50–90 min for 1,000
files, a few hours for a 10,000-file library. The **Performance** section above
covers throughput and how `--sample-duration` trades speed for thoroughness.

### Can I use it in my own application?

Yes! FLAC Detective provides a Python API:

```python
from flac_detective import FLACAnalyzer

analyzer = FLACAnalyzer()
result = analyzer.analyze_file("song.flac")
print(result['verdict'])  # AUTHENTIC, WARNING, SUSPICIOUS, or FAKE_CERTAIN
```

See [examples/](examples/) directory for integration examples.

### Is it free and open source?

Yes! MIT License:
- ✅ Free for personal and commercial use
- ✅ Open source on GitHub
- ✅ Contributions welcome

### How can I contribute?

Bug reports, code, docs, and testing are all welcome — see
[CONTRIBUTING.md](.github/CONTRIBUTING.md).

---

## 📚 Documentation

Detailed documentation is available in the `docs/` directory:

- [**Documentation Index**](docs/index.md) - Overview and navigation
- [**Getting Started**](docs/getting-started.md) - Installation and first analysis
- [**User Guide**](docs/user-guide.md) - Complete usage guide with examples
- [**Technical Details**](docs/technical-details.md) - Deep dive into detection rules and algorithms
- [**API Reference**](docs/api-reference.md) - Python API documentation
- [**Contributing**](.github/CONTRIBUTING.md) - Development guide

---

## 🎯 Use Cases

- **Library Maintenance**: Clean your music collection of fake lossless files
- **Quality Verification**: Validate FLAC authenticity before archiving
- **Batch Processing**: Analyze large music libraries efficiently
- **Format Validation**: Ensure genuine lossless quality for critical listening

### 💡 Quick Examples

See the [examples/](examples/) directory for ready-to-run scripts:
- **[basic_usage.py](examples/basic_usage.py)** - Simple file and directory analysis
- **[batch_processing.py](examples/batch_processing.py)** - Process multiple directories with statistics
- **[json_export.py](examples/json_export.py)** - Export results to JSON for further processing
- **[api_integration.py](examples/api_integration.py)** - Advanced API usage and integration patterns

---

## 🤝 Contributing

Contributions are welcome! Please read our [CONTRIBUTING.md](.github/CONTRIBUTING.md) for detailed guidelines and [CODE_OF_CONDUCT.md](.github/CODE_OF_CONDUCT.md) for community standards.

---

## 🔒 Security

For security policy and vulnerability reporting, please see [SECURITY.md](.github/SECURITY.md).

---

## 📝 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

## 📞 Support

- **Issues**: [GitHub Issues](https://github.com/Guillain-RDCDE/FLAC_Detective/issues)
- **Discussions**: [GitHub Discussions](https://github.com/Guillain-RDCDE/FLAC_Detective/discussions)
- **Security**: see [SECURITY.md](.github/SECURITY.md)

---

## 🙏 Acknowledgements

Thanks to the community members who took the time to report bugs and confirm fixes — first issues are special.

- **[@GearKite](https://github.com/GearKite)** — Filed [#7](https://github.com/Guillain-RDCDE/FLAC_Detective/issues/7) with a clean traceback that pinpointed the circular import in v0.9.6, and [#6](https://github.com/Guillain-RDCDE/FLAC_Detective/issues/6) spotting the underscore-vs-dash Docker image name.
- **[@Aakiles](https://github.com/Aakiles)** — Diagnosed the circular import end-to-end and shipped a working patch via comment. The v0.9.7 fix is a refinement of his approach.
- **[@AnotherMuggle](https://github.com/AnotherMuggle)** and **[@tomelephant-git](https://github.com/tomelephant-git)** — Confirmed the fix across operating systems, including Windows 11 LTSC.
- **[@AKHwyJunkie](https://github.com/AKHwyJunkie)** — Confirmed the v0.9.6 import crash, validating @GearKite's report.
- **[@pblue3](https://github.com/pblue3)** — First reported the Docker image inaccessibility ([#6](https://github.com/Guillain-RDCDE/FLAC_Detective/issues/6)).

---

## ⭐ Star History

[![Star History Chart](https://api.star-history.com/svg?repos=Guillain-RDCDE/FLAC_Detective&type=Date)](https://star-history.com/#Guillain-RDCDE/FLAC_Detective&Date)

---

**FLAC Detective** - *Maintaining authentic lossless audio collections*
