Metadata-Version: 2.4
Name: flac-detective
Version: 0.14.1
Summary: Advanced FLAC authenticity analyzer - Detects MP3-to-FLAC transcodes with high precision
Author-email: Guillain d'Erceville <guillain@poulpe.us>
License-Expression: MIT
Project-URL: Homepage, https://github.com/Guillain-RDCDE/FLAC_Detective
Project-URL: Repository, https://github.com/Guillain-RDCDE/FLAC_Detective
Project-URL: Documentation, https://github.com/Guillain-RDCDE/FLAC_Detective/tree/main/docs
Project-URL: Changelog, https://github.com/Guillain-RDCDE/FLAC_Detective/blob/main/CHANGELOG.md
Project-URL: Issues, https://github.com/Guillain-RDCDE/FLAC_Detective/issues
Keywords: flac,audio,analysis,transcode,detection,mp3,quality,authenticity
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio :: Analysis
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.20.0
Requires-Dist: scipy>=1.7.0
Requires-Dist: mutagen>=1.45.0
Requires-Dist: soundfile>=0.10.0
Requires-Dist: rich>=13.0.0
Provides-Extra: ml
Requires-Dist: torch>=2.0; extra == "ml"
Requires-Dist: librosa>=0.10; extra == "ml"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-benchmark>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Requires-Dist: flake8-docstrings>=1.7.0; extra == "dev"
Requires-Dist: flake8-bugbear>=23.0.0; extra == "dev"
Requires-Dist: flake8-comprehensions>=3.14.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: pylint>=2.17.0; extra == "dev"
Requires-Dist: pre-commit>=3.5.0; extra == "dev"
Requires-Dist: bandit>=1.7.0; extra == "dev"
Requires-Dist: interrogate>=1.5.0; extra == "dev"
Requires-Dist: commitizen>=3.0.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=7.0.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=2.0.0; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints>=1.25.0; extra == "docs"
Requires-Dist: myst-parser>=2.0.0; extra == "docs"
Dynamic: license-file

# 🎵 FLAC Detective

![FLAC Detective Banner](https://raw.githubusercontent.com/Guillain-RDCDE/FLAC_Detective/main/assets/flac_detective_banner.png)

[![Python Version](https://img.shields.io/badge/python-3.10%2B-blue)](https://www.python.org/downloads/)
[![PyPI version](https://img.shields.io/pypi/v/flac-detective)](https://pypi.org/project/flac-detective/)
[![PyPI Downloads](https://img.shields.io/pypi/dm/flac-detective)](https://pypi.org/project/flac-detective/)
[![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
[![CI](https://github.com/Guillain-RDCDE/FLAC_Detective/actions/workflows/ci.yml/badge.svg)](https://github.com/Guillain-RDCDE/FLAC_Detective/actions/workflows/ci.yml)
[![Status](https://img.shields.io/badge/status-active--development-brightgreen)](https://github.com/Guillain-RDCDE/FLAC_Detective)
[![codecov](https://codecov.io/gh/Guillain-RDCDE/FLAC_Detective/branch/main/graph/badge.svg)](https://codecov.io/gh/Guillain-RDCDE/FLAC_Detective)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)

**Advanced FLAC Authenticity Analyzer for Detecting MP3-to-FLAC Transcodes**

FLAC Detective is a professional-grade command-line tool that analyzes FLAC audio files to detect MP3-to-FLAC transcodes with high precision. Using advanced spectral analysis and an 11-rule scoring system, it helps you maintain an authentic lossless music collection.

---

## 🆕 What's new in v0.14.0 — Stereo CNN (May 2026)

v0.13 *gated around* Rule 12's blind spot on band-limited music; v0.14 *fixes*
it. The insight: the model was listening in **mono**, but MP3 joint-stereo coding
leaves its clearest fingerprint in the **side channel** (L−R) — exactly where a
band-limited transcode is otherwise invisible. A controlled probe nailed it: on
band-limited material a mid-only CNN is a coin flip (AUC 0.51), while the same CNN
on mid+side hits **0.72, even at 320 kbps**. So we retrained EfficientNet-B0 with
a 2-channel (mid+side) input.

| Held-out test                     | v3 (mono) | **v4 (stereo)** |
|-----------------------------------|-----------|-----------------|
| Balanced accuracy                 | 0.834     | **0.905**       |
| Recall (transcoded)               | 86.9 %    | **94.1 %**      |
| Specificity (recall on authentic) | 80.0 %    | **86.9 %**      |

On the real library of 11 234 authentic FLACs, false positives drop in every
rolloff regime; shipped as **v4 + the v0.13 reliability gate**, real-world
specificity reaches **95.1 %** (from v3's 80.2 %). The reversal of the v0.13
"fundamental limit" conclusion — and the bit-depth confound and audit-offset bug
caught along the way — is written up in
[ml/README.md](ml/README.md#the-fifth-attempt-that-worked--stereo-v014).

## 🆕 What's new in v0.13.0 — Reliability Gate (May 2026)

No new model — a small, empirically-grounded gate that fixes v3's one weak spot:
false alarms on **band-limited music** (baroque, historical, acoustic). An audit
of all 11 234 certified-authentic FLACs showed the model's false-positive rate
ran from 5 % on full-range material to **57 % below 4 kHz of rolloff** — because
when a recording already rolls off that early, an MP3 transcode removes nothing
detectable, so authentic and fake are physically near-identical.

We exhausted the alternatives (threshold tuning, compression ratio, stereo and
in-band texture, MP3 frame-rate modulation — none separate them) and concluded
it's a near-fundamental limit. So **Rule 12 now abstains where its precision is a
coin flip** (rolloff < 7 kHz) and defers to the heuristic rules:

| Metric                            | v0.12 (v3) | **v0.13** |
|-----------------------------------|------------|-----------|
| Specificity (recall on authentic) | 80.2 %     | **92.8 %** |

The only detection given up is in a regime where Rule 12 was guessing anyway —
and where a transcode is the least harmful (a 320 kbps MP3 of a 5 kHz-bandwidth
source is sonically transparent). The full R&D write-up — the audit, the four
dead ends, the debunked false discovery — is in
[ml/README.md](ml/README.md#the-reliability-gate-and-the-four-dead-ends-before-it-v013).

## 🆕 What's new in v0.12.0 — ML v3 (May 2026)

Smaller, faster, more accurate. Same conservative philosophy.

| Metric                              | v0.11 (v2)   | **v0.12 (v3)**    | Δ           |
|-------------------------------------|--------------|--------------------|-------------|
| Balanced accuracy                   | 0.811        | **0.834**          | +0.023      |
| Recall on transcoded                | 82.7 %       | **86.9 %**         | **+4.2 pp** |
| Recall on authentic (specificity)   | 80.0 %       | 80.0 %             | ≈           |
| Model size (bundled)                | 43 MB        | **16 MB**          | **−63 %**   |
| Architecture                        | ResNet-18    | EfficientNet-B0    |             |

**4 more transcoded files out of every 100 are now caught**, at the same
false-positive rate. The wheel is also 27 MB lighter.

Under the hood: more training data (5 964 × 10 codecs = 65 244 samples vs
24 451), EfficientNet-B0 pretrained replacing ResNet-18, Mixup
augmentation, cosine annealing LR, and mmap-backed feature loading (the
27 GB feature tensor stays on disk so the training process plays nice on
shared hosts). Full story in the [CHANGELOG](CHANGELOG.md).

## 🕰️ What's new in v0.11.0 — ML v2, Properly Trained (May 2026)

The 12th scoring rule, introduced in v0.10.0, was technically functional
but had a **95 % false-positive rate** on authentic FLAC files. v0.11.0
ships a properly-trained model that fixes this.

| Metric                              | v0.10 (v1)    | **v0.11 (v2)**     |
|-------------------------------------|---------------|---------------------|
| Balanced accuracy                   | ~0.55         | **0.81**            |
| Specificity (recall on authentic)   | 4.5 %         | **80 %**            |
| Precision (transcoded)              | 87.5 %        | **97.6 %**          |
| Threshold needed for safe use       | 0.85 (hack)   | **0.5 (natural)**   |
| Architecture                        | Custom 5-block CNN | ResNet-18 (ImageNet-pretrained) |
| Model size                          | 1.6 MB        | 43 MB               |

**The 80 % specificity is the headline**: out of 333 known-authentic test
files, v1 misclassified 318 as transcoded; v2 misclassifies 68. Almost a
20× drop in false positives.

The path to a working model was **five training attempts** that taught
specific lessons (focal loss double-balancing, biased F1 selection,
insufficient model capacity, and the root cause: feature extraction was
downsampling audio to 22 kHz, erasing the very MP3 cutoff signal we were
trying to learn). The full story is in the [CHANGELOG](CHANGELOG.md) and
[ml/README.md](ml/README.md).

- **Opt-in** via `pip install "flac-detective[ml]"`. PyTorch and librosa are
  optional — without them, Rule 12 is a graceful no-op and the existing
  11-rule pipeline runs unchanged.
- **Trained on Hetzner GPU** (RTX 4000 Ada) over 2 237 certified-authentic
  FLACs (CD rips verified by EAC / XLD / Audiochecker logs) plus 22 258
  transcodes generated across **10** codec/bitrate combinations
  (MP3 CBR 128/192/256/320, MP3 VBR V0/V2, AAC 192/256, Opus 128, Vorbis q5).
- **Reproducible**: the full training pipeline lives in `ml/`. Eight
  scripts, one `run_pipeline.sh` to chain them, ~2 h end-to-end on a
  modest GPU.

For the v0.9.7 → v0.10.1 fix trail (circular import, Docker image,
documentation refresh, CLI catch-up, branch protection, …) see the
[CHANGELOG](CHANGELOG.md).

---

## ✨ Key Features

- **🎯 High Precision Detection**: 11-rule scoring system with intelligent protection mechanisms
- **📊 4-Level Verdict System**: Clear confidence ratings from AUTHENTIC to FAKE_CERTAIN
- **⚡ Performance Optimized**: 80% faster than baseline through smart caching and parallel processing
- **🔍 Advanced Analysis**: Spectral analysis, compression artifact detection, and multi-segment validation
- **🛡️ Protection Layers**: Prevents false positives for vinyl rips, cassette transfers, and high-quality MP3s
- **📝 Flexible Output**: Console reports with Rich formatting, JSON export, and detailed logging
- **🔧 Robust Error Handling**: Automatic retries, partial file reading, and comprehensive diagnostic tracking
- **🔨 Automatic Repair**: Corrupted FLAC files are automatically repaired with full metadata preservation
- **🤖 CNN classifier (optional)**: A small ML model bundled with the package adds a 12th scoring rule on borderline cases. `pip install "flac-detective[ml]"` to enable.

---

## 🚀 Quick Start

### Installation

```bash
# Install via pip (Recommended)
pip install flac-detective

# OR with the optional CNN classifier (Rule 12)
pip install "flac-detective[ml]"

# OR run with Docker (multi-arch: linux/amd64 + linux/arm64)
docker pull ghcr.io/guillain-rdcde/flac_detective:latest
```

### Upgrading to the latest version

`pip install flac-detective` does **not** upgrade an existing install — if
you already have an older version, pip prints `Requirement already
satisfied` and exits without doing anything. To get the latest release,
add the `--upgrade` flag (short form `-U`):

```bash
# Upgrade to the latest version on PyPI
pip install --upgrade flac-detective

# Same thing with the optional ML extra
pip install --upgrade "flac-detective[ml]"

# Verify the new version
flac-detective --version

# Docker: pull again to refresh the image
docker pull ghcr.io/guillain-rdcde/flac_detective:latest
```

**📦 See [Getting Started](docs/getting-started.md) for complete installation instructions.**

### Basic Usage

```bash
# Analyze current directory
flac-detective .

# Analyze specific directory
flac-detective /path/to/music

# Interactive mode (prompts for paths, accepts drag-and-drop in Windows cmd)
flac-detective
```

### Common Options

```bash
# Show version and help
flac-detective --version
flac-detective --help

# Verbose log + JSON output to a custom path
flac-detective -v --format json --output report.json /music

# Quick scan (15 s sample instead of default 30 s)
flac-detective --sample-duration 15 /music
```

**📖 See [User Guide](docs/user-guide.md) for detailed usage examples and command line options.**

### Try it Now (No Installation Required)

**Option 1: Docker with Sample File**
```bash
# Download a sample FLAC file (public domain)
curl -O https://archive.org/download/test_flac/sample.flac

# Run analysis with Docker (mount current directory)
docker run --rm -v "$(pwd)":/data ghcr.io/guillain-rdcde/flac_detective:latest /data/sample.flac
```

**Option 2: Quick Python Test**
```bash
# Using Python (if you have pip installed)
pip install flac-detective
flac-detective --version
flac-detective --help
```

**Option 3: Interactive Demo Script** ⭐ (Best for Quick Test)
```bash
# Clone and run demo with synthetic test files
git clone https://github.com/Guillain-RDCDE/FLAC_Detective.git
cd FLAC_Detective
pip install -e .
python examples/quick_test.py
```
This creates test files and shows FLAC Detective in action in 30 seconds!

**Option 4: GitHub Codespaces** (Fully Interactive Online)
1. Click the "Code" button → "Codespaces" → "Create codespace"
2. Wait for environment setup (~30 seconds)
3. Run: `pip install -e . && python examples/quick_test.py`

> **No sample files?** The tool works with **any FLAC file** from your music collection!

---

## 🎬 Demo

### Live Demo

![FLAC Detective in Action](assets/demo.gif)

Watch FLAC Detective analyze files with real-time progress bars and colored output!

### Example Output
```
======================================================================
  FLAC AUTHENTICITY ANALYZER
  Detection of MP3s transcoded to FLAC
======================================================================

⠋ Analyzing audio files... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  15% 0:02:34

======================================================================
  ANALYSIS COMPLETE
======================================================================
  FLAC files analyzed: 245
  Authentic files: 215 (87.8%)
  Fake/Suspicious files: 12 (4.9%)
  Text report: flac_report_20251220_143022.txt
======================================================================
```

---

## ⚡ Performance

FLAC Detective is optimized for both speed and accuracy:

- **Speed**: 2-5 seconds per file (30s sample, default)
- **Throughput**: 700-1,800 files/hour on modern hardware
- **Memory**: ~150-300 MB peak usage
- **Optimization**: 80% faster than baseline through intelligent caching and parallel processing
- **Scalability**: Handles libraries with 10,000+ files efficiently

**Customizable Performance**:
```bash
# Faster analysis (15s per file) - good for quick scans
flac-detective /music --sample-duration 15

# Balanced (30s per file) - default, recommended
flac-detective /music

# More thorough (60s per file) - maximum accuracy
flac-detective /music --sample-duration 60
```

---

## ❓ Frequently Asked Questions

### Does it work on Windows/Mac/Linux?

Yes! FLAC Detective is cross-platform and works on:
- ✅ Windows (7, 10, 11)
- ✅ macOS (10.14+)
- ✅ Linux (all major distributions)

### How accurate is the detection?

FLAC Detective uses an 11-rule scoring system with protection layers:
- **High confidence**: >95% accuracy for AUTHENTIC and FAKE_CERTAIN verdicts
- **Protection mechanisms**: Prevents false positives for vinyl rips, cassette transfers, and high-quality sources
- **4-level system**: AUTHENTIC, WARNING, SUSPICIOUS, FAKE_CERTAIN for nuanced results

### Will it damage or modify my files?

**No!** FLAC Detective is read-only by default:
- ✅ Only analyzes files, never modifies them
- ✅ Safe for your entire music collection
- ✅ Optional `--repair` flag for corrupted files (preserves all metadata)

### Can I trust the results?

Yes, but use common sense:
- ✅ **AUTHENTIC** (score ≤30): Very high confidence, keep the file
- ⚡ **WARNING** (31-60): Borderline case, manual verification recommended
- ⚠️ **SUSPICIOUS** (61-85): High confidence transcode, consider replacing
- ❌ **FAKE_CERTAIN** (≥86): Multiple indicators, definitely a transcode

For critical decisions, use complementary tools (e.g., Spek for visual spectral analysis) to confirm.

### What file formats are supported?

Currently:
- ✅ FLAC files (.flac)
- 🔜 Future: WAV, ALAC, APE (planned for v1.0)

### How long does analysis take?

- **Single file**: 2-5 seconds (30s sample)
- **100 files**: ~5-10 minutes
- **1,000 files**: ~50-90 minutes
- **10,000 files**: ~8-15 hours

Use `--sample-duration 15` for faster scans of large libraries.

### Can I use it in my own application?

Yes! FLAC Detective provides a Python API:

```python
from flac_detective import FLACAnalyzer

analyzer = FLACAnalyzer()
result = analyzer.analyze_file("song.flac")
print(result['verdict'])  # AUTHENTIC, WARNING, SUSPICIOUS, or FAKE_CERTAIN
```

See [examples/](examples/) directory for integration examples.

### Is it free and open source?

Yes! MIT License:
- ✅ Free for personal and commercial use
- ✅ Open source on GitHub
- ✅ Contributions welcome

### How can I contribute?

See [CONTRIBUTING.md](.github/CONTRIBUTING.md) for:
- Bug reports and feature requests
- Code contributions
- Documentation improvements
- Testing and feedback

---

## 📚 Documentation

Detailed documentation is available in the `docs/` directory:

- [**Documentation Index**](docs/index.md) - Overview and navigation
- [**Getting Started**](docs/getting-started.md) - Installation and first analysis
- [**User Guide**](docs/user-guide.md) - Complete usage guide with examples
- [**Technical Details**](docs/technical-details.md) - Deep dive into detection rules and algorithms
- [**API Reference**](docs/api-reference.md) - Python API documentation
- [**Contributing**](.github/CONTRIBUTING.md) - Development guide

---

## 🎯 Use Cases

- **Library Maintenance**: Clean your music collection of fake lossless files
- **Quality Verification**: Validate FLAC authenticity before archiving
- **Batch Processing**: Analyze large music libraries efficiently
- **Format Validation**: Ensure genuine lossless quality for critical listening

### 💡 Quick Examples

See the [examples/](examples/) directory for ready-to-run scripts:
- **[basic_usage.py](examples/basic_usage.py)** - Simple file and directory analysis
- **[batch_processing.py](examples/batch_processing.py)** - Process multiple directories with statistics
- **[json_export.py](examples/json_export.py)** - Export results to JSON for further processing
- **[api_integration.py](examples/api_integration.py)** - Advanced API usage and integration patterns

---

## 🤝 Contributing

Contributions are welcome! Please read our [CONTRIBUTING.md](.github/CONTRIBUTING.md) for detailed guidelines and [CODE_OF_CONDUCT.md](.github/CODE_OF_CONDUCT.md) for community standards.

---

## 🔒 Security

For security policy and vulnerability reporting, please see [SECURITY.md](.github/SECURITY.md).

---

## 📝 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

## 📞 Support

- **Issues**: [GitHub Issues](https://github.com/Guillain-RDCDE/FLAC_Detective/issues)
- **Discussions**: [GitHub Discussions](https://github.com/Guillain-RDCDE/FLAC_Detective/discussions)
- **Security**: see [SECURITY.md](.github/SECURITY.md)

---

## 🙏 Acknowledgements

Thanks to the community members who took the time to report bugs and confirm fixes — first issues are special.

- **[@GearKite](https://github.com/GearKite)** — Filed [#7](https://github.com/Guillain-RDCDE/FLAC_Detective/issues/7) with a clean traceback that pinpointed the circular import in v0.9.6, and [#6](https://github.com/Guillain-RDCDE/FLAC_Detective/issues/6) spotting the underscore-vs-dash Docker image name.
- **[@Aakiles](https://github.com/Aakiles)** — Diagnosed the circular import end-to-end and shipped a working patch via comment. The v0.9.7 fix is a refinement of his approach.
- **[@AnotherMuggle](https://github.com/AnotherMuggle)** and **[@tomelephant-git](https://github.com/tomelephant-git)** — Confirmed the fix across operating systems, including Windows 11 LTSC.
- **[@AKHwyJunkie](https://github.com/AKHwyJunkie)** — Confirmed the v0.9.6 import crash, validating @GearKite's report.
- **[@pblue3](https://github.com/pblue3)** — First reported the Docker image inaccessibility ([#6](https://github.com/Guillain-RDCDE/FLAC_Detective/issues/6)).

---

## ⭐ Star History

[![Star History Chart](https://api.star-history.com/svg?repos=Guillain-RDCDE/FLAC_Detective&type=Date)](https://star-history.com/#Guillain-RDCDE/FLAC_Detective&Date)

---

**FLAC Detective** - *Maintaining authentic lossless audio collections*
