Metadata-Version: 2.4
Name: birdpreen
Version: 0.4.1
Summary: Scan macOS Photos library, detect and identify birds, write species captions
Author: Dazhen Pan
License: AGPL-3.0-or-later
Keywords: birds,photos,macos,identification,osea,yolo
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: MacOS X
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Multimedia :: Graphics
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: Pillow>=9.0.0
Requires-Dist: pillow-heif>=0.10.0
Requires-Dist: pi-heif
Requires-Dist: ultralytics>=8.0.0
Requires-Dist: torch>=2.0.0
Requires-Dist: torchvision
Requires-Dist: numpy>=1.20.0
Requires-Dist: huggingface_hub
Requires-Dist: photoscript>=0.3.0
Requires-Dist: pyobjc-framework-Photos
Requires-Dist: pypinyin
Requires-Dist: pillow-jxl-plugin
Requires-Dist: pillow-avif-plugin
Requires-Dist: rawpy
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Dynamic: license-file

# preen

Groom your photo library — automatically find and name every bird.

Preen scans your macOS Photos library, detects birds with YOLO, identifies species with [SuperPicky](https://gitcode.com/Jamesphotography/SuperPicky/tree/master)'s OSEA classifier (10,964 species), and writes bilingual keywords and captions.

## Features

- Scans entire Photos library including iCloud photos
- YOLO multi-bird detection — finds all birds in a photo
- OSEA species identification with GPS-based eBird regional filtering
- Keywords: `白鹭 (Little Egret)` + pinyin `bailu` per species
- Captions: `白鹭, 苍鹭 (Little Egret, Grey Heron)`
- Parallel iCloud downloads via PhotoKit (no Photos.app dependency for reads)
- SQLite checkpoint — pause/resume, incremental or full rescan
- Auto-retries failed iCloud exports on next run
- Supports JPEG, HEIC, JXL, AVIF, and RAW formats (ARW, CR2, CR3, NEF, DNG, RAF)

## Requirements

- macOS with Photos.app
- Python 3.11+

## Installation

```bash
pipx install birdpreen
```

Or with pip:

```bash
pip install birdpreen
```

On first `scan`, model files (~260 MB) are automatically downloaded from HuggingFace.

## Usage

```bash
# Scan new photos (incremental)
preen scan

# Full library rescan
preen scan --full

# Dry run — detect and identify without writing
preen scan --dry-run

# Custom confidence threshold (default: 70%)
preen scan --threshold 65

# Process in batches
preen scan --batch-size 500

# Adjust parallel iCloud downloads (default: 16)
preen scan --workers 32

# Override regional species filter
preen scan --country US
preen scan --region US-CA

# Stricter threshold when no regional filter matches (default: 90%)
preen scan --global-threshold 95

# Customize caption and keyword format
preen scan --caption-format "{en} ({latin})"
preen scan --caption-separator " / "
preen scan --keyword-format "{cn} {en} {latin} {pinyin}"

# Check progress
preen status

# Reset checkpoint (auto-creates backup)
preen reset

# Restore checkpoint from latest backup
preen restore
```

### Caption and keyword format

The `--caption-format` flag controls how each species appears in the photo description. The `--keyword-format` flag controls which fields become individual keywords (space-delimited). Available placeholders:

| Placeholder | Example |
|------------|---------|
| `{cn}` | 白鹭 |
| `{cn_trad}` | 白鷺 |
| `{en}` | Little Egret |
| `{latin}` | Egretta garzetta |
| `{pinyin}` | bailu |

### Tuning `--workers`

The `--workers` flag controls how many iCloud photos are downloaded in parallel (default: 16). The scan output shows a queue indicator like `q:12/16` — ready/total. "Ready" means downloaded and waiting for the GPU; "total" is the queue size.

- If ready often drops to 0, downloads can't keep up — increase workers
- If queue is often full (e.g. `q:16/16`), GPU is the bottleneck — check for other GPU-intensive processes
- If you have plenty of RAM (each queued image uses ~50-100MB), 32 workers is safe
- For sequential processing (most reliable), use `--workers 1`

## Credits

- OSEA bird classification model (10,964 species) by [Sun Jiao](https://gitcode.com/sunjiao)
- Bird identification logic (OSEA classifier, AVONET geographic filtering, eBird species data) extracted from [SuperPicky](https://gitcode.com/Jamesphotography/SuperPicky/tree/master)
- YOLO11 segmentation model by [Ultralytics](https://github.com/ultralytics/ultralytics)
- Photos library access via [PhotoKit](https://developer.apple.com/documentation/photokit) through [PyObjC](https://pyobjc.readthedocs.io/)
