Metadata-Version: 2.4
Name: pdf-email-optimizer
Version: 0.1.0
Summary: Shrink PDFs to email-safe sizes while preserving visual quality.
Author: PDF Email Optimizer contributors
License: MIT
Project-URL: Homepage, https://github.com/petehottelet/pdf-email-optimizer
Project-URL: Issues, https://github.com/petehottelet/pdf-email-optimizer/issues
Keywords: pdf,compression,email,optimization
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Graphics
Classifier: Topic :: Office/Business
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pypdf>=4.0
Requires-Dist: Pillow>=10.0
Requires-Dist: pypdfium2>=4.0
Provides-Extra: dev
Requires-Dist: build; extra == "dev"
Requires-Dist: jsonschema; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: pikepdf>=8; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: reportlab>=4.0; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: twine; extra == "dev"
Provides-Extra: qa
Requires-Dist: pypdfium2>=4.0; extra == "qa"
Provides-Extra: ghostscript
Provides-Extra: pikepdf
Requires-Dist: pikepdf>=8; extra == "pikepdf"
Provides-Extra: fixtures
Requires-Dist: reportlab>=4.0; extra == "fixtures"
Dynamic: license-file

<p align="center">
  <img src="assets/logo.png" alt="PDF Email Optimizer" width="480">
</p>

# PDF Email Optimizer

[![CI](https://github.com/petehottelet/pdf-email-optimizer/actions/workflows/ci.yml/badge.svg)](https://github.com/petehottelet/pdf-email-optimizer/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/pdf-email-optimizer.svg)](https://pypi.org/project/pdf-email-optimizer/)
[![Python](https://img.shields.io/pypi/pyversions/pdf-email-optimizer.svg)](https://pypi.org/project/pdf-email-optimizer/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Agent Skill](https://img.shields.io/badge/Agent_Skill-SKILL.md-orange.svg)](SKILL.md)

![Profiles](https://img.shields.io/badge/profiles-quality_%7C_balanced_%7C_aggressive-blue)
![Backends](https://img.shields.io/badge/backends-pypdf_%7C_pikepdf_%7C_ghostscript-555)
![Optimizes](https://img.shields.io/badge/optimizes-photos_%7C_scans_%7C_screenshots_%7C_vectors-blueviolet)

Shrink PDFs to email-safe sizes while preserving visual quality.

PDF Email Optimizer is built for posters, brochures, reports, photo-heavy decks, and design-tool exports that need to fit under a target like 5-7 MB. It starts with structural cleanup, recompresses images only when needed, and reports when a requested size conflicts with visual quality.

## Install

From a checkout:

```bash
python -m pip install -e ".[dev]"
pdf-email-optimizer --help
```

Once published to a package index:

```bash
pipx install pdf-email-optimizer
pdf-email-optimizer input.pdf output.pdf --target-mb 7 --profile quality
```

Also supported:

```bash
uvx pdf-email-optimizer input.pdf output.pdf --target 7mb
python -m pdf_email_optimizer input.pdf output.pdf --target-mb 7
```

## Quick Start

```bash
# Ordinary email optimization
pdf-email-optimizer input.pdf output_email.pdf --target-mb 7

# Preserve photos, screenshots, maps, and other detail
pdf-email-optimizer input.pdf output_email.pdf --target 7mb --quality

# Land inside a 5-7 MB range when possible
pdf-email-optimizer input.pdf output_email.pdf --range 5-7mb --quality

# Produce a Markdown report beside the output
pdf-email-optimizer input.pdf output_email.pdf --target-mb 7 --report report.md

# Inspect without writing an optimized PDF
pdf-email-optimizer input.pdf --audit
```

The source PDF is never overwritten. Existing output files are rejected unless `--force` is supplied.

## Profiles

| Profile | Use When | Behavior |
|---|---|---|
| `quality` | Photos, screenshots, maps, product images, "do not degrade" requests | High JPEG floor, protects small images, runs render QA, does not use Ghostscript by default |
| `balanced` | General email delivery | Moderate recompression ladder and conservative structural cleanup |
| `aggressive` | Smallest file matters more than perfect fidelity | Lower quality floor, smaller long-edge caps, optional Ghostscript fallback |

If `quality` mode cannot hit the requested size, the tool keeps the smallest quality-preserving output and emits a direct warning with next steps.

## Output

Use `--json` for machine-readable summaries:

```bash
pdf-email-optimizer input.pdf output.pdf --target-mb 7 --json
```

The JSON contract is documented in [docs/json-output.md](docs/json-output.md) and validated by [schema/output-summary.schema.json](schema/output-summary.schema.json). Important fields include input/output size, target status, strategy, page count, private payload removals, image statistics, render QA, quality status, and warnings.

## Gallery

Original page (left) vs. email copy (right). All inputs are synthetic, CC0 fixtures generated by `benchmarks/make_fixtures.py`; regenerate the images with `python benchmarks/make_gallery.py`.

**InDesign-style export — 2.35 MB → 0.18 MB (92% smaller, PSNR 57.8 dB)**

![InDesign-style export before and after](docs/gallery/indesign_export.png)

**Scanned document — 0.73 MB → 0.25 MB (66% smaller)**

![Scanned document before and after](docs/gallery/scanned_pdf.png)

**Repeated images — 0.81 MB → 0.14 MB (83% smaller, lossless dedupe)**

![Repeated images before and after](docs/gallery/repeated_images.png)

## Benchmarks

The benchmark harness runs against the bundled redistributable fixtures:

```bash
python benchmarks/make_fixtures.py        # (re)generate CC0 sample PDFs
python benchmarks/run_benchmarks.py --manifest benchmarks/benchmark_manifest.yaml --output benchmarks/results/latest.json
```

It writes JSON plus a Markdown table. Missing fixtures are marked as skipped so published results stay honest. The table below is generated output (`benchmarks/results/latest.md`); PSNR/RMS compare the optimized copy against the original render, and `inf`/`0.0` denote a pixel-identical (lossless) result.

| Case | Input | Target | Profile | Output | Reduction | Target Hit | Worst PSNR | Worst RMS | Strategy |
|---|---:|---:|---|---:|---:|---|---:|---:|---|
| photo_brochure | 1.10 MB | 0.6 MB | quality | 1.10 MB | 0.1% | No | inf | 0.0 | pikepdf-structural |
| indesign_export | 2.35 MB | 1 MB | balanced | 0.18 MB | 92.3% | Yes | 57.822 | 0.327679 | image-recompress |
| illustrator_export | 0.01 MB | 7 MB | balanced | 0.01 MB | 18.6% | Yes | inf | 0.0 | structural-cleanup |
| private_payload_export | 0.16 MB | 7 MB | quality | 0.16 MB | 0.1% | Yes | inf | 0.0 | structural-cleanup |
| screenshot_report | 0.27 MB | 0.2 MB | quality | 0.09 MB | 66.4% | Yes | inf | 0.0 | structural-cleanup |
| text_vector_document | 0.00 MB | 7 MB | balanced | 0.00 MB | 12.2% | Yes | inf | 0.0 | structural-cleanup |
| scanned_pdf | 0.73 MB | 0.4 MB | balanced | 0.25 MB | 66.6% | Yes | inf | 0.0 | structural-cleanup |
| mixed_transparency | 1.75 MB | 1 MB | quality | 1.75 MB | -0.0% | No | inf | 0.0 | structural-cleanup |
| embedded_metadata | 0.12 MB | 7 MB | balanced | 0.12 MB | 0.1% | Yes | inf | 0.0 | structural-cleanup |
| repeated_images | 0.81 MB | 0.5 MB | balanced | 0.14 MB | 83.2% | Yes | inf | 0.0 | structural-cleanup |
| forms_annotations | 0.01 MB | 7 MB | quality | 0.01 MB | 3.9% | Yes | inf | 0.0 | structural-cleanup |
| encrypted_pdf | - | 7.0 MB | balanced | failed | - | - | - | - | Encrypted PDFs must be unlocked before optimization. |

The `quality` profile deliberately refuses to degrade `photo_brochure` and `mixed_transparency` below their targets, emitting a warning instead of shipping a blurry file.

See [docs/benchmarking.md](docs/benchmarking.md) before adding fixtures.

## Visual QA

Render and compare two PDFs:

```bash
pdf-email-render-compare original.pdf optimized.pdf --output-dir qa-renders
```

This reports page-level pixel differences and can write original, optimized, and amplified diff PNGs for review.

## Agent Usage

The repo includes [SKILL.md](SKILL.md) for agent runtimes that load local skills. The short version:

- Use `quality` when the user asks to preserve image fidelity.
- Use `balanced` for ordinary email optimization.
- Use `aggressive` only when visible quality loss is acceptable.
- Report size, target status, strategy, and warnings.
- Never overwrite the source PDF.

More examples are in [docs/agent-usage.md](docs/agent-usage.md).

## Development

```bash
python -m pip install -e ".[dev]"
pytest
pytest --cov
ruff check .
python -m build
```

CI runs linting, tests, coverage, package build, and CLI smoke checks on Python 3.9-3.13.

## Documentation

- [Installation](docs/installation.md)
- [Examples](docs/examples.md)
- [Benchmarking](docs/benchmarking.md)
- [Compatibility](docs/compatibility.md)
- [JSON output](docs/json-output.md)
- [Agent usage](docs/agent-usage.md)
- [Known limitations](docs/known-limitations.md)
- [Troubleshooting](docs/troubleshooting.md)

## License

[MIT](LICENSE)
