Metadata-Version: 2.4
Name: slimpdf
Version: 0.1.0
Summary: Local-first PDF compressor: shrink PDFs below a target size without sending them to any third-party service.
Project-URL: Homepage, https://github.com/thisis-gp/slimpdf
Project-URL: Repository, https://github.com/thisis-gp/slimpdf
Project-URL: Issues, https://github.com/thisis-gp/slimpdf/issues
Project-URL: Changelog, https://github.com/thisis-gp/slimpdf/blob/main/CHANGELOG.md
Author-email: D A Gurupriyan <am400718@gmail.com>
License: MIT
License-File: LICENSE
Keywords: compression,local-first,offline,optimize,pdf,privacy,qpdf,shrink
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Graphics :: Graphics Conversion
Classifier: Topic :: Utilities
Requires-Python: >=3.11
Requires-Dist: pikepdf>=9.0
Requires-Dist: pillow>=10.0
Requires-Dist: pypdfium2>=4.30
Provides-Extra: benchmark
Requires-Dist: numpy>=1.24; extra == 'benchmark'
Requires-Dist: scikit-image>=0.22; extra == 'benchmark'
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: reportlab>=4.0; extra == 'dev'
Description-Content-Type: text/markdown

# slimpdf

[![PyPI](https://img.shields.io/pypi/v/slimpdf.svg)](https://pypi.org/project/slimpdf/)
[![Python](https://img.shields.io/pypi/pyversions/slimpdf.svg)](https://pypi.org/project/slimpdf/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
[![CI](https://github.com/thisis-gp/slimpdf/actions/workflows/ci.yml/badge.svg)](https://github.com/thisis-gp/slimpdf/actions/workflows/ci.yml)

Get a PDF under a size limit (say 3 MB for an upload form) without sending it to
some website. slimpdf is a command-line tool and Python library that compresses
PDFs entirely on your own machine — no uploads, no account, no network calls.

```bash
pip install slimpdf
slimpdf compress big-scan.pdf --target 3mb -o small.pdf
```

![slimpdf demo — compress an 8.4 MB PDF to 484 KB fully offline](docs/demo.gif)

## Why this exists

I kept hitting the same wall at work: a scanned hospital bill or KYC document
would be 15–20 MB, the upload form capped at 3 MB, and the only quick fix was to
drop the file into an online compressor. For medical and identity documents,
handing them to a random website is exactly what you don't want to do.

Command-line options exist, but they have rough edges. Ghostscript's presets are
a blunt instrument — `/screen` either over-compresses into mush or doesn't get
small enough, and there's no "just get it under 3 MB" mode. It's also AGPL, so
you can't bundle it into a product. slimpdf is the tool I wanted instead:

- **It runs locally.** Your files never leave the machine. No telemetry either.
- **It aims at a size, not a vague preset.** Tell it `--target 3mb` and it works
  out how much compression is actually needed, so it doesn't throw away quality
  it didn't have to.
- **It won't quietly wreck a file.** Every result is re-opened and checked (page
  count, text still extractable) before it's kept, and if it can't beat the
  original it leaves the file alone instead of writing a bigger one.
- **MIT, with permissive dependencies only.** No AGPL anywhere, so you can drop
  it into closed-source software without a lawyer conversation.

On a real set of scanned documents it held its own against Ghostscript —
similar size reduction at noticeably better visual fidelity. Numbers and method
are in [docs/benchmark-results.md](docs/benchmark-results.md).

## Install

```bash
pip install slimpdf
# or, from source with uv:
uv sync --extra dev
```

## CLI

```bash
# Compress below 3 MB using the claim-upload preset
slimpdf compress input.pdf --target 3mb -o output.pdf --report report.json

# Inspect a PDF (read-only): pages, images, text layer, encryption
slimpdf inspect input.pdf --json

# Batch a folder and emit a benchmark table
slimpdf batch ./samples --target 3mb --out ./compressed --csv bench.csv --md bench.md

# Compare slimpdf against other engines on size AND quality (SSIM)
pip install "slimpdf[benchmark]"      # adds SSIM scoring
slimpdf compare ./samples --out ./out --min-mb 3 --csv compare.csv
```

`compare` picks up Ghostscript and mutool if they're on your PATH and scores
each engine on size, visual fidelity (SSIM) and text retention, so you're not
just trusting that smaller is better. There's a real run in
[docs/benchmark-results.md](docs/benchmark-results.md).

Exit code is `2` when a target was requested but not achieved — handy in scripts.

### Presets

| preset         | target | max DPI | quality (start→min) | rasterize |
|----------------|--------|---------|---------------------|-----------|
| `claim-upload` | 3 MB   | 150     | 75 → 55             | off       |
| `screen`       | —      | 120     | 70 → 45             | off       |
| `archive`      | —      | 200     | 85 → 70             | off       |

Override any knob: `--max-dpi`, `--quality`, `--min-quality`, `--target`,
`--allow-rasterize`, `--keep-metadata`, `--password`, `--force-output`.

## Python API

```python
from slimpdf import compress, inspect, CompressOptions

info = inspect("input.pdf")
print(info.page_count, info.images_found, info.text_layer_detected)

result = compress(
    "input.pdf", "output.pdf",
    CompressOptions(preset="claim-upload", target_bytes=3_000_000),
)
print(result.compressed_size_bytes, result.target_achieved, result.mode_used)
```

## How it works

There are three compression passes. slimpdf tries them least-destructive first
and keeps the gentlest one that gets you under the target:

1. **Structural** — recompress streams, pack objects, drop junk metadata. Fully
   lossless; sometimes enough on its own for a bloated-but-not-scanned PDF.
2. **Image rewrite** — the main worker. Downsamples and re-encodes the oversized
   embedded images, binary-searching JPEG quality and DPI until it hits the
   target. Text, forms and vector graphics are left intact.
3. **Raster fallback** — off unless you pass `--allow-rasterize`. Flattens each
   page to an image and rebuilds the PDF. It'll hit almost any size, but you lose
   selectable text, form fields and signatures, so it's a last resort.

## Demo

The animation at the top (`docs/demo.gif`) is generated from a real slimpdf run
on a synthetic file — regenerate it any time with:

```bash
uv run python docs/make_demo.py     # writes docs/demo.gif
```

(A [VHS](https://github.com/charmbracelet/vhs) tape, `docs/demo.tape`, is also
included if you prefer recording a live terminal.)

## Notes

**It really is offline.** slimpdf makes no network calls and writes nothing
outside the paths you give it. Safe to run on documents you wouldn't upload.

**What if it can't hit the target?** It returns the smallest valid result it
found and reports `target_achieved: false` (CLI exit code `2`). It never ships a
file larger than the input unless you pass `--force-output`.

**Scanned PDFs with no text layer** compress on image content alone; there's
nothing to preserve text-wise, so they tend to shrink the most.

## Licensing

slimpdf is **MIT**. All runtime dependencies are permissive (no copyleft):

| dependency  | license       | role                        |
|-------------|---------------|-----------------------------|
| `pikepdf`   | MPL-2.0       | parsing + structural rewrite (bundles qpdf, Apache-2.0) |
| `pypdfium2` | BSD-3/Apache  | page rendering (bundles PDFium, BSD-3) |
| `Pillow`    | MIT-CMU (HPND)| image encode/decode         |

So you can use it inside a proprietary product without copyleft obligations.
PyMuPDF and Ghostscript would both have been easier in places, but they're AGPL,
which is the whole reason they're not here.

## Status

Alpha — works well on the documents I've thrown at it, but it hasn't seen the
long tail yet. Two things to know: any rewrite invalidates a PDF's digital
signature (that's unavoidable when you change the bytes), and a few unusual image
filters/colorspaces are skipped with a warning rather than risk producing a
broken file. Bug reports with a sample PDF are very welcome.
