Metadata-Version: 2.4
Name: collate-pdf
Version: 0.2.0
Summary: Objective file-comparison engine for the Print With Synergy stack — reports measured plate ↔ 1-up differences as neutral FACTS (coverage / geometry deltas + presence + diff images). Tolerance + pass/fail policy lives in lint.
Author-email: Think Neverland <dev@thinkneverland.com>
License-Expression: AGPL-3.0-or-later
License-File: LICENSE
Requires-Python: >=3.12
Requires-Dist: codex-pdf>=1.37.0
Requires-Dist: fastapi>=0.115
Requires-Dist: gunicorn>=23.0
Requires-Dist: httpx>=0.27
Requires-Dist: numpy>=1.26
Requires-Dist: pillow>=10.0
Requires-Dist: pydantic>=2.8
Requires-Dist: python-multipart>=0.0.9
Requires-Dist: uvicorn>=0.30
Description-Content-Type: text/markdown

# collate

**The objective file-comparison engine for the [Print With Synergy](https://github.com/printwithsynergy) stack.**

collate answers one question — *"do these two files match, and where exactly do
they differ?"* — and answers it with **measured facts**, never a verdict. Its
first capability is **plate ↔ 1-up** comparison: align a set of decoded
separation plates (1-bit TIFF / Esko LEN) against an approved 1-up PDF and report
the per-ink coverage and geometry differences, which separations are missing or
extra, and (optionally) a per-ink visual difference image and an AI visual note.

## Where collate sits in the stack

The Print With Synergy engines split by **who owns what**:

| Engine | Owns |
|---|---|
| **codex** | Extraction — single-file **facts** (per-separation coverage, screen ruling/angle, Pantone, dieline…). |
| **collate** | Comparison — two-file **difference facts** (coverage/geometry deltas, presence, diff images). |
| **lint** | Policy — **rules + verdicts** (`LPDF_PLATE_CMP_*`, pass/fail, tolerances). |
| **lens** | Display — the visual inspection UX. |

collate is an **objective-layer** engine: it states the numbers. Whether a
3-point coverage delta or a 1.5 mm geometry shift is *acceptable* is a tolerance
decision, and tolerances are **policy** — they live in lint, not here. collate
ships **no** `coverage_mismatch` flag, **no** `overall_match` roll-up, and **no**
tolerance constant. It measures; lint judges.

collate owns no raster primitives of its own — it reuses
[`codex-pdf`](https://pypi.org/project/codex-pdf/) for plate decode, the
Ghostscript `tiffsep` separation render, ink normalization, and the Pantone
catalogue. codex stays the extraction layer; collate is comparison built on top.

## API

| Route | Purpose |
|---|---|
| `GET /healthz` | Liveness + capability flags (Ghostscript availability, codex version). |
| `GET /readyz` | Readiness (the PDF render path degrades gracefully without Ghostscript). |
| `GET /v1/contract` | Engine / version / schema / routes / capabilities. |
| `POST /v1/compare/plates` | Compare a plate set (1+ TIFF/LEN files) against a 1-up PDF → `CollateCompareResult`. |

`POST /v1/compare/plates` is `multipart/form-data`:

- `plates` — one or more separation files (repeat the field per file).
- `pdf` — the approved 1-up PDF.
- `page` (default `1`), `dpi` (default `150`), `ai` (default `false`),
  `diff_images` (default `false`).

```bash
curl -sS https://<collate-host>/v1/compare/plates \
  -F plates=@cyan.tif -F plates=@magenta.tif -F plates=@black.tif \
  -F pdf=@approved-1up.pdf \
  -F dpi=150 -F diff_images=true
```

The response is neutral comparison facts — see
[`CollateCompareResult`](src/collate/models/compare.py). Errors are RFC 7807
Problem Details (`application/problem+json`).

When Ghostscript is unavailable the PDF side **self-skips**: the result carries
the plate-side facts, `pdf_rendered: false`, and a note. A consumer must not read
that as a clean compare — lint floors it to `INCONCLUSIVE`.

## Library use

The same comparison runs in process via the client (no HTTP), which is how lint
consumes collate when they share a host:

```python
from collate.client import CollateClient

client = CollateClient()  # in-process; or CollateClient(base_url="https://…") for HTTP
result = client.compare_plates(
    [(cyan_bytes, "cyan.tif"), (black_bytes, "black.tif")],
    pdf_bytes,
    dpi=150,
    diff_images=True,
)
for ink in result.inks:
    print(ink.ink_name, ink.presence, ink.coverage_delta_percent)
```

## Local dev

The distribution is published as **`collate-pdf`** (the bare `collate` name is
taken on PyPI), matching the `codex-pdf` family; the import package is `collate`.

```bash
pip install -e . pytest httpx ruff   # pulls codex-pdf from the index
ruff check src tests
pytest                                # gs-free (the PDF side is monkeypatched)
uvicorn collate.api.main:app --reload --port 8080
```

The comparison's PDF-render path needs **Ghostscript** (`gs`) at runtime; the
test suite is gs-free, but a real `POST /v1/compare/plates` against a PDF needs
it installed (the Docker image ships it).

## License

AGPL-3.0-or-later, matching the engine stack.
