Metadata-Version: 2.4
Name: exis-pdfeditor
Version: 3.8.0
Summary: Python wrapper for Exis.PdfEditor — comprehensive PDF toolkit (find/replace, merge, split, forms, redaction, watermark, encryption, signatures, PDF/A, and more)
Author-email: Exis LLC <support@exisone.com>
License-Expression: LicenseRef-Proprietary
Project-URL: Homepage, https://pdfbatcheditor.com/developers
Project-URL: Documentation, https://pdfbatcheditor.com/developers
Keywords: pdf,pdf-editor,find-replace,merge,split,form-filling,redaction,watermark,encryption,digital-signatures,pdf-a,text-extraction
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Office/Business
Requires-Python: >=3.9
Description-Content-Type: text/markdown

# Exis.PdfEditor for Python

Comprehensive PDF toolkit for Python — find/replace, merge, split, form filling, redaction, image editing, watermark, Bates stamping, page editing, encryption, optimization, digital signatures, PDF/A compliance, XMP + `/Info` metadata, OCR for scanned PDFs, and more.

Powered by the [Exis.PdfEditor](https://pdfbatcheditor.com/developers) .NET library, compiled to a native binary via .NET Native AOT. **No .NET runtime required.**

## Installation

```bash
pip install exis-pdfeditor
```

Platform-specific wheels are available for:
- Windows x64 (`win_amd64`)
- Linux x64 (`manylinux_2_17_x86_64`)
- macOS ARM64 / Apple Silicon (`macosx_11_0_arm64`)

## Quick start

```python
import exis_pdfeditor

# Optional: set a license key (or use EXIS_PDF_LICENSE_KEY env var).
# Without a key, a 14-day trial is activated automatically.
exis_pdfeditor.initialize("XXXX-XXXX-XXXX-XXXX")

# Inspect a PDF (free — no license required)
info = exis_pdfeditor.inspect("document.pdf")
print(f"{info.pageCount} pages, title: {info.title}")

# Find and replace text
result = exis_pdfeditor.find_replace(
    "input.pdf", "output.pdf",
    "old text", "new text",
    case_sensitive=False,
)
print(f"{result.totalReplacements} replacements made")
```

## Features

| Feature | Function |
|---------|----------|
| **Inspect** | `inspect(path)` — metadata, fonts, pages, encryption status |
| **Text extraction** | `extract_text(path)`, `extract_text_structured(path)` |
| **Find & replace** | `find_replace(...)` — regex, case-insensitive, whole-word, styling |
| **Merge** | `merge(paths, output)` |
| **Split** | `split(path, output_dir)` |
| **Extract pages** | `extract_pages(path, output, pages=[1,3,5])` |
| **Form fields** | `list_fields(path, split_duplicate_widgets=...)`, `fill_form(..., text_alignment=..., split_duplicate_widgets=..., flatten=...)` |
| **Redaction** | `redact(path, output, redactions)` |
| **Watermark** | `watermark(path, output, "DRAFT", position="across")` |
| **Stamp** | `stamp(path, output, stamp_pdf, mode="overlay")` |
| **Bates stamping** | `bates_stamp(path, output, prefix="ABC", digits=6)` |
| **Optimize** | `optimize(path, output, downsample_images=True)` |
| **Encrypt / Decrypt** | `encrypt(...)`, `decrypt(...)` |
| **Page editing** | `rotate(...)`, `crop(...)`, `reorder(...)`, `delete_pages(...)`, `insert_blank_pages(...)` |
| **Images** | `find_images(path)`, `replace_image(...)` |
| **Digital signatures** | `sign(...)`, `verify(...)` |
| **PDF/A** | `pdfa_validate(path)`, `pdfa_convert(path, output)` |
| **Metadata (XMP + `/Info`)** | `get_metadata(path)`, `set_xmp(...)`, `set_info(...)`, `remove_xmp(...)`, `remove_info(...)`, `remove_metadata(...)` |
| **Page classification** | `analyze_pages(path)` — pre-OCR triage (free, no license) |
| **OCR — searchable PDF** *(Windows)* | `make_searchable_pdf(path, output, languages=..., progress=...)` |
| **OCR — de-identify scans** *(Windows)* | `redact_scanned_pdf(path, output, terms=..., visible_replacement=...)` |

---

## Platform support

| Function group | Windows | Linux | macOS |
|---|:-:|:-:|:-:|
| All non-OCR functions (find/replace, merge, forms, redaction, watermark, optimization, signatures, PDF/A, metadata, …) | yes | yes | yes |
| `analyze_pages` — page classification | yes | yes | yes |
| `make_searchable_pdf`, `redact_scanned_pdf` — OCR | yes | not yet — raises `OcrNotSupportedError` | not yet — raises `OcrNotSupportedError` |

OCR is currently Windows-only because the page rasterizer (`Exis.PdfOcr.Windows`) depends on the WinRT `Windows.Data.Pdf` API and `System.Drawing.Common`. When a cross-platform rasterizer ships, the Python side will pick it up automatically.

---

## Inspect

Retrieve metadata, page info, fonts, encryption status, and form field counts. No license required.

```python
info = exis_pdfeditor.inspect("document.pdf")

print(f"Version:    {info.version}")
print(f"Pages:      {info.pageCount}")
print(f"Title:      {info.title}")
print(f"Author:     {info.author}")
print(f"Encrypted:  {info.isEncrypted}")
print(f"Has forms:  {info.hasFormFields} ({info.formFieldCount} fields)")
print(f"Fonts:      {info.fontsUsed}")

for page in info.pages:
    print(f"  Page {page.pageNumber}: {page.widthInPoints}x{page.heightInPoints} pt, "
          f"{page.characterCount} chars")
```

## Text extraction

### Plain text

```python
result = exis_pdfeditor.extract_text("document.pdf")
print(result.fullText)

for page in result.pages:
    print(f"Page {page.pageNumber}: {page.text[:100]}...")
```

### Specific pages only

```python
result = exis_pdfeditor.extract_text("document.pdf", pages=[1, 3, 5])
```

### Structured text (with positions and font data)

```python
result = exis_pdfeditor.extract_text_structured("document.pdf")

for page in result.pages:
    for block in page.textBlocks:
        print(f"  [{block.x:.0f}, {block.y:.0f}] {block.fontName} {block.fontSize}pt: "
              f"{block.text}")
```

## Find & replace

### Single pair

```python
result = exis_pdfeditor.find_replace(
    "input.pdf", "output.pdf",
    "old text", "new text",
)
print(f"{result.totalReplacements} replacements across {result.pagesModified} pages")
```

### Multiple pairs

```python
result = exis_pdfeditor.find_replace(
    "input.pdf", "output.pdf",
    pairs=[
        {"search": "foo", "replace": "bar"},
        {"search": "hello", "replace": "world"},
    ],
)
```

### Regex

```python
result = exis_pdfeditor.find_replace(
    "input.pdf", "output.pdf",
    pairs=[
        {"search": r"\d{3}-\d{2}-\d{4}", "replace": "XXX-XX-XXXX", "isRegex": True},
    ],
)
```

### All options

```python
result = exis_pdfeditor.find_replace(
    "input.pdf", "output.pdf",
    "confidential", "[REDACTED]",
    case_sensitive=False,
    whole_word=True,
    use_regex=False,
    page_range=[1, 2, 3],
    text_fitting="adaptive",         # none, preserve_width, fit_to_page, adaptive
    min_horizontal_scale=70,
    max_font_size_reduction=1.5,
    replacement_text_color={"r": 1, "g": 0, "b": 0},
    replacement_highlight_color={"r": 1, "g": 1, "b": 0},
    replacement_bold=True,
    replacement_underline=False,
    replacement_strikethrough=False,
    preserve_form_fields=True,
    use_incremental_update=True,
)

for detail in result.details:
    print(f"  Page {detail.pageNumber}: '{detail.originalText}' -> '{detail.replacementText}'")
```

## Merge

```python
exis_pdfeditor.merge(["part1.pdf", "part2.pdf", "part3.pdf"], "merged.pdf")
```

## Split

Split a PDF into one file per page:

```python
result = exis_pdfeditor.split("document.pdf", "output_folder/")
print(f"Split into {result.pageCount} files")
for path in result.files:
    print(f"  {path}")
```

## Extract pages

Extract specific pages into a new PDF:

```python
exis_pdfeditor.extract_pages("document.pdf", "pages_1_and_3.pdf", pages=[1, 3])
```

## Form fields

Each field has a PDF ``name``, optional ``displayName`` (smart label from nearby page text), ``type``, ``value``, ``options`` (choice fields), ``isReadOnly``, and ``hasDuplicateWidgets`` (true when one logical field has multiple widgets at different positions — e.g. two-up receipts). Pass ``split_duplicate_widgets=True`` to ``list_fields`` / ``fill_form`` to use per-widget keys ``name_1``, ``name_2``, … (page-reading order).

### List all fields

```python
fields = exis_pdfeditor.list_fields("form.pdf")

for field in fields:
    print(f"  {field.name} ({field.type}): {field.value}")
    if field.displayName:
        print(f"    Label: {field.displayName}")
    if field.options:
        print(f"    Options: {field.options}")
    if field.hasDuplicateWidgets:
        print("    (duplicate widgets — use split_duplicate_widgets=True for per-widget names)")
```

### List with duplicate-widget split (two-up / carbon copy)

```python
fields = exis_pdfeditor.list_fields("two-up-receipt.pdf", split_duplicate_widgets=True)
# e.g. Address_1, Address_2, date_1, date_2, ...
```

### Fill a form

```python
result = exis_pdfeditor.fill_form(
    "form.pdf", "filled.pdf",
    fields={
        "FirstName": "John",
        "LastName": "Doe",
        "Email": "john@example.com",
        "AgreeToTerms": "Yes",
    },
)
print(f"Filled {result.fieldsFilled} fields, {result.fieldsNotFound} not found")
```

### Force text alignment (default honors each field's PDF ``/Q``)

```python
result = exis_pdfeditor.fill_form(
    "form.pdf", "filled_centered.pdf",
    fields={"Comments": "Centered text"},
    text_alignment="center",  # "auto" | "left" | "center" | "right"
)
```

### Fill duplicate-widget fields (different value per widget; auto-flattens)

```python
result = exis_pdfeditor.fill_form(
    "two-up-receipt.pdf", "filled.pdf",
    fields={
        "Address_1": "Mr. David G Cruz",
        "date_1": "10-Jan-67",
        "Address_2": "Ms. Claudia Morales",
        "date_2": "22-Feb-26",
    },
    split_duplicate_widgets=True,
)
```

### Fill and flatten (make fields non-editable)

```python
result = exis_pdfeditor.fill_form(
    "form.pdf", "filled_flat.pdf",
    fields={"Name": "Jane Doe"},
    flatten=True,
)
```

## Redaction

### Redact text

```python
result = exis_pdfeditor.redact(
    "document.pdf", "redacted.pdf",
    redactions=[
        {"text": "John Doe", "replaceWith": "[NAME REDACTED]"},
        {"text": r"\d{3}-\d{2}-\d{4}", "isRegex": True, "replaceWith": "[SSN]"},
    ],
)
print(f"{result.redactionsApplied} redactions applied")
```

### Redact a specific area on a page

```python
result = exis_pdfeditor.redact(
    "document.pdf", "redacted.pdf",
    redactions=[
        {
            "area": {"x": 100, "y": 700, "width": 200, "height": 20},
            "pageNumber": 1,
        },
    ],
)
```

### Combined text + area redaction

```python
result = exis_pdfeditor.redact(
    "document.pdf", "redacted.pdf",
    redactions=[
        {"text": "confidential", "caseSensitive": False},
        {"area": {"x": 50, "y": 50, "width": 500, "height": 30}, "pageNumber": 2},
    ],
)
```

## Page classification (pre-OCR triage)

`analyze_pages` classifies every page as `Digital`, `Scanned`, `AlreadyOcrd`, or `Empty` purely from the content stream — no rendering and no OCR. Use it to predict which pages will need OCR before paying for a run. **Free, no license required, available on every wheel.**

```python
pages = exis_pdfeditor.analyze_pages("mixed.pdf")
for p in pages:
    print(f"  page {p.pageNumber}: {p.kind}  "
          f"(text {p.textCoverageRatio:.0%}, images {p.imageCoverageRatio:.0%})")

scanned = [p.pageNumber for p in pages if p.kind == "Scanned"]
if scanned:
    print(f"{len(scanned)} pages need OCR: {scanned}")
```

Each entry has `pageNumber`, `kind`, `textCharCount`, `textCoverageRatio`, `imageCoverageRatio`, and `hasInvisibleTextLayer` (signature of an already-OCR'd page).

## Searchable PDFs (OCR) — Windows only

`make_searchable_pdf` adds an **invisible, selectable text layer** over scanned pages so find/replace, extraction, and redaction work on the result like any digital PDF. Born-digital and already-OCR'd pages are passed through untouched — only true scans are rasterized and recognized, which is fast on mixed documents.

```python
result = exis_pdfeditor.make_searchable_pdf(
    "scan.pdf", "searchable.pdf",
    languages=("eng",),                   # default
    progress=lambda p: print(f"  page {p.page}/{p.total}: {p.phase}"),
)

print(f"OCR'd {result.pagesProcessed}, skipped {result.pagesSkipped}, "
      f"avg confidence {result.averageConfidence:.0%}")

# Confidence is never swallowed — flag low-confidence pages for human review.
for page in result.pages:
    if page.wasOcrd and page.confidence < 0.80:
        print(f"  page {page.pageNumber} needs review (conf {page.confidence:.0%})")
```

### Additional languages

The wheel ships English (`eng.traineddata`). To add languages, drop the matching `xxx.traineddata` files into the wheel's bundled `tessdata/` folder, or place them in your own folder and point at it:

```python
result = exis_pdfeditor.make_searchable_pdf(
    "scan.pdf", "searchable.pdf",
    languages=("eng", "spa", "vie"),
    tessdata_path=r"C:\my-app\tessdata",
)
```

The invisible text layer embeds a Unicode font (DejaVu Sans), so accented and non-Latin names (e.g. *Nguyễn*, *Peña*) survive find/replace intact rather than being mangled.

### Options

| Parameter | Default | Notes |
|---|---|---|
| `languages` | `("eng",)` | Tesseract language codes |
| `dpi` | `300` | Rendering DPI for non-image pages (full-page scans use their native resolution) |
| `apply_deskew` | `True` | Pre-process hint (engine-dependent) |
| `apply_denoise` | `False` | Pre-process hint (engine-dependent) |
| `min_confidence_to_include` | `0.0` | Words below this are still included (dropping them would risk leaking names you need to match) but counted in per-page confidence |
| `tessdata_path` | bundled | Override the trained-data folder |
| `progress` | `None` | Callback `(page, total, phase, confidence)` |

## De-identify scanned pages — Windows only

`redact_scanned_pdf` finds terms on scanned pages via OCR and **burns opaque boxes into the actual page image**. Unlike OCR-layer redaction (which only changes selectable text), this destroys the underlying pixels — the term is gone from the visible scan and unrecoverable, appropriate for sending de-identified scans externally.

```python
r = exis_pdfeditor.redact_scanned_pdf(
    "intake_scan.pdf", "intake_redacted.pdf",
    terms=["Jane Doe", "555-12-3456"],
    visible_replacement="[REDACTED]",     # drawn into the raster
    fill="white",                         # or "black"
)

print(f"Redacted {r.occurrencesRedacted} occurrences on {r.pagesAffected} pages")
if r.pagesSkipped:
    print(f"WARNING: {r.pagesSkipped} pages could not be safely burned — "
          "review them before sending the output.")

for occ in r.occurrences:
    print(f"  page {occ.pageNumber}: {occ.term!r} (conf {occ.confidence:.0%})")
```

> **Scope:** this handles image-based pages. Born-digital text pages are NOT modified — use `redact(...)` for those.
>
> Pages that are image-based but not a single full-page image are reported in `pagesSkipped` and left unchanged. A non-zero value means the output is **not safe to send as-is**.

The output is rewritten via the optimizer to drop the superseded original image objects, so the un-redacted pixels are not recoverable with PDF forensics.

## Watermark

### Basic diagonal watermark

```python
result = exis_pdfeditor.watermark("input.pdf", "watermarked.pdf", "DRAFT")
print(f"Watermarked {result.pagesWatermarked} of {result.totalPages} pages")
```

### All watermark options

```python
result = exis_pdfeditor.watermark(
    "input.pdf", "watermarked.pdf",
    "CONFIDENTIAL",
    position="across",          # top, bottom, center, across
    font_size=72,
    text_color={"r": 1, "g": 0, "b": 0},
    opacity=0.15,
    page_range=[1, 2],
)
```

## Stamp (PDF overlay / underlay)

### Overlay a letterhead on top

```python
result = exis_pdfeditor.stamp(
    "document.pdf", "stamped.pdf",
    "letterhead.pdf",
    mode="overlay",
)
print(f"Stamped {result.pagesStamped} pages")
```

### Underlay a background behind content

```python
result = exis_pdfeditor.stamp(
    "document.pdf", "stamped.pdf",
    "background.pdf",
    mode="underlay",
    opacity=0.5,
    page_range=[1],
)
```

## Bates stamping

Sequential page numbering for legal production and discovery workflows. Each page receives
a zero-padded identifier (e.g. `ABC000001`, `ABC000002`) rendered in a chosen corner of the
visual page. Placement is relative to the page's `/Rotate` orientation, so mixed-rotation
documents render consistently. An XMP audit block recording the range, digit width, and
prefix/suffix is written to the document catalog by default.

### Defaults (number starts at 1, 6 digits, bottom-right corner)

```python
result = exis_pdfeditor.bates_stamp("input.pdf", "stamped.pdf")
print(f"Stamped pages {result.firstNumber}-{result.lastNumber}")
print(f"Digits used: {result.digitsUsed}")
```

### Prefix, custom position, color, confidentiality label

```python
result = exis_pdfeditor.bates_stamp(
    "input.pdf", "stamped.pdf",
    prefix="ABC",
    start_number=1,
    digits=6,                                 # -> "ABC000001"
    position="bottom_right",                  # top_left, top_center, top_right,
                                              # bottom_left, bottom_center, bottom_right
    font_size=10,
    text_color={"r": 0, "g": 0, "b": 0},
    background_color={"r": 1, "g": 1, "b": 1},  # opaque box behind stamp
    margin_inches=0.5,
    confidentiality_label="CONFIDENTIAL",     # stacked above the Bates number
)
```

### Continuous numbering across a batch

Thread `lastNumber + 1` into the next call:

```python
next_n = 1
for path in docs:
    r = exis_pdfeditor.bates_stamp(
        path, path + ".stamped.pdf",
        prefix="ABC", start_number=next_n,
    )
    next_n = r.lastNumber + 1
```

### Skip the cover page (legal convention)

The cover is `ABC000001` in the production log even if physically unstamped:

```python
exis_pdfeditor.bates_stamp(
    "input.pdf", "stamped.pdf",
    prefix="ABC",
    skip_first_page=True,
    counter_advances_on_skipped_pages=True,   # default
)
```

### Stamp only selected pages

```python
exis_pdfeditor.bates_stamp(
    "input.pdf", "stamped.pdf",
    page_range=[2, 3, 5],                     # 1-based
)
```

### Signed input

Stamping invalidates signatures — opt in explicitly:

```python
result = exis_pdfeditor.bates_stamp(
    "signed.pdf", "stamped.pdf",
    allow_signed_input=True,                  # result.warnings will record it
)
for w in result.warnings:
    print(f"  warning: {w}")
```

### Suppress the XMP audit block

```python
exis_pdfeditor.bates_stamp(
    "input.pdf", "stamped.pdf",
    write_xmp_metadata=False,
)
```

### All options at a glance

| Parameter | Default | Purpose |
|-----------|---------|---------|
| `prefix` / `suffix` | `""` | Text bracketing the number |
| `start_number` | `1` | First Bates number (for batch continuation) |
| `digits` | `6` | Minimum zero-padded width; auto-expands if needed |
| `position` | `"bottom_right"` | Corner/edge on the visual page |
| `font_size` | `10.0` | Point size (Helvetica) |
| `text_color` | black | RGB dict `{"r": 0, "g": 0, "b": 0}` |
| `background_color` | none | Opaque rectangle behind stamp for legibility |
| `margin_inches` | `0.5` | Distance from trimmed page edge |
| `confidentiality_label` | none | e.g. `"CONFIDENTIAL"` |
| `confidentiality_position` | same corner | Override to place label elsewhere |
| `confidentiality_font_size` | matches `font_size` | Label point size |
| `page_range` | all pages | 1-based page numbers to stamp |
| `skip_first_page` | `False` | Skip page 1 (cover sheets) |
| `counter_advances_on_skipped_pages` | `True` | Advance counter on skipped pages |
| `allow_signed_input` | `False` | Stamp PDFs with digital signatures |
| `write_xmp_metadata` | `True` | Write an XMP audit block to the catalog |

**Result fields:** `firstNumber`, `lastNumber`, `pagesStamped`, `digitsUsed` (equals
`digits` unless auto-expanded), `warnings` (non-fatal diagnostics: digit expansion,
signed-input stamped anyway, etc.).

## Optimize

### Default optimization (compress + deduplicate)

```python
result = exis_pdfeditor.optimize("large.pdf", "smaller.pdf")
print(f"Reduced {result.originalSize:,} -> {result.optimizedSize:,} bytes "
      f"({result.reductionPercent:.1f}% smaller)")
print(f"  Streams compressed: {result.streamsCompressed}")
print(f"  Duplicates removed: {result.duplicatesRemoved}")
```

### With image downsampling

```python
result = exis_pdfeditor.optimize(
    "large.pdf", "smaller.pdf",
    downsample_images=True,
    max_image_dpi=150,
    remove_metadata=True,
)
print(f"Images downsampled: {result.imagesDownsampled}")
```

## Encrypt & decrypt

### Encrypt with a password

```python
exis_pdfeditor.encrypt(
    "document.pdf", "protected.pdf",
    user_password="openme",
    owner_password="secret",
    permissions=["Print", "CopyText"],
)
```

Available permissions: `Print`, `ModifyContents`, `CopyText`, `AddAnnotations`,
`FillForms`, `PrintHighQuality`, `All`.

### Decrypt

```python
exis_pdfeditor.decrypt("protected.pdf", "unlocked.pdf", password="openme")
```

## Page editing

### Rotate pages

```python
result = exis_pdfeditor.rotate("input.pdf", "rotated.pdf", angle=90)
print(f"Rotated {result.pagesModified} pages")

# Rotate only specific pages
result = exis_pdfeditor.rotate("input.pdf", "rotated.pdf", angle=180, pages=[2, 4])
```

### Crop pages

```python
result = exis_pdfeditor.crop(
    "input.pdf", "cropped.pdf",
    rect={"x": 50, "y": 50, "width": 500, "height": 700},
)
```

### Reorder pages

```python
exis_pdfeditor.reorder("input.pdf", "reordered.pdf", order=[3, 1, 2])
```

### Delete pages

```python
exis_pdfeditor.delete_pages("input.pdf", "trimmed.pdf", pages=[2, 4])
```

### Insert blank pages

Each insertion specifies a 0-based "after page" anchor (`0` = before page 1) and an
optional page size. Supported sizes: `letter` (default), `legal`, `a4`, `a3`, `a5`,
`tabloid`.

```python
exis_pdfeditor.insert_blank_pages(
    "input.pdf", "with_blanks.pdf",
    insertions=[
        {"afterPage": 0, "size": "a4"},       # before page 1
        {"afterPage": 3, "size": "letter"},   # after page 3
    ],
)
```

## Images

### Find all images

```python
result = exis_pdfeditor.find_images("document.pdf")
print(f"Found {result.totalImages} images across {result.pagesSearched} pages")

for img in result.images:
    print(f"  Image {img.index}: {img.pixelWidth}x{img.pixelHeight} "
          f"{img.colorSpace} {img.format}")
    print(f"    Pages: {img.pageNumbers}")
```

### Find images and save to disk

```python
result = exis_pdfeditor.find_images("document.pdf", output_dir="extracted_images/")
```

### Replace all images

```python
result = exis_pdfeditor.replace_image(
    "document.pdf", "replaced.pdf",
    "new_logo.png",
)
print(f"Replaced {result.imagesReplaced} of {result.imagesFound} images")
```

### Replace specific images by index or page

```python
result = exis_pdfeditor.replace_image(
    "document.pdf", "replaced.pdf",
    "new_logo.jpg",
    image_indices=[0, 2],
    page_range=[1],
    scale_mode="scale_to_fit",  # match_original_size, preserve_aspect_ratio, scale_to_fit
)
```

## Digital signatures

### Sign a PDF

```python
exis_pdfeditor.sign(
    "document.pdf", "signed.pdf",
    cert_path="certificate.pfx",
    cert_password="certpass",
    reason="Approved",
    location="New York, NY",
    signer_name="John Doe",
)
```

### Sign with a visible signature

```python
exis_pdfeditor.sign(
    "document.pdf", "signed.pdf",
    cert_path="certificate.pfx",
    cert_password="certpass",
    visible=True,
    page=1,
    rect={"x": 50, "y": 50, "width": 200, "height": 60},
    reason="Reviewed and approved",
)
```

### Verify signatures

```python
info = exis_pdfeditor.verify("signed.pdf")
print(f"Signed: {info.isSigned}")
print(f"Signer: {info.signerName}")
print(f"Valid:  {info.isValid}")
print(f"Reason: {info.reason}")
print(f"Date:   {info.signDate}")

# Verify all signatures in a multi-signed document
signatures = exis_pdfeditor.verify("multi_signed.pdf", all_signatures=True)
for sig in signatures:
    print(f"  {sig.signerName}: valid={sig.isValid}")
```

## PDF/A compliance

### Validate

```python
result = exis_pdfeditor.pdfa_validate("document.pdf", level="2b")
print(f"Compliant: {result.isCompliant}")

if not result.isCompliant:
    for v in result.violations:
        print(f"  [{v.code}] {v.message} (auto-fixable: {v.canAutoFix})")
```

### Convert to PDF/A

```python
exis_pdfeditor.pdfa_convert("document.pdf", "archive.pdf", level="2b")
```

Supported levels: `1b`, `2b`, `2u`, `3b`, `3u`.

## XMP + `/Info` metadata

PDFs carry document metadata in two places:

- **XMP** — the modern RDF/XML packet referenced from the Catalog `/Metadata` entry.
- **`/Info`** — the legacy trailer dictionary (`Title`, `Author`, `Subject`, `Keywords`,
  `Creator`, `Producer`, `CreationDate`, `ModDate`, plus custom keys).

`get_metadata()` is free and works on encrypted PDFs (XMP only).
All mutation calls (`set_xmp`, `set_info`, `remove_xmp`, `remove_info`, `remove_metadata`)
require a license and reject encrypted input — decrypt first with `decrypt()`.

### Read metadata

```python
meta = exis_pdfeditor.get_metadata("document.pdf")

if meta.hasXmp:
    print(f"XMP packet: {meta.xmpByteSize} bytes")
    print(meta.xmpXml)          # Full <?xpacket ... ?> payload as UTF-8 text

if meta.hasInfo:
    info = meta.info
    print(f"Title:    {info.title}")
    print(f"Author:   {info.author}")
    print(f"Subject:  {info.subject}")
    print(f"Keywords: {info.keywords}")
    print(f"Creator:  {info.creator}")
    print(f"Producer: {info.producer}")
    print(f"Created:  {info.creationDate}")        # ISO 8601 string or None
    print(f"Modified: {info.modificationDate}")    # same

    # Non-standard /Info keys (e.g. "Company", custom producer fields)
    for key, value in vars(info.custom).items():
        print(f"  [custom] {key} = {value}")
```

### Replace the XMP packet

Pass a full `<?xpacket ...?>`-wrapped RDF/XML document as a string:

```python
xmp = """<?xpacket begin=""?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/">
      <dc:title><rdf:Alt><rdf:li xml:lang="x-default">Quarterly Report</rdf:li></rdf:Alt></dc:title>
      <dc:creator><rdf:Seq><rdf:li>Finance Team</rdf:li></rdf:Seq></dc:creator>
    </rdf:Description>
  </rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>"""

exis_pdfeditor.set_xmp("input.pdf", "output.pdf", xmp)
```

Passing an empty string writes an **empty** packet (still present, no data).
To drop the `/Metadata` reference entirely, use `remove_xmp()`.

### Replace the `/Info` dictionary

The PDF must already have an `/Info` reference in its trailer; otherwise
this raises `PdfEditorError`. Fields you omit are written as *missing* (not
empty strings) so "not set" semantics are preserved.

```python
from datetime import datetime, timezone

exis_pdfeditor.set_info(
    "input.pdf", "output.pdf",
    info={
        "title":    "Quarterly Report",
        "author":   "Finance Team",
        "subject":  "Q1 2026 earnings summary",
        "keywords": "earnings, Q1, 2026",
        "creator":  "exis-pdfeditor",
        "producer": "Exis.PdfEditor 3.6",
        "creationDate":     datetime(2026, 4, 1, 9, 0, tzinfo=timezone.utc),
        "modificationDate": datetime.now(timezone.utc),
        # Arbitrary non-standard /Info keys
        "custom": {
            "Company":  "Exis LLC",
            "Revision": "v3.6.4",
        },
    },
)
```

Dates can be provided as `datetime` / `date` objects or ISO 8601 strings
(e.g. `"2026-04-20T09:00:00Z"`). Snake_case aliases `creation_date` and
`modification_date` are also accepted.

### Remove metadata (wipe for privacy)

```python
# Drop the XMP packet only
exis_pdfeditor.remove_xmp("input.pdf", "stripped-xmp.pdf")

# Empty the /Info dict (trailer reference kept, contents cleared)
exis_pdfeditor.remove_info("input.pdf", "stripped-info.pdf")

# Drop both in one incremental update (smaller output than calling both)
exis_pdfeditor.remove_metadata("input.pdf", "stripped.pdf")
```

## Diagnostic structure dump

When a PDF fails to process and you can't share the file, `dump_structure()` produces
a self-contained report you can paste into a bug report. It walks every object in the
file, tallies filter chains and font subtypes, lists encryption details, and records
any streams that fail to decode — without needing the original file. No license required.

### Human-readable report

```python
dump = exis_pdfeditor.dump_structure("problem.pdf")

# The dump has a built-in text report — just print it
# (or paste it into a support email / bug report)
print(dump)
```

### Individual fields

```python
dump = exis_pdfeditor.dump_structure("problem.pdf")

print(f"PDF version:    {dump.version}")
print(f"Pages:          {dump.pageCount}")
print(f"Total objects:  {dump.totalObjects}")
print(f"Stream objects: {dump.streamObjectCount}")
print(f"Xref entries:   {dump.xrefEntryCount}")
print(f"Xref streams:   {dump.usesXrefStreams}")
print(f"Producer:       {dump.producer}")
print(f"Creator:        {dump.creator}")

# Encryption details
print(f"Encrypted:      {dump.isEncrypted}")
if dump.isEncrypted:
    print(f"  Version:      {dump.encryptionVersion}")
    print(f"  Revision:     {dump.encryptionRevision}")
    print(f"  Key length:   {dump.encryptionKeyLengthBits} bits")

# Catalog flags
print(f"AcroForm:       {dump.hasAcroForm}")
print(f"Signed:         {dump.hasDigitalSignature}")
print(f"Embedded files: {dump.hasEmbeddedFiles}")

# Filter chains — which compression methods are used and how often
for chain in dump.filterChains:
    print(f"  {chain}: {dump.filterChains[chain]} streams")

# Font subtypes
for subtype in dump.fontSubtypes:
    print(f"  /{subtype}: {dump.fontSubtypes[subtype]}")

# Streams that failed to decode (capped at 50)
for bad in dump.unsupportedStreams:
    print(f"  obj {bad.objectNumber} [{bad.filterChain}]: {bad.error}")

# Free-form notes about parse anomalies
for note in dump.notes:
    print(f"  {note}")
```

---

## Licensing

- **Free**: `inspect()`, `pdfa_validate()`, `dump_structure()`, and `get_metadata()` work without a license.
- **Trial**: Call `exis_pdfeditor.initialize()` with no key for a 14-day full-feature trial.
- **Licensed**: Pass your key to `exis_pdfeditor.initialize("XXXX-XXXX-XXXX-XXXX")` or set the `EXIS_PDF_LICENSE_KEY` environment variable.
- **Evaluation**: After trial expiry, all features work on documents up to 3 pages.

Purchase a license at [pdfbatcheditor.com/developers](https://pdfbatcheditor.com/developers).

## Requirements

- Python 3.9+
- No external dependencies — the native binary is bundled in the wheel.
