Metadata-Version: 2.4
Name: pdfluent
Version: 1.0.0b6
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Multimedia :: Graphics
Classifier: Topic :: Office/Business
Classifier: Topic :: Text Processing
Requires-Dist: numpy ; extra == 'numpy'
Requires-Dist: pillow ; extra == 'pillow'
Requires-Dist: pytest>=7.0 ; extra == 'test'
Provides-Extra: numpy
Provides-Extra: pillow
Provides-Extra: test
License-File: LICENSE
Summary: Enterprise PDF SDK — render, extract, annotate, sign, and validate PDFs. Pure Rust, zero system dependencies.
Keywords: pdf,render,text-extraction,forms,pdf-a,annotations,redaction,rust,pdfluent
Author-email: Innovation Trigger BV <hello@pdfluent.com>
Requires-Python: >=3.8
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Bug Tracker, https://pdfluent.com/support
Project-URL: Documentation, https://pdfluent.com/docs
Project-URL: Homepage, https://pdfluent.com
Project-URL: How-to Guides, https://pdfluent.com/how-to
Project-URL: Pricing, https://pdfluent.com/pricing

# pdfluent

**Enterprise PDF SDK for Python — built on a pure-Rust stack, zero system dependencies.**

Render pages, extract text, fill forms, annotate, redact, encrypt, merge, and validate PDF/A — all from a single `pip install`.

## Installation

```bash
pip install pdfluent

# Optional extras
pip install pdfluent[pillow]   # PIL Image support
pip install pdfluent[numpy]    # NumPy array support
```

> Requires Python ≥ 3.8. Pre-built wheels for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x86_64).

## Quick Start

```python
from pdfluent import Document

# Open, inspect, render
with Document("invoice.pdf") as doc:
    print(f"{doc.page_count} pages — {doc.metadata.title}")

    img = doc[0].render(dpi=150)
    img.save("page_0.png")          # requires Pillow

# Extract text
doc = Document("report.pdf")
for page in doc:
    print(page.extract_text())

# Fill a form field and save
doc = Document("form.pdf")
doc.set_form_field("Name", "Jane Doe")
doc.save("form_filled.pdf")

# Search-and-redact
doc = Document("contract.pdf")
report = doc.redact_text("Confidential")
print(f"Redacted {report.areas_redacted} areas on {report.pages_affected} pages")
doc.save("contract_redacted.pdf")

# PDF/A validation
from pdfluent import validate_pdfa

report = validate_pdfa("archive.pdf")
if report.is_compliant:
    print(f"✓ {report.pdfa_level} compliant")
else:
    for issue in report.issues:
        print(f"[{issue.severity}] {issue.rule}: {issue.message}")

# Merge PDFs
from pdfluent import merge_pdfs
merge_pdfs(["a.pdf", "b.pdf", "c.pdf"], "merged.pdf")

# Encrypt / decrypt
doc = Document("sensitive.pdf")
doc.encrypt("sensitive_enc.pdf", password="s3cr3t")

from pdfluent import decrypt_pdf
decrypt_pdf("sensitive_enc.pdf", "sensitive_dec.pdf", password="s3cr3t")
```

## Features

| Feature | Description |
|---|---|
| **Render** | Pages to RGBA pixels, PIL Images, or NumPy arrays at any DPI |
| **Text extraction** | Plain text or structured `TextBlock`/`TextSpan` with position |
| **Text search** | Find pages containing a query string |
| **Forms (AcroForm)** | Read and fill text, checkbox, and dropdown fields |
| **Annotations** | Read existing annotations; add highlights and free-text notes |
| **Redaction** | Search-and-redact: black-box all occurrences of a string |
| **Encryption** | AES-256 (PDF 2.0) encrypt/decrypt with user + owner passwords |
| **Merge / split** | Merge multiple PDFs; split into individual pages (via page slicing) |
| **PDF/A validation** | Validate against PDF/A-1B, 2B, 3B with issue-level reporting |
| **Metadata** | Read title, author, subject, keywords, creator, producer |
| **Bookmarks** | Traverse the document outline tree |
| **Thumbnails** | Fast downscaled preview images |

## API Overview

### `Document(source, password=None)`

Opens a PDF from a file path (`str`) or raw bytes.

```python
doc = Document("file.pdf")             # from path
doc = Document(open("file.pdf","rb").read())  # from bytes
doc = Document("encrypted.pdf", password="pw")
```

**Properties:** `page_count`, `metadata`, `bookmarks`
**Methods:** `render_all(dpi)`, `search(query)`, `extract_text(page_num)`, `save(path)`,
`get_form_fields()`, `set_form_field(name, value)`, `get_annotations(page)`,
`add_annotation(page, type, rect, content)`, `redact_text(term, page=None)`,
`encrypt(path, password)`, `decrypt(path, password)`
**Protocols:** `len(doc)`, `doc[0]`, `for page in doc`, `with Document(...) as doc`

### `Page`

**Properties:** `index`, `width`, `height`, `rotation`, `geometry`
**Methods:** `render(dpi, width, height, background)`, `thumbnail(max_dimension)`,
`extract_text()`, `extract_text_blocks()`

### `RenderedImage`

**Properties:** `width`, `height`, `pixels` (raw RGBA bytes)
**Methods:** `to_pil()`, `to_numpy()`, `save(path)`

### Module-level functions

| Function | Description |
|---|---|
| `open_pdf(path, password=None)` | Alias for `Document(path)` |
| `merge_pdfs(paths, output)` | Merge a list of PDFs |
| `validate_pdfa(path)` → `ComplianceReport` | Run PDF/A validation |
| `decrypt_pdf(input, output, password)` | Decrypt to a new file |

## Comparison

| | pdfluent | pypdf | pdfminer | pdfplumber | pikepdf |
|---|---|---|---|---|---|
| Rendering | ✓ | – | – | ✓ (via pdfminer) | – |
| Text extraction | ✓ | ✓ | ✓ | ✓ | – |
| Form fill | ✓ | ✓ | – | – | ✓ |
| Redaction | ✓ | – | – | – | ✓ |
| Encryption | ✓ (AES-256) | ✓ | – | – | ✓ |
| PDF/A validation | ✓ | – | – | – | – |
| Native deps | **none** | none | none | none | libqpdf |
| Language | **Rust** | Python | Python | Python | C++ |

## Building from Source

Requires a Rust toolchain and `maturin`.

```bash
pip install maturin
# Source available to licensed customers — see https://pdfluent.com/pricing
maturin develop --release          # install in current venv
maturin build --release            # build wheel in ./dist/
```

## License

PDFluent is **proprietary, commercial software** distributed under the PDFluent Commercial License.

- [Pricing & licensing](https://pdfluent.com/pricing)
- Commercial enquiries: hello@pdfluent.com

Pre-built wheels (`pip install pdfluent`) are available for evaluation. Production use requires a valid license. See the `LICENSE` file included in this distribution for full terms.

