Metadata-Version: 2.4
Name: pdfblah
Version: 0.3.0
Summary: Real find and replace on the actual text in a PDF. No overlay, metadata preserved.
Project-URL: Homepage, https://pdfblah.com
Project-URL: Source, https://github.com/KuvopLLC/pdfblah
Project-URL: Issues, https://github.com/KuvopLLC/pdfblah/issues
Author: Kuvop LLC
License-Expression: MIT
License-File: LICENSE
Keywords: acrobat-alternative,cli,edit,find,pdf,pdf-editor,redact,replace,text
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Text Processing
Classifier: Topic :: Utilities
Requires-Python: >=3.9
Requires-Dist: pdfplumber>=0.11
Requires-Dist: pikepdf>=8
Provides-Extra: test
Requires-Dist: pytest; extra == 'test'
Requires-Dist: reportlab; extra == 'test'
Description-Content-Type: text/markdown

# pdfblah

[![PyPI](https://img.shields.io/pypi/v/pdfblah)](https://pypi.org/project/pdfblah/)
[![Python](https://img.shields.io/pypi/pyversions/pdfblah)](https://pypi.org/project/pdfblah/)
[![CI](https://github.com/KuvopLLC/pdfblah/actions/workflows/ci.yml/badge.svg)](https://github.com/KuvopLLC/pdfblah/actions/workflows/ci.yml)
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

Real find and replace on the actual text in a PDF, from the command line.

![pdfblah demo](https://pdfblah.com/demo.gif?v=2)

Most tools "edit" a PDF by painting a box over the old text and drawing new text
on top, which leaves the original underneath (copy and paste still reveals it) and
often adds a watermark. `pdfblah` rewrites the real text in the content stream, so:

- the old text is genuinely gone (`pdftotext`, Ctrl-F, and copy show only the new value)
- no overlay, no watermark
- the original metadata (dates, Producer, XMP) is preserved byte for byte
- alignment is auto-detected and kept, so right-aligned numbers stay flush
- fonts it cannot reproduce are refused instead of garbled

Pure Python (pdfplumber + pikepdf). No system dependencies.

## Install

```sh
pipx install pdfblah      # recommended, isolated; or:  pip install pdfblah
```

On a Mac with Homebrew, use Homebrew's pipx:

```sh
brew install pipx && pipx install pdfblah
```

Also works with [uv](https://github.com/astral-sh/uv): `uv tool install pdfblah`.

## Use

Replace the first match:

```sh
pdfblah in.pdf out.pdf --find "Old Name" --replace "New Name"
```

Options:

```sh
--scope all         change every match           (default: first)
--scope 3           change the 3rd match
--ci                ignore case
--word              whole word only ("cat" will not match "category")
--page 2            only page 2
--replace ""        delete the text
```

Many rules from a file (`FIND | REPLACE | FLAGS` per line):

```sh
pdfblah in.pdf out.pdf --rules rules.txt
```

```
# rules.txt
Old Company Name | New Company Name | all
CONFIDENTIAL DRAFT | FINAL | ci
Jane Doe | John Smith | all word
Total | Sum | 2
delete this phrase |
```

## Library

```python
from pdfblah import process, apply_rules, parse_rules_file

process("in.pdf", "out.pdf", "999.00", "42.00", scope="all", ci=True)
```

Each call returns a report dict (`ok`, `count`, `refused`, `reason`, ...).

## What it does not do

Scanned PDFs (image only, no text layer) cannot be edited. Fonts that are not
embedded and not standard, or use a custom encoding, are refused rather than
rendered wrong. This is by design: a wrong-looking edit is worse than a clear "no".

## Hosted version

Want it without installing anything, or for a non-technical colleague? The hosted
version at **[pdfblah.com](https://pdfblah.com)** does the same edit in the browser:
upload, preview for free, download.

## License

MIT, (c) 2026 Kuvop LLC.
