Metadata-Version: 2.4
Name: pdf2ofd
Version: 0.1.0
Summary: Convert PDF files to OFD (Open Fixed-layout Document) format
Project-URL: Homepage, https://github.com/wanglrebe/pdf2ofd
Project-URL: Issues, https://github.com/wanglrebe/pdf2ofd/issues
Author: wanglrebe
License: MIT
License-File: LICENSE
Keywords: chinese,conversion,converter,document,electronic-document,gb-t-33190,invoice,ofd,pdf
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Office/Business
Classifier: Topic :: Text Processing
Requires-Python: >=3.10
Requires-Dist: lxml>=4.9
Requires-Dist: pillow>=9.0
Requires-Dist: pymupdf<1.25.0,>=1.24.0
Description-Content-Type: text/markdown

# pdf2ofd

Convert PDF files to OFD (Open Fixed-layout Document) format — a Chinese national standard for electronic documents (GB/T 33190-2016).

## Features

- Full vector path conversion (SVG → OFD), supporting all SVG path commands: `M L H V Q C S T A Z`
- Native OFD path commands: `M L Q B A C` — no lossy approximation
- Image extraction with RGBA transparency support
- Multi-page PDF support
- Command-line interface and Python API

## Installation

```bash
pip install pdf2ofd
```

> **Note:** Requires `pymupdf>=1.24.0,<1.25.0`. Newer versions have SVG export issues that affect conversion accuracy.

## Usage

### Command Line

```bash
# Convert all pages
pdf2ofd input.pdf output.ofd

# Convert specific pages (0-indexed)
pdf2ofd input.pdf output.ofd --pages 0,1,2

# Suppress output
pdf2ofd input.pdf output.ofd --quiet

# Show version
pdf2ofd --version
```

### Python API

```python
from pdf2ofd import convert

# Convert all pages
convert("input.pdf", "output.ofd")

# Convert specific pages
convert("input.pdf", "output.ofd", pages="0,1,2")

# Suppress progress output
convert("input.pdf", "output.ofd", verbose=False)
```

## How It Works

```
PDF
 ├── PyMuPDF SVG export  →  vector paths (glyphs, lines, curves)
 └── PyMuPDF image API   →  images (with RGBA/transparency)
          ↓
        OFD
```

- Vector paths are extracted from PyMuPDF's SVG export and converted to OFD `AbbreviatedData` format
- Images are extracted directly from PDF resources via `get_image_rects()`, preserving original quality and supporting multiple placements of the same image
- All SVG path commands are mapped to native OFD equivalents — cubic Bézier (`C`) maps to OFD `B`, arcs (`A`) map to OFD `A`

## Requirements

- Python >= 3.10
- pymupdf >= 1.24.0, < 1.25.0
- lxml >= 4.9
- pillow >= 9.0

## License

MIT