Metadata-Version: 2.4
Name: pdf2docx-healer
Version: 0.1.2
Summary: Formatting-preserving PDF-to-DOCX converter that fixes bullet lists, hyperlinks, CJK fonts, and scanned PDFs
License-Expression: MIT
Project-URL: homepage, https://github.com/krockxz/pdf2docx-healer
Project-URL: repository, https://github.com/krockxz/pdf2docx-healer
Project-URL: issues, https://github.com/krockxz/pdf2docx-healer/issues
Project-URL: changelog, https://github.com/krockxz/pdf2docx-healer/releases
Keywords: pdf,docx,converter,pdf-to-docx,formatting,hyperlink,ocr,cjk,bullet-list
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Text Processing
Classifier: Topic :: Office/Business :: Office Suites
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: pdf2docx>=0.5.0
Requires-Dist: PyMuPDF>=1.23.0
Requires-Dist: python-docx>=0.8.11
Requires-Dist: lxml
Provides-Extra: ocr
Requires-Dist: pytesseract>=0.3.10; extra == "ocr"

# pdf2docx-healer

Formatting-preserving PDF-to-DOCX converter that fixes:

- Bullet and numbered lists (proper Word list styles)
- Hyperlinks (extracted from PDF annotations)
- CJK and unavailable font fallback
- Scanned/image-based PDFs (via OCR)

## Usage

```python
from docx_healer import heal

heal("input.pdf", "output.docx")
```

```bash
pdf2docx-heal input.pdf -o output.docx
```
