Metadata-Version: 2.4
Name: pdf2any
Version: 0.5.13
Summary: Open source Python library converting PDF to docx, HTML, and Markdown.
Home-page: https://artifex.com/
Author: Artifex
Author-email: support@artifex.com
License: MIT
Keywords: pdf-to-word,pdf-to-docx,pdf-to-html,pdf-to-markdown,pdf-converter
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: PyMuPDF>=1.26.7
Requires-Dist: python-docx>=0.8.10
Requires-Dist: fonttools>=4.24.0
Requires-Dist: numpy>=1.17.2
Requires-Dist: opencv-python>=4.5
Requires-Dist: fire>=0.3.0
Requires-Dist: lxml
Dynamic: author
Dynamic: author-email
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# pdf2any

![python-version](https://img.shields.io/badge/python-%3E=3.10-green.svg)

**PDF to DOCX, HTML, and Markdown converter — extract text, tables, and images from PDFs.**

## Features

- Convert PDF to **DOCX** (Word documents with full formatting)
- Convert PDF to **HTML** (preserves layout, tables and images)
- Convert PDF to **Markdown** (clean, readable text with tables)
- Preserve document structure: paragraphs, tables, images, text styling
- Extract tables from PDFs
- Multi-processing support for large documents
- Command-line and Python API interfaces

## Installation

```bash
pip install pdf2any
```

## Quick Start

### Command Line

```bash
# Convert PDF to DOCX
pdf2any convert input.pdf output.docx

# Convert PDF to HTML
pdf2any convert-html input.pdf output.html

# Convert PDF to Markdown (no page breaks)
pdf2any convert-md input.pdf output.md --nopage_break

# Convert specific pages
pdf2any convert input.pdf output.docx --pages=1,3,5
```

### Python API

```python
from pdf2any import Converter

# Convert to DOCX
cv = Converter("input.pdf")
cv.convert("output.docx")

# Convert to HTML (no page breaks)
cv.convert_html("output.html", page_break=False)

# Convert to Markdown
cv.convert_md("output.md", page_break=False)

# Extract tables
tables = cv.extract_tables()
cv.close()
```

### Key Options

| Option | Description | Default |
|--------|-------------|---------|
| `--pages` | Specific pages to convert (e.g. `1,3,5`) | All |
| `--nopage_break` | Remove page separators in output | `False` |
| `--remove_header_footer` | Remove headers and footers | `False` |
| `--multi_processing` | Enable parallel processing | `False` |

## Documentation

- [Installation](https://pdf2any.readthedocs.io/en/latest/installation.html)
- [Quickstart](https://pdf2any.readthedocs.io/en/latest/quickstart.html)
  - [Convert PDF](https://pdf2any.readthedocs.io/en/latest/quickstart.convert.html)
  - [Extract table](https://pdf2any.readthedocs.io/en/latest/quickstart.table.html)
  - [Command Line Interface](https://pdf2any.readthedocs.io/en/latest/quickstart.cli.html)
- [API Documentation](https://pdf2any.readthedocs.io/en/latest/modules.html)

## License

MIT License — see [LICENSE](LICENSE) for details.
