Metadata-Version: 2.4
Name: unpdf-markdown
Version: 0.6.3
Summary: Python bindings for unpdf - High-performance PDF content extraction
Author: iyulab
License: MIT
Project-URL: Homepage, https://github.com/iyulab/unpdf
Project-URL: Documentation, https://github.com/iyulab/unpdf
Project-URL: Repository, https://github.com/iyulab/unpdf
Project-URL: Issues, https://github.com/iyulab/unpdf/issues
Keywords: pdf,markdown,text-extraction,document,parser
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Text Processing
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown

# unpdf

Python bindings for [unpdf](https://github.com/iyulab/unpdf) - High-performance PDF content extraction to Markdown, text, and JSON.

## Installation

```bash
pip install unpdf
```

## Quick Start

```python
import unpdf

# Convert PDF to Markdown
markdown = unpdf.to_markdown("document.pdf")
print(markdown)

# Convert PDF to plain text
text = unpdf.to_text("document.pdf")
print(text)

# Convert PDF to JSON
json_data = unpdf.to_json("document.pdf", pretty=True)
print(json_data)

# Get document information
info = unpdf.get_info("document.pdf")
print(info)

# Get page count
pages = unpdf.get_page_count("document.pdf")
print(f"Total pages: {pages}")

# Check if file is a valid PDF
is_valid = unpdf.is_pdf("document.pdf")
print(f"Is valid PDF: {is_valid}")
```

## API Reference

### `to_markdown(path: str) -> str`
Convert a PDF file to Markdown format.

### `to_text(path: str) -> str`
Convert a PDF file to plain text.

### `to_json(path: str, pretty: bool = False) -> str`
Convert a PDF file to JSON format.

### `get_info(path: str) -> dict`
Get document metadata (title, author, page count, etc.)

### `get_page_count(path: str) -> int`
Get the number of pages in a PDF file.

### `is_pdf(path: str) -> bool`
Check if a file is a valid PDF.

### `version() -> str`
Get the version of the native library.

## License

MIT License
