Metadata-Version: 2.4
Name: epubify
Version: 0.1.0
Summary: Convert PDF files to nicely structured Markdown and EPUB format
Project-URL: Homepage, https://github.com/mustafa-zidan/epubify
Project-URL: Repository, https://github.com/mustafa-zidan/epubify
Project-URL: Issues, https://github.com/mustafa-zidan/epubify/issues
Author-email: Mustafa <mustafa@zidan.me>
License: MIT
License-File: LICENSE
Keywords: converter,ebook,epub,markdown,ocr,pdf
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Graphics :: Graphics Conversion
Classifier: Topic :: Text Processing :: Markup
Requires-Python: >=3.10
Requires-Dist: latex2mathml==3.79.0
Requires-Dist: markdown==3.10.2
Requires-Dist: marker-pdf==1.10.2
Requires-Dist: pillow
Requires-Dist: torch
Requires-Dist: torchaudio
Requires-Dist: torchvision
Requires-Dist: transformers==4.57.6
Description-Content-Type: text/markdown

# Epubify

[![CI](https://github.com/mustafa-zidan/epubify/actions/workflows/ci.yml/badge.svg)](https://github.com/mustafa-zidan/epubify/actions/workflows/ci.yml)
[![PyPI version](https://badge.fury.io/py/epubify.svg)](https://pypi.org/project/epubify/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://pypi.org/project/epubify/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Convert PDF files to nicely structured Markdown and EPUB format with intelligent layout detection.

## Features

- Smart layout detection for books and academic papers
- Advanced text extraction and OCR capabilities
- Table detection and formatting
- Image extraction and optimization
- Clean markdown output with preserved structure
- EPUB generation with customizable styling
- Multi-language support
- GPU acceleration support (NVIDIA, AMD, Apple Silicon)

## Installation

### From PyPI (recommended)

```bash
pip install epubify
```

### Using uv

```bash
uv tool install epubify
```

### Using pipx

```bash
pipx install epubify
```

### From source

```bash
git clone https://github.com/mustafa-zidan/epubify.git
cd epubify
uv sync
```

### Homebrew (planned)

A Homebrew tap is planned for future releases:

```bash
# Coming soon
brew install mustafa-zidan/tap/epubify
```

For GPU support (NVIDIA/AMD/Apple Silicon), follow the official [PyTorch installation guide](https://pytorch.org/get-started/locally/).

## Dependencies

- Python 3.10+
- [uv](https://github.com/astral-sh/uv) (recommended for dependency management)
- PyTorch (with CUDA/ROCm/MPS support)
- marker-pdf, transformers, markdown

## Usage

### Command Line

```bash
epubify input.pdf
```

Or via `uv`:

```bash
uv run epubify input.pdf
```

Options:

| Option             | Description                                           |
|--------------------|-------------------------------------------------------|
| `--max-pages INT`  | Maximum number of pages to process                    |
| `--start-page INT` | Page number to start from                             |
| `--skip-epub`      | Skip EPUB generation, only create markdown            |
| `--skip-md`        | Skip markdown generation, use existing markdown files |

### As a Library

```python
from pathlib import Path
from epubify.pdf2md import convert_pdf
from epubify.mark2epub import convert_to_epub

# Convert PDF to Markdown
convert_pdf("input.pdf", Path("./output/input"))

# Convert Markdown to EPUB
convert_to_epub(Path("./output/input"), Path("./output"))
```

### Output Structure

```
output_directory/
├── document_name/
│   ├── document_name.md
│   ├── document_name.epub
│   ├── document_name_metadata.json
│   └── images/
│       ├── image1.png
│       ├── image2.jpg
│       └── ...
```

## Development

### Setup

```bash
git clone https://github.com/mustafa-zidan/epubify.git
cd epubify
uv sync --group dev
```

### Running tests

```bash
uv run pytest
```

### CI/CD

This project uses GitHub Actions for:

- **CI** (`ci.yml`) - Runs tests across Python 3.10-3.13 on every push/PR
- **Qodana** (`qodana_code_quality.yml`) - Static code analysis via JetBrains Qodana
- **Publish** (`publish.yml`) - Automatically publishes to PyPI on GitHub releases using [trusted publishing](https://docs.pypi.org/trusted-publishers/)

### Publishing a new release

1. Update the version in `pyproject.toml`
2. Create a GitHub release with a tag matching the version (e.g., `v0.1.0`)
3. The publish workflow will automatically build and upload to PyPI

## Contributing

Contributions are welcome! Please:

1. Fork the repository
2. Create a new branch for your feature
3. Commit your changes
4. Push to your branch
5. Create a Pull Request

## Known Issues

- Some image embedding might need manual adjustment
- Some complex mathematical equations might not be perfectly converted
- Certain PDF layouts with multiple columns may require manual adjustment
- Font detection might be imperfect in some cases

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- [marker-pdf](https://github.com/VikParuchuri/marker) for PDF processing
- [PyTorch](https://pytorch.org/) for GPU acceleration
- [Transformers](https://huggingface.co/transformers) for advanced text processing
