Metadata-Version: 2.4
Name: ocr-my-mess
Version: 0.2.0
Summary: A PDF pipeline to convert, OCR, and merge documents.
Project-URL: Homepage, https://github.com/example/ocr-my-mess
Project-URL: Bug Tracker, https://github.com/example/ocr-my-mess/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: rich>=13.0.0
Requires-Dist: typer[all]>=0.9.0
Requires-Dist: pypdf>=3.0.0
Requires-Dist: ttkbootstrap>=1.10.1
Requires-Dist: ocrmypdf
Dynamic: license-file

# ocr-my-mess

A complete and modular Python pipeline to convert, OCR, and merge all your documents into a single, searchable PDF.

## Features

- **Recursive Conversion**: Traverses a directory to find all supported files (images, office documents, archives, existing PDFs).
- **OCR Processing**: Applies OCR to all documents using `ocrmypdf` to make them text-searchable.
- **Hierarchical Merging**: Merges all generated PDFs into a single file with a table of contents that mirrors the original folder structure.
- **Dual Interfaces**: Usable as both a powerful Command-Line Interface (`ocr-my-mess-cli`) and a simple Graphical User Interface (`ocr-my-mess-gui`).
- **Cross-Platform**: Packaged with PyInstaller to run on Windows, macOS, and Linux.

## Installation

### Using Conda (Recommended)

This is the easiest way to get started, as it handles all dependencies, last version of tessaract and Python itself.

```bash
# 1. Create the conda environment
conda env create -f environment.yml

# 2. Activate the environment
conda activate ocr-my-mess

# 3. Install the project in editable mode
pip install -e .
```

### Using Pip

```bash
# 1. Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate

# 2. Install dependencies
pip install -r requirements.txt

# 3. Install the project in editable mode
pip install -e .
```

**Note**: For office document conversion, you must have `LibreOffice` installed and available in your system's PATH.

## Usage

The application can be run in two modes:

- **Command-Line Interface (CLI)**: If you provide any arguments.
- **Graphical User Interface (GUI)**: If you run it without any arguments.

### Command-Line Interface (CLI)

The CLI provides several commands, including `run`, `convert` and `merge`.

```bash
# General help
ocr-my-mess --help

# Get version
ocr-my-mess -v

# Run the full pipeline on a directory
ocr-my-mess run --input /path/to/docs --output /path/to/final.pdf --lang en+fr

# Just convert and OCR all documents in a folder
ocr-my-mess convert --input-dir /path/to/docs --output-dir /path/to/output

# Just merge all PDFs in a folder into a single file
ocr-my-mess merge --input-dir /path/to/output --output-file /path/to/final.pdf
```

### Graphical User Interface (GUI)

For a more visual approach, you can launch the GUI by running the command without any arguments.

```bash
ocr-my-mess
```

This will open a window allowing you to:
- Select input and output directories.
- Choose OCR languages.
- Run the full pipeline.
- See live logs and progress.

## Development

### Running Tests

To ensure everything is working correctly, run the automated tests:

```bash
pytest
```

### Building Executables

This project uses PyInstaller to create a standalone executable. A build script is provided in the `scripts/` directory.

```bash
# Build the executable
python scripts/build.py
```

The executable will be located in the `dist/` directory.

**Note**: When running the GUI from the executable on Windows or macOS, a console window will appear alongside the main application window. This is expected behavior.
