Metadata-Version: 2.1
Name: cucaracha
Version: 0.6.0
Summary: Mr. Franz Cucaracha will be glad to assist you to the document analysis and processing routine
License: MIT
Author: Antonio Senra Filho
Author-email: acsenrafilho@gmail.com
Requires-Python: >=3.10,<3.13
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Image Processing
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Dist: kagglehub (>=0.3.4,<0.4.0)
Requires-Dist: numpy (>=1.22.4,<2.0.0)
Requires-Dist: opencv-python (>=4.10.0.84,<5.0.0.0)
Requires-Dist: pymupdf (>=1.24.13,<2.0.0)
Requires-Dist: rich (>=13.9.3,<14.0.0)
Requires-Dist: scikit-learn (>=1.6.1,<2.0.0)
Requires-Dist: scipy (>=1.15.1,<2.0.0)
Requires-Dist: seaborn (>=0.13.2,<0.14.0)
Requires-Dist: tensorflow (==2.16.2)
Requires-Dist: tensorflow-io-gcs-filesystem (==0.37.1)
Project-URL: Code, https://github.com/acsenrafilho/cucaracha
Project-URL: Code Issues, https://github.com/acsenrafilho/cucaracha/issues
Project-URL: Documentation, https://cucaracha.readthedocs.io/en/latest/
Description-Content-Type: text/markdown

<img src="https://raw.githubusercontent.com/acsenrafilho/cucaracha/refs/heads/main/docs/assets/cucaracha-logo.png" width=700>

[![Documentation Status](https://readthedocs.org/projects/cucaracha/badge/?version=main)](https://cucaracha.readthedocs.io/en/main/?badge=main)
[![codecov](https://codecov.io/gh/acsenrafilho/cucaracha/graph/badge.svg?token=TgmCLPoIbW)](https://codecov.io/gh/acsenrafilho/cucaracha)
[![CI Main](https://github.com/acsenrafilho/cucaracha/actions/workflows/ci-lib.yml/badge.svg?branch=main)](https://github.com/acsenrafilho/cucaracha/actions/workflows/ci-lib.yml)
[![CI Develop](https://github.com/acsenrafilho/cucaracha/actions/workflows/ci-lib.yml/badge.svg?branch=develop)](https://github.com/acsenrafilho/cucaracha/actions/workflows/ci-lib.yml)
[![PyPI version](https://badge.fury.io/py/cucaracha.svg)](https://badge.fury.io/py/cucaracha)

# cucaracha 🪳

Inspired by Franz Kafka's infamous character, Gregor Samsa — an ordinary man who wakes up to find himself transformed into a cockroach in *The Metamorphosis*. Here, `cucaracha` embodies the metaphor of a tireless, sometimes bureaucratic helper working tirelessly in the background. In the digital age, Mr. Cucaracha is here to assist you with the complex and often tedious tasks of document processing and analysis.

## Meet Mr. Cucaracha: Your Assistant for Digital Document Processing and Analysis

`cucaracha` is an open-source library crafted to help with digital document analysis and processing. It provides a toolkit for working with both structured and unstructured data, allowing users to collect, transform, and interpret textual content from various document formats, including PDFs and images. 

### Key Features

- **Text Extraction**: Efficiently retrieve text from PDFs and image files, transforming them into usable data.
- **Content Structuring**: Process extracted text into structured formats, aiding in more organized data handling and downstream applications.
- **Context Recognition**: Perform contextual analysis to interpret and label document content based on intended usage.

The major objective of this project is to offer an accessible, open-source alternative for processing document files, which provides document processing and analysis algorithms to simplify tasks that would traditionally be time-consuming or challenging to automate. 

Check it out all the public datasets and ML models used in this project located at Kaggle - [Cucaracha Project](https://www.kaggle.com/organizations/cucaracha-project)

### Why `cucaracha`?

The name `cucaracha` reflects the tireless, behind-the-scenes nature of the tool. Like Kafka's transformed character, Mr. Cucaracha deals with the mundanity and bureaucracy often present in document processing tasks. He's designed to tackle these repetitive and complex tasks with minimal oversight, ensuring efficient and structured data extraction without the typical hurdles of document handling.

### Getting Started

Check out the [full documentation](https://cucaracha.readthedocs.io/en/main/) for detailed instructions on how to use, implement, and keep up with updates to `cucaracha`. 

### Contributing to `cucaracha`

We welcome contributions to `cucaracha`! To get involved, take a look at the [open issues](https://github.com/acsenrafilho/cucaracha/issues) and join us in enhancing Mr. Cucaracha's capabilities. Whether you're here to fix bugs, suggest features, or work on documentation, your input is valuable to the project.

Happy document processing with Mr. Cucaracha! 🪳

## How to install

A quick to use install is via `pip`, as follows:

> [!NOTE]
> The installation requires Python 3.9 or higher

```bash
pip install cucaracha
```

