Metadata-Version: 2.1
Name: document-classification
Version: 0.0.2a0
Summary: Awesome document classifcation - Implementation of major techniques
License: MIT
Author: Amit Timalsina
Author-email: amittimalsina14@gmail.com
Requires-Python: >=3.10,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: albumentations (>=1.4.18,<2.0.0)
Requires-Dist: fasttext (>=0.9.3,<0.10.0)
Requires-Dist: google (>=3.0.0,<4.0.0)
Requires-Dist: google-cloud-vision (>=3.7.4,<4.0.0)
Requires-Dist: instructor (>=1.6.3,<2.0.0)
Requires-Dist: langsmith (>=0.1.139,<0.2.0)
Requires-Dist: loguru (>=0.7.2,<0.8.0)
Requires-Dist: paddleocr (>=2.9.0,<3.0.0)
Requires-Dist: paddlepaddle (>=2.6.2,<3.0.0)
Requires-Dist: pandas (>=2.2.3,<3.0.0)
Requires-Dist: pydantic (>=2.7.0,<3.0.0)
Requires-Dist: torch (>=2.5.1,<3.0.0)
Description-Content-Type: text/markdown

# Document Classification: All in one place
This package provides support to classify documents using all the popular avialable methods. Along with document classification, it also provides support to a single interface for OCR using both open source models like: Tesseract and PaddleOCR, and commercial models like Google OCR, etc.

PYPI: [document-classification](https://pypi.org/project/document-classification/)

## Features
- OCR
    - Tesseract
    - Google OCR
- Classification
    - Fasttext (train, evaluate, predict)
    - Language Models like BERT (train, evaluate, predict)
    - Language + Layout Models like LayoutLM (train, evaluate, predict)
    - LLM (evaluate, predict)

## Installation
Install with a single command:
```bash
pip install -U document-classification
```
or if you use poetry (like me):
```bash
poetry add document-classification
```

## Usuage
Please check the [examples](https://github.com/amit-timalsina/document_classification/tree/master/examples) directory for examples on how to use the package.

## Contributing

Your contributions are welcome! If you have great examples or find neat patterns, clone the repo and add another example. 
The goal is to find great patterns and cool examples to highlight.

If you encounter any issues or want to provide feedback, you can create an issue in this repository. You can also reach out to me on Twitter at [@amittimalsina14](https://x.com/amittimalsina14).

Check the [todo.md](https://github.com/amit-timalsina/document_classification/blob/master/todo.md) file for the list of features that are coming next with their due dates.

## What's coming next?
I am going to first add tests and refactor the code to make it more readable, usuable, and maintainable. Then I will release documentation and more examples.
