Metadata-Version: 2.1
Name: py-doc
Version: 0.1.3
Summary: Used for working with documentations in Python.
Home-page: https://github.com/connorholm/py-doc
Author: Connor Holm
Author-email: connorjholm@gmail.com
License: MIT
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
Requires-Dist: matplotlib (>=3.2.2)
Requires-Dist: numpy (<1.24.0,>=1.18.5)
Requires-Dist: opencv-python (>=4.1.1)
Requires-Dist: Pillow (>=7.1.2)
Requires-Dist: PyYAML (>=5.3.1)
Requires-Dist: requests (>=2.23.0)
Requires-Dist: scipy (>=1.4.1)
Requires-Dist: torch (!=1.12.0,>=1.7.0)
Requires-Dist: torchvision (!=0.13.0,>=0.8.1)
Requires-Dist: tqdm (>=4.41.0)
Requires-Dist: protobuf (<4.21.3)
Requires-Dist: pytesseract
Requires-Dist: pymupdf
Requires-Dist: tensorboard (>=2.4.1)
Requires-Dist: pandas (>=1.1.4)
Requires-Dist: seaborn (>=0.11.0)
Requires-Dist: ipython
Requires-Dist: psutil
Requires-Dist: thop
Requires-Dist: py-doc

# PyDoc
A library for interacting with pdf documents.

### Installation
```
pip install py-doc
```

### Get Started
How to use the library:

```python
from py_doc import Image 

# Instantiate a Document object 
image = Image('path/to/image.jpg')

# Use draw_classifications to find document classifications
image.draw_classifications("outupt.jpg")

# Additionally, if you just want the bounding boxes use get_bounding_box()
image.get_bboxes()
```
To do optical character recognition (OCR), you will need to install [Tesseract](https://github.com/tesseract-ocr/tesseract)
on your machine and make sure it is added to your PATH. If you don't need OCR, you can skip this step. 

```python
from py_doc import Image
image = Image('path/to/image.jpg')

# Use get_text() to get all the text from the image
print(image.get_text())

# Use get_text_from_bbox() to get text from a specific bounding box
bbox = image.get_bboxes()[0]
print(image.get_text_from_bbox(bbox))
```

### Documentation
The documentation for this library can be found [here](https://py-doc.readthedocs.io/en/latest/index.html#).

### Examples
This image is a sample of the output of the draw_classifications() method. The bounding boxes are drawn around the document classifications.
![Sample Output](tests/documents/output.jpg)

### Contributing
Run the following instructions after pushing to the repo:
1. make html (update documentation - need to be in docs directory)
2. update version in setup.py
3. python setup.py sdist bdist_wheel (builds the package)
4. delete previous versions in dist folder
4. twine check dist/* (checks the package)
5. twine upload dist/* (uploads the package to PyPi)

