Metadata-Version: 2.4
Name: pdf_ingest
Version: 1.0.8
Summary: PDF Ingester
License: BSD 3-Clause License
Keywords: template-python-cmd
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: langdetect

# PDF Ingest

[![Lint](https://github.com/zackees/pdf-ingest/actions/workflows/lint.yml/badge.svg)](https://github.com/zackees/pdf-ingest/actions/workflows/lint.yml)
[![Build Docker Image](https://github.com/zackees/pdf-ingest/actions/workflows/build_docker_image.yml/badge.svg)](https://github.com/zackees/pdf-ingest/actions/workflows/build_docker_image.yml)

# Language(X) -> English translations

[fairseq](https://github.com/facebookresearch/fairseq)

https://github.com/facebookresearch/fairseq/blob/main/examples/translation/README.md


# Use

```
pdf-ingest X:\yourfiles
```


# Misc

  * How to use GPU in paddleocr
  * https://github.com/PaddlePaddle/PaddleOCR/issues/10429


# Instructions from Mike Adams

MA: I was wondering though if the filename could have a pre-extension based on language like *-EN.txt
MA: Or *-RUS.txt, etc.
MA: Like, if it's easy for your program to realize what language it is

ME: that's trivial
ME: but how is your AI going to make sense out of different languages?

MA: We are just gonna archive non-English for now, and only process English

