Metadata-Version: 2.4
Name: any-document-extractor
Version: 0.1.3
Summary: A Python library for extracting text content from any document format.
Home-page: https://github.com/yeqing215777/any-document-extractor
Author: yeqing
Author-email: 215777@qq.com
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: python-docx
Requires-Dist: python-multipart
Requires-Dist: openpyxl
Requires-Dist: pdfminer.six
Requires-Dist: camelot-py
Requires-Dist: python-pptx
Dynamic: author
Dynamic: author-email
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Any document Extractor

A Python library for extracting text content from any document format.

## Features

- Supports multiple document formats (PPTX, DOCX, PDF, XLSX.)
- Returns clean extracted text

## Installation

```bash
pip install any-document-extractor
````



## Usage
Basic usage example:

```python

from anydocumentextractor import DocumentExtractor


def main(fp: str):
    extra = DocumentExtractor(fp)
    return extra.extract()


if __name__ == '__main__':
    fp = 'text.docx'  # Can be any supported document
    content = main(fp)
    print(content)

```

## Supported Formats
- Microsoft Office: PPTX, DOCX, XLSX
- OpenDocument: ODT, ODP
- PDF documents
- Plain text files
- And more...

## build to PYPI

```shell
rm -rf dist/ build/ *.egg-info/
python setup.py sdist bdist_wheel
twine upload dist/*
```

## License
MIT License - Free for commercial and personal use.

You can customize this further by adding:
- More detailed installation instructions
- Specific version requirements
- Advanced usage examples
- Error handling documentation
- Contribution guidelines
- Project status badges

