module documentation
ZMS content extraction toolkit module
This module provides helpful functions and classes for use in Python Scripts. It can be accessed from Python with the statement "import Products.zms.content_extraction"
| Function | extract |
No summary |
| Function | extract |
Apply the pdfminer.six library to extract text from a PDF file. Pdfminer.six is a community maintained fork of the original PDFMiner. It is a tool for extracting information from PDF documents. It focuses on getting and analyzing text data... |
| Function | extract |
Apache Tika - a content analysis toolkit |
| Function | extract |
Removes html tags and converts html entities to plain text. |
| Variable | security |
Undocumented |
Apply the pdfminer.six library to extract text from a PDF file. Pdfminer.six is a community maintained fork of the original PDFMiner. It is a tool for extracting information from PDF documents. It focuses on getting and analyzing text data. Pdfminer.six extracts the text from a page directly from the sourcecode of the PDF. Install: pip install pdfminer.six
| Parameters | |
| context | the ZMS-context |
| b:bytes | pdf data stream |
| content | the content type |
| See Also | |
| //github.com/pdfminer/pdfminer.six | |