abacusai.document_data
Module Contents
Classes
Data extracted from a docstore document. |
- class abacusai.document_data.DocumentData(client, docId=None, mimeType=None, pageCount=None, extractedText=None, embeddedText=None, pages=None, tokens=None, metadata=None)
Bases:
abacusai.return_class.AbstractApiClass
Data extracted from a docstore document.
- Parameters:
client (ApiClient) – An authenticated API Client instance
docId (str) – Unique Docstore string identifier for the document.
mimeType (str) – The mime type of the document.
pageCount (int) – The total number of pages in document.
extractedText (str) – The extracted text in the document obtained from OCR.
embeddedText (str) – The embedded text in the document. Only available for digital documents.
pages (list) – List of embedded text for each page in the document. Only available for digital documents.
tokens (list) – List of extracted tokens in the document obtained from OCR.
metadata (list) – List of metadata for each page in the document.
- __repr__()
Return repr(self).