abacusai.api_class.dataset
Module Contents
Classes
Helper class that provides a standard way to create an ABC using |
|
Document processing configuration. |
- class abacusai.api_class.dataset.ParsingConfig
Bases:
abacusai.api_class.abstract.ApiClass
Helper class that provides a standard way to create an ABC using inheritance.
- class abacusai.api_class.dataset.DocumentProcessingConfig
Bases:
abacusai.api_class.abstract.ApiClass
Document processing configuration.
- Parameters:
extract_bounding_boxes (bool) – Whether to perform OCR and extract bounding boxes. If False, no OCR will be done but only the embedded text from digital documents will be extracted. Defaults to False.
ocr_mode (OcrMode) – OCR mode. There are different OCR modes available for different kinds of documents and use cases. This option only takes effect when extract_bounding_boxes is True.
use_full_ocr (bool) – Whether to perform full OCR. If True, OCR will be performed on the full page. If False, OCR will be performed on the non-text regions only. By default, it will be decided automatically based on the OCR mode and the document type. This option only takes effect when extract_bounding_boxes is True.
remove_header_footer (bool) – Whether to remove headers and footers. Defaults to False. This option only takes effect when extract_bounding_boxes is True.
remove_watermarks (bool) – Whether to remove watermarks. By default, it will be decided automatically based on the OCR mode and the document type. This option only takes effect when extract_bounding_boxes is True.
convert_to_markdown (bool) – Whether to convert extracted text to markdown. Defaults to False. This option only takes effect when extract_bounding_boxes is True.
- ocr_mode: abacusai.api_class.enums.OcrMode