abacusai.api_class.dataset

Module Contents

Classes

ParsingConfig

Helper class that provides a standard way to create an ABC using

DocumentProcessingConfig

Document processing configuration.

class abacusai.api_class.dataset.ParsingConfig

Bases: abacusai.api_class.abstract.ApiClass

Helper class that provides a standard way to create an ABC using inheritance.

escape: str
csv_delimiter: str
file_path_with_schema: str
class abacusai.api_class.dataset.DocumentProcessingConfig

Bases: abacusai.api_class.abstract.ApiClass

Document processing configuration.

Parameters:
  • extract_bounding_boxes (bool) – Whether to perform OCR and extract bounding boxes. If False, no OCR will be done but only the embedded text from digital documents will be extracted. Defaults to False.

  • ocr_mode (OcrMode) – OCR mode. There are different OCR modes available for different kinds of documents and use cases. This option only takes effect when extract_bounding_boxes is True.

  • use_full_ocr (bool) – Whether to perform full OCR. If True, OCR will be performed on the full page. If False, OCR will be performed on the non-text regions only. By default, it will be decided automatically based on the OCR mode and the document type. This option only takes effect when extract_bounding_boxes is True.

  • remove_header_footer (bool) – Whether to remove headers and footers. Defaults to False. This option only takes effect when extract_bounding_boxes is True.

  • remove_watermarks (bool) – Whether to remove watermarks. By default, it will be decided automatically based on the OCR mode and the document type. This option only takes effect when extract_bounding_boxes is True.

  • convert_to_markdown (bool) – Whether to convert extracted text to markdown. Defaults to False. This option only takes effect when extract_bounding_boxes is True.

extract_bounding_boxes: bool = False
ocr_mode: abacusai.api_class.enums.OcrMode
use_full_ocr: bool
remove_watermarks: bool
convert_to_markdown: bool = False