abacusai.api_client_utils

Module Contents

Classes

DocstoreUtils

Utility class for loading docstore data.

Functions

clean_column_name(column)

get_clean_function_source_code(func)

avro_to_pandas_dtype(avro_type)

get_non_nullable_type(types)

get_object_from_context(client, context, ...)

load_as_pandas_from_avro_fd(fd)

load_as_pandas_from_avro_files(files, download_method)

try_abacus_internal_copy(src_suffix, dst_local[, ...])

Retuns true if the file was copied, false otherwise

Attributes

INVALID_PANDAS_COLUMN_NAME_CHARACTERS

abacusai.api_client_utils.INVALID_PANDAS_COLUMN_NAME_CHARACTERS = '[^A-Za-z0-9_]'
abacusai.api_client_utils.clean_column_name(column)
abacusai.api_client_utils.get_clean_function_source_code(func)
Parameters:

func (Callable) –

abacusai.api_client_utils.avro_to_pandas_dtype(avro_type)
abacusai.api_client_utils.get_non_nullable_type(types)
abacusai.api_client_utils.get_object_from_context(client, context, variable_name, return_type)
abacusai.api_client_utils.load_as_pandas_from_avro_fd(fd)
Parameters:

fd (IO) –

abacusai.api_client_utils.load_as_pandas_from_avro_files(files, download_method, max_workers=10)
Parameters:
  • files (List[str]) –

  • download_method (Callable) –

  • max_workers (int) –

class abacusai.api_client_utils.DocstoreUtils

Utility class for loading docstore data. Needs to be updated if docstore formats change.

DOC_ID = 'doc_id'
PREDICTION_PREFIX = 'prediction'
FIRST_PAGE = 'first_page'
LAST_PAGE = 'last_page'
PAGE_TEXT = 'page_text'
PAGES = 'pages'
TOKENS = 'tokens'
PAGES_ZIP_METADATA = 'pages_zip_metadata'
PAGE_DATA = 'page_data'
static get_archive_id(doc_id)
Parameters:

doc_id (str) –

static get_page_id(doc_id, page)
Parameters:
  • doc_id (str) –

  • page (int) –

classmethod get_pandas_pages_df(df, feature_group_version, doc_id_column, document_column, get_docstore_resource_bytes, max_workers=10)
Parameters:
  • feature_group_version (str) –

  • doc_id_column (str) –

  • document_column (str) –

  • get_docstore_resource_bytes (Callable[Ellipsis, bytes]) –

  • max_workers (int) –

classmethod get_pandas_documents_df(df, feature_group_version, doc_id_column, document_column, get_docstore_resource_bytes, max_workers=10)
Parameters:
  • feature_group_version (str) –

  • doc_id_column (str) –

  • document_column (str) –

  • get_docstore_resource_bytes (Callable) –

  • max_workers (int) –

abacusai.api_client_utils.try_abacus_internal_copy(src_suffix, dst_local, raise_exception=True)

Retuns true if the file was copied, false otherwise