abacusai
Subpackages
abacusai.api_class
abacusai.api_class.abstract
abacusai.api_class.batch_prediction
abacusai.api_class.blob_input
abacusai.api_class.dataset
abacusai.api_class.document_retriever
abacusai.api_class.enums
abacusai.api_class.feature_group
abacusai.api_class.model
abacusai.api_class.monitor
abacusai.api_class.monitor_alert
abacusai.api_class.project
abacusai.api_class.python_functions
abacusai.api_class.refresh
Submodules
abacusai.abacus_api
abacusai.agent
abacusai.agent_version
abacusai.ai_building_task
abacusai.algorithm
abacusai.annotation
abacusai.annotation_config
abacusai.annotation_document
abacusai.annotation_entry
abacusai.annotations_status
abacusai.api_client_utils
abacusai.api_endpoint
abacusai.api_key
abacusai.app_user_group
abacusai.application_connector
abacusai.batch_prediction
abacusai.batch_prediction_version
abacusai.categorical_range_violation
abacusai.chat_message
abacusai.chat_session
abacusai.client
abacusai.code_source
abacusai.concatenation_config
abacusai.cpu_gpu_memory_specs
abacusai.cryptography
abacusai.custom_loss_function
abacusai.custom_metric
abacusai.custom_metric_version
abacusai.custom_train_function_info
abacusai.data_consistency_duplication
abacusai.data_metrics
abacusai.data_prep_logs
abacusai.data_quality_results
abacusai.database_connector
abacusai.dataset
abacusai.dataset_column
abacusai.dataset_version
abacusai.dataset_version_logs
abacusai.deployment
abacusai.deployment_auth_token
abacusai.deployment_conversation
abacusai.deployment_conversation_event
abacusai.document
abacusai.document_annotation
abacusai.document_retriever
abacusai.document_retriever_config
abacusai.document_retriever_lookup_result
abacusai.document_retriever_version
abacusai.document_store
abacusai.document_store_import
abacusai.drift_distribution
abacusai.drift_distributions
abacusai.eda
abacusai.eda_chart_description
abacusai.eda_collinearity
abacusai.eda_data_consistency
abacusai.eda_feature_association
abacusai.eda_feature_collinearity
abacusai.eda_forecasting_analysis
abacusai.eda_version
abacusai.embedding_feature_drift_distribution
abacusai.execute_feature_group_operation
abacusai.external_application
abacusai.feature
abacusai.feature_distribution
abacusai.feature_drift_record
abacusai.feature_drift_summary
abacusai.feature_group
abacusai.feature_group_document
abacusai.feature_group_export
abacusai.feature_group_export_config
abacusai.feature_group_export_download_url
abacusai.feature_group_lineage
abacusai.feature_group_refresh_export_config
abacusai.feature_group_row
abacusai.feature_group_row_process
abacusai.feature_group_row_process_logs
abacusai.feature_group_row_process_summary
abacusai.feature_group_template
abacusai.feature_group_template_variable_options
abacusai.feature_group_version
abacusai.feature_importance
abacusai.feature_mapping
abacusai.feature_record
abacusai.file_connector
abacusai.file_connector_instructions
abacusai.file_connector_verification
abacusai.forecasting_analysis_graph_data
abacusai.forecasting_monitor_item_analysis
abacusai.forecasting_monitor_summary
abacusai.function_logs
abacusai.generated_pit_feature_config_option
abacusai.graph_dashboard
abacusai.holdout_analysis
abacusai.holdout_analysis_version
abacusai.indexing_config
abacusai.inferred_feature_mappings
abacusai.item_statistics
abacusai.llm_code_block
abacusai.llm_execution_preview
abacusai.llm_execution_result
abacusai.llm_input
abacusai.llm_parameters
abacusai.llm_response
abacusai.memory_options
abacusai.model
abacusai.model_artifacts_export
abacusai.model_blueprint_export
abacusai.model_blueprint_stage
abacusai.model_location
abacusai.model_metrics
abacusai.model_monitor
abacusai.model_monitor_org_summary
abacusai.model_monitor_summary
abacusai.model_monitor_summary_from_org
abacusai.model_monitor_version
abacusai.model_monitor_version_metric_data
abacusai.model_training_type_for_deployment
abacusai.model_upload
abacusai.model_version
abacusai.modification_lock_info
abacusai.module
abacusai.monitor_alert
abacusai.monitor_alert_version
abacusai.natural_language_explanation
abacusai.nested_feature
abacusai.null_violation
abacusai.organization_external_application_settings
abacusai.organization_group
abacusai.organization_search_result
abacusai.organization_secret
abacusai.page_data
abacusai.pipeline
abacusai.pipeline_reference
abacusai.pipeline_step
abacusai.pipeline_step_version
abacusai.pipeline_step_version_logs
abacusai.pipeline_step_version_reference
abacusai.pipeline_version
abacusai.pipeline_version_logs
abacusai.point_in_time_feature
abacusai.point_in_time_group
abacusai.point_in_time_group_feature
abacusai.prediction_client
abacusai.prediction_dataset
abacusai.prediction_feature_group
abacusai.prediction_input
abacusai.prediction_metric
abacusai.prediction_metric_version
abacusai.prediction_operator
abacusai.prediction_operator_version
abacusai.problem_type
abacusai.project
abacusai.project_config
abacusai.project_feature_group
abacusai.project_feature_group_schema
abacusai.project_feature_group_schema_version
abacusai.project_validation
abacusai.python_function
abacusai.python_function_validator
abacusai.python_plot_function
abacusai.range_violation
abacusai.refresh_pipeline_run
abacusai.refresh_policy
abacusai.refresh_schedule
abacusai.resolved_feature_group_template
abacusai.return_class
abacusai.schema
abacusai.streaming_auth_token
abacusai.streaming_client
abacusai.streaming_connector
abacusai.test_point_predictions
abacusai.training_config_options
abacusai.type_violation
abacusai.upload
abacusai.upload_part
abacusai.use_case
abacusai.use_case_requirements
abacusai.user
abacusai.user_exception
abacusai.webhook
Package Contents
Classes
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Batch Prediction Config for the ANOMALY_DETECTION problem type |
|
Batch Prediction Config for the ANOMALY_OUTLIERS problem type |
|
Batch Prediction Config for the FORECASTING problem type |
|
Batch Prediction Config for the NAMED_ENTITY_EXTRACTION problem type |
|
Batch Prediction Config for the PERSONALIZATION problem type |
|
Batch Prediction Config for the PREDICTIVE_MODELING problem type |
|
Batch Prediction Config for the PRETRAINED_MODELS problem type |
|
Batch Prediction Config for the SENTENCE_BOUNDARY_DETECTION problem type |
|
Batch Prediction Config for the THEME_ANALYSIS problem type |
|
Batch Prediction Config for the ChatLLM problem type |
|
Helper class that provides a standard way to create an ABC using |
|
Binary large object input data. |
|
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Generic enumeration. |
|
Configs for vector store indexing. |
|
Configs for document retriever. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
An abstract class for the sampling config of a feature group |
|
The number of distinct values of the key columns to include in the sample, or number of rows if key columns not specified. |
|
The fraction of distinct values of the feature group to include in the sample. |
|
Helper class that provides a standard way to create an ABC using |
|
An abstract class for the merge config of a feature group |
|
Merge LAST N chunks/versions of an incremental dataset. |
|
Merge rows within a given timewindow of the most recent timestamp |
|
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Training config for the PERSONALIZATION problem type |
|
Training config for the PREDICTIVE_MODELING problem type |
|
Training config for the FORECASTING problem type |
|
Training config for the NAMED_ENTITY_EXTRACTION problem type |
|
Training config for the NATURAL_LANGUAGE_SEARCH problem type |
|
Training config for the CHAT_LLM problem type |
|
Training config for the SENTENCE_BOUNDARY_DETECTION problem type |
|
Training config for the SENTIMENT_DETECTION problem type |
|
Training config for the DOCUMENT_CLASSIFICATION problem type |
|
Training config for the DOCUMENT_SUMMARIZATION problem type |
|
Training config for the DOCUMENT_VISUALIZATION problem type |
|
Training config for the CLUSTERING problem type |
|
Training config for the CLUSTERING_TIMESERIES problem type |
|
Training config for the EVENT_ANOMALY problem type |
|
Training config for the CUMULATIVE_FORECASTING problem type |
|
Training config for the ANOMALY_DETECTION problem type |
|
Training config for the THEME ANALYSIS problem type |
|
Training config for the AI_AGENT problem type |
|
Training config for the CUSTOM_TRAINED_MODEL problem type |
|
Training config for the CUSTOM_ALGORITHM problem type |
|
Training config for the OPTIMIZATION problem type |
|
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Accuracy Below Threshold Condition Config for Monitor Alerts |
|
Feature Drift Condition Config for Monitor Alerts |
|
Data Integrity Violation Condition Config for Monitor Alerts |
|
Bias Violation Condition Config for Monitor Alerts |
|
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Email Action Config for Monitor Alerts |
|
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
A config class for python function arguments |
|
A config class for python function arguments |
|
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Abacus.AI API Client |
|
Options for configuring the ApiClient |
|
Abacus.AI Read Only API Client. Only contains GET methods |
|
Abacus.AI Prediction API Client. Does not utilize authentication and only contains public prediction methods |
|
Abacus.AI Streaming API Client. Does not utilize authentication and only contains public streaming methods |
Attributes
- class abacusai.ApiClass
Bases:
abc.ABC
Helper class that provides a standard way to create an ABC using inheritance.
- __post_init__()
- classmethod _get_builder()
- __str__()
Return str(self).
- _repr_html_()
- to_dict()
Standardizes converting an ApiClass to dictionary. Keys of response dictionary are converted to camel case. This also validates the fields ( type, value, etc ) received in the dictionary.
- class abacusai._ApiClassFactory
Bases:
abc.ABC
Helper class that provides a standard way to create an ABC using inheritance.
- config_abstract_class
- config_class_key
- config_class_map
- class abacusai.BatchPredictionArgs
Bases:
abacusai.api_class.abstract.ApiClass
Helper class that provides a standard way to create an ABC using inheritance.
- problem_type: abacusai.api_class.enums.ProblemType
- classmethod _get_builder()
- class abacusai.AnomalyDetectionBatchPredictionArgs
Bases:
BatchPredictionArgs
Batch Prediction Config for the ANOMALY_DETECTION problem type :param for_eval: If True, the test fold which was created during training and used for metrics calculation will be used as input data. These predictions are hence, used for model evaluation. :type for_eval: bool :param prediction_time_endpoint: The end point for predictions. :type prediction_time_endpoint: str :param prediction_time_range: Over what period of time should we make predictions (in seconds). :type prediction_time_range: int :param minimum_anomaly_score: Exclude results with an anomaly score (1 in x event) below this threshold. Range: [1, 1_000_000_000_000]. :type minimum_anomaly_score: int :param summary_mode: Only show top anomalies per ID. :type summary_mode: bool :param attach_raw_data: Return raw data along with anomalies. :type attach_raw_data: bool :param small_batch: Size of batch data guaranteed to be small. :type small_batch: bool
- __post_init__()
- class abacusai.AnomalyOutliersBatchPredictionArgs
Bases:
BatchPredictionArgs
Batch Prediction Config for the ANOMALY_OUTLIERS problem type :param for_eval: If True, the test fold which was created during training and used for metrics calculation will be used as input data. These predictions are hence, used for model evaluation. :type for_eval: bool :param threshold: The threshold for detecting an anomaly. Range: [0.8, 0.99] :type threshold: float
- __post_init__()
- class abacusai.ForecastingBatchPredictionArgs
Bases:
BatchPredictionArgs
Batch Prediction Config for the FORECASTING problem type :param for_eval: If True, the test fold which was created during training and used for metrics calculation will be used as input data. These predictions are hence, used for model evaluation :type for_eval: bool :param predictions_start_date: The start date for predictions. :type predictions_start_date: str :param use_prediction_offset: If True, use prediction offset. :type use_prediction_offset: bool :param start_date_offset: Sets prediction start date as this offset relative to the prediction start date. :type start_date_offset: int :param forecasting_horizon: The number of timestamps to predict in the future. Range: [1, 1000]. :type forecasting_horizon: int :param item_attributes_to_include_in_the_result: List of columns to include in the prediction output. :type item_attributes_to_include_in_the_result: list :param explain_predictions: If True, explain predictions for the forecast. :type explain_predictions: bool
- __post_init__()
- class abacusai.NamedEntityExtractionBatchPredictionArgs
Bases:
BatchPredictionArgs
Batch Prediction Config for the NAMED_ENTITY_EXTRACTION problem type :param for_eval: If True, the test fold which was created during training and used for metrics calculation will be used as input data. These predictions are hence, used for model evaluation. :type for_eval: bool :param verbose_predictions: Return prediction inputs, predicted annotations and token label probabilities. :type verbose_predictions: bool
- __post_init__()
- class abacusai.PersonalizationBatchPredictionArgs
Bases:
BatchPredictionArgs
Batch Prediction Config for the PERSONALIZATION problem type :param for_eval: If True, the test fold which was created during training and used for metrics calculation will be used as input data. These predictions are hence, used for model evaluation. :type for_eval: bool :param number_of_items: Number of items to recommend. :type number_of_items: int :param result_columns: List of columns to include in the prediction output. :type result_columns: list :param score_field: If specified, relative item scores will be returned using a field with this name :type score_field: str
- __post_init__()
- class abacusai.PredictiveModelingBatchPredictionArgs
Bases:
BatchPredictionArgs
Batch Prediction Config for the PREDICTIVE_MODELING problem type :param for_eval: If True, the test fold which was created during training and used for metrics calculation will be used as input data. These predictions are hence, used for model evaluation. :type for_eval: bool :param explainer_type: The type of explainer to use to generate explanations on the batch prediction. :type explainer_type: enums.ExplainerType :param number_of_samples_to_use_for_explainer: Number Of Samples To Use For Kernel Explainer. :type number_of_samples_to_use_for_explainer: int :param include_multi_class_explanations: If True, Includes explanations for all classes in multi-class classification. :type include_multi_class_explanations: bool :param features_considered_constant_for_explanations: Comma separate list of fields to treat as constant in SHAP explanations. :type features_considered_constant_for_explanations: str :param importance_of_records_in_nested_columns: Returns importance of each index in the specified nested column instead of SHAP column explanations. :type importance_of_records_in_nested_columns: str :param explanation_filter_lower_bound: If set explanations will be limited to predictions above this value, Range: [0, 1]. :type explanation_filter_lower_bound: float :param explanation_filter_upper_bound: If set explanations will be limited to predictions below this value, Range: [0, 1]. :type explanation_filter_upper_bound: float :param bound_label: For classification problems specifies the label to which the explanation bounds are applied. :type bound_label: str :param output_columns: A list of column names to include in the prediction result. :type output_columns: list
- explainer_type: abacusai.api_class.enums.ExplainerType
- __post_init__()
- class abacusai.PretrainedModelsBatchPredictionArgs
Bases:
BatchPredictionArgs
Batch Prediction Config for the PRETRAINED_MODELS problem type :param for_eval: If True, the test fold which was created during training and used for metrics calculation will be used as input data. These predictions are hence, used for model evaluation. :type for_eval: bool :param files_output_location_prefix: The output location prefix for the files. :type files_output_location_prefix: str :param channel_id_to_label_map: JSON string for the map from channel ids to their labels. :type channel_id_to_label_map: str
- __post_init__()
- class abacusai.SentenceBoundaryDetectionBatchPredictionArgs
Bases:
BatchPredictionArgs
Batch Prediction Config for the SENTENCE_BOUNDARY_DETECTION problem type :param for_eval: If True, the test fold which was created during training and used for metrics calculation will be used as input data. These predictions are hence, used for model evaluation :type for_eval: bool :param explode_output: Explode data so there is one sentence per row. :type explode_output: bool
- __post_init__()
- class abacusai.ThemeAnalysisBatchPredictionArgs
Bases:
BatchPredictionArgs
Batch Prediction Config for the THEME_ANALYSIS problem type :param for_eval: If True, the test fold which was created during training and used for metrics calculation will be used as input data. These predictions are hence, used for model evaluation. :type for_eval: bool :param analysis_frequency: The length of each analysis interval. :type analysis_frequency: str :param start_date: The end point for predictions. :type start_date: str :param analysis_days: How many days to analyze. :type analysis_days: int
- __post_init__()
- class abacusai.ChatLLMBatchPredictionArgs
Bases:
BatchPredictionArgs
Batch Prediction Config for the ChatLLM problem type :param for_eval: If True, the test fold which was created during training and used for metrics calculation will be used as input data. These predictions are hence, used for model evaluation. :type for_eval: bool :param product: Generate a response for every question and chunk combination :type product: bool
- __post_init__()
- class abacusai._BatchPredictionArgsFactory
Bases:
abacusai.api_class.abstract._ApiClassFactory
Helper class that provides a standard way to create an ABC using inheritance.
- config_abstract_class
- config_class_key = 'problemType'
- config_class_map
- class abacusai.BlobInput
Bases:
abacusai.api_class.abstract.ApiClass
Binary large object input data.
- Parameters:
- class abacusai.ParsingConfig
Bases:
abacusai.api_class.abstract.ApiClass
Helper class that provides a standard way to create an ABC using inheritance.
- class abacusai.DocumentProcessingConfig
Bases:
abacusai.api_class.abstract.ApiClass
Helper class that provides a standard way to create an ABC using inheritance.
- class abacusai.VectorStoreTextEncoder
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- E5 = 'E5'
- OPENAI = 'OPENAI'
- SENTENCE_BERT = 'SENTENCE_BERT'
- E5_SMALL = 'E5_SMALL'
- class abacusai.VectorStoreConfig
Bases:
abacusai.api_class.abstract.ApiClass
Configs for vector store indexing.
- Parameters:
chunk_size (int) – The size of text chunks in the vector store.
chunk_overlap_fraction (float) – The fraction of overlap between chunks.
text_encoder (VectorStoreTextEncoder) – Encoder used to index texts from the documents.
- text_encoder: abacusai.api_class.enums.VectorStoreTextEncoder
- class abacusai.DocumentRetrieverConfig
Bases:
VectorStoreConfig
Configs for document retriever.
- class abacusai.ApiEnum
Bases:
enum.Enum
Generic enumeration.
Derive from this class to define new enumerations.
- __eq__(other)
Return self==value.
- __hash__()
Return hash(self).
- class abacusai.ProblemType
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- AI_AGENT = 'ai_agent'
- ANOMALY_DETECTION = 'anomaly_new'
- ANOMALY_OUTLIERS = 'anomaly'
- EVENT_ANOMALY = 'event_anomaly'
- CLUSTERING = 'clustering'
- CLUSTERING_TIMESERIES = 'clustering_timeseries'
- CUMULATIVE_FORECASTING = 'cumulative_forecasting'
- NAMED_ENTITY_EXTRACTION = 'nlp_ner'
- NATURAL_LANGUAGE_SEARCH = 'nlp_search'
- CHAT_LLM = 'chat_llm'
- SENTENCE_BOUNDARY_DETECTION = 'nlp_sentence_boundary_detection'
- SENTIMENT_DETECTION = 'nlp_sentiment'
- DOCUMENT_CLASSIFICATION = 'nlp_classification'
- DOCUMENT_SUMMARIZATION = 'nlp_summarization'
- DOCUMENT_VISUALIZATION = 'nlp_document_visualization'
- PERSONALIZATION = 'personalization'
- PREDICTIVE_MODELING = 'regression'
- FORECASTING = 'forecasting'
- CUSTOM_TRAINED_MODEL = 'plug_and_play'
- CUSTOM_ALGORITHM = 'trainable_plug_and_play'
- FEATURE_STORE = 'feature_store'
- IMAGE_CLASSIFICATION = 'vision_classification'
- OBJECT_DETECTION = 'vision_object_detection'
- IMAGE_VALUE_PREDICTION = 'vision_regression'
- MODEL_MONITORING = 'model_monitoring'
- LANGUAGE_DETECTION = 'language_detection'
- OPTIMIZATION = 'optimization'
- PRETRAINED_MODELS = 'pretrained'
- THEME_ANALYSIS = 'theme_analysis'
- class abacusai.RegressionObjective
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- AUC = 'auc'
- ACCURACY = 'acc'
- LOG_LOSS = 'log_loss'
- PRECISION = 'precision'
- RECALL = 'recall'
- F1_SCORE = 'fscore'
- MAE = 'mae'
- MAPE = 'mape'
- WAPE = 'wape'
- RMSE = 'rmse'
- R_SQUARED_COEFFICIENT_OF_DETERMINATION = 'r^2'
- class abacusai.RegressionTreeHPOMode
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- RAPID = ('rapid',)
- THOROUGH = 'thorough'
- class abacusai.RegressionAugmentationStrategy
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- SMOTE = 'smote'
- RESAMPLE = 'resample'
- class abacusai.RegressionTargetTransform
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- LOG = 'log'
- QUANTILE = 'quantile'
- YEO_JOHNSON = 'yeo-johnson'
- BOX_COX = 'box-cox'
- class abacusai.RegressionTypeOfSplit
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- RANDOM = 'Random Sampling'
- TIMESTAMP_BASED = 'Timestamp Based'
- ROW_INDICATOR_BASED = 'Row Indicator Based'
- class abacusai.RegressionTimeSplitMethod
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- TEST_SPLIT_PERCENTAGE_BASED = 'Test Split Percentage Based'
- TEST_START_TIMESTAMP_BASED = 'Test Start Timestamp Based'
- class abacusai.RegressionLossFunction
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- HUBER = 'Huber'
- MSE = 'Mean Squared Error'
- MAE = 'Mean Absolute Error'
- MAPE = 'Mean Absolute Percentage Error'
- MSLE = 'Mean Squared Logarithmic Error'
- TWEEDIE = 'Tweedie'
- CROSS_ENTROPY = 'Cross Entropy'
- FOCAL_CROSS_ENTROPY = 'Focal Cross Entropy'
- AUTOMATIC = 'Automatic'
- CUSTOM = 'Custom'
- class abacusai.ExplainerType
Bases:
enum.Enum
Generic enumeration.
Derive from this class to define new enumerations.
- KERNEL_EXPLAINER = 'KERNEL_EXPLAINER'
- LIME_EXPLAINER = 'LIME_EXPLAINER'
- TREE_EXPLAINER = 'TREE_EXPLAINER'
- EBM_EXPLAINER = 'EBM_EXPLAINER'
- class abacusai.SamplingMethodType
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- N_SAMPLING = 'N_SAMPLING'
- PERCENT_SAMPLING = 'PERCENT_SAMPLING'
- class abacusai.MergeMode
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- LAST_N = 'LAST_N'
- TIME_WINDOW = 'TIME_WINDOW'
- class abacusai.FillLogic
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- AVERAGE = 'average'
- MAX = 'max'
- MEDIAN = 'median'
- MIN = 'min'
- CUSTOM = 'custom'
- BACKFILL = 'bfill'
- FORWARDFILL = 'ffill'
- LINEAR = 'linear'
- NEAREST = 'nearest'
- class abacusai.BatchSize
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- BATCH_8 = 8
- BATCH_16 = 16
- BATCH_32 = 32
- BATCH_64 = 64
- BATCH_128 = 128
- BATCH_256 = 256
- BATCH_384 = 384
- BATCH_512 = 512
- BATCH_740 = 740
- BATCH_1024 = 1024
- class abacusai.HolidayCalendars
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- AU = 'AU'
- UK = 'UK'
- US = 'US'
- class abacusai.ExperimentationMode
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- RAPID = 'rapid'
- THOROUGH = 'thorough'
- class abacusai.PersonalizationTrainingMode
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- EXPERIMENTAL = 'EXP'
- PRODUCTION = 'PROD'
- class abacusai.PersonalizationObjective
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- NDCG = 'ndcg'
- NDCG_5 = 'ndcg@5'
- NDCG_10 = 'ndcg@10'
- MAP = 'map'
- MAP_5 = 'map@5'
- MAP_10 = 'map@10'
- MRR = 'mrr'
- PERSONALIZATION = 'personalization@10'
- COVERAGE = 'coverage'
- class abacusai.ForecastingObjective
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- ACCURACY = 'w_c_accuracy'
- WAPE = 'wape'
- MAPE = 'mape'
- CMAPE = 'cmape'
- RMSE = 'rmse'
- CV = 'coefficient_of_variation'
- BIAS = 'bias'
- SRMSE = 'srmse'
- class abacusai.ForecastingFrequency
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- HOURLY = '1H'
- DAILY = '1D'
- WEEKLY_SUNDAY_START = '1W'
- WEEKLY_MONDAY_START = 'W-MON'
- WEEKLY_SATURDAY_START = 'W-SAT'
- MONTH_START = 'MS'
- MONTH_END = '1M'
- QUARTER_START = 'QS'
- QUARTER_END = '1Q'
- YEARLY = '1Y'
- class abacusai.ForecastingDataSplitType
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- AUTO = 'Automatic Time Based'
- TIMESTAMP = 'Timestamp Based'
- ITEM = 'Item Based'
- PREDICTION_LENGTH = 'Force Prediction Length'
- class abacusai.ForecastingLossFunction
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- CUSTOM = 'Custom'
- MEAN_ABSOLUTE_ERROR = 'mae'
- NORMALIZED_MEAN_ABSOLUTE_ERROR = 'nmae'
- PEAKS_MEAN_ABSOLUTE_ERROR = 'peaks_mae'
- MEAN_ABSOLUTE_PERCENTAGE_ERROR = 'stable_mape'
- POINTWISE_ACCURACY = 'accuracy'
- ROOT_MEAN_SQUARE_ERROR = 'rmse'
- NORMALIZED_ROOT_MEAN_SQUARE_ERROR = 'nrmse'
- ASYMMETRIC_MEAN_ABSOLUTE_PERCENTAGE_ERROR = 'asymmetric_mape'
- STABLE_STANDARDIZED_MEAN_ABSOLUTE_PERCENTAGE_ERROR = 'stable_standardized_mape_with_cmape'
- GAUSSIAN = 'mle_gaussian_local'
- GAUSSIAN_FULL_COVARIANCE = 'mle_gaussfullcov'
- GUASSIAN_EXPONENTIAL = 'mle_gaussexp'
- MIX_GAUSSIANS = 'mle_gaussmix'
- WEIBULL = 'mle_weibull'
- NEGATIVE_BINOMIAL = 'mle_negbinom'
- LOG_ROOT_MEAN_SQUARE_ERROR = 'log_rmse'
- class abacusai.ForecastingLocalScaling
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- ZSCORE = 'zscore'
- SLIDING_ZSCORE = 'sliding_zscore'
- LAST_POINT = 'lastpoint'
- MIN_MAX = 'minmax'
- MIN_STD = 'minstd'
- ROBUST = 'robust'
- ITEM = 'item'
- class abacusai.ForecastingFillMethod
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- BACK = 'BACK'
- MIDDLE = 'MIDDLE'
- FUTURE = 'FUTURE'
- class abacusai.ForecastingQuanitlesExtensionMethod
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- DIRECT = 'direct'
- QUADRATIC = 'quadratic'
- ANCESTRAL_SIMULATION = 'simulation'
- class abacusai.NERObjective
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- LOG_LOSS = 'log_loss'
- AUC = 'auc'
- PRECISION = 'precision'
- RECALL = 'recall'
- ANNOTATIONS_PRECISION = 'annotations_precision'
- ANNOTATIONS_RECALL = 'annotations_recall'
- class abacusai.NERModelType
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- PRETRAINED_BERT = 'pretrained_bert'
- PRETRAINED_ROBERTA_27 = 'pretrained_roberta_27'
- PRETRAINED_ROBERTA_43 = 'pretrained_roberta_43'
- PRETRAINED_MULTILINGUAL = 'pretrained_multilingual'
- LEARNED = 'learned'
- class abacusai.NLPDocumentFormat
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- AUTO = 'auto'
- TEXT = 'text'
- DOC = 'doc'
- TOKENS = 'tokens'
- class abacusai.SentimentType
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- VALENCE = 'valence'
- EMOTION = 'emotion'
- class abacusai.ClusteringImputationMethod
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- AUTOMATIC = 'Automatic'
- ZEROS = 'Zeros'
- INTERPOLATE = 'Interpolate'
- class abacusai.ConnectorType
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- FILE = 'FILE'
- DATABASE = 'DATABASE'
- STREAMING = 'STREAMING'
- APPLICATION = 'APPLICATION'
- class abacusai.PythonFunctionArgumentType
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- FEATURE_GROUP = 'FEATURE_GROUP'
- INTEGER = 'INTEGER'
- STRING = 'STRING'
- BOOLEAN = 'BOOLEAN'
- FLOAT = 'FLOAT'
- JSON = 'JSON'
- LIST = 'LIST'
- DATASET_ID = 'DATASET_ID'
- MODEL_ID = 'MODEL_ID'
- FEATURE_GROUP_ID = 'FEATURE_GROUP_ID'
- MONITOR_ID = 'MONITOR_ID'
- BATCH_PREDICTION_ID = 'BATCH_PREDICTION_ID'
- DEPLOYMENT_ID = 'DEPLOYMENT_ID'
- class abacusai.PythonFunctionOutputArgumentType
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- NTEGER = 'INTEGER'
- STRING = 'STRING'
- BOOLEAN = 'BOOLEAN'
- FLOAT = 'FLOAT'
- JSON = 'JSON'
- LIST = 'LIST'
- DATASET_ID = 'DATASET_ID'
- MODEL_ID = 'MODEL_ID'
- FEATURE_GROUP_ID = 'FEATURE_GROUP_ID'
- MONITOR_ID = 'MONITOR_ID'
- BATCH_PREDICTION_ID = 'BATCH_PREDICTION_ID'
- DEPLOYMENT_ID = 'DEPLOYMENT_ID'
- ANY = 'ANY'
- class abacusai.LLMName
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- OPENAI_GPT4 = 'OPENAI_GPT4'
- OPENAI_GPT3_5 = 'OPENAI_GPT3_5'
- OPENAI_GPT3_5_SHORT = 'OPENAI_GPT3_5_SHORT'
- CLAUDE_V2 = 'CLAUDE_V2'
- ABACUS_GIRAFFE = 'ABACUS_GIRAFFE'
- ABACUS_LLAMA2_QA = 'ABACUS_LLAMA2_QA'
- ABACUS_LLAMA2_CODE = 'ABACUS_LLAMA2_CODE'
- LLAMA2_CHAT = 'LLAMA2_CHAT'
- PALM = 'PALM'
- PALM_TEXT = 'PALM_TEXT'
- class abacusai.MonitorAlertType
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- ACCURACY_BELOW_THRESHOLD = 'AccuracyBelowThreshold'
- FEATURE_DRIFT = 'FeatureDrift'
- DATA_INTEGRITY_VIOLATIONS = 'DataIntegrityViolations'
- BIAS_VIOLATIONS = 'BiasViolations'
- class abacusai.FeatureDriftType
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- KL = 'kl'
- KS = 'ks'
- WS = 'ws'
- JS = 'js'
- class abacusai.DataIntegrityViolationType
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- NULL_VIOLATIONS = 'null_violations'
- TYPE_MISMATCH_VIOLATIONS = 'type_mismatch_violations'
- RANGE_VIOLATIONS = 'range_violations'
- CATEGORICAL_RANGE_VIOLATION = 'categorical_range_violations'
- TOTAL_VIOLATIONS = 'total_violations'
- class abacusai.BiasType
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- DEMOGRAPHIC_PARITY = 'demographic_parity'
- EQUAL_OPPORTUNITY = 'equal_opportunity'
- GROUP_BENEFIT_EQUALITY = 'group_benefit'
- TOTAL = 'total'
- class abacusai.AlertActionType
Bases:
ApiEnum
Generic enumeration.
Derive from this class to define new enumerations.
- EMAIL = 'Email'
- class abacusai.SamplingConfig
Bases:
abacusai.api_class.abstract.ApiClass
An abstract class for the sampling config of a feature group
- classmethod _get_builder()
- __post_init__()
- class abacusai.NSamplingConfig
Bases:
SamplingConfig
The number of distinct values of the key columns to include in the sample, or number of rows if key columns not specified.
- Parameters:
sampling_method (SamplingMethodType) – N_SAMPLING
sample_count (int) – The number of rows to include in the sample
key_columns (List[str]) – The feature(s) to use as the key(s) when sampling
- sampling_method: abacusai.api_class.enums.SamplingMethodType
- class abacusai.PercentSamplingConfig
Bases:
SamplingConfig
The fraction of distinct values of the feature group to include in the sample.
- Parameters:
sampling_method (SamplingMethodType) – PERCENT_SAMPLING
sample_percent (float) – The percentage of the rows to sample
key_columns (List[str]) – The feature(s) to use as the key(s) when sampling
- sampling_method: abacusai.api_class.enums.SamplingMethodType
- class abacusai._SamplingConfigFactory
Bases:
abacusai.api_class.abstract._ApiClassFactory
Helper class that provides a standard way to create an ABC using inheritance.
- config_class_key = 'sampling_method'
- config_class_map
- class abacusai.MergeConfig
Bases:
abacusai.api_class.abstract.ApiClass
An abstract class for the merge config of a feature group
- classmethod _get_builder()
- __post_init__()
- class abacusai.LastNMergeConfig
Bases:
MergeConfig
Merge LAST N chunks/versions of an incremental dataset.
- Parameters:
- merge_mode: abacusai.api_class.enums.MergeMode
- class abacusai.TimeWindowMergeConfig
Bases:
MergeConfig
Merge rows within a given timewindow of the most recent timestamp
- Parameters:
- merge_mode: abacusai.api_class.enums.MergeMode
- class abacusai._MergeConfigFactory
Bases:
abacusai.api_class.abstract._ApiClassFactory
Helper class that provides a standard way to create an ABC using inheritance.
- config_class_key = 'merge_mode'
- config_class_map
- class abacusai.TrainingConfig
Bases:
abacusai.api_class.abstract.ApiClass
Helper class that provides a standard way to create an ABC using inheritance.
- problem_type: abacusai.api_class.enums.ProblemType
- classmethod _get_builder()
- class abacusai.PersonalizationTrainingConfig
Bases:
TrainingConfig
Training config for the PERSONALIZATION problem type :param objective: Ranking scheme used to select final best model. :type objective: PersonalizationObjective :param sort_objective: Ranking scheme used to sort models on the metrics page. :type sort_objective: PersonalizationObjective :param training_mode: whether to train in production or experimental mode. Defaults to EXP. :type training_mode: PersonalizationTrainingMode :param target_action_types: List of action types to use as targets for training. :type target_action_types: List[str] :param target_action_weights: Dictionary of action types to weights for training. :type target_action_weights: Dict[str, float] :param session_event_types: List of event types to treat as occurrences of sessions. :type session_event_types: List[str] :param test_split: Percent of dataset to use for test data. We support using a range between 6% to 20% of your dataset to use as test data. :type test_split: int :param recent_days_for_training: Limit training data to a certain latest number of days. :type recent_days_for_training: int :param training_start_date: Only consider training interaction data after this date. Specified in the timezone of the dataset. :type training_start_date: str :param test_on_user_split: Use user splits instead of using time splits, when validating and testing the model. :type test_on_user_split: bool :param test_split_on_last_k_items: Use last k items instead of global timestamp splits, when validating and testing the model. :type test_split_on_last_k_items: bool :param test_last_items_length: Number of items to leave out for each user when using leave k out folds. :type test_last_items_length: int :param test_window_length_hours: Duration (in hours) of most recent time window to use when validating and testing the model. :type test_window_length_hours: int :param explicit_time_split: Sets an explicit time-based test boundary. :type explicit_time_split: bool :param test_row_indicator: Column indicating which rows to use for training (TRAIN), validation (VAL) and testing (TEST). :type test_row_indicator: str :param full_data_retraining: Train models separately with all the data. :type full_data_retraining: bool :param sequential_training: Train a mode sequentially through time. :type sequential_training: bool :param data_split_feature_group_table_name: Specify the table name of the feature group to export training data with the fold column. :type data_split_feature_group_table_name: str :param optimized_event_type: The final event type to optimize for and compute metrics on. :type optimized_event_type: str :param dropout_rate: Dropout rate for neural network. :type dropout_rate: int :param batch_size: Batch size for neural network. :type batch_size: BatchSize :param disable_transformer: Disable training the transformer algorithm. :type disable_transformer: bool :param disable_gpu: Disable training on GPU. :type disable_gpu: boo :param filter_history: Do not recommend items the user has already interacted with. :type filter_history: bool :param max_history_length: Maximum length of user-item history to include user in training examples. :type max_history_length: int :param compute_rerank_metrics: Compute metrics based on rerank results. :type compute_rerank_metrics: bool :param add_time_features: Include interaction time as a feature. :type add_time_features: bool :param disable_timestamp_scalar_features: Exclude timestamp scalar features. :type disable_timestamp_scalar_features: bool :param compute_session_metrics: Evaluate models based on how well they are able to predict the next session of interactions. :type compute_session_metrics: bool :param max_user_history_len_percentile: Filter out users with history length above this percentile. :type max_user_history_len_percentile: int :param downsample_item_popularity_percentile: Downsample items more popular than this percentile. :type downsample_item_popularity_percentile: float
- sort_objective: abacusai.api_class.enums.PersonalizationObjective
- training_mode: abacusai.api_class.enums.PersonalizationTrainingMode
- batch_size: abacusai.api_class.enums.BatchSize
- __post_init__()
- class abacusai.RegressionTrainingConfig
Bases:
TrainingConfig
Training config for the PREDICTIVE_MODELING problem type :param objective: Ranking scheme used to select final best model. :type objective: RegressionObjective :param sort_objective: Ranking scheme used to sort models on the metrics page. :type sort_objective: RegressionObjective :param tree_hpo_mode: (RegressionTreeHPOMode): Turning off Rapid Experimentation will take longer to train. :param type_of_split: Type of data splitting into train/test (validation also). :type type_of_split: RegressionTypeOfSplit :param test_split: Percent of dataset to use for test data. We support using a range between 5% to 20% of your dataset to use as test data. :type test_split: int :param disable_test_val_fold: Do not create a TEST_VAL set. All records which would be part of the TEST_VAL fold otherwise, remain in the TEST fold. :type disable_test_val_fold: bool :param k_fold_cross_validation: Use this to force k-fold cross validation bagging on or off. :type k_fold_cross_validation: bool :param num_cv_folds: Specify the value of k in k-fold cross validation. :type num_cv_folds: int :param timestamp_based_splitting_column: Timestamp column selected for splitting into test and train. :type timestamp_based_splitting_column: str :param timestamp_based_splitting_method: Method of selecting TEST set, top percentile wise or after a given timestamp. :type timestamp_based_splitting_method: RegressionTimeSplitMethod :param test_splitting_timestamp: Rows with timestamp greater than this will be considered to be in the test set. :type test_splitting_timestamp: str :param sampling_unit_keys: Constrain train/test separation to partition a column. :type sampling_unit_keys: List[str] :param test_row_indicator: Column indicating which rows to use for training (TRAIN) and testing (TEST). Validation (VAL) can also be specified. :type test_row_indicator: str :param full_data_retraining: Train models separately with all the data. :type full_data_retraining: bool :param rebalance_classes: Class weights are computed as the inverse of the class frequency from the training dataset when this option is selected as “Yes”. It is useful when the classes in the dataset are unbalanced.
Re-balancing classes generally boosts recall at the cost of precision on rare classes.
- Parameters:
rare_class_augmentation_threshold (float) – Augments any rare class whose relative frequency with respect to the most frequent class is less than this threshold. Default = 0.1 for classification problems with rare classes.
augmentation_strategy (RegressionAugmentationStrategy) – Strategy to deal with class imbalance and data augmentation.
training_rows_downsample_ratio (float) – Uses this ratio to train on a sample of the dataset provided.
active_labels_column (str) – Specify a column to use as the active columns in a multi label setting.
min_categorical_count (int) – Minimum threshold to consider a value different from the unknown placeholder.
sample_weight (str) – Specify a column to use as the weight of a sample for training and eval.
numeric_clipping_percentile (float) – Uses this option to clip the top and bottom x percentile of numeric feature columns where x is the value of this option.
target_transform (RegressionTargetTransform) – Specify a transform (e.g. log, quantile) to apply to the target variable.
ignore_datetime_features (bool) – Remove all datetime features from the model. Useful while generalizing to different time periods.
max_text_words (int) – Maximum number of words to use from text fields.
perform_feature_selection (bool) – If enabled, additional algorithms which support feature selection as a pretraining step will be trained separately with the selected subset of features. The details about their selected features can be found in their respective logs.
feature_selection_intensity (int) – This determines the strictness with which features will be filtered out. 1 being very lenient (more features kept), 100 being very strict.
batch_size (BatchSize) – Batch size.
dropout_rate (int) – Dropout percentage rate.
pretrained_model_name (str) – Enable algorithms which process text using pretrained multilingual NLP models.
is_multilingual (bool) – Enable algorithms which process text using pretrained multilingual NLP models.
loss_function (RegressionLossFunction) – Loss function to be used as objective for model training.
loss_parameters (str) – Loss function params in format <key>=<value>;<key>=<value>;…..
target_encode_categoricals (bool) – Use this to turn target encoding on categorical features on or off.
drop_original_categoricals (bool) – This option helps us choose whether to also feed the original label encoded categorical columns to the mdoels along with their target encoded versions.
monotonically_increasing_features (List[str]) – Constrain the model such that it behaves as if the target feature is monotonically increasing with the selected features
monotonically_decreasing_features (List[str]) – Constrain the model such that it behaves as if the target feature is monotonically decreasing with the selected features
data_split_feature_group_table_name (str) – Specify the table name of the feature group to export training data with the fold column.
custom_loss_functions (List[str]) – Registered custom losses available for selection.
custom_metrics (List[str]) – Registered custom metrics available for selection.
- sort_objective: abacusai.api_class.enums.RegressionObjective
- tree_hpo_mode: abacusai.api_class.enums.RegressionTreeHPOMode
- type_of_split: abacusai.api_class.enums.RegressionTypeOfSplit
- timestamp_based_splitting_method: abacusai.api_class.enums.RegressionTimeSplitMethod
- augmentation_strategy: abacusai.api_class.enums.RegressionAugmentationStrategy
- target_transform: abacusai.api_class.enums.RegressionTargetTransform
- batch_size: abacusai.api_class.enums.BatchSize
- loss_function: abacusai.api_class.enums.RegressionLossFunction
- __post_init__()
- class abacusai.ForecastingTrainingConfig
Bases:
TrainingConfig
Training config for the FORECASTING problem type :param prediction_length: How many timesteps in the future to predict. :type prediction_length: int :param objective: Ranking scheme used to select final best model. :type objective: ForecastingObjective :param sort_objective: Ranking scheme used to sort models on the metrics page. :type sort_objective: ForecastingObjective :param forecast_frequency: Forecast frequency. :type forecast_frequency: ForecastingFrequency :param probability_quantiles: Prediction quantiles. :type probability_quantiles: List[float] :param force_prediction_length: Force length of test window to be the same as prediction length. :type force_prediction_length: int :param filter_items: Filter items with small history and volume. :type filter_items: bool :param enable_feature_selection: Enable feature selection. :type enable_feature_selection: bool :param enable_padding: Pad series to the max_date of the dataset :type enable_padding: bool :param enable_cold_start: Enable cold start forecasting by training/predicting for zero history items. :type enable_cold_start: bool :param enable_multiple_backtests: Whether to enable multiple backtesting or not. :type enable_multiple_backtests: bool :param num_backtesting_windows: Total backtesting windows to use for the training. :type num_backtesting_windows: int :param backtesting_window_step_size: Use this step size to shift backtesting windows for model training. :type backtesting_window_step_size: int :param full_data_retraining: Train models separately with all the data. :type full_data_retraining: bool :param additional_forecast_keys: List[str]: List of categoricals in timeseries that can act as multi-identifier. :param experimentation_mode: Selecting Thorough Experimentation will take longer to train. :type experimentation_mode: ExperimentationMode :param type_of_split: Type of data splitting into train/test. :type type_of_split: ForecastingDataSplitType :param test_by_item: Partition train/test data by item rather than time if true. :type test_by_item: bool :param test_start: Limit training data to dates before the given test start. :type test_start: str :param test_split: Percent of dataset to use for test data. We support using a range between 5% to 20% of your dataset to use as test data. :type test_split: int :param loss_function: Loss function for training neural network. :type loss_function: ForecastingLossFunction :param underprediction_weight: Weight for underpredictions :type underprediction_weight: float :param disable_networks_without_analytic_quantiles: Disable neural networks, which quantile functions do not have analytic expressions (e.g, mixture models) :type disable_networks_without_analytic_quantiles: bool :param initial_learning_rate: Initial learning rate. :type initial_learning_rate: float :param l2_regularization_factor: L2 regularization factor. :type l2_regularization_factor: float :param dropout_rate: Dropout percentage rate. :type dropout_rate: int :param recurrent_layers: Number of recurrent layers to stack in network. :type recurrent_layers: int :param recurrent_units: Number of units in each recurrent layer. :type recurrent_units: int :param convolutional_layers: Number of convolutional layers to stack on top of recurrent layers in network. :type convolutional_layers: int :param convolution_filters: Number of filters in each convolution. :type convolution_filters: int :param local_scaling_mode: Options to make NN inputs stationary in high dynamic range datasets. :type local_scaling_mode: ForecastingLocalScaling :param zero_predictor: Include subnetwork to classify points where target equals zero. :type zero_predictor: bool :param skip_missing: Make the RNN ignore missing entries rather instead of processing them. :type skip_missing: bool :param batch_size: Batch size. :type batch_size: ForecastingBatchSize :param batch_renormalization: Enable batch renormalization between layers. :type batch_renormalization: bool :param history_length: While training, how much history to consider. :type history_length: int :param prediction_step_size: Number of future periods to include in objective for each training sample. :type prediction_step_size: int :param training_point_overlap: Amount of overlap to allow between training samples. :type training_point_overlap: float :param max_scale_context: Maximum context to use for local scaling. :type max_scale_context: int :param quantiles_extension_method: Quantile extension method :type quantiles_extension_method: ForecastingQuanitlesExtensionMethod :param number_of_samples: Number of samples for ancestral simulation :type number_of_samples: int :param symmetrize_quantiles: Force symmetric quantiles (like in Gaussian distribution) :type symmetrize_quantiles: bool :param use_log_transforms: Apply logarithmic transformations to input data. :type use_log_transforms: bool :param smooth_history: Smooth (low pass filter) the timeseries. :type smooth_history: float :param local_scale_target: Using per training/prediction window target scaling. :type local_scale_target: bool :param timeseries_weight_column: If set, we use the values in this column from timeseries data to assign time dependent item weights during training and evaluation. :type timeseries_weight_column: str :param item_attributes_weight_column: If set, we use the values in this column from item attributes data to assign weights to items during training and evaluation. :type item_attributes_weight_column: str :param use_timeseries_weights_in_objective: If True, we include weights from column set as “TIMESERIES WEIGHT COLUMN” in objective functions. :type use_timeseries_weights_in_objective: bool :param use_item_weights_in_objective: If True, we include weights from column set as “ITEM ATTRIBUTES WEIGHT COLUMN” in objective functions. :type use_item_weights_in_objective: bool :param skip_timeseries_weight_scaling: If True, we will avoid normalizing the weights. :type skip_timeseries_weight_scaling: bool :param timeseries_loss_weight_column: Use value in this column to weight the loss while training. :type timeseries_loss_weight_column: str :param use_item_id: Include a feature to indicate the item being forecast. :type use_item_id: bool :param use_all_item_totals: Include as input total target across items. :type use_all_item_totals: bool :param handle_zeros_as_missing_values: If True, handle zero values in demand as missing data. :type handle_zeros_as_missing_values: bool :param datetime_holiday_calendars: Holiday calendars to augment training with. :type datetime_holiday_calendars: List[HolidayCalendars] :param fill_missing_values: Strategy for filling in missing values. :type fill_missing_values: List[dict] :param enable_clustering: Enable clustering in forecasting. :type enable_clustering: bool :param data_split_feature_group_table_name: Specify the table name of the feature group to export training data with the fold column. :type data_split_feature_group_table_name: str :param custom_loss_functions: Registered custom losses available for selection. :type custom_loss_functions: List[str] :param custom_metrics: Registered custom metrics available for selection. :type custom_metrics: List[str]
- sort_objective: abacusai.api_class.enums.ForecastingObjective
- forecast_frequency: abacusai.api_class.enums.ForecastingFrequency
- experimentation_mode: abacusai.api_class.enums.ExperimentationMode
- type_of_split: abacusai.api_class.enums.ForecastingDataSplitType
- loss_function: abacusai.api_class.enums.ForecastingLossFunction
- local_scaling_mode: abacusai.api_class.enums.ForecastingLocalScaling
- batch_size: abacusai.api_class.enums.BatchSize
- quantiles_extension_method: abacusai.api_class.enums.ForecastingQuanitlesExtensionMethod
- datetime_holiday_calendars: List[abacusai.api_class.enums.HolidayCalendars]
- __post_init__()
- class abacusai.NamedEntityExtractionTrainingConfig
Bases:
TrainingConfig
Training config for the NAMED_ENTITY_EXTRACTION problem type :param objective: Ranking scheme used to select final best model. :type objective: NERObjective :param sort_objective: Ranking scheme used to sort models on the metrics page. :type sort_objective: NERObjective :param ner_model_type: Type of NER model to use. :type ner_model_type: NERModelType :param test_split: Percent of dataset to use for test data. We support using a range between 5 ( i.e. 5% ) to 20 ( i.e. 20% ) of your dataset. :type test_split: int :param test_row_indicator: Column indicating which rows to use for training (TRAIN) and testing (TEST). :type test_row_indicator: str :param dropout_rate: Dropout rate for neural network. :type dropout_rate: float :param batch_size: Batch size for neural network. :type batch_size: BatchSize :param active_labels_column: Entities that have been marked in a particular text :type active_labels_column: str :param document_format: Format of the input documents. :type document_format: NLPDocumentFormat :param include_longformer: Whether to include the longformer model. :type include_longformer: bool
- objective: abacusai.api_class.enums.NERObjective
- sort_objective: abacusai.api_class.enums.NERObjective
- ner_model_type: abacusai.api_class.enums.NERModelType
- batch_size: abacusai.api_class.enums.BatchSize
- document_format: abacusai.api_class.enums.NLPDocumentFormat
- __post_init__()
- class abacusai.NaturalLanguageSearchTrainingConfig
Bases:
TrainingConfig
Training config for the NATURAL_LANGUAGE_SEARCH problem type :param abacus_internal_model: Use a Abacus.AI LLM to answer questions about your data without using any external APIs :type abacus_internal_model: bool :param num_completion_tokens: Default for maximum number of tokens for chat answers. Reducing this will get faster responses which are more succinct :type num_completion_tokens: int :param larger_embeddings: Use a higher dimension embedding model. :type larger_embeddings: bool :param search_chunk_size: Chunk size for indexing the documents. :type search_chunk_size: int :param chunk_overlap_fraction: Overlap in chunks while indexing the documents. :type chunk_overlap_fraction: float :param test_split: Percent of dataset to use for test data. We support using a range between 5 ( i.e. 5% ) to 20 ( i.e. 20% ) of your dataset. :type test_split: int
- __post_init__()
- class abacusai.ChatLLMTrainingConfig
Bases:
TrainingConfig
Training config for the CHAT_LLM problem type :param document_retrievers: List of document retriever names to use for the feature stores this model was trained with. :type document_retrievers: List[str] :param num_completion_tokens: Default for maximum number of tokens for chat answers. Reducing this will get faster responses which are more succinct :type num_completion_tokens: int :param system_message: The generative LLM system message :type system_message: str :param temperature: The generative LLM temperature :type temperature: float :param metadata_columns: Include the metadata column values in the retrieved search results. :type metadata_columns: list
- __post_init__()
- class abacusai.SentenceBoundaryDetectionTrainingConfig
Bases:
TrainingConfig
Training config for the SENTENCE_BOUNDARY_DETECTION problem type :param test_split: Percent of dataset to use for test data. We support using a range between 5 ( i.e. 5% ) to 20 ( i.e. 20% ) of your dataset. :type test_split: int :param dropout_rate: Dropout rate for neural network. :type dropout_rate: float :param batch_size: Batch size for neural network. :type batch_size: BatchSize
- batch_size: abacusai.api_class.enums.BatchSize
- __post_init__()
- class abacusai.SentimentDetectionTrainingConfig
Bases:
TrainingConfig
Training config for the SENTIMENT_DETECTION problem type :param sentiment_type: Type of sentiment to detect. :type sentiment_type: SentimentType :param test_split: Percent of dataset to use for test data. We support using a range between 5 ( i.e. 5% ) to 20 ( i.e. 20% ) of your dataset. :type test_split: int :param dropout_rate: Dropout rate for neural network. :type dropout_rate: float :param batch_size: Batch size for neural network. :type batch_size: BatchSize :param compute_metrics: Whether to compute metrics. :type compute_metrics: bool
- sentiment_type: abacusai.api_class.enums.SentimentType
- batch_size: abacusai.api_class.enums.BatchSize
- __post_init__()
- class abacusai.DocumentClassificationTrainingConfig
Bases:
TrainingConfig
Training config for the DOCUMENT_CLASSIFICATION problem type :param zero_shot_hypotheses: Zero shot hypotheses. Example text: ‘This text is about pricing’. :type zero_shot_hypotheses: List[str] :param test_split: Percent of dataset to use for test data. We support using a range between 5 ( i.e. 5% ) to 20 ( i.e. 20% ) of your dataset. :type test_split: int :param dropout_rate: Dropout rate for neural network. :type dropout_rate: float :param batch_size: Batch size for neural network. :type batch_size: BatchSize
- batch_size: abacusai.api_class.enums.BatchSize
- __post_init__()
- class abacusai.DocumentSummarizationTrainingConfig
Bases:
TrainingConfig
Training config for the DOCUMENT_SUMMARIZATION problem type :param test_split: Percent of dataset to use for test data. We support using a range between 5 ( i.e. 5% ) to 20 ( i.e. 20% ) of your dataset. :type test_split: int :param dropout_rate: Dropout rate for neural network. :type dropout_rate: float :param batch_size: Batch size for neural network. :type batch_size: BatchSize
- batch_size: abacusai.api_class.enums.BatchSize
- __post_init__()
- class abacusai.DocumentVisualizationTrainingConfig
Bases:
TrainingConfig
Training config for the DOCUMENT_VISUALIZATION problem type :param test_split: Percent of dataset to use for test data. We support using a range between 5 ( i.e. 5% ) to 20 ( i.e. 20% ) of your dataset. :type test_split: int :param dropout_rate: Dropout rate for neural network. :type dropout_rate: float :param batch_size: Batch size for neural network. :type batch_size: BatchSize
- batch_size: abacusai.api_class.enums.BatchSize
- __post_init__()
- class abacusai.ClusteringTrainingConfig
Bases:
TrainingConfig
Training config for the CLUSTERING problem type :param num_clusters_selection: Number of clusters. If None, will be selected automatically. :type num_clusters_selection: int
- __post_init__()
- class abacusai.ClusteringTimeseriesTrainingConfig
Bases:
TrainingConfig
Training config for the CLUSTERING_TIMESERIES problem type :param num_clusters_selection: Number of clusters. If None, will be selected automatically. :type num_clusters_selection: int :param imputation: Imputation method for missing values. :type imputation: ClusteringImputationMethod
- __post_init__()
- class abacusai.EventAnomalyTrainingConfig
Bases:
TrainingConfig
Training config for the EVENT_ANOMALY problem type :param anomaly_fraction: The fraction of the dataset to classify as anomalous, between 0 and 0.5 :type anomaly_fraction: float
- __post_init__()
- class abacusai.CumulativeForecastingTrainingConfig
Bases:
TrainingConfig
Training config for the CUMULATIVE_FORECASTING problem type :param test_split: Percent of dataset to use for test data. We support using a range between 5 ( i.e. 5% ) to 20 ( i.e. 20% ) of your dataset. :type test_split: int :param historical_frequency: Forecast frequency :type historical_frequency: str :param cumulative_prediction_lengths: List of Cumulative Prediction Frequencies. Each prediction length must be between 1 and 365. :type cumulative_prediction_lengths: List[int] :param skip_input_transform: Avoid doing numeric scaling transformations on the input. :type skip_input_transform: bool :param skip_target_transform: Avoid doing numeric scaling transformations on the target. :type skip_target_transform: bool :param predict_residuals: Predict residuals instead of totals at each prediction step. :type predict_residuals: bool
- __post_init__()
- class abacusai.AnomalyDetectionTrainingConfig
Bases:
TrainingConfig
Training config for the ANOMALY_DETECTION problem type :param test_split: Percent of dataset to use for test data. We support using a range between 5 (i.e. 5%) to 20 (i.e. 20%) of your dataset to use as test data. :type test_split: int :param value_high: Detect unusually high values. :type value_high: bool :param mixture_of_gaussians: Detect unusual combinations of values using mixture of Gaussians. :type mixture_of_gaussians: bool :param variational_autoencoder: Use variational autoencoder for anomaly detection. :type variational_autoencoder: bool :param spike_up: Detect outliers with a high value. :type spike_up: bool :param spike_down: Detect outliers with a low value. :type spike_down: bool :param trend_change: Detect changes to the trend. :type trend_change: bool
- __post_init__()
- class abacusai.ThemeAnalysisTrainingConfig
Bases:
TrainingConfig
Training config for the THEME ANALYSIS problem type
- __post_init__()
- class abacusai.AIAgentTrainingConfig
Bases:
TrainingConfig
Training config for the AI_AGENT problem type :param description: Description of the agent function. :type description: str :param enable_binary_input: If True, the agent will be able to accept binary data as inputs. :type enable_binary_input: bool
- __post_init__()
- class abacusai.CustomTrainedModelTrainingConfig
Bases:
TrainingConfig
Training config for the CUSTOM_TRAINED_MODEL problem type :param max_catalog_size: Maximum expected catalog size. :type max_catalog_size: int :param max_dimension: Maximum expected dimension of the catalog. :type max_dimension: int :param index_output_path: Fully qualified cloud location (GCS, S3, etc) to export snapshots of the embedding to. :type index_output_path: str :param docker_image_uri: Docker image URI. :type docker_image_uri: str :param service_port: Service port. :type service_port: int
- __post_init__()
- class abacusai.CustomAlgorithmTrainingConfig
Bases:
TrainingConfig
Training config for the CUSTOM_ALGORITHM problem type :param train_function_name: The name of the train function. :type train_function_name: str :param predict_many_function_name: The name of the predict many function. :type predict_many_function_name: str :param training_input_tables: List of tables to use for training. :type training_input_tables: List[str] :param predict_function_name: Optional name of the predict function if the predict many function is not given. :type predict_function_name: str :param train_module_name: The name of the train module - only relevant if model is being uploaded from a zip file or github repositoty. :type train_module_name: str :param predict_module_name: The name of the predict module - only relevant if model is being uploaded from a zip file or github repositoty. :type predict_module_name: str :param test_split: Percent of dataset to use for test data. We support using a range between 6% to 20% of your dataset to use as test data. :type test_split: int
- __post_init__()
- class abacusai.OptimizationTrainingConfig
Bases:
TrainingConfig
Training config for the OPTIMIZATION problem type :param solve_time_limit: The maximum time in seconds to spend solving the problem. Accepts values between 0 and 86400. :type solve_time_limit: float
- __post_init__()
- class abacusai._TrainingConfigFactory
Bases:
abacusai.api_class.abstract._ApiClassFactory
Helper class that provides a standard way to create an ABC using inheritance.
- config_abstract_class
- config_class_key = 'problem_type'
- config_class_map
- class abacusai.ForecastingMonitorConfig
Bases:
abacusai.api_class.abstract.ApiClass
Helper class that provides a standard way to create an ABC using inheritance.
- to_dict()
Standardizes converting an ApiClass to dictionary. Keys of response dictionary are converted to camel case. This also validates the fields ( type, value, etc ) received in the dictionary.
- class abacusai.AlertConditionConfig
Bases:
abacusai.api_class.abstract.ApiClass
Helper class that provides a standard way to create an ABC using inheritance.
- alert_type: abacusai.api_class.enums.MonitorAlertType
- classmethod _get_builder()
- class abacusai.AccuracyBelowThresholdConditionConfig
Bases:
AlertConditionConfig
Accuracy Below Threshold Condition Config for Monitor Alerts :param threshold: Threshold for when to consider a column to be in violation. The alert will only fire when the drift value is strictly greater than the threshold. :type threshold: float
- class abacusai.FeatureDriftConditionConfig
Bases:
AlertConditionConfig
Feature Drift Condition Config for Monitor Alerts :param feature_drift_type: Feature drift type to apply the threshold on to determine whether a column has drifted significantly enough to be a violation. :type feature_drift_type: str :param threshold: Threshold for when to consider a column to be in violation. The alert will only fire when the drift value is strictly greater than the threshold. :type threshold: float :param minimum_violations: Number of columns that must exceed the specified threshold to trigger an alert. :type minimum_violations: int
- feature_drift_type: abacusai.api_class.enums.FeatureDriftType
- class abacusai.DataIntegrityViolationConditionConfig
Bases:
AlertConditionConfig
Data Integrity Violation Condition Config for Monitor Alerts :param data_integrity_type: This option selects the data integrity violations to monitor for this alert. :type data_integrity_type: enums.DataIntegrityViolationType :param minimum_violations: Number of columns that must exceed the specified threshold to trigger an alert. :type minimum_violations: int
- data_integrity_type: abacusai.api_class.enums.DataIntegrityViolationType
- class abacusai.BiasViolationConditionConfig
Bases:
AlertConditionConfig
Bias Violation Condition Config for Monitor Alerts :param bias_type: This option selects the bias metric to monitor for this alert. :type bias_type: enums.BiasType :param threshold: Threshold for when to consider a column to be in violation. The alert will only fire when the drift value is strictly greater than the threshold. :type threshold: float :param minimum_violations: Number of columns that must exceed the specified threshold to trigger an alert. :type minimum_violations: int
- bias_type: abacusai.api_class.enums.BiasType
- class abacusai._AlertConditionConfigFactory
Bases:
abacusai.api_class.abstract._ApiClassFactory
Helper class that provides a standard way to create an ABC using inheritance.
- config_abstract_class
- config_class_key = 'alert_type'
- config_class_key_value_camel_case = True
- config_class_map
- class abacusai.AlertActionConfig
Bases:
abacusai.api_class.abstract.ApiClass
Helper class that provides a standard way to create an ABC using inheritance.
- action_type: abacusai.api_class.enums.AlertActionType
- classmethod _get_builder()
- class abacusai.EmailActionConfig
Bases:
AlertActionConfig
Email Action Config for Monitor Alerts :param email_recipients: List of email addresses to send the alert to. :type email_recipients: List[str] :param email_body: Body of the email to send. :type email_body: str
- __post_init__()
- class abacusai._AlertActionConfigFactory
Bases:
abacusai.api_class.abstract._ApiClassFactory
Helper class that provides a standard way to create an ABC using inheritance.
- config_abstract_class
- config_class_key = 'action_type'
- config_class_map
- class abacusai.FeatureMappingConfig
Bases:
abacusai.api_class.abstract.ApiClass
Helper class that provides a standard way to create an ABC using inheritance.
- class abacusai.ProjectFeatureGroupTypeMappingsConfig
Bases:
abacusai.api_class.abstract.ApiClass
Helper class that provides a standard way to create an ABC using inheritance.
- feature_mappings: List[FeatureMappingConfig]
- class abacusai.PythonFunctionArgument
Bases:
abacusai.api_class.abstract.ApiClass
A config class for python function arguments
- Parameters:
variable_type (PythonFunctionArgumentType) – The type of the python function argument
name (str) – The name of the python function variable
is_required (bool) – Whether the argument is required
value (Any) – The value of the argument
pipeline_variable (str) – The name of the pipeline variable to use as the value
- variable_type: abacusai.api_class.enums.PythonFunctionArgumentType
- value: Any
- class abacusai.OutputVariableMapping
Bases:
abacusai.api_class.abstract.ApiClass
A config class for python function arguments
- Parameters:
variable_type (PythonFunctionOutputArgumentType) – The type of the python function output argument
name (str) – The name of the python function variable
- variable_type: abacusai.api_class.enums.PythonFunctionOutputArgumentType
- class abacusai.FeatureGroupExportConfig
Bases:
abacusai.api_class.abstract.ApiClass
Helper class that provides a standard way to create an ABC using inheritance.
- connector_type: abacusai.api_class.enums.ConnectorType
- classmethod _get_builder()
- class abacusai.FileConnectorExportConfig
Bases:
FeatureGroupExportConfig
Helper class that provides a standard way to create an ABC using inheritance.
- connector_type: abacusai.api_class.enums.ConnectorType
- to_dict()
Standardizes converting an ApiClass to dictionary. Keys of response dictionary are converted to camel case. This also validates the fields ( type, value, etc ) received in the dictionary.
- class abacusai.DatabaseConnectorExportConfig
Bases:
FeatureGroupExportConfig
Helper class that provides a standard way to create an ABC using inheritance.
- connector_type: abacusai.api_class.enums.ConnectorType
- to_dict()
Standardizes converting an ApiClass to dictionary. Keys of response dictionary are converted to camel case. This also validates the fields ( type, value, etc ) received in the dictionary.
- class abacusai._FeatureGroupExportConfigFactory
Bases:
abacusai.api_class.abstract._ApiClassFactory
Helper class that provides a standard way to create an ABC using inheritance.
- config_abstract_class
- config_class_key = 'connectorType'
- config_class_map
- class abacusai.ApiClient(api_key=None, server=None, client_options=None, skip_version_check=False)
Bases:
ReadOnlyClient
Abacus.AI API Client
- Parameters:
api_key (str) – The api key to use as authentication to the server
server (str) – The base server url to use to send API requets to
client_options (ClientOptions) – Optional API client configurations
skip_version_check (bool) – If true, will skip checking the server’s current API version on initializing the client
- create_dataset_from_pandas(feature_group_table_name, df, clean_column_names=False)
[Deprecated] Creates a Dataset from a pandas dataframe
- Parameters:
feature_group_table_name (str) – The table name to assign to the feature group created by this call
df (pandas.DataFrame) – The dataframe to upload
clean_column_names (bool) – If true, the dataframe’s column names will be automatically cleaned to be complaint with Abacus.AI’s column requirements. Otherwise it will raise a ValueError.
- Returns:
The dataset object created
- Return type:
- create_dataset_version_from_pandas(table_name_or_id, df, clean_column_names=False)
[Deprecated] Updates an existing dataset from a pandas dataframe
- Parameters:
table_name_or_id (str) – The table name of the feature group or the ID of the dataset to update
df (pandas.DataFrame) – The dataframe to upload
clean_column_names (bool) – If true, the dataframe’s column names will be automatically cleaned to be complaint with Abacus.AI’s column requirements. Otherwise it will raise a ValueError.
- Returns:
The dataset updated
- Return type:
- create_feature_group_from_pandas_df(table_name, df, clean_column_names=False)
Create a Feature Group from a local Pandas DataFrame.
- Parameters:
table_name (str) – The table name to assign to the feature group created by this call
df (pandas.DataFrame) – The dataframe to upload
clean_column_names (bool) – If true, the dataframe’s column names will be automatically cleaned to be complaint with Abacus.AI’s column requirements. Otherwise it will raise a ValueError.
- Return type:
- update_feature_group_from_pandas_df(table_name, df, clean_column_names=False)
Updates a DATASET Feature Group from a local Pandas DataFrame.
- Parameters:
table_name (str) – The table name to assign to the feature group created by this call
df (pandas.DataFrame) – The dataframe to upload
clean_column_names (bool) – If true, the dataframe’s column names will be automatically cleaned to be complaint with Abacus.AI’s column requirements. Otherwise it will raise a ValueError.
- Return type:
- create_feature_group_from_spark_df(table_name, df)
Create a Feature Group from a local Spark DataFrame.
- Parameters:
df (pyspark.sql.DataFrame) – The dataframe to upload
table_name (str) – The table name to assign to the feature group created by this call
- Return type:
- update_feature_group_from_spark_df(table_name, df)
Create a Feature Group from a local Spark DataFrame.
- Parameters:
df (pyspark.sql.DataFrame) – The dataframe to upload
table_name (str) – The table name to assign to the feature group created by this call
should_wait_for_upload (bool) – Wait for dataframe to upload before returning. Some FeatureGroup methods, like materialization, may not work until upload is complete.
timeout (int, optional) – If waiting for upload, time out after this limit.
- Return type:
- create_spark_df_from_feature_group_version(session, feature_group_version)
Create a Spark Dataframe in the provided Spark Session context, for a materialized Abacus Feature Group Version.
- Parameters:
session (pyspark.sql.SparkSession) – Spark session
feature_group_version (str) – Feature group version to load from
- Returns:
pyspark.sql.DataFrame
- create_prediction_operator_from_functions(name, project_id, predict_function=None, initialize_function=None, feature_group_ids=None, cpu_size=None, memory=None, included_modules=None, package_requirements=None, use_gpu=False)
Create a new prediction operator.
- Parameters:
prediction_operator_id (str) – The unique ID of the prediction operator.
name (str) – Name of the prediction operator.
function_source_code (str) – Contents of a valid Python source code file. The source code should contain the function predictFunctionName, and the function ‘initializeFunctionName’ if defined.
predict_function_name (str) – Name of the optional initialize function found in the source code. This function will generate anything used by predictions, based on input feature groups.
predict_function_name – Name of the function found in the source code that will be executed to run predictions.
feature_group_ids (list) – List of feature groups that are supplied to the initialize function as parameters. Each of the parameters are materialized Dataframes.
cpu_size (str) – Size of the CPU for the prediction operator.
memory (int) – Memory (in GB) for the prediction operator.
package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’]
use_gpu (bool) – Whether this rediction operator needs gpu.
project_id (str) –
predict_function (callable) –
initialize_function (callable) –
included_modules (list) –
- Returns
PredictionOperator: The updated prediction operator object.
- update_prediction_operator_from_functions(prediction_operator_id, name=None, predict_function=None, initialize_function=None, feature_group_ids=None, cpu_size=None, memory=None, included_modules=None, package_requirements=None, use_gpu=False)
Update an existing prediction operator.
- Parameters:
name (str) – The name of the prediction operator
project_id (str) – The project to create the prediction in
predict_function (callable) – The predict function callable to serialize and upload
initialize_function (callable) – The initialize function callable to serialize and upload
initialize_input_tables (list) – The input table names of the feature groups to pass to the train function
cpu_size (str) – Size of the cpu for the training function
memory (int) – Memory (in GB) for the training function
package_requirements (List) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’]
included_modules (list) – A list of user-created modules that will be included, which is equivalent to ‘from module import *’
use_gpu (bool) – Whether this prediction needs gpu
prediction_operator_id (str) –
feature_group_ids (list) –
- create_model_from_functions(project_id, train_function, predict_function=None, training_input_tables=None, predict_many_function=None, initialize_function=None, cpu_size=None, memory=None, training_config=None, exclusive_run=False, included_modules=None, package_requirements=None, name=None, use_gpu=False)
Creates a model from a python function
- Parameters:
project_id (str) – The project to create the model in
train_function (callable) – The training fucntion callable to serialize and upload
predict_function (callable) – The predict function callable to serialize and upload
predict_many_function (callable) – The predict many function callable to serialize and upload
initialize_function (callable) – The initialize function callable to serialize and upload
training_input_tables (list) – The input table names of the feature groups to pass to the train function
cpu_size (str) – Size of the cpu for the training function
memory (int) – Memory (in GB) for the training function
package_requirements (List) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’]
included_modules (list) – A list of user-created modules that will be included, which is equivalent to ‘from module import *’
name (str) – The name of the model
use_gpu (bool) – Whether this model needs gpu
training_config (dict) –
exclusive_run (bool) –
- update_model_from_functions(model_id, train_function, predict_function=None, predict_many_function=None, initialize_function=None, training_input_tables=None, cpu_size=None, memory=None, included_modules=None, package_requirements=None, use_gpu=False)
Creates a model from a python function. Please pass in all the functions, even if you don’t update it.
- Parameters:
model_id (str) – The id of the model to update
train_function (callable) – The training fucntion callable to serialize and upload
predict_function (callable) – The predict function callable to serialize and upload
predict_many_function (callable) – The predict many function callable to serialize and upload
initialize_function (callable) – The initialize function callable to serialize and upload
training_input_tables (list) – The input table names of the feature groups to pass to the train function
cpu_size (str) – Size of the cpu for the training function
memory (int) – Memory (in GB) for the training function
package_requirements (List) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’]
included_modules (list) – A list of user-created modules that will be included, which is equivalent to ‘from module import *’
use_gpu (bool) – Whether this model needs gpu
- create_pipeline_step_from_function(pipeline_id, step_name, function, step_input_mappings=None, output_variable_mappings=None, step_dependencies=None, package_requirements=None, cpu_size=None, memory=None)
Creates a step in a given pipeline from a python function.
- Parameters:
pipeline_id (str) – The ID of the pipeline to add the step to.
step_name (str) – The name of the step.
function (callable) – The python function.
step_input_mappings (List[PythonFunctionArguments]) – List of Python function arguments.
output_variable_mappings (List[OutputVariableMapping]) – List of Python function ouputs.
step_dependencies (List[str]) – List of step names this step depends on.
package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].
cpu_size (str) – Size of the CPU for the step function.
memory (int) – Memory (in GB) for the step function.
- update_pipeline_step_from_function(pipeline_step_id, function, step_input_mappings=None, output_variable_mappings=None, step_dependencies=None, package_requirements=None, cpu_size=None, memory=None)
Updates a pipeline step from a python function.
- Parameters:
pipeline_step_id (str) – The ID of the pipeline_step to update.
function (callable) – The python function.
step_input_mappings (List[PythonFunctionArguments]) – List of Python function arguments.
output_variable_mappings (List[OutputVariableMapping]) – List of Python function ouputs.
step_dependencies (List[str]) – List of step names this step depends on.
package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].
cpu_size (str) – Size of the CPU for the step function.
memory (int) – Memory (in GB) for the step function.
- create_feature_group_from_python_function(function, table_name, input_tables=None, python_function_name=None, python_function_bindings=None, cpu_size=None, memory=None, package_requirements=None, included_modules=None)
Creates a feature group from a python function
- Parameters:
function (callable) – The function callable for the feature group
table_name (str) – The table name to give the feature group
input_tables (list) – The input table names of the feature groups as input to the feature group function
python_function_name (str) – The name of the python function to create a feature group from.
python_function_bindings (List<PythonFunctionArguments>) – List of python function arguments
cpu_size (str) – Size of the cpu for the feature group function
memory (int) – Memory (in GB) for the feature group function
package_requirements (List) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’]
included_modules (list) – A list of user-created modules that will be included, which is equivalent to ‘from module import *’
- update_python_function_code(name, function=None, function_variable_mappings=None, package_requirements=None, included_modules=None)
Update custom python function with user inputs for the given python function.
- Parameters:
name (String) – The unique name to identify the python function in an organization.
function (callable) – The function callable to serialize and upload.
function_variable_mappings (List<PythonFunctionArguments>) – List of python function arguments
package_requirements (List) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’]
included_modules (list) – A list of user-created modules that will be included, which is equivalent to ‘from module import *’
- Returns:
The python_function object.
- Return type:
- create_algorithm_from_function(name, problem_type, training_data_parameter_names_mapping=None, training_config_parameter_name=None, train_function=None, predict_function=None, predict_many_function=None, initialize_function=None, common_functions=None, config_options=None, is_default_enabled=False, project_id=None, use_gpu=False, package_requirements=None, included_modules=None)
Create a new algorithm, or update existing algorithm if the name already exists
- Parameters:
name (String) – The name to identify the algorithm, only uppercase letters, numbers and underscore allowed
problem_type (str) – The type of the problem this algorithm will work on
train_function (callable) – The training function callable to serialize and upload
predict_function (callable) – The predict function callable to serialize and upload
predict_many_function (callable) – The predict many function callable to serialize and upload
initialize_function (callable) – The initialize function callable to serialize and upload
common_functions (List of callables) – A list of functions that will be used by both train and predict functions, e.g. some data processing utilities
training_data_parameter_names_mapping (Dict) – The mapping from feature group types to training data parameter names in the train function
training_config_parameter_name (string) – The train config parameter name in the train function
config_options (Dict) – Map dataset types and configs to train function parameter names
is_default_enabled (bool) – Whether train with the algorithm by default
project_id (Unique String Identifier) – The unique version ID of the project
use_gpu (Boolean) – Whether this algorithm needs to run on GPU
package_requirements (List) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’]
included_modules (list) – A list of user-created modules that will be included, which is equivalent to ‘from module import *’
- update_algorithm_from_function(algorithm, training_data_parameter_names_mapping=None, training_config_parameter_name=None, train_function=None, predict_function=None, predict_many_function=None, initialize_function=None, common_functions=None, config_options=None, is_default_enabled=None, use_gpu=None, package_requirements=None, included_modules=None)
Create a new algorithm, or update existing algorithm if the name already exists
- Parameters:
algorithm (String) – The name to identify the algorithm, only uppercase letters, numbers and underscore allowed
train_function (callable) – The training fucntion callable to serialize and upload
predict_function (callable) – The predict function callable to serialize and upload
predict_many_function (callable) – The predict many function callable to serialize and upload
initialize_function (callable) – The initialize function callable to serialize and upload
common_functions (List of callables) – A list of functions that will be used by both train and predict functions, e.g. some data processing utilities
training_data_parameter_names_mapping (Dict) – The mapping from feature group types to training data parameter names in the train function
training_config_parameter_name (string) – The train config parameter name in the train function
config_options (Dict) – Map dataset types and configs to train function parameter names
is_default_enabled (Boolean) – Whether train with the algorithm by default
use_gpu (Boolean) – Whether this algorithm needs to run on GPU
package_requirements (List) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’]
included_modules (list) – A list of user-created modules that will be included, which is equivalent to ‘from module import *’
- get_train_function_input(project_id, training_table_names=None, training_data_parameter_name_override=None, training_config_parameter_name_override=None, training_config=None, custom_algorithm_config=None)
Get the input data for the train function to test locally.
- Parameters:
project_id (String) – The id of the project
training_table_names (List) – A list of feature group tables used for training
training_data_parameter_name_override (Dict) – The mapping from feature group types to training data parameter names in the train function
training_config_parameter_name_override (String) – The train config parameter name in the train function
training_config (Dict) – A dictionary for Abacus.AI defined training options and values
custom_algorithm_config (Any) – User-defined config that can be serialized by JSON
- Returns:
A dictionary that maps train function parameter names to their values.
- get_train_function_input_from_model_version(model_version, algorithm=None, training_config=None, custom_algorithm_config=None)
Get the input data for the train function to test locally, based on a trained model version.
- Parameters:
model_version (String) – The string identifier of the model version
algorithm (String) – The particular algorithm’s name, whose train function to test with
training_config (Dict) – A dictionary for Abacus.AI defined training options and values
custom_algorithm_config (Any) – User-defined config that can be serialized by JSON
- Returns:
A dictionary that maps train function parameter names to their values.
- create_custom_loss_function(name, loss_function_type, loss_function)
Registers a new custom loss function which can be used as an objective function during model training.
- Parameters:
name (String) – A name for the loss. Should be unique per organization. Limit - 50 chars. Only underscores, numbers, uppercase alphabets allowed
loss_function_type (String) – The category of problems that this loss would be applicable to. Ex - REGRESSION_DL_TF, CLASSIFICATION_DL_TF, etc.
loss_function (Callable) – A python functor which can take required arguments (Ex - (y_true, y_pred)) and returns loss value(s) (Ex - An array of loss values of size batch size)
- Returns:
A description of the registered custom loss function
- Return type:
- Raises:
InvalidParameterError – If either loss function name or type or the passed function is invalid/incompatible
AlreadyExistsError – If the loss function with the same name already exists in the organization
- update_custom_loss_function(name, loss_function)
Updates a previously registered custom loss function with a new function implementation.
- Parameters:
name (String) – name of the registered custom loss.
loss_function (Callable) – A python functor which can take required arguments (Ex - (y_true, y_pred)) and returns loss value(s) (Ex - An array of loss values of size batch size)
- Returns:
A description of the updated custom loss function
- Return type:
- Raises:
InvalidParameterError – If either loss function name or type or the passed function is invalid/incompatible
DataNotFoundError – If a loss function with given name is not found in the organization
- create_custom_metric_from_function(name, problem_type, custom_metric_function)
Registers a new custom metric which can be used as an evaluation metric for the trained model.
- Parameters:
name (String) – A name for the metric. Should be unique per organization. Limit - 50 chars. Only underscores, numbers, uppercase alphabets allowed.
problem_type (String) – The problem type that this metric would be applicable to. e.g. - REGRESSION, FORECASTING, etc.
custom_metric_function (Callable) – A python functor which can take required arguments e.g. (y_true, y_pred) and returns the metric value.
- Returns:
The newly created custom metric.
- Return type:
- Raises:
InvalidParameterError – If either custom metric name or type or the passed function is invalid/incompatible.
AlreadyExistsError – If a custom metric with given name already exists in the organization.
- update_custom_metric_from_function(name, custom_metric_function)
Updates a previously registered custom metric.
- Parameters:
name (String) – A name for the metric. Should be unique per organization. Limit - 50 chars. Only underscores, numbers, uppercase alphabets allowed.
custom_metric_function (Callable) – A python functor which can take required arguments e.g. (y_true, y_pred) and returns the metric value.
- Returns:
The updated custom metric.
- Return type:
- Raises:
InvalidParameterError – If either custom metric name or type or the passed function is invalid/incompatible.
DataNotFoundError – If a custom metric with given name is not found in the organization.
- create_module_from_notebook(file_path, name)
Create a module with the code marked in the notebook. Use ‘#module_start#’ to mark the starting code cell and ‘#module_end#’ for the ending code cell.
- Parameters:
file_path (String) – Notebook’s relative path to the root directory, e.g. ‘n1.ipynb’
name (String) – Name of the module to create.
- Returns:
the created Abacus.ai module object
- Return type:
- update_module_from_notebook(file_path, name)
Update the module with the code marked in the notebook. Use ‘#module_start#’ to mark the starting code cell and ‘#module_end#’ for the ending code cell.
- Parameters:
file_path (String) – Notebook’s relative path to the root directory, e.g. ‘n1.ipynb’
name (String) – Name of the module to create.
- Returns:
the created Abacus.ai module object
- Return type:
- import_module(name)
Import a module created previously. It will reload if has been imported before. This will be equivalent to including from that module file.
- Parameters:
name (String) – Name of the module to import.
- Returns:
the imported python module
- Return type:
module
- create_agent_from_function(project_id, agent_function, name=None, memory=None, package_requirements=None)
Creates the agent from a python function
- Parameters:
project_id (str) – The project to create the model in
agent_function (callable) – The agent function callable to serialize and upload
name (str) – The name of the agent
memory (int) – Memory (in GB) for hosting the agent
package_requirements (List) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’]
- update_agent_with_function(model_id, agent_function, memory=None, package_requirements=None)
Updates the agent with a new agent function.
- Parameters:
model_id (str) – The unique ID associated with the AI Agent to be changed.
agent_function (callable) – The new agent function callable to serialize and upload
memory (int) – Memory (in GB) for hosting the agent
package_requirements (List) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’]
- execute_feature_group_sql(sql, timeout=3600, delay=2)
Execute a SQL query on the feature groups
- Parameters:
sql (str) – The SQL query to execute.
- Returns:
The result of the query.
- Return type:
pandas.DataFrame
- get_agent_context_chat_history()
Gets a history of chat messages from the current request context. Applicable within a AIAgent execute function.
- Returns:
List[ChatMessage]:: The chat history for the current request being processed by the Agent.
- set_agent_context_chat_history(chat_history)
Sets the history of chat messages from the current request context.
- Parameters:
chat_history (List[ChatMessage]) – The chat history associated with the current request context.
- clear_agent_context()
Clears the current request context.
- streaming_evaluate_prompt(prompt, system_message=None, llm_name=None, max_tokens=None)
Generate response to the prompt using the specified model. This works similar to evaluate_prompt, but would stream the result as well so that user is aware of the current status of the generation.
- Parameters:
prompt (str) – Prompt to use for generation.
system_message (str) – System message for models that support it.
llm_name (str) – Name of the underlying LLM to be used for generation. Should be one of ‘gpt-4’ or ‘gpt-3.5-turbo’. Default is auto selection.
max_tokens (int) – Maximum number of tokens to generate. If set, the model will just stop generating after this token limit is reached.
- Returns:
The response from the model, raw text and parsed components.
- Return type:
LLMResponse
- _get_agent_async_app_request_id()
Gets the current request ID for the current request context of async app. Applicable within a AIAgent execute function.
- Returns:
The request ID for the current request being processed by the Agent.
- Return type:
- _get_agent_async_app_caller()
Gets the caller for the current request context of async app. Applicable within a AIAgent execute function.
- Returns:
The caller for the current request being processed by the Agent.
- Return type:
- stream_message(message)
Streams a message to the current request context. Applicable within a AIAgent execute function. If the request is from the abacus.ai app, the response will be streamed to the UI. otherwise would be logged info if used from notebook or python script.
- Parameters:
message (str) – The message to be streamed.
- Return type:
None
- _call_aiagent_asyncapp_sync_message(request_id, caller, message=None, llm_args=None)
Calls the AI Agent AsyncApp sync message endpoint.
- Parameters:
- Returns:
The response from the AsyncApp.
- Return type:
- _status_poll(url, wait_states, method, body={}, headers=None, delay=2, timeout=600)
- _proxy_request(name, method='POST', query_params=None, body=None, files=None, parse_type=None, is_sync=False, streamable_response=False)
- execute_data_query_using_llm(query, feature_group_ids, prompt_context=None, llm_name=None, temperature=None, preview=False, schema_document_retriever_ids=None, timeout=3600, delay=2)
Execute a data query using a large language model.
- Parameters:
query (str) – The natural language query to execute. The query is converted to a SQL query using the language model.
feature_group_ids (List[str]) – A list of feature group IDs that the query should be executed against.
prompt_context (str) – The context message used to construct the prompt for the language model. If not provide, a default context message is used.
llm_name (str) – The name of the language model to use. If not provided, the default language model is used.
temperature (float) – The temperature to use for the language model if supported. If not provided, the default temperature is used.
preview (bool) – If True, a preview of the query execution is returned.
schema_document_retriever_ids (List[str]) – A list of document retrievers to retrieve schema information for the data query. Otherwise, they are retrieved from the feature group metadata.
- Returns:
The result of the query.
- Return type:
pandas.DataFrame
- get_matching_documents(document_retriever_id, query, filters=None, limit=None, result_columns=None, max_words=None, num_retrieval_margin_words=None, max_words_per_chunk=None)
Lookup document retrievers and return the matching documents from the document retriever deployed with given query.
Original documents are splitted into chunks and stored in the document retriever. This lookup function will return the relevant chunks from the document retriever. The returned chunks could be expanded to include more words from the original documents and merged if they are overlapping, and permitted by the settings provided. The returned chunks are sorted by relevance.
- Parameters:
document_retriever_id (str) – A unique string identifier associated with the document retriever.
query (str) – The query to search for.
filters (dict) – A dictionary mapping column names to a list of values to restrict the retrieved search results.
limit (int) – If provided, will limit the number of results to the value specified.
result_columns (list) – If provided, will limit the column properties present in each result to those specified in this list.
max_words (int) – If provided, will limit the total number of words in the results to the value specified.
num_retrieval_margin_words (int) – If provided, will add this number of words from left and right of the returned chunks.
max_words_per_chunk (int) – If provided, will limit the number of words in each chunk to the value specified. If the value provided is smaller than the actual size of chunk on disk, which is determined during document retriever creation, the actual size of chunk will be used. I.e, chunks looked up from document retrievers will not be split into smaller chunks during lookup due to this setting.
- Returns:
The relevant documentation results found from the document retriever.
- Return type:
- add_user_to_organization(email)
Invite a user to your organization. This method will send the specified email address an invitation link to join your organization.
- Parameters:
email (str) – The email address to invite to your organization.
- create_organization_group(group_name, permissions, default_group=False)
Creates a new Organization Group.
- Parameters:
- Returns:
Information about the created Organization Group.
- Return type:
- add_organization_group_permission(organization_group_id, permission)
Adds a permission to the specified Organization Group.
- remove_organization_group_permission(organization_group_id, permission)
Removes a permission from the specified Organization Group.
- delete_organization_group(organization_group_id)
Deletes the specified Organization Group
- Parameters:
organization_group_id (str) – Unique string identifier of the organization group.
- add_user_to_organization_group(organization_group_id, email)
Adds a user to the specified Organization Group.
- remove_user_from_organization_group(organization_group_id, email)
Removes a user from an Organization Group.
- set_default_organization_group(organization_group_id)
Sets the default Organization Group to which all new users joining an organization are automatically added.
- Parameters:
organization_group_id (str) – Unique string identifier of the Organization Group.
- delete_api_key(api_key_id)
Delete a specified API key.
- Parameters:
api_key_id (str) – The ID of the API key to delete.
- remove_user_from_organization(email)
Removes the specified user from the organization. You can remove yourself; otherwise, you must be an organization administrator to use this method to remove other users from the organization.
- Parameters:
email (str) – The email address of the user to remove from the organization.
- create_deployment_webhook(deployment_id, endpoint, webhook_event_type, payload_template=None)
Create a webhook attached to a given deployment ID.
- Parameters:
deployment_id (str) – Unique string identifier for the deployment this webhook will attach to.
endpoint (str) – URI that the webhook will send HTTP POST requests to.
webhook_event_type (str) – One of ‘DEPLOYMENT_START’, ‘DEPLOYMENT_SUCCESS’, or ‘DEPLOYMENT_FAILED’.
payload_template (dict) – Template for the body of the HTTP POST requests. Defaults to {}.
- Returns:
The webhook attached to the deployment.
- Return type:
- update_webhook(webhook_id, endpoint=None, webhook_event_type=None, payload_template=None)
Update the webhook
- delete_webhook(webhook_id)
Delete the webhook
- Parameters:
webhook_id (str) – Unique identifier of the target webhook.
- create_project(name, use_case)
Creates a project with the specified project name and use case. Creating a project creates a container for all datasets and models associated with a particular problem/project. For example, if you want to create a model to detect fraud, you need to first create a project, upload datasets, create feature groups, and then create one or more models to get predictions for your use case.
- Parameters:
name (str) – The project’s name.
use_case (str) – The use case that the project solves. Refer to our [guide on use cases](https://api.abacus.ai/app/help/useCases) for further details of each use case. The following enums are currently available for you to choose from: LANGUAGE_DETECTION, NLP_SENTIMENT, NLP_SEARCH, NLP_CHAT, CHAT_LLM, NLP_SENTENCE_BOUNDARY_DETECTION, NLP_CLASSIFICATION, NLP_SUMMARIZATION, NLP_DOCUMENT_VISUALIZATION, AI_AGENT, EMBEDDINGS_ONLY, MODEL_WITH_EMBEDDINGS, TORCH_MODEL, TORCH_MODEL_WITH_EMBEDDINGS, PYTHON_MODEL, NOTEBOOK_PYTHON_MODEL, DOCKER_MODEL, DOCKER_MODEL_WITH_EMBEDDINGS, CUSTOMER_CHURN, ENERGY, EVENT_ANOMALY_DETECTION, FINANCIAL_METRICS, CUMULATIVE_FORECASTING, FRAUD_ACCOUNT, FRAUD_THREAT, FRAUD_TRANSACTIONS, OPERATIONS_CLOUD, CLOUD_SPEND, TIMESERIES_ANOMALY_DETECTION, OPERATIONS_MAINTENANCE, OPERATIONS_INCIDENT, PERS_PROMOTIONS, PREDICTING, FEATURE_STORE, RETAIL, SALES_FORECASTING, SALES_SCORING, FEED_RECOMMEND, USER_RANKINGS, NAMED_ENTITY_RECOGNITION, USER_RECOMMENDATIONS, USER_RELATED, VISION, VISION_REGRESSION, VISION_OBJECT_DETECTION, FEATURE_DRIFT, SCHEDULING, GENERIC_FORECASTING, PRETRAINED_IMAGE_TEXT_DESCRIPTION, PRETRAINED_SPEECH_RECOGNITION, PRETRAINED_STYLE_TRANSFER, PRETRAINED_TEXT_TO_IMAGE_GENERATION, PRETRAINED_OCR_DOCUMENT_TO_TEXT, THEME_ANALYSIS, CLUSTERING, CLUSTERING_TIMESERIES, PRETRAINED_INSTRUCT_PIX2PIX, PRETRAINED_TEXT_CLASSIFICATION.
- Returns:
This object represents the newly created project.
- Return type:
- rename_project(project_id, name)
This method renames a project after it is created.
- delete_project(project_id)
Delete a specified project from your organization.
This method deletes the project, its associated trained models, and deployments. The datasets attached to the specified project remain available for use with other projects in the organization.
This method will not delete a project that contains active deployments. Ensure that all active deployments are stopped before using the delete option.
Note: All projects, models, and deployments cannot be recovered once they are deleted.
- Parameters:
project_id (str) – The unique ID of the project to delete.
- add_project_tags(project_id, tags)
This method adds a tag to a project.
- remove_project_tags(project_id, tags)
This method removes a tag from a project.
- add_feature_group_to_project(feature_group_id, project_id, feature_group_type='CUSTOM_TABLE')
Adds a feature group to a project.
- set_project_feature_group_config(feature_group_id, project_id, project_config=None)
Sets a feature group’s project config
- remove_feature_group_from_project(feature_group_id, project_id)
Removes a feature group from a project.
- set_feature_group_type(feature_group_id, project_id, feature_group_type='CUSTOM_TABLE')
Update the feature group type in a project. The feature group must already be added to the project.
- set_feature_mapping(project_id, feature_group_id, feature_name, feature_mapping=None, nested_column_name=None)
Set a column’s feature mapping. If the column mapping is single-use and already set in another column in this feature group, this call will first remove the other column’s mapping and move it to this column.
- Parameters:
project_id (str) – The unique ID associated with the project.
feature_group_id (str) – The unique ID associated with the feature group.
feature_name (str) – The name of the feature.
feature_mapping (str) – The mapping of the feature in the feature group.
nested_column_name (str) – The name of the nested column if the input feature is part of a nested feature group for the given feature_group_id.
- Returns:
A list of objects that describes the resulting feature group’s schema after the feature’s featureMapping is set.
- Return type:
- add_annotation(annotation, feature_group_id, feature_name, doc_id=None, feature_group_row_identifier=None, annotation_source='ui', status=None, comments=None, project_id=None, save_metadata=False, pages=None)
Add an annotation entry to the database.
- Parameters:
annotation (dict) – The annotation to add. Format of the annotation is determined by its annotation type.
feature_group_id (str) – The ID of the feature group the annotation is on.
feature_name (str) – The name of the feature the annotation is on.
doc_id (str) – The ID of the primary document the annotation is on. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.
feature_group_row_identifier (str) – The key value of the feature group row the annotation is on (cast to string). Usually the feature group’s primary / identifier key value. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.
annotation_source (str) – Indicator of whether the annotation came from the UI, bulk upload, etc.
status (str) – The status of the annotation. Can be one of ‘todo’, ‘in_progress’, ‘done’. This is optional.
comments (dict) – Comments for the annotation. This is a dictionary of feature name to the corresponding comment. This is optional.
project_id (str) – The ID of the project that the annotation is associated with. This is optional.
save_metadata (bool) – Whether to save the metadata for the annotation. This is optional.
pages (list) – pages (list): List of page numbers to consider while processing the annotation. This is optional. doc_id must be provided if pages is provided.
- Returns:
The annotation entry that was added.
- Return type:
- describe_annotation(feature_group_id, feature_name=None, doc_id=None, feature_group_row_identifier=None)
Get the latest annotation entry for a given feature group, feature, and document.
- Parameters:
feature_group_id (str) – The ID of the feature group the annotation is on.
feature_name (str) – The name of the feature the annotation is on.
doc_id (str) – The ID of the primary document the annotation is on. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.
feature_group_row_identifier (str) – The key value of the feature group row the annotation is on (cast to string). Usually the feature group’s primary / identifier key value. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.
- Returns:
The latest annotation entry for the given feature group, feature, document, and/or annotation key value.
- Return type:
- update_annotation_status(feature_group_id, feature_name, status, doc_id=None, feature_group_row_identifier=None, save_metadata=False)
Update the status of an annotation entry.
- Parameters:
feature_group_id (str) – The ID of the feature group the annotation is on.
feature_name (str) – The name of the feature the annotation is on.
status (str) – The new status of the annotation. Must be one of the following: ‘TODO’, ‘IN_PROGRESS’, ‘DONE’.
doc_id (str) – The ID of the primary document the annotation is on. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.
feature_group_row_identifier (str) – The key value of the feature group row the annotation is on (cast to string). Usually the feature group’s primary / identifier key value. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.
save_metadata (bool) – If True, save the metadata for the annotation entry.
- Returns:
The updated annotation entry.
- Return type:
- get_document_to_annotate(feature_group_id, project_id, feature_name, feature_group_row_identifier=None, get_previous=False)
Get an available document that needs to be annotated for a annotation feature group.
- Parameters:
feature_group_id (str) – The ID of the feature group the annotation is on.
project_id (str) – The ID of the project that the annotation is associated with.
feature_name (str) – The name of the feature the annotation is on.
feature_group_row_identifier (str) – The key value of the feature group row the annotation is on (cast to string). Usually the primary key value. If provided, fetch the immediate next (or previous) available document.
get_previous (bool) – If True, get the previous document instead of the next document. Applicable if feature_group_row_identifier is provided.
- Returns:
The document to annotate.
- Return type:
- import_annotation_labels(feature_group_id, file, annotation_type)
Imports annotation labels from csv file. All valid values in the file will be imported as labels (including header row if present).
- Parameters:
feature_group_id (str) – The unique string identifier of the feature group.
file (io.TextIOBase) – The file to import. Must be a csv file.
annotation_type (str) – The type of the annotation.
- Returns:
The annotation config for the feature group.
- Return type:
- create_feature_group(table_name, sql, description=None)
Creates a new FeatureGroup from a SQL statement.
- Parameters:
- Returns:
The created FeatureGroup.
- Return type:
- create_feature_group_from_template(table_name, feature_group_template_id, template_bindings=None, should_attach_feature_group_to_template=True, description=None)
Creates a new feature group from a SQL statement.
- Parameters:
table_name (str) – The unique name to be given to the feature group.
feature_group_template_id (str) – The unique ID associated with the template that will be used to create this feature group.
template_bindings (list) – Variable bindings that override the template’s variable values.
should_attach_feature_group_to_template (bool) – Set to False to create a feature group but not leave it attached to the template that created it.
description (str) – A user-friendly description of this feature group.
- Returns:
The created feature group.
- Return type:
- create_feature_group_from_function(table_name, function_source_code=None, function_name=None, input_feature_groups=None, description=None, cpu_size=None, memory=None, package_requirements=None, use_original_csv_names=False, python_function_name=None, python_function_bindings=None, use_gpu=None)
Creates a new feature in a Feature Group from user-provided code. Currently supported code languages are Python.
If a list of input feature groups are supplied, we will provide DataFrames (pandas, in the case of Python) with the materialized feature groups for those input feature groups as arguments to the function.
This method expects the source code to be a valid language source file containing a function. This function needs to return a DataFrame when executed; this DataFrame will be used as the materialized version of this feature group table.
- Parameters:
table_name (str) – The unique name to be given to the feature group.
function_source_code (str) – Contents of a valid source code file in a supported Feature Group specification language (currently only Python). The source code should contain a function called function_name. A list of allowed import and system libraries for each language is specified in the user functions documentation section.
function_name (str) – Name of the function found in the source code that will be executed (on the optional inputs) to materialize this feature group.
input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
description (str) – The description for this feature group.
cpu_size (str) – Size of the CPU for the feature group function.
memory (int) – Memory (in GB) for the feature group function.
package_requirements (list) – List of package requirements for the feature group function. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’]
use_original_csv_names (bool) – Defaults to False, if set it uses the original column names for input feature groups from CSV datasets.
python_function_name (str) – Name of Python Function that contains the source code and function arguments.
python_function_bindings (list) – List of arguments to be supplied to the function as parameters in the format [{‘name’: ‘function_argument’, ‘variable_type’: ‘FEATURE_GROUP’, ‘value’: ‘name_of_feature_group’}].
use_gpu (bool) – Whether the feature group needs a gpu or not. Otherwise default to CPU.
- Returns:
The created feature group
- Return type:
- create_sampling_feature_group(feature_group_id, table_name, sampling_config, description=None)
Creates a new Feature Group defined as a sample of rows from another Feature Group.
For efficiency, sampling is approximate unless otherwise specified. (e.g. the number of rows may vary slightly from what was requested).
- Parameters:
feature_group_id (str) – The unique ID associated with the pre-existing Feature Group that will be sampled by this new Feature Group. i.e. the input for sampling.
table_name (str) – The unique name to be given to this sampling Feature Group.
sampling_config (SamplingConfig) – Dictionary defining the sampling method and its parameters.
description (str) – A human-readable description of this Feature Group.
- Returns:
The created Feature Group.
- Return type:
- create_merge_feature_group(source_feature_group_id, table_name, merge_config, description=None)
Creates a new feature group defined as the union of other feature group versions.
- Args:
source_feature_group_id (str): Unique string identifier corresponding to the dataset feature group that will have its versions merged into this feature group. table_name (str): Unique string identifier to be given to this merge feature group. merge_config (MergeConfig): JSON object defining the merging method and its parameters. description (str): Human-readable description of this feature group.
- Returns:
FeatureGroup: The created feature group.
Description: Creates a new feature group defined as the union of other feature group versions.
- Parameters:
source_feature_group_id (str) –
table_name (str) –
merge_config (Union[dict, abacusai.api_class.MergeConfig]) –
description (str) –
- Return type:
- create_transform_feature_group(source_feature_group_id, table_name, transform_config, description=None)
Creates a new Feature Group defined by a pre-defined transform applied to another Feature Group.
- Parameters:
source_feature_group_id (str) – Unique string identifier corresponding to the Feature Group to which the transformation will be applied.
table_name (str) – Unique string identifier for the transform Feature Group.
transform_config (dict) – JSON object (aka map) defining the transform and its parameters.
description (str) – Human-readable description of the Feature Group.
- Returns:
The created Feature Group.
- Return type:
- create_snapshot_feature_group(feature_group_version, table_name)
Creates a Snapshot Feature Group corresponding to a specific Feature Group version.
- Parameters:
- Returns:
Feature Group corresponding to the newly created Snapshot.
- Return type:
- create_online_feature_group(table_name, primary_key, description=None)
Creates an Online Feature Group.
- Parameters:
- Returns:
The created online feature group.
- Return type:
- set_feature_group_sampling_config(feature_group_id, sampling_config)
Set a FeatureGroup’s sampling to the config values provided, so that the rows the FeatureGroup returns will be a sample of those it would otherwise have returned.
- Parameters:
feature_group_id (str) – The unique identifier associated with the FeatureGroup.
sampling_config (SamplingConfig) – A JSON string object specifying the sampling method and parameters specific to that sampling method. An empty sampling_config indicates no sampling.
- Returns:
The updated FeatureGroup.
- Return type:
- set_feature_group_merge_config(feature_group_id, merge_config)
Set a MergeFeatureGroup’s merge config to the values provided, so that the feature group only returns a bounded range of an incremental dataset.
- Parameters:
feature_group_id (str) – Unique identifier associated with the feature group.
merge_config (MergeConfig) – JSON object string specifying the merge rule. An empty merge_config will default to only including the latest dataset version.
- Returns:
The updated FeatureGroup.
- Return type:
- set_feature_group_transform_config(feature_group_id, transform_config)
Set a TransformFeatureGroup’s transform config to the values provided.
- set_feature_group_schema(feature_group_id, schema)
Creates a new schema and points the feature group to the new feature group schema ID.
- create_feature(feature_group_id, name, select_expression)
Creates a new feature in a Feature Group from a SQL select statement.
- Parameters:
- Returns:
A Feature Group object with the newly added feature.
- Return type:
- add_feature_group_tag(feature_group_id, tag)
Adds a tag to the feature group
- remove_feature_group_tag(feature_group_id, tag)
Removes a tag from the specified feature group.
- add_annotatable_feature(feature_group_id, name, annotation_type)
Add an annotatable feature in a Feature Group
- Parameters:
- Returns:
The feature group after the feature has been set
- Return type:
- set_feature_as_annotatable_feature(feature_group_id, feature_name, annotation_type, feature_group_row_identifier_feature=None, doc_id_feature=None)
Sets an existing feature as an annotatable feature (Feature that can be annotated).
- Parameters:
feature_group_id (str) – The unique ID associated with the feature group.
feature_name (str) – The name of the feature to set as annotatable.
annotation_type (str) – The type of annotation label to add.
feature_group_row_identifier_feature (str) – The key value of the feature group row the annotation is on (cast to string) and uniquely identifies the feature group row. At least one of the doc_id or key value must be provided so that the correct annotation can be identified.
doc_id_feature (str) – The name of the document ID feature.
- Returns:
A feature group object with the newly added annotatable feature.
- Return type:
- set_annotation_status_feature(feature_group_id, feature_name)
Sets a feature as the annotation status feature for a feature group.
- Parameters:
- Returns:
The updated feature group.
- Return type:
- unset_feature_as_annotatable_feature(feature_group_id, feature_name)
Unsets a feature as annotatable
- Parameters:
- Returns:
The feature group after unsetting the feature
- Return type:
- add_feature_group_annotation_label(feature_group_id, label_name, annotation_type, label_definition=None)
Adds an annotation label
- Parameters:
- Returns:
The feature group after adding the annotation label
- Return type:
- remove_feature_group_annotation_label(feature_group_id, label_name)
Removes an annotation label
- Parameters:
- Returns:
The feature group after adding the annotation label
- Return type:
- add_feature_tag(feature_group_id, feature, tag)
Adds a tag on a feature
- remove_feature_tag(feature_group_id, feature, tag)
Removes a tag from a feature
- create_nested_feature(feature_group_id, nested_feature_name, table_name, using_clause, where_clause=None, order_clause=None)
Creates a new nested feature in a feature group from a SQL statement.
- Parameters:
feature_group_id (str) – The unique ID associated with the feature group.
nested_feature_name (str) – The name of the feature.
table_name (str) – The table name of the feature group to nest.
using_clause (str) – The SQL join column or logic to join the nested table with the parent.
where_clause (str) – A SQL WHERE statement to filter the nested rows.
order_clause (str) – A SQL clause to order the nested rows.
- Returns:
A feature group object with the newly added nested feature.
- Return type:
- update_nested_feature(feature_group_id, nested_feature_name, table_name=None, using_clause=None, where_clause=None, order_clause=None, new_nested_feature_name=None)
Updates a previously existing nested feature in a feature group.
- Parameters:
feature_group_id (str) – The unique ID associated with the feature group.
nested_feature_name (str) – The name of the feature to be updated.
table_name (str) – The name of the table.
using_clause (str) – The SQL join column or logic to join the nested table with the parent.
where_clause (str) – An SQL WHERE statement to filter the nested rows.
order_clause (str) – An SQL clause to order the nested rows.
new_nested_feature_name (str) – New name for the nested feature.
- Returns:
A feature group object with the updated nested feature.
- Return type:
- delete_nested_feature(feature_group_id, nested_feature_name)
Delete a nested feature.
- Parameters:
- Returns:
A feature group object without the specified nested feature.
- Return type:
- create_point_in_time_feature(feature_group_id, feature_name, history_table_name, aggregation_keys, timestamp_key, historical_timestamp_key, expression, lookback_window_seconds=None, lookback_window_lag_seconds=0, lookback_count=None, lookback_until_position=0)
Creates a new point in time feature in a feature group using another historical feature group, window spec, and aggregate expression.
We use the aggregation keys and either the lookbackWindowSeconds or the lookbackCount values to perform the window aggregation for every row in the current feature group.
If the window is specified in seconds, then all rows in the history table which match the aggregation keys and with historicalTimeFeature greater than or equal to lookbackStartCount and less than the value of the current rows timeFeature are considered. An optional lookbackWindowLagSeconds (+ve or -ve) can be used to offset the current value of the timeFeature. If this value is negative, we will look at the future rows in the history table, so care must be taken to ensure that these rows are available in the online context when we are performing a lookup on this feature group. If the window is specified in counts, then we order the historical table rows aligning by time and consider rows from the window where the rank order is greater than or equal to lookbackCount and includes the row just prior to the current one. The lag is specified in terms of positions using lookbackUntilPosition.
- Parameters:
feature_group_id (str) – The unique ID associated with the feature group.
feature_name (str) – The name of the feature to create.
history_table_name (str) – The table name of the history table.
aggregation_keys (list) – List of keys to use for joining the historical table and performing the window aggregation.
timestamp_key (str) – Name of feature which contains the timestamp value for the point in time feature.
historical_timestamp_key (str) – Name of feature which contains the historical timestamp.
expression (str) – SQL aggregate expression which can convert a sequence of rows into a scalar value.
lookback_window_seconds (float) – If window is specified in terms of time, number of seconds in the past from the current time for start of the window.
lookback_window_lag_seconds (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.
lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row).
lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.
- Returns:
A feature group object with the newly added nested feature.
- Return type:
- update_point_in_time_feature(feature_group_id, feature_name, history_table_name=None, aggregation_keys=None, timestamp_key=None, historical_timestamp_key=None, expression=None, lookback_window_seconds=None, lookback_window_lag_seconds=None, lookback_count=None, lookback_until_position=None, new_feature_name=None)
Updates an existing Point-in-Time (PiT) feature in a feature group. See createPointInTimeFeature for detailed semantics.
- Parameters:
feature_group_id (str) – The unique ID associated with the feature group.
feature_name (str) – The name of the feature.
history_table_name (str) – The table name of the history table. If not specified, we use the current table to do a self join.
aggregation_keys (list) – List of keys to use for joining the historical table and performing the window aggregation.
timestamp_key (str) – Name of the feature which contains the timestamp value for the PiT feature.
historical_timestamp_key (str) – Name of the feature which contains the historical timestamp.
expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.
lookback_window_seconds (float) – If the window is specified in terms of time, the number of seconds in the past from the current time for the start of the window.
lookback_window_lag_seconds (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of the window. If it is negative, we are looking at the “future” rows in the history table.
lookback_count (int) – If the window is specified in terms of count, the start position of the window (0 is the current row).
lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of the window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.
new_feature_name (str) – New name for the PiT feature.
- Returns:
A feature group object with the newly added nested feature.
- Return type:
- create_point_in_time_group(feature_group_id, group_name, window_key, aggregation_keys, history_table_name=None, history_window_key=None, history_aggregation_keys=None, lookback_window=None, lookback_window_lag=0, lookback_count=None, lookback_until_position=0)
Create a Point-in-Time Group
- Parameters:
feature_group_id (str) – The unique ID associated with the feature group to add the point in time group to.
group_name (str) – The name of the point in time group.
window_key (str) – Name of feature to use for ordering the rows on the source table.
aggregation_keys (list) – List of keys to perform on the source table for the window aggregation.
history_table_name (str) – The table to use for aggregating, if not provided, the source table will be used.
history_window_key (str) – Name of feature to use for ordering the rows on the history table. If not provided, the windowKey from the source table will be used.
history_aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation. If not provided, the aggregationKeys from the source table will be used. Must be the same length and order as the source table’s aggregationKeys.
lookback_window (float) – Number of seconds in the past from the current time for the start of the window. If 0, the lookback will include all rows.
lookback_window_lag (float) – Optional lag to offset the closest point for the window. If it is positive, the start of the window is delayed. If it is negative, “future” rows in the history table are used.
lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row).
lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, the start of the window is delayed by that many rows. If it is negative, those many “future” rows in the history table are used.
- Returns:
The feature group after the point in time group has been created.
- Return type:
- generate_point_in_time_features(feature_group_id, group_name, columns, window_functions, prefix=None)
Generates and adds PIT features given the selected columns to aggregate over, and the operations to include.
- Parameters:
feature_group_id (str) – Unique string identifier associated with the feature group.
group_name (str) – Name of the point-in-time group.
columns (list) – List of columns to generate point-in-time features for.
window_functions (list) – List of window functions to operate on.
prefix (str) – Prefix for generated features, defaults to group name
- Returns:
Feature group object with newly added point-in-time features.
- Return type:
- update_point_in_time_group(feature_group_id, group_name, window_key=None, aggregation_keys=None, history_table_name=None, history_window_key=None, history_aggregation_keys=None, lookback_window=None, lookback_window_lag=None, lookback_count=None, lookback_until_position=None)
Update Point-in-Time Group
- Parameters:
feature_group_id (str) – The unique ID associated with the feature group.
group_name (str) – The name of the point-in-time group.
window_key (str) – Name of feature which contains the timestamp value for the point-in-time feature.
aggregation_keys (list) – List of keys to use for joining the historical table and performing the window aggregation.
history_table_name (str) – The table to use for aggregating, if not provided, the source table will be used.
history_window_key (str) – Name of feature to use for ordering the rows on the history table. If not provided, the windowKey from the source table will be used.
history_aggregation_keys (list) – List of keys to use for joining the historical table and performing the window aggregation. If not provided, the aggregationKeys from the source table will be used. Must be the same length and order as the source table’s aggregationKeys.
lookback_window (float) – Number of seconds in the past from the current time for the start of the window.
lookback_window_lag (float) – Optional lag to offset the closest point for the window. If it is positive, the start of the window is delayed. If it is negative, future rows in the history table are looked at.
lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row).
lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, the start of the window is delayed by that many rows. If it is negative, those many future rows in the history table are looked at.
- Returns:
The feature group after the update has been applied.
- Return type:
- delete_point_in_time_group(feature_group_id, group_name)
Delete point in time group
- Parameters:
- Returns:
The feature group after the point in time group has been deleted.
- Return type:
- create_point_in_time_group_feature(feature_group_id, group_name, name, expression)
Create point in time group feature
- Parameters:
feature_group_id (str) – A unique string identifier associated with the feature group.
group_name (str) – The name of the point-in-time group.
name (str) – The name of the feature to add to the point-in-time group.
expression (str) – A SQL aggregate expression which can convert a sequence of rows into a scalar value.
- Returns:
The feature group after the update has been applied.
- Return type:
- update_point_in_time_group_feature(feature_group_id, group_name, name, expression)
Update a feature’s SQL expression in a point in time group
- Parameters:
feature_group_id (str) – The unique ID associated with the feature group.
group_name (str) – The name of the point-in-time group.
name (str) – The name of the feature to add to the point-in-time group.
expression (str) – SQL aggregate expression which can convert a sequence of rows into a scalar value.
- Returns:
The feature group after the update has been applied.
- Return type:
- set_feature_type(feature_group_id, feature, feature_type)
Set the type of a feature in a feature group. Specify the feature group ID, feature name, and feature type, and the method will return the new column with the changes reflected.
- Parameters:
feature_group_id (str) – The unique ID associated with the feature group.
feature (str) – The name of the feature.
feature_type (str) – The machine learning type of the data in the feature. Refer to the [guide on feature types](https://api.abacus.ai/app/help/class/FeatureType) for more information.
- Returns:
The feature group after the data_type is applied.
- Return type:
- invalidate_streaming_feature_group_data(feature_group_id, invalid_before_timestamp)
Invalidates all streaming data with timestamp before invalidBeforeTimestamp
- concatenate_feature_group_data(feature_group_id, source_feature_group_id, merge_type='UNION', replace_until_timestamp=None, skip_materialize=False)
Concatenates data from one Feature Group to another. Feature Groups can be merged if their schemas are compatible, they have the special updateTimestampKey column, and (if set) the primaryKey column. The second operand in the concatenate operation will be appended to the first operand (merge target).
- Parameters:
feature_group_id (str) – The destination Feature Group.
source_feature_group_id (str) – The Feature Group to concatenate with the destination Feature Group.
merge_type (str) – UNION or INTERSECTION.
replace_until_timestamp (int) – The UNIX timestamp to specify the point until which we will replace data from the source Feature Group.
skip_materialize (bool) – If True, will not materialize the concatenated Feature Group.
- remove_concatenation_config(feature_group_id)
Removes the concatenation config on a destination feature group.
- Parameters:
feature_group_id (str) – Unique identifier of the destination feature group to remove the concatenation configuration from.
- set_feature_group_indexing_config(feature_group_id, primary_key=None, update_timestamp_key=None, lookup_keys=None)
Sets various attributes of the feature group used for primary key, deployment lookups and streaming updates.
- Parameters:
feature_group_id (str) – Unique string identifier for the feature group.
primary_key (str) – Name of the feature which defines the primary key of the feature group.
update_timestamp_key (str) – Name of the feature which defines the update timestamp of the feature group. Used in concatenation and primary key deduplication.
lookup_keys (list) – List of feature names which can be used in the lookup API to restrict the computation to a set of dataset rows. These feature names have to correspond to underlying dataset columns.
- describe_async_feature_group_operation(feature_group_operation_run_id)
Gets the status of the execution of fg operation
- Parameters:
feature_group_operation_run_id (str) – The unique ID associated with the execution.
- Returns:
A dict that contains the execution status
- Return type:
- update_feature_group(feature_group_id, description=None)
Modify an existing Feature Group.
- Parameters:
- Returns:
Updated Feature Group object.
- Return type:
- detach_feature_group_from_template(feature_group_id)
Update a feature group to detach it from a template.
- Parameters:
feature_group_id (str) – Unique string identifier associated with the feature group.
- Returns:
The updated feature group.
- Return type:
- update_feature_group_template_bindings(feature_group_id, template_bindings=None)
Update the feature group template bindings for a template feature group.
- Parameters:
- Returns:
Updated feature group.
- Return type:
- update_feature_group_python_function_bindings(feature_group_id, python_function_bindings)
Updates an existing Feature Group’s Python function bindings from a user-provided Python Function. If a list of feature groups are supplied within the Python function bindings, we will provide DataFrames (Pandas in the case of Python) with the materialized feature groups for those input feature groups as arguments to the function.
- update_feature_group_python_function(feature_group_id, python_function_name, python_function_bindings=[])
Updates an existing Feature Group’s python function from a user provided Python Function. If a list of feature groups are supplied within the python function
bindings, we will provide as arguments to the function DataFrame’s (pandas in the case of Python) with the materialized feature groups for those input feature groups.
- Parameters:
feature_group_id (str) – The unique ID associated with the feature group.
python_function_name (str) – The name of the python function to be associated with the feature group.
python_function_bindings (list) – List of arguments to be supplied to the function as parameters in the format [{‘name’: ‘function_argument’, ‘variable_type’: ‘FEATURE_GROUP’, ‘value’: ‘name_of_feature_group’}].
- update_feature_group_sql_definition(feature_group_id, sql)
Updates the SQL statement for a feature group.
- Parameters:
- Returns:
The updated feature group.
- Return type:
- update_dataset_feature_group_feature_expression(feature_group_id, feature_expression)
Updates the SQL feature expression for a Dataset FeatureGroup’s custom features
- Parameters:
- Returns:
The updated feature group.
- Return type:
- update_feature_group_function_definition(feature_group_id, function_source_code=None, function_name=None, input_feature_groups=None, cpu_size=None, memory=None, package_requirements=None, use_original_csv_names=False, python_function_bindings=None, use_gpu=None)
Updates the function definition for a feature group
- Parameters:
feature_group_id (str) – The unique ID associated with the feature group.
function_source_code (str) – Contents of a valid source code file in a supported Feature Group specification language (currently only Python). The source code should contain a function called function_name. A list of allowed import and system libraries for each language is specified in the user functions documentation section.
function_name (str) – Name of the function found in the source code that will be executed (on the optional inputs) to materialize this feature group.
input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized DataFrames (same type as the functions return value).
cpu_size (str) – Size of the CPU for the feature group function.
memory (int) – Memory (in GB) for the feature group function.
package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].
use_original_csv_names (bool) – If set to True, feature group uses the original column names for input feature groups from CSV datasets.
python_function_bindings (list) – List of arguments to be supplied to the function as parameters in the format [{‘name’: ‘function_argument’, ‘variable_type’: ‘FEATURE_GROUP’, ‘value’: ‘name_of_feature_group’}].
use_gpu (bool) – Whether the feature group needs a gpu or not. Otherwise default to CPU.
- Returns:
The updated feature group.
- Return type:
- update_feature(feature_group_id, name, select_expression=None, new_name=None)
Modifies an existing feature in a feature group.
- Parameters:
- Returns:
Updated feature group object.
- Return type:
- export_feature_group_version_to_file_connector(feature_group_version, location, export_file_format, overwrite=False)
Export Feature group to File Connector.
- Parameters:
feature_group_version (str) – Unique string identifier for the feature group instance to export.
location (str) – Cloud file location to export to.
export_file_format (str) – Enum string specifying the file format to export to.
overwrite (bool) – If true and a file exists at this location, this process will overwrite the file.
- Returns:
The FeatureGroupExport instance.
- Return type:
- export_feature_group_version_to_database_connector(feature_group_version, database_connector_id, object_name, write_mode, database_feature_mapping, id_column=None, additional_id_columns=None)
Export Feature group to Database Connector.
- Parameters:
feature_group_version (str) – Unique string identifier for the Feature Group instance to export.
database_connector_id (str) – Unique string identifier for the Database Connector to export to.
object_name (str) – Name of the database object to write to.
write_mode (str) – Enum string indicating whether to use INSERT or UPSERT.
database_feature_mapping (dict) – Key/value pair JSON object of “database connector column” -> “feature name” pairs.
id_column (str) – Required if write_mode is UPSERT. Indicates which database column should be used as the lookup key.
additional_id_columns (list) – For database connectors which support it, additional ID columns to use as a complex key for upserting.
- Returns:
The FeatureGroupExport instance.
- Return type:
- export_feature_group_version_to_console(feature_group_version, export_file_format)
Export Feature group to console.
- Parameters:
- Returns:
The FeatureGroupExport instance.
- Return type:
- set_feature_group_modifier_lock(feature_group_id, locked=True)
Lock a feature group to prevent modification.
- add_user_to_feature_group_modifiers(feature_group_id, email)
Adds a user to a feature group.
- add_organization_group_to_feature_group_modifiers(feature_group_id, organization_group_id)
Add OrganizationGroup to a feature group modifiers list
- remove_user_from_feature_group_modifiers(feature_group_id, email)
Removes a user from a specified feature group.
- remove_organization_group_from_feature_group_modifiers(feature_group_id, organization_group_id)
Removes an OrganizationGroup from a feature group modifiers list
- delete_feature(feature_group_id, name)
Removes a feature from the feature group.
- Parameters:
- Returns:
Updated feature group object.
- Return type:
- delete_feature_group(feature_group_id)
Deletes a Feature Group.
- Parameters:
feature_group_id (str) – Unique string identifier for the feature group to be removed.
- create_feature_group_version(feature_group_id, variable_bindings=None)
Creates a snapshot for a specified feature group.
- Parameters:
- Returns:
A feature group version.
- Return type:
- create_feature_group_template(feature_group_id, name, template_sql, template_variables, description=None, template_bindings=None, should_attach_feature_group_to_template=False)
Create a feature group template.
- Parameters:
feature_group_id (str) – Unique identifier of the feature group this template was created from.
name (str) – User-friendly name for this feature group template.
template_sql (str) – The template SQL that will be resolved by applying values from the template variables to generate SQL for a feature group.
template_variables (list) – The template variables for resolving the template.
description (str) – Description of this feature group template.
template_bindings (list) – If the feature group will be attached to the newly created template, set these variable bindings on that feature group.
should_attach_feature_group_to_template (bool) – Set to True to convert the feature group to a template feature group and attach it to the newly created template.
- Returns:
The created feature group template.
- Return type:
- delete_feature_group_template(feature_group_template_id)
Delete an existing feature group template.
- Parameters:
feature_group_template_id (str) – Unique string identifier associated with the feature group template.
- update_feature_group_template(feature_group_template_id, template_sql=None, template_variables=None, description=None, name=None)
Update a feature group template.
- Parameters:
feature_group_template_id (str) – Unique identifier of the feature group template to update.
template_sql (str) – If provided, the new value to use for the template SQL.
template_variables (list) – If provided, the new value to use for the template variables.
description (str) – Description of this feature group template.
name (str) – User-friendly name for this feature group template.
- Returns:
The updated feature group template.
- Return type:
- preview_feature_group_template_resolution(feature_group_template_id=None, template_bindings=None, template_sql=None, template_variables=None, should_validate=True)
Resolve template sql using template variables and template bindings.
- Parameters:
feature_group_template_id (str) – Unique string identifier. If specified, use this template, otherwise assume an empty template.
template_bindings (list) – Values to override the template variable values specified by the template.
template_sql (str) – If specified, use this as the template SQL instead of the feature group template’s SQL.
template_variables (list) – Template variables to use. If a template is provided, this overrides the template’s template variables.
should_validate (bool) – If true, validates the resolved SQL.
- Returns:
The resolved template
- Return type:
- cancel_upload(upload_id)
Cancels an upload.
- Parameters:
upload_id (str) – A unique string identifier for the upload.
- upload_part(upload_id, part_number, part_data)
Uploads part of a large dataset file from your bucket to our system. Our system currently supports parts of up to 5GB and full files of up to 5TB. Note that each part must be at least 5MB in size, unless it is the last part in the sequence of parts for the full file.
- Parameters:
upload_id (str) – A unique identifier for this upload.
part_number (int) – The 1-indexed number denoting the position of the file part in the sequence of parts for the full file.
part_data (io.TextIOBase) – The multipart/form-data for the current part of the full file.
- Returns:
The object ‘UploadPart’ which encapsulates the hash and the etag for the part that got uploaded.
- Return type:
- mark_upload_complete(upload_id)
Marks an upload process as complete.
- create_dataset_from_file_connector(table_name, location, file_format=None, refresh_schedule=None, csv_delimiter=None, filename_column=None, start_prefix=None, until_prefix=None, location_date_format=None, date_format_lookback_days=None, incremental=False, is_documentset=False, extract_bounding_boxes=False, merge_file_schemas=False, reference_only_documentset=False, parsing_config=None)
Creates a dataset from a file located in a cloud storage, such as Amazon AWS S3, using the specified dataset name and location.
- Parameters:
table_name (str) – Organization-unique table name or the name of the feature group table to create using the source table.
location (str) – The URI location format of the dataset source. The URI location format needs to be specified to match the location_date_format when location_date_format is specified. For example, Location = s3://bucket1/dir1/dir2/event_date=YYYY-MM-DD/* when location_date_format is specified. The URI location format needs to include both the start_prefix and until_prefix when both are specified. For example, Location s3://bucket1/dir1/* includes both s3://bucket1/dir1/dir2/event_date=2021-08-02/* and s3://bucket1/dir1/dir2/event_date=2021-08-08/*
file_format (str) – The file format of the dataset.
refresh_schedule (str) – The Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. The time is specified in UTC.
csv_delimiter (str) – If the file format is CSV, use a specific csv delimiter.
filename_column (str) – Adds a new column to the dataset with the external URI path.
start_prefix (str) – The start prefix (inclusive) for a range based search on a cloud storage location URI.
until_prefix (str) – The end prefix (exclusive) for a range based search on a cloud storage location URI.
location_date_format (str) – The date format in which the data is partitioned in the cloud storage location. For example, if the data is partitioned as s3://bucket1/dir1/dir2/event_date=YYYY-MM-DD/dir4/filename.parquet, then the location_date_format is YYYY-MM-DD. This format needs to be consistent across all files within the specified location.
date_format_lookback_days (int) – The number of days to look back from the current day for import locations that are date partitioned. For example, import date 2021-06-04 with date_format_lookback_days = 3 will retrieve data for all the dates in the range [2021-06-02, 2021-06-04].
incremental (bool) – Signifies if the dataset is an incremental dataset.
is_documentset (bool) – Signifies if the dataset is docstore dataset. A docstore dataset contains documents like images, PDFs, audio files etc. or is tabular data with links to such files.
extract_bounding_boxes (bool) – Signifies whether to extract bounding boxes out of the documents. Only valid if is_documentset if True.
merge_file_schemas (bool) – Signifies if the merge file schema policy is enabled. If is_documentset is True, this is also set to True by default.
reference_only_documentset (bool) – Signifies if the data reference only policy is enabled.
parsing_config (ParsingConfig) – Custom config for dataset parsing.
- Returns:
The dataset created.
- Return type:
- create_dataset_version_from_file_connector(dataset_id, location=None, file_format=None, csv_delimiter=None, merge_file_schemas=None, parsing_config=None)
Creates a new version of the specified dataset.
- Parameters:
dataset_id (str) – Unique string identifier associated with the dataset.
location (str) – External URI to import the dataset from. If not specified, the last location will be used.
file_format (str) – File format to be used. If not specified, the service will try to detect the file format.
csv_delimiter (str) – If the file format is CSV, use a specific CSV delimiter.
merge_file_schemas (bool) – Signifies if the merge file schema policy is enabled.
parsing_config (ParsingConfig) – Custom config for dataset parsing.
- Returns:
The new Dataset Version created.
- Return type:
- create_dataset_from_database_connector(table_name, database_connector_id, object_name=None, columns=None, query_arguments=None, refresh_schedule=None, sql_query=None, incremental=False, timestamp_column=None)
Creates a dataset from a Database Connector.
- Parameters:
table_name (str) – Organization-unique table name.
database_connector_id (str) – Unique String Identifier of the Database Connector to import the dataset from.
object_name (str) – If applicable, the name/ID of the object in the service to query.
columns (str) – The columns to query from the external service object.
query_arguments (str) – Additional query arguments to filter the data.
refresh_schedule (str) – The Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. The time is specified in UTC.
sql_query (str) – The full SQL query to use when fetching data. If present, this parameter will override object_name, columns, timestamp_column, and query_arguments.
incremental (bool) – Signifies if the dataset is an incremental dataset.
timestamp_column (str) – If dataset is incremental, this is the column name of the required column in the dataset. This column must contain timestamps in descending order which are used to determine the increments of the incremental dataset.
- Returns:
The created dataset.
- Return type:
- create_dataset_from_application_connector(table_name, application_connector_id, object_id=None, start_timestamp=None, end_timestamp=None, refresh_schedule=None)
Creates a dataset from an Application Connector.
- Parameters:
table_name (str) – Organization-unique table name
application_connector_id (str) – Unique string identifier of the application connector to download data from
object_id (str) – If applicable, the ID of the object in the service to query.
start_timestamp (int) – Unix timestamp of the start of the period that will be queried.
end_timestamp (int) – Unix timestamp of the end of the period that will be queried.
refresh_schedule (str) – Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. The time is specified in UTC.
- Returns:
The created dataset.
- Return type:
- create_dataset_version_from_database_connector(dataset_id, object_name=None, columns=None, query_arguments=None, sql_query=None)
Creates a new version of the specified dataset.
- Parameters:
dataset_id (str) – The unique ID associated with the dataset.
object_name (str) – The name/ID of the object in the service to query. If not specified, the last name will be used.
columns (str) – The columns to query from the external service object. If not specified, the last columns will be used.
query_arguments (str) – Additional query arguments to filter the data. If not specified, the last arguments will be used.
sql_query (str) – The full SQL query to use when fetching data. If present, this parameter will override object_name, columns, and query_arguments.
- Returns:
The new Dataset Version created.
- Return type:
- create_dataset_version_from_application_connector(dataset_id, object_id=None, start_timestamp=None, end_timestamp=None)
Creates a new version of the specified dataset.
- Parameters:
dataset_id (str) – The unique ID associated with the dataset.
object_id (str) – The ID of the object in the service to query. If not specified, the last name will be used.
start_timestamp (int) – The Unix timestamp of the start of the period that will be queried.
end_timestamp (int) – The Unix timestamp of the end of the period that will be queried.
- Returns:
The new Dataset Version created.
- Return type:
- create_dataset_from_upload(table_name, file_format=None, csv_delimiter=None, is_documentset=False, extract_bounding_boxes=False, parsing_config=None, merge_file_schemas=False)
Creates a dataset and returns an upload ID that can be used to upload a file.
- Parameters:
table_name (str) – Organization-unique table name for this dataset.
file_format (str) – The file format of the dataset.
csv_delimiter (str) – If the file format is CSV, use a specific CSV delimiter.
is_documentset (bool) – Signifies if the dataset is a docstore dataset. A docstore dataset contains documents like images, PDFs, audio files etc. or is tabular data with links to such files.
extract_bounding_boxes (bool) – Signifies whether to extract bounding boxes out of the documents. Only valid if is_documentset if True.
parsing_config (ParsingConfig) – Custom config for dataset parsing.
merge_file_schemas (bool) – Signifies whether to merge the schemas of all files in the dataset. If is_documentset is True, this is also set to True by default.
- Returns:
A reference to be used when uploading file parts.
- Return type:
- create_dataset_version_from_upload(dataset_id, file_format=None)
Creates a new version of the specified dataset using a local file upload.
- create_streaming_dataset(table_name)
Creates a streaming dataset. Use a streaming dataset if your dataset is receiving information from multiple sources over an extended period of time.
- snapshot_streaming_data(dataset_id)
Snapshots the current data in the streaming dataset.
- Parameters:
dataset_id (str) – The unique ID associated with the dataset.
- Returns:
The new Dataset Version created by taking a snapshot of the current data in the streaming dataset.
- Return type:
- set_dataset_column_data_type(dataset_id, column, data_type)
Set a Dataset’s column type.
- Parameters:
dataset_id (str) – The unique ID associated with the dataset.
column (str) – The name of the column.
data_type (str) – The type of the data in the column. Refer to the [guide on data types](https://api.abacus.ai/app/help/class/DataType) for more information. Note: Some ColumnMappings may restrict the options or explicitly set the DataType.
- Returns:
The dataset and schema after the data type has been set.
- Return type:
- create_dataset_from_streaming_connector(table_name, streaming_connector_id, streaming_args=None, refresh_schedule=None)
Creates a dataset from a Streaming Connector
- Parameters:
table_name (str) – Organization-unique table name
streaming_connector_id (str) – Unique String Identifier for the Streaming Connector to import the dataset from
streaming_args (dict) – Dictionary of arguments to read data from the streaming connector
refresh_schedule (str) – Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. Time is specified in UTC.
- Returns:
The created dataset.
- Return type:
- set_streaming_retention_policy(dataset_id, retention_hours=None, retention_row_count=None, ignore_records_before_timestamp=None)
Sets the streaming retention policy.
- Parameters:
dataset_id (str) – Unique string identifier for the streaming dataset.
retention_hours (int) – Number of hours to retain streamed data in memory.
retention_row_count (int) – Number of rows to retain streamed data in memory.
ignore_records_before_timestamp (int) – The Unix timestamp (in seconds) to use as a cutoff to ignore all entries sent before it
- rename_database_connector(database_connector_id, name)
Renames a Database Connector
- rename_application_connector(application_connector_id, name)
Renames a Application Connector
- verify_database_connector(database_connector_id)
Checks if Abacus.AI can access the specified database.
- Parameters:
database_connector_id (str) – Unique string identifier for the database connector.
- verify_file_connector(bucket)
Checks to see if Abacus.AI can access the given bucket.
- Parameters:
bucket (str) – The bucket to test.
- Returns:
The result of the verification.
- Return type:
- delete_database_connector(database_connector_id)
Delete a database connector.
- Parameters:
database_connector_id (str) – The unique identifier for the database connector.
- delete_application_connector(application_connector_id)
Delete an application connector.
- Parameters:
application_connector_id (str) – The unique identifier for the application connector.
- delete_file_connector(bucket)
Deletes a file connector
- Parameters:
bucket (str) – The fully qualified URI of the bucket to remove.
- verify_application_connector(application_connector_id)
Checks if Abacus.AI can access the application using the provided application connector ID.
- Parameters:
application_connector_id (str) – Unique string identifier for the application connector.
- set_azure_blob_connection_string(bucket, connection_string)
Authenticates the specified Azure Blob Storage bucket using an authenticated Connection String.
- Parameters:
- Returns:
An object with the roleArn and verification status for the specified bucket.
- Return type:
- verify_streaming_connector(streaming_connector_id)
Checks to see if Abacus.AI can access the streaming connector.
- Parameters:
streaming_connector_id (str) – Unique string identifier for the streaming connector to be checked for Abacus.AI access.
- rename_streaming_connector(streaming_connector_id, name)
Renames a Streaming Connector
- delete_streaming_connector(streaming_connector_id)
Delete a streaming connector.
- Parameters:
streaming_connector_id (str) – The unique identifier for the streaming connector.
- create_streaming_token()
Creates a streaming token for the specified project. Streaming tokens are used to authenticate requests when appending data to streaming datasets.
- Returns:
The generated streaming token.
- Return type:
- delete_streaming_token(streaming_token)
Deletes the specified streaming token.
- Parameters:
streaming_token (str) – The streaming token to delete.
- delete_dataset(dataset_id)
Deletes the specified dataset from the organization.
- Parameters:
dataset_id (str) – Unique string identifier of the dataset to delete.
- get_training_config_options(project_id, feature_group_ids=None, for_retrain=False, current_training_config=None)
Retrieves the full initial description of the model training configuration options available for the specified project. The configuration options available are determined by the use case associated with the specified project. Refer to the [Use Case Documentation]({USE_CASES_URL}) for more information on use cases and use case-specific configuration options.
- Parameters:
project_id (str) – The unique ID associated with the project.
feature_group_ids (list) – The feature group IDs to be used for training.
for_retrain (bool) – Whether the training config options are used for retraining.
current_training_config (TrainingConfig) – The current state of the training config, with some options set, which shall be used to get new options after refresh. This is None by default initially.
- Returns:
An array of options that can be specified when training a model in this project.
- Return type:
- create_train_test_data_split_feature_group(project_id, training_config, feature_group_ids)
Get the train and test data split without training the model. Only supported for models with custom algorithms.
- Parameters:
project_id (str) – The unique ID associated with the project.
training_config (TrainingConfig) – The training config used to influence how the split is calculated.
feature_group_ids (list) – List of feature group IDs provided by the user, including the required one for data split and others to influence how to split.
- Returns:
The feature group containing the training data and folds information.
- Return type:
- train_model(project_id, name=None, training_config=None, feature_group_ids=None, refresh_schedule=None, custom_algorithms=None, custom_algorithms_only=False, custom_algorithm_configs=None, builtin_algorithms=None, cpu_size=None, memory=None, algorithm_training_configs=None)
Train a model for the specified project.
This method trains a model in the given project, using user-specified training configurations defined in the getTrainingConfigOptions method.
- Parameters:
project_id (str) – The unique ID associated with the project.
name (str) – The name of the model. Defaults to “<Project Name> Model”.
training_config (TrainingConfig) – The training config used to train this model.
feature_group_ids (list) – List of feature group IDs provided by the user to train the model on.
refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically retrain the created model.
custom_algorithms (list) – List of user-defined algorithms to train. If not set, the default enabled custom algorithms will be used.
custom_algorithms_only (bool) – Whether to only run custom algorithms.
custom_algorithm_configs (dict) – Configs for each user-defined algorithm; key is the algorithm name, value is the config serialized to JSON.
builtin_algorithms (list) – List of IDs of the builtin algorithms provided by Abacus.AI to train. If not set, all applicable builtin algorithms will be used.
cpu_size (str) – Size of the CPU for the user-defined algorithms during training.
memory (int) – Memory (in GB) for the user-defined algorithms during training.
algorithm_training_configs (list) – List of algorithm specifc training configs that will be part of the model training AutoML run.
- Returns:
The new model which is being trained.
- Return type:
- create_model_from_python(project_id, function_source_code, train_function_name, training_input_tables, predict_function_name=None, predict_many_function_name=None, initialize_function_name=None, name=None, cpu_size=None, memory=None, training_config=None, exclusive_run=False, package_requirements=None, use_gpu=False)
Initializes a new Model from user-provided Python code. If a list of input feature groups is supplied, they will be provided as arguments to the train and predict functions with the materialized feature groups for those input feature groups.
This method expects functionSourceCode to be a valid language source file which contains the functions named trainFunctionName and predictFunctionName. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well-defined return type, as it returns the prediction made by the predictFunctionName, which can be anything.
- Parameters:
project_id (str) – The unique ID associated with the project.
function_source_code (str) – Contents of a valid Python source code file. The source code should contain the functions named trainFunctionName and predictFunctionName. A list of allowed import and system libraries for each language is specified in the user functions documentation section.
train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.
training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
predict_function_name (str) – Name of the function found in the source code that will be executed to run predictions through the model. It is not executed when this function is run.
predict_many_function_name (str) – Name of the function found in the source code that will be executed for batch prediction of the model. It is not executed when this function is run.
initialize_function_name (str) – Name of the function found in the source code to initialize the trained model before using it to make predictions using the model
name (str) – The name you want your model to have. Defaults to “<Project Name> Model”
cpu_size (str) – Size of the CPU for the model training function
memory (int) – Memory (in GB) for the model training function
training_config (TrainingConfig) – Training configuration
exclusive_run (bool) – Decides if this model will be run exclusively or along with other Abacus.ai algorithms
package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’]
use_gpu (bool) – Whether this model needs gpu
- Returns:
The new model, which has not been trained.
- Return type:
- rename_model(model_id, name)
Renames a model
- update_python_model(model_id, function_source_code=None, train_function_name=None, predict_function_name=None, predict_many_function_name=None, initialize_function_name=None, training_input_tables=None, cpu_size=None, memory=None, package_requirements=None, use_gpu=None)
Updates an existing Python Model using user-provided Python code. If a list of input feature groups is supplied, they will be provided as arguments to the train and predict functions with the materialized feature groups for those input feature groups.
This method expects functionSourceCode to be a valid language source file which contains the functions named trainFunctionName and predictFunctionName. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName. predictFunctionName has no well-defined return type, as it returns the prediction made by the predictFunctionName, which can be anything.
- Parameters:
model_id (str) – The unique ID associated with the Python model to be changed.
function_source_code (str) – Contents of a valid Python source code file. The source code should contain the functions named trainFunctionName and predictFunctionName. A list of allowed import and system libraries for each language is specified in the user functions documentation section.
train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.
predict_function_name (str) – Name of the function found in the source code that will be executed to run predictions through the model. It is not executed when this function is run.
predict_many_function_name (str) – Name of the function found in the source code that will be executed to run batch predictions through the model. It is not executed when this function is run.
initialize_function_name (str) – Name of the function found in the source code to initialize the trained model before using it to make predictions using the model.
training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized DataFrames (same type as the functions return value).
cpu_size (str) – Size of the CPU for the model training function.
memory (int) – Memory (in GB) for the model training function.
package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].
use_gpu (bool) – Whether this model needs gpu
- Returns:
The updated model.
- Return type:
- update_python_model_zip(model_id, train_function_name=None, predict_function_name=None, predict_many_function_name=None, train_module_name=None, predict_module_name=None, training_input_tables=None, cpu_size=None, memory=None, package_requirements=None, use_gpu=None)
Updates an existing Python Model using a provided zip file. If a list of input feature groups are supplied, they will be provided as arguments to the train and predict functions with the materialized feature groups for those input feature groups.
This method expects trainModuleName and predictModuleName to be valid language source files which contain the functions named trainFunctionName and predictFunctionName, respectively. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName, and predictFunctionName has no well-defined return type, as it returns the prediction made by the predictFunctionName, which can be anything.
- Parameters:
model_id (str) – The unique ID associated with the Python model to be changed.
train_function_name (str) – Name of the function found in the train module that will be executed to train the model. It is not executed when this function is run.
predict_function_name (str) – Name of the function found in the predict module that will be executed to run predictions through the model. It is not executed when this function is run.
predict_many_function_name (str) – Name of the function found in the predict module that will be executed to run batch predictions through the model. It is not executed when this function is run.
train_module_name (str) – Full path of the module that contains the train function from the root of the zip.
predict_module_name (str) – Full path of the module that contains the predict function from the root of the zip.
training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the function’s return value).
cpu_size (str) – Size of the CPU for the model training function.
memory (int) – Memory (in GB) for the model training function.
package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].
use_gpu (bool) – Whether this model needs gpu
- Returns:
The updated model.
- Return type:
- update_python_model_git(model_id, application_connector_id=None, branch_name=None, python_root=None, train_function_name=None, predict_function_name=None, predict_many_function_name=None, train_module_name=None, predict_module_name=None, training_input_tables=None, cpu_size=None, memory=None, use_gpu=None)
Updates an existing Python model using an existing Git application connector. If a list of input feature groups are supplied, these will be provided as arguments to the train and predict functions with the materialized feature groups for those input feature groups.
This method expects trainModuleName and predictModuleName to be valid language source files which contain the functions named trainFunctionName and predictFunctionName, respectively. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName, and predictFunctionName has no well-defined return type, as it returns the prediction made by the predictFunctionName, which can be anything.
- Parameters:
model_id (str) – The unique ID associated with the Python model to be changed.
application_connector_id (str) – The unique ID associated with the Git application connector.
branch_name (str) – Name of the branch in the Git repository to be used for training.
python_root (str) – Path from the top level of the Git repository to the directory containing the Python source code. If not provided, the default is the root of the Git repository.
train_function_name (str) – Name of the function found in train module that will be executed to train the model. It is not executed when this function is run.
predict_function_name (str) – Name of the function found in the predict module that will be executed to run predictions through model. It is not executed when this function is run.
predict_many_function_name (str) – Name of the function found in the predict module that will be executed to run batch predictions through model. It is not executed when this function is run.
train_module_name (str) – Full path of the module that contains the train function from the root of the zip.
predict_module_name (str) – Full path of the module that contains the predict function from the root of the zip.
training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
cpu_size (str) – Size of the CPU for the model training function.
memory (int) – Memory (in GB) for the model training function.
use_gpu (bool) – Whether this model needs gpu
- Returns:
The updated model.
- Return type:
- set_model_training_config(model_id, training_config, feature_group_ids=None)
Edits the default model training config
- Parameters:
model_id (str) – A unique string identifier of the model to update.
training_config (TrainingConfig) – The training config used to train this model.
feature_group_ids (list) – The list of feature groups used as input to the model.
- Returns:
The model object corresponding to the updated training config.
- Return type:
- set_model_objective(model_version, metric=None)
Sets the best model for all model instances of the model based on the specified metric, and updates the training configuration to use the specified metric for any future model versions.
If metric is set to None, then just use the default selection
- set_model_prediction_params(model_id, prediction_config)
Sets the model prediction config for the model
- retrain_model(model_id, deployment_ids=None, feature_group_ids=None, custom_algorithms=None, builtin_algorithms=None, custom_algorithm_configs=None, cpu_size=None, memory=None, training_config=None, algorithm_training_configs=None)
Retrains the specified model, with an option to choose the deployments to which the retraining will be deployed.
- Parameters:
model_id (str) – Unique string identifier of the model to retrain.
deployment_ids (list) – List of unique string identifiers of deployments to automatically deploy to.
feature_group_ids (list) – List of feature group IDs provided by the user to train the model on.
custom_algorithms (list) – List of user-defined algorithms to train. If not set, will honor the runs from the last time and applicable new custom algorithms.
builtin_algorithms (list) – List of the built-in algorithms provided by Abacus.AI to train. If not set, will honor the runs from the last time and applicable new built-in algorithms.
custom_algorithm_configs (dict) – User-defined training configs for each custom algorithm.
cpu_size (str) – Size of the CPU for the user-defined algorithms during training.
memory (int) – Memory (in GB) for the user-defined algorithms during training.
training_config (TrainingConfig) – The training config used to train this model.
algorithm_training_configs (list) – List of algorithm specifc training configs that will be part of the model training AutoML run.
- Returns:
The model that is being retrained.
- Return type:
- delete_model(model_id)
Deletes the specified model and all its versions. Models which are currently used in deployments cannot be deleted.
- Parameters:
model_id (str) – Unique string identifier of the model to delete.
- delete_model_version(model_version)
Deletes the specified model version. Model versions which are currently used in deployments cannot be deleted.
- Parameters:
model_version (str) – The unique identifier of the model version to delete.
- export_model_artifact_as_feature_group(model_version, table_name, artifact_type)
Exports metric artifact data for a model as a feature group.
- Parameters:
- Returns:
The created feature group.
- Return type:
- set_default_model_algorithm(model_id=None, algorithm=None, data_cluster_type=None)
Sets the model’s algorithm to default for all new deployments
- get_custom_train_function_info(project_id, feature_group_names_for_training=None, training_data_parameter_name_override=None, training_config=None, custom_algorithm_config=None)
Returns information about how to call the custom train function.
- Parameters:
project_id (str) – The unique version ID of the project.
feature_group_names_for_training (list) – A list of feature group table names to be used for training.
training_data_parameter_name_override (dict) – Override from feature group type to parameter name in the train function.
training_config (TrainingConfig) – Training config for the options supported by the Abacus.ai platform.
custom_algorithm_config (dict) – User-defined config that can be serialized by JSON.
- Returns:
Information about how to call the customer-provided train function.
- Return type:
- export_custom_model_version(model_version, output_location, algorithm=None)
Bundle custom model artifacts to a zip file, and export to the specified location.
- Parameters:
- Returns:
Object describing the export and its status.
- Return type:
- create_model_monitor(project_id, prediction_feature_group_id, training_feature_group_id=None, name=None, refresh_schedule=None, target_value=None, target_value_bias=None, target_value_performance=None, feature_mappings=None, model_id=None, training_feature_mappings=None, feature_group_base_monitor_config=None, feature_group_comparison_monitor_config=None)
Runs a model monitor for the specified project.
- Parameters:
project_id (str) – The unique ID associated with the project.
prediction_feature_group_id (str) – The unique ID of the prediction data feature group.
training_feature_group_id (str) – The unique ID of the training data feature group.
name (str) – The name you want your model monitor to have. Defaults to “<Project Name> Model Monitor”.
refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically retrain the created model monitor.
target_value (str) – A target positive value for the label to compute bias and PR/AUC for performance page.
target_value_bias (str) – A target positive value for the label to compute bias.
target_value_performance (str) – A target positive value for the label to compute PR curve/AUC for performance page.
feature_mappings (dict) – A JSON map to override features for prediction_feature_group, where keys are column names and the values are feature data use types.
model_id (str) – The unique ID of the model.
training_feature_mappings (dict) – A JSON map to override features for training_fature_group, where keys are column names and the values are feature data use types.
feature_group_base_monitor_config (dict) – Selection strategy for the feature_group 1 with the feature group version if selected.
feature_group_comparison_monitor_config (dict) – Selection strategy for the feature_group 1 with the feature group version if selected.
- Returns:
The new model monitor that was created.
- Return type:
- rerun_model_monitor(model_monitor_id)
Re-runs the specified model monitor.
- Parameters:
model_monitor_id (str) – Unique string identifier of the model monitor to re-run.
- Returns:
The model monitor that is being re-run.
- Return type:
- rename_model_monitor(model_monitor_id, name)
Renames a model monitor
- delete_model_monitor(model_monitor_id)
Deletes the specified Model Monitor and all its versions.
- Parameters:
model_monitor_id (str) – Unique identifier of the Model Monitor to delete.
- delete_model_monitor_version(model_monitor_version)
Deletes the specified model monitor version.
- Parameters:
model_monitor_version (str) – Unique identifier of the model monitor version to delete.
- create_vision_drift_monitor(project_id, prediction_feature_group_id, training_feature_group_id, name, feature_mappings, training_feature_mappings, target_value_performance=None, refresh_schedule=None)
Runs a vision drift monitor for the specified project.
- Parameters:
project_id (str) – Unique string identifier of the project.
prediction_feature_group_id (str) – Unique string identifier of the prediction data feature group.
training_feature_group_id (str) – Unique string identifier of the training data feature group.
name (str) – The name you want your model monitor to have. Defaults to “<Project Name> Model Monitor”.
feature_mappings (dict) – A JSON map to override features for prediction_feature_group, where keys are column names and the values are feature data use types.
training_feature_mappings (dict) – A JSON map to override features for training_feature_group, where keys are column names and the values are feature data use types.
target_value_performance (str) – A target positive value for the label to compute precision-recall curve/area under curve for performance page.
refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically rerun the created vision drift monitor.
- Returns:
The new model monitor that was created.
- Return type:
- create_nlp_drift_monitor(project_id, prediction_feature_group_id, training_feature_group_id, name, feature_mappings, training_feature_mappings, target_value_performance=None, refresh_schedule=None)
Runs an NLP drift monitor for the specified project.
- Parameters:
project_id (str) – Unique string identifier of the project.
prediction_feature_group_id (str) – Unique string identifier of the prediction data feature group.
training_feature_group_id (str) – Unique string identifier of the training data feature group.
name (str) – The name you want your model monitor to have. Defaults to “<Project Name> Model Monitor”.
feature_mappings (dict) – A JSON map to override features for prediction_feature_group, where keys are column names and the values are feature data use types.
training_feature_mappings (dict) – A JSON map to override features for training_feature_group, where keys are column names and the values are feature data use types.
target_value_performance (str) – A target positive value for the label to compute precision-recall curve/area under curve for performance page.
refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically rerun the created nlp drift monitor.
- Returns:
The new model monitor that was created.
- Return type:
- create_forecasting_monitor(project_id, name, prediction_feature_group_id, training_feature_group_id, training_forecast_config, prediction_forecast_config, forecast_frequency=None, refresh_schedule=None)
Runs a forecasting monitor for the specified project.
- Parameters:
project_id (str) – Unique string identifier of the project.
name (str) – The name you want your model monitor to have. Defaults to “<Project Name> Model Monitor”.
prediction_feature_group_id (str) – Unique string identifier of the prediction data feature group.
training_feature_group_id (str) – Unique string identifier of the training data feature group.
training_forecast_config (ForecastingMonitorConfig) – The configuration for the training data.
prediction_forecast_config (ForecastingMonitorConfig) – The configuration for the prediction data.
forecast_frequency (str) – The frequency of the forecast. Defaults to the frequency of the prediction data.
refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically rerun the created forecasting monitor.
- Returns:
The new model monitor that was created.
- Return type:
- create_eda(project_id, feature_group_id, name, refresh_schedule=None, include_collinearity=False, include_data_consistency=False, collinearity_keys=None, primary_keys=None, data_consistency_test_config=None, data_consistency_reference_config=None, feature_mappings=None, forecast_frequency=None)
Run an Exploratory Data Analysis (EDA) for the specified project.
- Parameters:
project_id (str) – The unique ID associated with the project.
feature_group_id (str) – The unique ID of the prediction data feature group.
name (str) – The name you want your model monitor to have. Defaults to “<Project Name> EDA”.
refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically retrain the created EDA.
include_collinearity (bool) – Set to True if the EDA type is collinearity.
include_data_consistency (bool) – Set to True if the EDA type is data consistency.
collinearity_keys (list) – List of features to use for collinearity
primary_keys (list) – List of features that corresponds to the primary keys or item ids for the given feature group for Data Consistency analysis or Forecasting analysis respectively.
data_consistency_test_config (dict) – Test feature group version selection strategy for Data Consistency EDA type.
data_consistency_reference_config (dict) – Reference feature group version selection strategy for Data Consistency EDA type.
feature_mappings (dict) – A JSON map to override features for the given feature_group, where keys are column names and the values are feature data use types. (In forecasting, used to set the timestamp column and target value)
forecast_frequency (str) – The frequency of the data. It can be either HOURLY, DAILY, WEEKLY, MONTHLY, QUARTERLY, YEARLY.
- Returns:
The new EDA object that was created.
- Return type:
- rerun_eda(eda_id)
Reruns the specified EDA object.
- rename_eda(eda_id, name)
Renames an EDA
- delete_eda(eda_id)
Deletes the specified EDA and all its versions.
- Parameters:
eda_id (str) – Unique string identifier of the EDA to delete.
- delete_eda_version(eda_version)
Deletes the specified EDA version.
- Parameters:
eda_version (str) – Unique string identifier of the EDA version to delete.
- create_holdout_analysis(name, model_id, feature_group_ids, model_version=None, algorithm=None)
Create a holdout analysis for a model
- Parameters:
name (str) – Name of the holdout analysis
model_id (str) – ID of the model to create a holdout analysis for
feature_group_ids (list) – List of feature group IDs to use for the holdout analysis
model_version (str) – (optional) Version of the model to use for the holdout analysis
algorithm (str) – (optional) ID of algorithm to use for the holdout analysis
- Returns:
The created holdout analysis
- Return type:
- rerun_holdout_analysis(holdout_analysis_id, model_version=None, algorithm=None)
Rerun a holdout analysis. A different model version and algorithm can be specified which should be under the same model.
- Parameters:
- Returns:
The created holdout analysis version
- Return type:
- create_monitor_alert(project_id, model_monitor_id, alert_name, condition_config, action_config)
Create a monitor alert for the given conditions and monitor
- Parameters:
project_id (str) – Unique string identifier for the project.
model_monitor_id (str) – Unique string identifier for the model monitor created under the project.
alert_name (str) – Name of the alert.
condition_config (AlertConditionConfig) – Condition to run the actions for the alert.
action_config (AlertActionConfig) – Configuration for the action of the alert.
- Returns:
Object describing the monitor alert.
- Return type:
- update_monitor_alert(monitor_alert_id, alert_name=None, condition_config=None, action_config=None)
Update monitor alert
- Parameters:
monitor_alert_id (str) – Unique identifier of the monitor alert.
alert_name (str) – Name of the alert.
condition_config (AlertConditionConfig) – Condition to run the actions for the alert.
action_config (AlertActionConfig) – Configuration for the action of the alert.
- Returns:
Object describing the monitor alert.
- Return type:
- run_monitor_alert(monitor_alert_id)
Reruns a given monitor alert from latest monitor instance
- Parameters:
monitor_alert_id (str) – Unique identifier of a monitor alert.
- Returns:
Object describing the monitor alert.
- Return type:
- delete_monitor_alert(monitor_alert_id)
Delets a monitor alert
- Parameters:
monitor_alert_id (str) – The unique string identifier of the alert to delete.
- create_prediction_operator(name, project_id, source_code=None, predict_function_name=None, initialize_function_name=None, feature_group_ids=None, cpu_size=None, memory=None, package_requirements=None, use_gpu=False)
Create a new prediction operator.
- Parameters:
name (str) – Name of the prediction operator.
project_id (str) – The unique ID of the associated project.
source_code (str) – Contents of a valid Python source code file. The source code should contain the function predictFunctionName, and the function ‘initializeFunctionName’ if defined.
predict_function_name (str) – Name of the function found in the source code that will be executed to run predictions.
initialize_function_name (str) – Name of the optional initialize function found in the source code. This function will generate anything used by predictions, based on input feature groups.
feature_group_ids (list) – List of feature groups that are supplied to the initialize function as parameters. Each of the parameters are materialized Dataframes. The order should match the initialize function’s parameters.
cpu_size (str) – Size of the CPU for the prediction operator.
memory (int) – Memory (in GB) for the prediction operator.
package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’]
use_gpu (bool) – Whether this prediction operator needs gpu. Returns
- Return type:
- update_prediction_operator(prediction_operator_id, name=None, feature_group_ids=None, source_code=None, initialize_function_name=None, predict_function_name=None, cpu_size=None, memory=None, package_requirements=None, use_gpu=None)
Update an existing prediction operator.
- Parameters:
prediction_operator_id (str) – The unique ID of the prediction operator.
name (str) – Name of the prediction operator.
feature_group_ids (list) – List of feature groups that are supplied to the initialize function as parameters. Each of the parameters are materialized Dataframes. The order should match the initialize function’s parameters.
source_code (str) – Contents of a valid Python source code file. The source code should contain the function predictFunctionName, and the function ‘initializeFunctionName’ if defined.
initialize_function_name (str) – Name of the optional initialize function found in the source code. This function will generate anything used by predictions, based on input feature groups.
predict_function_name (str) – Name of the function found in the source code that will be executed to run predictions.
cpu_size (str) – Size of the CPU for the prediction operator.
memory (int) – Memory (in GB) for the prediction operator.
package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’]
use_gpu (bool) – Whether this prediction operator needs gpu. Returns
- Return type:
- delete_prediction_operator(prediction_operator_id)
Delete an existing prediction operator.
- Parameters:
prediction_operator_id (str) – The unique ID of the prediction operator.
- deploy_prediction_operator(prediction_operator_id, auto_deploy=True)
Deploy the prediction operator.
- Parameters:
- Returns:
The created deployment object.
- Return type:
- create_prediction_operator_version(prediction_operator_id)
Create a new version of the prediction operator.
- Parameters:
prediction_operator_id (str) – The unique ID of the prediction operator.
- Returns:
The created prediction operator version object.
- Return type:
- delete_prediction_operator_version(prediction_operator_version)
Delete a prediction operator version.
- Parameters:
prediction_operator_version (str) – The unique ID of the prediction operator version.
- Return type:
- create_deployment(name=None, model_id=None, model_version=None, algorithm=None, feature_group_id=None, project_id=None, description=None, calls_per_second=None, auto_deploy=True, start=True, enable_batch_streaming_updates=False, skip_metrics_check=False, model_deployment_config=None, deployment_config=None)
Creates a deployment with the specified name and description for the specified model or feature group.
A Deployment makes the trained model or feature group available for prediction requests.
- Parameters:
name (str) – The name of the deployment.
model_id (str) – The unique ID associated with the model.
model_version (str) – The unique ID associated with the model version to deploy.
algorithm (str) – The unique ID associated with the algorithm to deploy.
feature_group_id (str) – The unique ID associated with a feature group.
project_id (str) – The unique ID associated with a project.
description (str) – The description for the deployment.
calls_per_second (int) – The number of calls per second the deployment can handle.
auto_deploy (bool) – Flag to enable the automatic deployment when a new Model Version finishes training.
start (bool) – If true, will start the deployment; otherwise will create offline
enable_batch_streaming_updates (bool) – Flag to enable marking the feature group deployment to have a background process cache streamed in rows for quicker lookup.
skip_metrics_check (bool) – Flag to skip metric regression with this current deployment
model_deployment_config (dict) – The deployment config for model to deploy
deployment_config (dict) – Additional parameters specific to different use_cases
- Returns:
The new model or feature group deployment.
- Return type:
- create_deployment_token(project_id, name=None)
Creates a deployment token for the specified project.
Deployment tokens are used to authenticate requests to the prediction APIs and are scoped to the project level.
- Parameters:
- Returns:
The deployment token.
- Return type:
- update_deployment(deployment_id, description=None, auto_deploy=None, skip_metrics_check=None)
Updates a deployment’s properties.
- Parameters:
deployment_id (str) – Unique identifier of the deployment to update.
description (str) – The new description for the deployment.
auto_deploy (bool) – Flag to enable the automatic deployment when a new Model Version finishes training.
skip_metrics_check (bool) – Flag to skip metric regression with this current deployment. This field is only relevant when auto_deploy is on
- rename_deployment(deployment_id, name)
Updates a deployment’s name
- set_auto_deployment(deployment_id, enable=None)
Enable or disable auto deployment for the specified deployment.
When a model is scheduled to retrain, deployments with auto deployment enabled will be marked to automatically promote the new model version. After the newly trained model completes, a check on its metrics in comparison to the currently deployed model version will be performed. If the metrics are comparable or better, the newly trained model version is automatically promoted. If not, it will be marked as a failed model version promotion with an error indicating poor metrics performance.
- set_deployment_model_version(deployment_id, model_version, algorithm=None, model_deployment_config=None)
Promotes a model version and/or algorithm to be the active served deployment version
- Parameters:
deployment_id (str) – A unique identifier for the deployment.
model_version (str) – A unique identifier for the model version.
algorithm (str) – The algorithm to use for the model version. If not specified, the algorithm will be inferred from the model version.
model_deployment_config (dict) – The deployment configuration for the model to deploy.
- set_deployment_feature_group_version(deployment_id, feature_group_version)
Promotes a feature group version to be served in the deployment.
- set_deployment_prediction_operator_version(deployment_id, prediction_operator_version)
Promotes a prediction operator version to be served in the deployment.
- start_deployment(deployment_id)
Restarts the specified deployment that was previously suspended.
- Parameters:
deployment_id (str) – A unique string identifier associated with the deployment.
- stop_deployment(deployment_id)
Stops the specified deployment.
- Parameters:
deployment_id (str) – Unique string identifier of the deployment to be stopped.
- delete_deployment(deployment_id)
Deletes the specified deployment. The deployment’s models will not be affected. Note that the deployments are not recoverable after they are deleted.
- Parameters:
deployment_id (str) – Unique string identifier of the deployment to delete.
- delete_deployment_token(deployment_token)
Deletes the specified deployment token.
- Parameters:
deployment_token (str) – The deployment token to delete.
- set_deployment_feature_group_export_file_connector_output(deployment_id, file_format=None, output_location=None)
Sets the export output for the Feature Group Deployment to be a file connector.
- set_deployment_feature_group_export_database_connector_output(deployment_id, database_connector_id, object_name, write_mode, database_feature_mapping, id_column=None, additional_id_columns=None)
Sets the export output for the Feature Group Deployment to a Database connector.
- Parameters:
deployment_id (str) – The ID of the deployment for which the export type is set.
database_connector_id (str) – The unique string identifier of the database connector used.
object_name (str) – The object of the database connector to write to.
write_mode (str) – The write mode to use when writing to the database connector, either UPSERT or INSERT.
database_feature_mapping (dict) – The column/feature pairs mapping the features to the database columns.
id_column (str) – The id column to use as the upsert key.
additional_id_columns (list) – For database connectors which support it, a list of additional ID columns to use as a complex key for upserting.
- remove_deployment_feature_group_export_output(deployment_id)
Removes the export type that is set for the Feature Group Deployment
- Parameters:
deployment_id (str) – The ID of the deployment for which the export type is set.
- create_refresh_policy(name, cron, refresh_type, project_id=None, dataset_ids=[], feature_group_id=None, model_ids=[], deployment_ids=[], batch_prediction_ids=[], prediction_metric_ids=[], model_monitor_ids=[], notebook_id=None, prediction_operator_id=None, feature_group_export_config=None)
Creates a refresh policy with a particular cron pattern and refresh type.
A refresh policy allows for the scheduling of a set of actions at regular intervals. This can be useful for periodically updating data that needs to be re-imported into the project for retraining.
- Parameters:
name (str) – The name of the refresh policy.
cron (str) – A cron-like string specifying the frequency of the refresh policy.
refresh_type (str) – The refresh type used to determine what is being refreshed, such as a single dataset, dataset and model, or more.
project_id (str) – Optionally, a project ID can be specified so that all datasets, models, deployments, batch predictions, prediction metrics, model monitrs, and notebooks are captured at the instant the policy was created.
dataset_ids (list) – Comma-separated list of dataset IDs.
feature_group_id (str) – Feature Group ID associated with refresh policy.
model_ids (list) – Comma-separated list of model IDs.
deployment_ids (list) – Comma-separated list of deployment IDs.
batch_prediction_ids (list) – Comma-separated list of batch prediction IDs.
prediction_metric_ids (list) – Comma-separated list of prediction metric IDs.
model_monitor_ids (list) – Comma-separated list of model monitor IDs.
notebook_id (str) – Notebook ID associated with refresh policy.
prediction_operator_id (str) – Prediction Operator ID associated with refresh policy.
feature_group_export_config (FeatureGroupExportConfig) – Feature group export configuration.
- Returns:
The created refresh policy.
- Return type:
- delete_refresh_policy(refresh_policy_id)
Delete a refresh policy.
- Parameters:
refresh_policy_id (str) – Unique string identifier associated with the refresh policy to delete.
- pause_refresh_policy(refresh_policy_id)
Pauses a refresh policy
- Parameters:
refresh_policy_id (str) – Unique identifier associated with the refresh policy to be paused.
- resume_refresh_policy(refresh_policy_id)
Resumes a refresh policy
- Parameters:
refresh_policy_id (str) – The unique ID associated with this refresh policy.
- run_refresh_policy(refresh_policy_id)
Force a run of the refresh policy.
- Parameters:
refresh_policy_id (str) – Unique string identifier associated with the refresh policy to be run.
- update_refresh_policy(refresh_policy_id, name=None, cron=None, feature_group_export_config=None)
Update the name or cron string of a refresh policy
- Parameters:
refresh_policy_id (str) – Unique string identifier associated with the refresh policy.
name (str) – Name of the refresh policy to be updated.
cron (str) – Cron string describing the schedule from the refresh policy to be updated.
feature_group_export_config (FeatureGroupExportConfig) – Feature group export configuration to update a feature group refresh policy.
- Returns:
Updated refresh policy.
- Return type:
- lookup_features(deployment_token, deployment_id, query_data, limit_results=None, result_columns=None)
Returns the feature group deployed in the feature store project.
- Parameters:
deployment_token (str) – A deployment token used to authenticate access to created deployments. This token only authorizes predictions on deployments in this project, so it can be safely embedded inside an application or website.
deployment_id (str) – A unique identifier for a deployment created under the project.
query_data (dict) – A dictionary where the key is the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and the value is the unique value of the same entity.
limit_results (int) – If provided, will limit the number of results to the value specified.
result_columns (list) – If provided, will limit the columns present in each result to the columns specified in this list.
- Return type:
Dict
- predict(deployment_token, deployment_id, query_data)
Returns a prediction for Predictive Modeling
- Parameters:
deployment_token (str) – A deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, and is safe to embed in an application or website.
deployment_id (str) – A unique identifier for a deployment created under the project.
query_data (dict) – A dictionary where the key is the column name (e.g. a column with name ‘user_id’ in the dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed, and the value is the unique value of the same entity.
- Return type:
Dict
- predict_multiple(deployment_token, deployment_id, query_data)
Returns a list of predictions for predictive modeling.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, and is safe to embed in an application or website.
deployment_id (str) – The unique identifier for a deployment created under the project.
query_data (list) – A list of dictionaries, where the ‘key’ is the column name (e.g. a column with name ‘user_id’ in the dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed, and the ‘value’ is the unique value of the same entity.
- Return type:
Dict
- predict_from_datasets(deployment_token, deployment_id, query_data)
Returns a list of predictions for Predictive Modeling.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier for a deployment created under the project.
query_data (dict) – A dictionary where the ‘key’ is the source dataset name, and the ‘value’ is a list of records corresponding to the dataset rows.
- Return type:
Dict
- predict_lead(deployment_token, deployment_id, query_data, explain_predictions=False, explainer_type=None)
Returns the probability of a user being a lead based on their interaction with the service/product and their own attributes (e.g. income, assets, credit score, etc.). Note that the inputs to this method, wherever applicable, should be the column names in the dataset mapped to the column mappings in our system (e.g. column ‘user_id’ mapped to mapping ‘LEAD_ID’ in our system).
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – A dictionary containing user attributes and/or user’s interaction data with the product/service (e.g. number of clicks, items in cart, etc.).
explain_predictions (bool) – Will explain predictions for leads
explainer_type (str) – Type of explainer to use for explanations
- Return type:
Dict
- predict_churn(deployment_token, deployment_id, query_data)
Returns the probability of a user to churn out in response to their interactions with the item/product/service. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘churn_result’ mapped to mapping ‘CHURNED_YN’ in our system).
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where the ‘key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and the ‘value’ will be the unique value of the same entity.
- Return type:
Dict
- predict_takeover(deployment_token, deployment_id, query_data)
Returns a probability for each class label associated with the types of fraud or a ‘yes’ or ‘no’ type label for the possibility of fraud. Note that the inputs to this method, wherever applicable, will be the column names in the dataset mapped to the column mappings in our system (e.g., column ‘account_name’ mapped to mapping ‘ACCOUNT_ID’ in our system).
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – A dictionary containing account activity characteristics (e.g., login id, login duration, login type, IP address, etc.).
- Return type:
Dict
- predict_fraud(deployment_token, deployment_id, query_data)
Returns the probability of a transaction performed under a specific account being fraudulent or not. Note that the inputs to this method, wherever applicable, should be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_number’ mapped to the mapping ‘ACCOUNT_ID’ in our system).
- Parameters:
deployment_token (str) – A deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique identifier to a deployment created under the project.
query_data (dict) – A dictionary containing transaction attributes (e.g. credit card type, transaction location, transaction amount, etc.).
- Return type:
Dict
- predict_class(deployment_token, deployment_id, query_data, threshold=None, threshold_class=None, thresholds=None, explain_predictions=False, fixed_features=None, nested=None, explainer_type=None)
Returns a classification prediction
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model within an application or website.
deployment_id (str) – The unique identifier for a deployment created under the project.
query_data (dict) – A dictionary where the ‘Key’ is the column name (e.g. a column with the name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and the ‘Value’ is the unique value of the same entity.
threshold (float) – A float value that is applied on the popular class label.
threshold_class (str) – The label upon which the threshold is added (binary labels only).
thresholds (list) – Maps labels to thresholds (multi-label classification only). Defaults to F1 optimal threshold if computed for the given class, else uses 0.5.
explain_predictions (bool) – If True, returns the SHAP explanations for all input features.
fixed_features (list) – A set of input features to treat as constant for explanations - only honored when the explainer type is KERNEL_EXPLAINER
nested (str) – If specified generates prediction delta for each index of the specified nested feature.
explainer_type (str) – The type of explainer to use.
- Return type:
Dict
- predict_target(deployment_token, deployment_id, query_data, explain_predictions=False, fixed_features=None, nested=None, explainer_type=None)
Returns a prediction from a classification or regression model. Optionally, includes explanations.
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier of a deployment created under the project.
query_data (dict) – A dictionary where the ‘key’ is the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and the ‘value’ is the unique value of the same entity.
explain_predictions (bool) – If true, returns the SHAP explanations for all input features.
fixed_features (list) – Set of input features to treat as constant for explanations - only honored when the explainer type is KERNEL_EXPLAINER
nested (str) – If specified, generates prediction delta for each index of the specified nested feature.
explainer_type (str) – The type of explainer to use.
- Return type:
Dict
- get_anomalies(deployment_token, deployment_id, threshold=None, histogram=False)
Returns a list of anomalies from the training dataset.
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
threshold (float) – The threshold score of what is an anomaly. Valid values are between 0.8 and 0.99.
histogram (bool) – If True, will return a histogram of the distribution of all points.
- Return type:
- is_anomaly(deployment_token, deployment_id, query_data=None)
Returns a list of anomaly attributes based on login information for a specified account. Note that the inputs to this method, wherever applicable, should be the column names in the dataset mapped to the column mappings in our system (e.g. column ‘account_name’ mapped to mapping ‘ACCOUNT_ID’ in our system).
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – The input data for the prediction.
- Return type:
Dict
- get_event_anomaly_score(deployment_token, deployment_id, query_data=None)
Returns an anomaly score for an event.
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – The input data for the prediction.
- Return type:
Dict
- get_forecast(deployment_token, deployment_id, query_data, future_data=None, num_predictions=None, prediction_start=None, explain_predictions=False, explainer_type=None)
Returns a list of forecasts for a given entity under the specified project deployment. Note that the inputs to the deployed model will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘holiday_yn’ mapped to mapping ‘FUTURE’ in our system).
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘store_id’ in your dataset) mapped to the column mapping ITEM_ID that uniquely identifies the entity against which forecasting is performed and ‘Value’ will be the unique value of the same entity.
future_data (list) – This will be a list of values known ahead of time that are relevant for forecasting (e.g. State Holidays, National Holidays, etc.). Each element is a dictionary, where the key and the value both will be of type ‘str’. For example future data entered for a Store may be [{“Holiday”:”No”, “Promo”:”Yes”, “Date”: “2015-07-31 00:00:00”}].
num_predictions (int) – The number of timestamps to predict in the future.
prediction_start (str) – The start date for predictions (e.g., “2015-08-01T00:00:00” as input for mid-night of 2015-08-01).
explain_predictions (bool) – Will explain predictions for forecasting
explainer_type (str) – Type of explainer to use for explanations
- Return type:
Dict
- get_k_nearest(deployment_token, deployment_id, vector, k=None, distance=None, include_score=False, catalog_id=None)
Returns the k nearest neighbors for the provided embedding vector.
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
vector (list) – Input vector to perform the k nearest neighbors with.
k (int) – Overrideable number of items to return.
distance (str) – Specify the distance function to use when finding nearest neighbors.
include_score (bool) – If True, will return the score alongside the resulting embedding value.
catalog_id (str) – An optional parameter honored only for embeddings that provide a catalog id
- Return type:
Dict
- get_multiple_k_nearest(deployment_token, deployment_id, queries)
Returns the k nearest neighbors for the queries provided.
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
queries (list) – List of mappings of format {“catalogId”: “cat0”, “vectors”: […], “k”: 20, “distance”: “euclidean”}. See getKNearest for additional information about the supported parameters.
- get_labels(deployment_token, deployment_id, query_data)
Returns a list of scored labels for a document.
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – Dictionary where key is “Content” and value is the text from which entities are to be extracted.
- Return type:
Dict
- get_entities_from_pdf(deployment_token, deployment_id, pdf=None, doc_id=None, return_extracted_features=False, verbose=False)
Extracts text from the provided PDF and returns a list of recognized labels and their scores.
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
pdf (io.TextIOBase) – (Optional) The pdf to predict on. One of pdf or docId must be specified.
doc_id (str) – (Optional) The pdf to predict on. One of pdf or docId must be specified.
return_extracted_features (bool) – (Optional) If True, will return all extracted features (e.g. all tokens in a page) from the PDF. Default is False.
verbose (bool) – (Optional) If True, will return all the extracted tokens probabilities for all the trained labels. Default is False.
- Return type:
Dict
- get_recommendations(deployment_token, deployment_id, query_data, num_items=None, page=None, exclude_item_ids=None, score_field=None, scaling_factors=None, restrict_items=None, exclude_items=None, explore_fraction=None, diversity_attribute_name=None, diversity_max_results_per_value=None)
Returns a list of recommendations for a given user under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘time’ mapped to mapping ‘TIMESTAMP’ in our system).
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_name’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the user against which recommendations are made and ‘Value’ will be the unique value of the same item. For example, if you have the column name ‘user_name’ mapped to the column mapping ‘USER_ID’, then the query must have the exact same column name (user_name) as key and the name of the user (John Doe) as value.
num_items (int) – The number of items to recommend on one page. By default, it is set to 50 items per page.
page (int) – The page number to be displayed. For example, let’s say that the num_items is set to 10 with the total recommendations list size of 50 recommended items, then an input value of 2 in the ‘page’ variable will display a list of items that rank from 11th to 20th.
exclude_item_ids (list) – [DEPRECATED]
score_field (str) – The relative item scores are returned in a separate field named with the same name as the key (score_field) for this argument.
scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.
restrict_items (list) – It allows you to restrict the recommendations to certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, “value3”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”, “value3”, …]” to which to restrict the recommendations to. Let’s take an example where the input to restrict_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. This input will restrict the recommendations to SUVs and Sedans. This type of restriction is particularly useful if there’s a list of items that you know is of use in some particular scenario and you want to restrict the recommendations only to that list.
exclude_items (list) – It allows you to exclude certain items from the list of recommendations. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” to exclude from the recommendations. Let’s take an example where the input to exclude_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. The resulting recommendation list will exclude all SUVs and Sedans. This is
explore_fraction (float) – Explore fraction.
diversity_attribute_name (str) – item attribute column name which is used to ensure diversity of prediction results.
diversity_max_results_per_value (int) – maximum number of results per value of diversity_attribute_name.
- Return type:
Dict
- get_personalized_ranking(deployment_token, deployment_id, query_data, preserve_ranks=None, preserve_unknown_items=False, scaling_factors=None)
Returns a list of items with personalized promotions for a given user under the specified project deployment. Note that the inputs to this method, wherever applicable, should be the column names in the dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model in an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This should be a dictionary with two key-value pairs. The first pair represents a ‘Key’ where the column name (e.g. a column with name ‘user_id’ in the dataset) mapped to the column mapping USER_ID uniquely identifies the user against whom a prediction is made and a ‘Value’ which is the identifier value for that user. The second pair will have a ‘Key’ which will be the name of the column name (e.g. movie_name) mapped to ITEM_ID (unique item identifier) and a ‘Value’ which will be a list of identifiers that uniquely identifies those items.
preserve_ranks (list) – List of dictionaries of format {“column”: “col0”, “values”: [“value0, value1”]}, where the ranks of items in query_data is preserved for all the items in “col0” with values, “value0” and “value1”. This option is useful when the desired items are being recommended in the desired order and the ranks for those items need to be kept unchanged during recommendation generation.
preserve_unknown_items (bool) – If true, any items that are unknown to the model, will not be reranked, and the original position in the query will be preserved.
scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.
- Return type:
Dict
- get_ranked_items(deployment_token, deployment_id, query_data, preserve_ranks=None, preserve_unknown_items=False, score_field=None, scaling_factors=None, diversity_attribute_name=None, diversity_max_results_per_value=None)
Returns a list of re-ranked items for a selected user when a list of items is required to be reranked according to the user’s preferences. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary with two key-value pairs. The first pair represents a ‘Key’ where the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID uniquely identifies the user against whom a prediction is made and a ‘Value’ which is the identifier value for that user. The second pair will have a ‘Key’ which will be the name of the column name (e.g. movie_name) mapped to ITEM_ID (unique item identifier) and a ‘Value’ which will be a list of identifiers that uniquely identifies those items.
preserve_ranks (list) – List of dictionaries of format {“column”: “col0”, “values”: [“value0, value1”]}, where the ranks of items in query_data is preserved for all the items in “col0” with values, “value0” and “value1”. This option is useful when the desired items are being recommended in the desired order and the ranks for those items need to be kept unchanged during recommendation generation.
preserve_unknown_items (bool) – If true, any items that are unknown to the model, will not be reranked, and the original position in the query will be preserved
score_field (str) – The relative item scores are returned in a separate field named with the same name as the key (score_field) for this argument.
scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there is a type of item that might be less popular but you want to promote it or there is an item that always comes up and you want to demote it.
diversity_attribute_name (str) – item attribute column name which is used to ensure diversity of prediction results.
diversity_max_results_per_value (int) – maximum number of results per value of diversity_attribute_name.
- Return type:
Dict
Returns a list of related items for a given item under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where the ‘key’ will be the column name (e.g. a column with name ‘user_name’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the user against which related items are determined and the ‘value’ will be the unique value of the same item. For example, if you have the column name ‘user_name’ mapped to the column mapping ‘USER_ID’, then the query must have the exact same column name (user_name) as key and the name of the user (John Doe) as value.
num_items (int) – The number of items to recommend on one page. By default, it is set to 50 items per page.
page (int) – The page number to be displayed. For example, let’s say that the num_items is set to 10 with the total recommendations list size of 50 recommended items, then an input value of 2 in the ‘page’ variable will display a list of items that rank from 11th to 20th.
scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.
restrict_items (list) – It allows you to restrict the recommendations to certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, “value3”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”, “value3”, …]” to which to restrict the recommendations to. Let’s take an example where the input to restrict_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. This input will restrict the recommendations to SUVs and Sedans. This type of restriction is particularly useful if there’s a list of items that you know is of use in some particular scenario and you want to restrict the recommendations only to that list.
exclude_items (list) – It allows you to exclude certain items from the list of recommendations. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” to exclude from the recommendations. Let’s take an example where the input to exclude_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. The resulting recommendation list will exclude all SUVs and Sedans. This is particularly useful if there’s a list of items that you know is of no use in some particular scenario and you don’t want to show those items present in that list.
- Return type:
Dict
- get_chat_response(deployment_token, deployment_id, messages, llm_name=None, num_completion_tokens=None, system_message=None, temperature=0.0, filter_key_values=None, search_score_cutoff=None, chat_config=None, ignore_documents=False)
Return a chat response which continues the conversation based on the input messages and search results.
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
messages (list) – A list of chronologically ordered messages, starting with a user message and alternating sources. A message is a dict with attributes: is_user (bool): Whether the message is from the user. text (str): The message’s text.
llm_name (str) – Name of the specific LLM backend to use to power the chat experience
num_completion_tokens (int) – Default for maximum number of tokens for chat answers
system_message (str) – The generative LLM system message
temperature (float) – The generative LLM temperature
filter_key_values (dict) – A dictionary mapping column names to a list of values to restrict the retrived search results.
search_score_cutoff (float) – Cutoff for the document retriever score. Matching search results below this score will be ignored.
chat_config (dict) – A dictionary specifiying the query chat config override.
ignore_documents (bool) – If True, will ignore any documents and search results, and only use the messages to generate a response.
- Return type:
Dict
- get_conversation_response(deployment_id, message, deployment_conversation_id=None, external_session_id=None, llm_name=None, num_completion_tokens=None, system_message=None, temperature=0.0, filter_key_values=None, search_score_cutoff=None, chat_config=None, ignore_documents=False)
Return a conversation response which continues the conversation based on the input message and deployment conversation id (if exists).
- Parameters:
deployment_id (str) – The unique identifier to a deployment created under the project.
message (str) – A message from the user
deployment_conversation_id (str) – The unique identifier of a deployment conversation to continue. If not specified, a new one will be created.
external_session_id (str) – The user supplied unique identifier of a deployment conversation to continue. If specified, we will use this instead of a internal deployment conversation id.
llm_name (str) – Name of the specific LLM backend to use to power the chat experience
num_completion_tokens (int) – Default for maximum number of tokens for chat answers
system_message (str) – The generative LLM system message
temperature (float) – The generative LLM temperature
filter_key_values (dict) – A dictionary mapping column names to a list of values to restrict the retrived search results.
search_score_cutoff (float) – Cutoff for the document retriever score. Matching search results below this score will be ignored.
chat_config (dict) – A dictionary specifiying the query chat config override.
ignore_documents (bool) – If True, will ignore any documents and search results, and only use the message and past conversation to generate a response.
- Return type:
Dict
- get_search_results(deployment_token, deployment_id, query_data, num=15)
Return the most relevant search results to the search query from the uploaded documents.
- Parameters:
deployment_token (str) – A token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it can be securely embedded in an application or website.
deployment_id (str) – A unique identifier of a deployment created under the project.
query_data (dict) – A dictionary where the key is “Content” and the value is the text from which entities are to be extracted.
num (int) – Number of search results to return.
- Return type:
Dict
- get_sentiment(deployment_token, deployment_id, document)
Predicts sentiment on a document
- Parameters:
deployment_token (str) – A token used to authenticate access to deployments created in this project. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique string identifier for a deployment created under this project.
document (str) – The document to be analyzed for sentiment.
- Return type:
Dict
- get_entailment(deployment_token, deployment_id, document)
Predicts the classification of the document
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique string identifier for the deployment created under the project.
document (str) – The document to be classified.
- Return type:
Dict
- get_classification(deployment_token, deployment_id, document)
Predicts the classification of the document
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique string identifier for the deployment created under the project.
document (str) – The document to be classified.
- Return type:
Dict
- get_summary(deployment_token, deployment_id, query_data)
Returns a JSON of the predicted summary for the given document. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘text’ mapped to mapping ‘DOCUMENT’ in our system).
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – Raw data dictionary containing the required document data - must have a key ‘document’ corresponding to a DOCUMENT type text as value.
- Return type:
Dict
- predict_language(deployment_token, deployment_id, query_data)
Predicts the language of the text
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments within this project, making it safe to embed this model in an application or website.
deployment_id (str) – A unique string identifier for a deployment created under the project.
query_data (str) – The input string to detect.
- Return type:
Dict
- get_assignments(deployment_token, deployment_id, query_data, forced_assignments=None, solve_time_limit_seconds=None)
Get all positive assignments that match a query.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it can be safely embedded in an application or website.
deployment_id (str) – The unique identifier of a deployment created under the project.
query_data (dict) – Specifies the set of assignments being requested. The value for the key can be: 1. A simple scalar value, which is matched exactly 2. A list of values, which matches any element in the list 3. A dictionary with keys lower_in/lower_ex and upper_in/upper_ex, which matches values in an inclusive/exclusive range
forced_assignments (dict) – Set of assignments to force and resolve before returning query results.
solve_time_limit_seconds (float) – Maximum time in seconds to spend solving the query.
- Return type:
Dict
- get_alternative_assignments(deployment_token, deployment_id, query_data, add_constraints=None, solve_time_limit_seconds=None)
Get alternative positive assignments for given query. Optimal assignments are ignored and the alternative assignments are returned instead.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it can be safely embedded in an application or website.
deployment_id (str) – The unique identifier of a deployment created under the project.
query_data (dict) – Specifies the set of assignments being requested. The value for the key can be: 1. A simple scalar value, which is matched exactly 2. A list of values, which matches any element in the list 3. A dictionary with keys lower_in/lower_ex and upper_in/upper_ex, which matches values in an inclusive/exclusive range
add_constraints (list) – List of constraints dict to apply to the query. The constraint dict should have the following keys: 1. query (dict): Specifies the set of assignments involved in the constraint. The format is same as query_data. 2. operator (str): Constraint operator ‘=’ or ‘<=’ or ‘>=’. 3. constant (int): Constraint RHS constant value.
solve_time_limit_seconds (float) – Maximum time in seconds to spend solving the query.
- Return type:
Dict
- check_constraints(deployment_token, deployment_id, query_data)
Check for any constraints violated by the overrides.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model within an application or website.
deployment_id (str) – The unique identifier for a deployment created under the project.
query_data (dict) – Assignment overrides to the solution.
- Return type:
Dict
- predict_with_binary_data(deployment_token, deployment_id, blob)
Make predictions for a given blob, e.g. image, audio
- Parameters:
deployment_token (str) – A token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model in an application or website.
deployment_id (str) – A unique identifier to a deployment created under the project.
blob (io.TextIOBase) – The multipart/form-data of the data.
- Return type:
Dict
- describe_image(deployment_token, deployment_id, image, categories, top_n=None)
Describe the similarity between an image and a list of categories.
- Parameters:
deployment_token (str) – Authentication token to access created deployments. This token is only authorized to predict on deployments in the current project, and can be safely embedded in an application or website.
deployment_id (str) – Unique identifier of a deployment created under the project.
image (io.TextIOBase) – Image to describe.
categories (list) – List of candidate categories to compare with the image.
top_n (int) – Return the N most similar categories.
- Return type:
Dict
- get_text_from_document(deployment_token, deployment_id, document=None, return_detected_images=False)
Generate text from a document
- Parameters:
deployment_token (str) – Authentication token to access created deployments. This token is only authorized to predict on deployments in the current project, and can be safely embedded in an application or website.
deployment_id (str) – Unique identifier of a deployment created under the project.
document (io.TextIOBase) – Input document which can be an image, pdf, or word document (Some formats might not be supported yet)
return_detected_images (bool) – whether the detected images should be saved in docstore or not (if true, adds a docstore id to the response (may not be available for some algorithms))
- Return type:
Dict
- transcribe_audio(deployment_token, deployment_id, audio)
Transcribe the audio
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to make predictions on deployments in this project, so it can be safely embedded in an application or website.
deployment_id (str) – The unique identifier of a deployment created under the project.
audio (io.TextIOBase) – The audio to transcribe.
- Return type:
Dict
- classify_image(deployment_token, deployment_id, image=None, doc_id=None)
Classify an image.
- Parameters:
deployment_token (str) – A deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique string identifier to a deployment created under the project.
image (io.TextIOBase) – The binary data of the image to classify. One of image or doc_id must be specified.
doc_id (str) – The document ID of the image. One of image or doc_id must be specified.
- Return type:
Dict
- classify_pdf(deployment_token, deployment_id, pdf=None)
Returns a classification prediction from a PDF
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model within an application or website.
deployment_id (str) – The unique identifier for a deployment created under the project.
pdf (io.TextIOBase) – (Optional) The pdf to predict on. One of pdf or docId must be specified.
- Return type:
Dict
- get_cluster(deployment_token, deployment_id, query_data)
Predicts the cluster for given data.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique string identifier for the deployment created under the project.
query_data (dict) – A dictionary where each ‘key’ represents a column name and its corresponding ‘value’ represents the value of that column. For Timeseries Clustering, the ‘key’ should be ITEM_ID, and its value should represent a unique item ID that needs clustering.
- Return type:
Dict
- get_objects_from_image(deployment_token, deployment_id, image)
Classify an image.
- Parameters:
deployment_token (str) – A deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique string identifier to a deployment created under the project.
image (io.TextIOBase) – The binary data of the image to detect objects from.
- Return type:
Dict
- score_image(deployment_token, deployment_id, image)
Score on image.
- Parameters:
deployment_token (str) – A deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique string identifier to a deployment created under the project.
image (io.TextIOBase) – The binary data of the image to get the score.
- Return type:
Dict
- transfer_style(deployment_token, deployment_id, source_image, style_image)
Change the source image to adopt the visual style from the style image.
- Parameters:
deployment_token (str) – A token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model in an application or website.
deployment_id (str) – A unique identifier to a deployment created under the project.
source_image (io.TextIOBase) – The source image to apply the makeup.
style_image (io.TextIOBase) – The image that has the style as a reference.
- Return type:
- generate_image(deployment_token, deployment_id, query_data)
Generate an image from text prompt.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model within an application or website.
deployment_id (str) – A unique identifier to a deployment created under the project.
query_data (dict) – Specifies the text prompt. For example, {‘prompt’: ‘a cat’}
- Return type:
- execute_agent(deployment_token, deployment_id, arguments=None, keyword_arguments=None)
Executes a deployed AI agent function using the arguments as keyword arguments to the agent execute function.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique string identifier for the deployment created under the project.
arguments (list) – Positional arguments to the agent execute function.
keyword_arguments (dict) – A dictionary where each ‘key’ represents the paramter name and its corresponding ‘value’ represents the value of that parameter for the agent execute function.
- Return type:
Dict
- execute_conversation_agent(deployment_token, deployment_id, arguments=None, keyword_arguments=None, deployment_conversation_id=None, external_session_id=None, regenerate=False)
Executes a deployed AI agent function using the arguments as keyword arguments to the agent execute function.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique string identifier for the deployment created under the project.
arguments (list) – Positional arguments to the agent execute function.
keyword_arguments (dict) – A dictionary where each ‘key’ represents the paramter name and its corresponding ‘value’ represents the value of that parameter for the agent execute function.
deployment_conversation_id (str) – A unique string identifier for the deployment conversation used for the conversation.
external_session_id (str) – A unique string identifier for the session used for the conversation. If both deployment_conversation_id and external_session_id are not provided, a new session will be created.
regenerate (bool) – If True, will regenerate the response from the last query.
- Return type:
Dict
- execute_agent_with_binary_data(deployment_token, deployment_id, blob, arguments=None, keyword_arguments=None, deployment_conversation_id=None, external_session_id=None)
Executes a deployed AI agent function with binary data as inputs.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique string identifier for the deployment created under the project.
blob (io.TextIOBase) – The multipart/form-data of the binary data.
arguments (list) – Positional arguments to the agent execute function.
keyword_arguments (dict) – A dictionary where each ‘key’ represents the parameter name and its corresponding ‘value’ represents the value of that parameter for the agent execute function.
deployment_conversation_id (str) – A unique string identifier for the deployment conversation used for the conversation.
external_session_id (str) – A unique string identifier for the session used for the conversation. If both deployment_conversation_id and external_session_id are not provided, a new session will be created.
- Return type:
Dict
- lookup_matches(deployment_token, deployment_id, data=None, filters=None, num=None, result_columns=None, max_words=None, num_retrieval_margin_words=None, max_words_per_chunk=None)
Lookup document retrievers and return the matching documents from the document retriever deployed with given query.
Original documents are splitted into chunks and stored in the document retriever. This lookup function will return the relevant chunks from the document retriever. The returned chunks could be expanded to include more words from the original documents and merged if they are overlapping, and permitted by the settings provided. The returned chunks are sorted by relevance.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments within this project, making it safe to embed this model in an application or website.
deployment_id (str) – A unique string identifier for the deployment created under the project.
data (str) – The query to search for.
filters (dict) – A dictionary mapping column names to a list of values to restrict the retrieved search results.
num (int) – If provided, will limit the number of results to the value specified.
result_columns (list) – If provided, will limit the column properties present in each result to those specified in this list.
max_words (int) – If provided, will limit the total number of words in the results to the value specified.
num_retrieval_margin_words (int) – If provided, will add this number of words from left and right of the returned chunks.
max_words_per_chunk (int) – If provided, will limit the number of words in each chunk to the value specified. If the value provided is smaller than the actual size of chunk on disk, which is determined during document retriever creation, the actual size of chunk will be used. I.e, chunks looked up from document retrievers will not be split into smaller chunks during lookup due to this setting.
- Returns:
The relevant documentation results found from the document retriever.
- Return type:
- create_batch_prediction(deployment_id, table_name=None, name=None, global_prediction_args=None, explanations=False, output_format=None, output_location=None, database_connector_id=None, database_output_config=None, refresh_schedule=None, csv_input_prefix=None, csv_prediction_prefix=None, csv_explanations_prefix=None, output_includes_metadata=None, result_input_columns=None, input_feature_groups=None)
Creates a batch prediction job description for the given deployment.
- Parameters:
deployment_id (str) – Unique string identifier for the deployment.
table_name (str) – Name of the feature group table to write the results of the batch prediction. Can only be specified if outputLocation and databaseConnectorId are not specified. If tableName is specified, the outputType will be enforced as CSV.
name (str) – Name of the batch prediction job.
global_prediction_args (BatchPredictionArgs) – Batch Prediction args specific to problem type.
explanations (bool) – If true, SHAP explanations will be provided for each prediction, if supported by the use case.
output_format (str) – Format of the batch prediction output (CSV or JSON).
output_location (str) – Location to write the prediction results. Otherwise, results will be stored in Abacus.AI.
database_connector_id (str) – Unique identifier of a Database Connection to write predictions to. Cannot be specified in conjunction with outputLocation.
database_output_config (dict) – Key-value pair of columns/values to write to the database connector. Only available if databaseConnectorId is specified.
refresh_schedule (str) – Cron-style string that describes a schedule in UTC to automatically run the batch prediction.
csv_input_prefix (str) – Prefix to prepend to the input columns, only applies when output format is CSV.
csv_prediction_prefix (str) – Prefix to prepend to the prediction columns, only applies when output format is CSV.
csv_explanations_prefix (str) – Prefix to prepend to the explanation columns, only applies when output format is CSV.
output_includes_metadata (bool) – If true, output will contain columns including prediction start time, batch prediction version, and model version.
result_input_columns (list) – If present, will limit result files or feature groups to only include columns present in this list.
input_feature_groups (dict) – A dict of {‘<feature_group_type>’: ‘<table_name>’} which overrides the default input data of that type for the Batch Prediction. Default input data is the training data that was used for training the deployed model.
- Returns:
The batch prediction description.
- Return type:
- start_batch_prediction(batch_prediction_id)
Creates a new batch prediction version job for a given batch prediction job description.
- Parameters:
batch_prediction_id (str) – The unique identifier of the batch prediction to create a new version of.
- Returns:
The batch prediction version started by this method call.
- Return type:
- update_batch_prediction(batch_prediction_id, deployment_id=None, global_prediction_args=None, explanations=None, output_format=None, csv_input_prefix=None, csv_prediction_prefix=None, csv_explanations_prefix=None, output_includes_metadata=None, result_input_columns=None, name=None)
Update a batch prediction job description.
- Parameters:
batch_prediction_id (str) – Unique identifier of the batch prediction.
deployment_id (str) – Unique identifier of the deployment.
global_prediction_args (BatchPredictionArgs) – Batch Prediction args specific to problem type.
explanations (bool) – If True, SHAP explanations for each prediction will be provided, if supported by the use case.
output_format (str) – If specified, sets the format of the batch prediction output (CSV or JSON).
csv_input_prefix (str) – Prefix to prepend to the input columns, only applies when output format is CSV.
csv_prediction_prefix (str) – Prefix to prepend to the prediction columns, only applies when output format is CSV.
csv_explanations_prefix (str) – Prefix to prepend to the explanation columns, only applies when output format is CSV.
output_includes_metadata (bool) – If True, output will contain columns including prediction start time, batch prediction version, and model version.
result_input_columns (list) – If present, will limit result files or feature groups to only include columns present in this list.
name (str) – If present, will rename the batch prediction.
- Returns:
The batch prediction.
- Return type:
- set_batch_prediction_file_connector_output(batch_prediction_id, output_format=None, output_location=None)
Updates the file connector output configuration of the batch prediction
- Parameters:
batch_prediction_id (str) – The unique identifier of the batch prediction.
output_format (str) – The format of the batch prediction output (CSV or JSON). If not specified, the default format will be used.
output_location (str) – The location to write the prediction results. If not specified, results will be stored in Abacus.AI.
- Returns:
The batch prediction description.
- Return type:
- set_batch_prediction_database_connector_output(batch_prediction_id, database_connector_id=None, database_output_config=None)
Updates the database connector output configuration of the batch prediction
- Parameters:
- Returns:
Description of the batch prediction.
- Return type:
- set_batch_prediction_feature_group_output(batch_prediction_id, table_name)
Creates a feature group and sets it as the batch prediction output.
- Parameters:
- Returns:
Batch prediction after the output has been applied.
- Return type:
- set_batch_prediction_output_to_console(batch_prediction_id)
Sets the batch prediction output to the console, clearing both the file connector and database connector configurations.
- Parameters:
batch_prediction_id (str) – The unique identifier of the batch prediction.
- Returns:
The batch prediction description.
- Return type:
- set_batch_prediction_feature_group(batch_prediction_id, feature_group_type, feature_group_id=None)
Sets the batch prediction input feature group.
- Parameters:
batch_prediction_id (str) – Unique identifier of the batch prediction.
feature_group_type (str) – Enum string representing the feature group type to set. The type is based on the use case under which the feature group is being created (e.g. Catalog Attributes for personalized recommendation use case).
feature_group_id (str) – Unique identifier of the feature group to set as input to the batch prediction.
- Returns:
Description of the batch prediction.
- Return type:
- set_batch_prediction_dataset_remap(batch_prediction_id, dataset_id_remap)
For the purpose of this batch prediction, will swap out datasets in the training feature groups
- Parameters:
- Returns:
Batch prediction object.
- Return type:
- delete_batch_prediction(batch_prediction_id)
Deletes a batch prediction and associated data, such as associated monitors.
- Parameters:
batch_prediction_id (str) – Unique string identifier of the batch prediction.
- add_user_item_interaction(streaming_token, dataset_id, timestamp, user_id, item_id, event_type, additional_attributes)
Adds a user-item interaction record (data row) to a streaming dataset.
- Parameters:
streaming_token (str) – The streaming token for authenticating requests to the dataset.
dataset_id (str) – The unique string identifier for the streaming dataset to record data to.
timestamp (int) – The Unix timestamp of the event.
user_id (str) – The unique identifier for the user.
item_id (list) – The unique identifier for the items.
event_type (str) – The event type.
additional_attributes (dict) – Attributes of the user interaction.
- upsert_user_attributes(streaming_token, dataset_id, user_id, user_attributes)
Adds a user attribute record (data row) to a streaming dataset.
Either the streaming dataset ID or the project ID is required.
- Parameters:
- upsert_item_attributes(streaming_token, dataset_id, item_id, item_attributes)
Adds an item attributes record (data row) to a streaming dataset.
Either the streaming dataset ID or the project ID is required.
- Parameters:
- add_multiple_user_item_interactions(streaming_token, dataset_id, interactions)
Adds a user-item interaction record (data row) to a streaming dataset.
- Parameters:
streaming_token (str) – The streaming token for authenticating requests to the dataset.
dataset_id (str) – The unique string identifier of the streaming dataset to record data to.
interactions (list) – List of interactions, each interaction of format {‘userId’: userId, ‘timestamp’: timestamp, ‘itemId’: itemId, ‘eventType’: eventType, ‘additionalAttributes’: {‘attribute1’: ‘abc’, ‘attribute2’: 123}}.
- upsert_multiple_user_attributes(streaming_token, dataset_id, upserts)
Adds multiple user attributes records (data rows) to a streaming dataset.
- Parameters:
streaming_token (str) – The streaming token for authenticating requests to the dataset.
dataset_id (str) – A unique string identifier for the streaming dataset to record data to.
upserts (list) – List of upserts, each upsert of format {‘userId’: userId, ‘userAttributes’: {‘attribute1’: ‘abc’, ‘attribute2’: 123}}.
- upsert_multiple_item_attributes(streaming_token, dataset_id, upserts)
Adds multiple item attributes records (data rows) to a streaming dataset.
- Parameters:
streaming_token (str) – The streaming token for authenticating requests to the dataset.
dataset_id (str) – A unique string identifier for the streaming dataset to record data to.
upserts (list) – A list of upserts, each upsert of format {‘itemId’: itemId, ‘itemAttributes’: {‘attribute1’: ‘abc’, ‘attribute2’: 123}}.
- upsert_item_embeddings(streaming_token, model_id, item_id, vector, catalog_id=None)
Upserts an embedding vector for an item id for a model_id.
- Parameters:
streaming_token (str) – The streaming token for authenticating requests to the model.
model_id (str) – A unique string identifier for the model to upsert item embeddings to.
item_id (str) – The item id for which its embeddings will be upserted.
vector (list) – The embedding vector.
catalog_id (str) – The name of the catalog in the model to update.
- delete_item_embeddings(streaming_token, model_id, item_ids, catalog_id=None)
Deletes KNN embeddings for a list of item IDs for a given model ID.
- Parameters:
streaming_token (str) – The streaming token for authenticating requests to the model.
model_id (str) – A unique string identifier for the model from which to delete item embeddings.
item_ids (list) – A list of item IDs whose embeddings will be deleted.
catalog_id (str) – An optional name to specify which catalog in a model to update.
- upsert_multiple_item_embeddings(streaming_token, model_id, upserts, catalog_id=None)
Upserts a knn embedding for multiple item ids for a model_id.
- Parameters:
streaming_token (str) – The streaming token for authenticating requests to the model.
model_id (str) – The unique string identifier of the model to upsert item embeddings to.
upserts (list) – A list of dictionaries of the form {‘itemId’: …, ‘vector’: […]} for each upsert.
catalog_id (str) – Name of the catalog in the model to update.
- append_data(feature_group_id, streaming_token, data)
Appends new data into the feature group for a given lookup key recordId.
- upsert_multiple_data(feature_group_id, streaming_token, data)
Update new data into the feature group for a given lookup key recordId if the recordId is found; otherwise, insert new data into the feature group.
- append_multiple_data(feature_group_id, streaming_token, data)
Appends new data into the feature group for a given lookup key recordId.
- upsert_data(feature_group_id, streaming_token=None, data=None)
Update new data into the feature group for a given lookup key record ID if the record ID is found; otherwise, insert new data into the feature group.
- Parameters:
- Return type:
- upsert_online_data(feature_group_id, data)
Update new data into the feature group for a given lookup key record ID if the record ID is found; otherwise, insert new data into the feature group.
- Parameters:
- Return type:
- delete_data(feature_group_id, primary_key)
Deletes a row from the feature group given the primary key
- add_feature_group_document(feature_group_id, document)
Adds a document to the feature group.
- Parameters:
feature_group_id (str) – The unique ID associated with the feature group.
document (io.TextIOBase) – The multipart/form-data of the document to add to the feature group.
- Return type:
- describe_feature_group_row_process_by_key(deployment_id, primary_key_value)
Gets the feature group row process.
- Parameters:
- Returns:
An object representing the feature group row process
- Return type:
- list_feature_group_row_processes(deployment_id, limit=None, status=None)
Gets a list of feature group row processes.
- Parameters:
- Returns:
A list of object representing the feature group row process
- Return type:
- get_feature_group_row_process_summary(deployment_id)
Gets a summary of the statuses of the individual feature group processes.
- Parameters:
deployment_id (str) – The deployment id for the process
- Returns:
An object representing the summary of the statuses of the individual feature group processes
- Return type:
- reset_feature_group_row_process_by_key(deployment_id, primary_key_value)
Resets a feature group row process so that it can be reprocessed
- Parameters:
- Returns:
An object representing the feature group row process.
- Return type:
- get_feature_group_row_process_logs_by_key(deployment_id, primary_key_value)
Gets the logs for a feature group row process
- Parameters:
- Returns:
An object representing the logs for the feature group row process
- Return type:
- create_python_function(name, source_code=None, function_name=None, function_variable_mappings=None, package_requirements=None, function_type='FEATURE_GROUP')
Creates a custom Python function that is reusable.
- Parameters:
name (str) – The name to identify the Python function.
source_code (str) – Contents of a valid Python source code file. The source code should contain the transform feature group functions. A list of allowed imports and system libraries for each language is specified in the user functions documentation section.
function_name (str) – The name of the Python function.
function_variable_mappings (list) – List of Python function arguments.
package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].
function_type (str) – Type of Python function to create.
- Returns:
The Python function that can be used (e.g. for feature group transform).
- Return type:
- update_python_function(name, source_code=None, function_name=None, function_variable_mappings=None, package_requirements=None)
Update custom python function with user inputs for the given python function.
- Parameters:
name (str) – The name to identify the Python function.
source_code (str) – Contents of a valid Python source code file. The source code should contain the transform feature group functions. A list of allowed imports and system libraries for each language is specified in the user functions documentation section.
function_name (str) – The name of the Python function within source_code.
function_variable_mappings (list) – List of arguments required by function_name.
package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].
- Returns:
The Python function object.
- Return type:
- delete_python_function(name)
Removes an existing Python function.
- Parameters:
name (str) – The name to identify the Python function.
- create_pipeline(pipeline_name, project_id=None, cron=None, is_prod=None)
Creates a pipeline for executing multiple steps.
- Parameters:
pipeline_name (str) – The name of the pipeline, which should be unique to the organization.
project_id (str) – A unique string identifier for the pipeline.
cron (str) – A cron-like string specifying the frequency of pipeline reruns.
is_prod (bool) – Whether the pipeline is a production pipeline or not.
- Returns:
An object that describes a Pipeline.
- Return type:
- describe_pipeline(pipeline_id)
Describes a given pipeline.
- describe_pipeline_by_name(pipeline_name)
Describes a given pipeline.
- update_pipeline(pipeline_id, project_id=None, pipeline_variable_mappings=None, cron=None, is_prod=None)
Updates a pipeline for executing multiple steps.
- Parameters:
pipeline_id (str) – The ID of the pipeline to update.
project_id (str) – A unique string identifier for the pipeline.
pipeline_variable_mappings (list) – List of Python function arguments for the pipeline.
cron (str) – A cron-like string specifying the frequency of the scheduled pipeline runs.
is_prod (bool) – Whether the pipeline is a production pipeline or not.
- Returns:
An object that describes a Pipeline.
- Return type:
- rename_pipeline(pipeline_id, pipeline_name)
Renames a pipeline.
- delete_pipeline(pipeline_id)
Deletes a pipeline.
- Parameters:
pipeline_id (str) – The ID of the pipeline to delete.
- list_pipeline_versions(pipeline_id, limit=200)
Lists the pipeline versions for a specified pipeline
- Parameters:
- Returns:
A list of pipeline versions.
- Return type:
- run_pipeline(pipeline_id, pipeline_variable_mappings=None)
Runs a specified pipeline with the arguments provided.
- Parameters:
- Returns:
The object describing the pipeline
- Return type:
- reset_pipeline_version(pipeline_version, steps=None, include_downstream_steps=True)
Reruns a pipeline version for the given steps and downstream steps if specified.
- Parameters:
- Returns:
Object describing the pipeline version
- Return type:
- create_pipeline_step(pipeline_id, step_name, function_name=None, source_code=None, step_input_mappings=None, output_variable_mappings=None, step_dependencies=None, package_requirements=None, cpu_size=None, memory=None)
Creates a step in a given pipeline.
- Parameters:
pipeline_id (str) – The ID of the pipeline to run.
step_name (str) – The name of the step.
function_name (str) – The name of the Python function.
source_code (str) – Contents of a valid Python source code file. The source code should contain the transform feature group functions. A list of allowed imports and system libraries for each language is specified in the user functions documentation section.
step_input_mappings (list) – List of Python function arguments.
output_variable_mappings (list) – List of Python function ouputs.
step_dependencies (list) – List of step names this step depends on.
package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].
cpu_size (str) – Size of the CPU for the step function.
memory (int) – Memory (in GB) for the step function.
- Returns:
Object describing the pipeline.
- Return type:
- delete_pipeline_step(pipeline_step_id)
Deletes a step from a pipeline.
- Parameters:
pipeline_step_id (str) – The ID of the pipeline step.
- update_pipeline_step(pipeline_step_id, function_name=None, source_code=None, step_input_mappings=None, output_variable_mappings=None, step_dependencies=None, package_requirements=None, cpu_size=None, memory=None)
Creates a step in a given pipeline.
- Parameters:
pipeline_step_id (str) – The ID of the pipeline_step to update.
function_name (str) – The name of the Python function.
source_code (str) – Contents of a valid Python source code file. The source code should contain the transform feature group functions. A list of allowed imports and system libraries for each language is specified in the user functions documentation section.
step_input_mappings (list) – List of Python function arguments.
output_variable_mappings (list) – List of Python function ouputs.
step_dependencies (list) – List of step names this step depends on.
package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].
cpu_size (str) – Size of the CPU for the step function.
memory (int) – Memory (in GB) for the step function.
- Returns:
Object describing the pipeline.
- Return type:
- rename_pipeline_step(pipeline_step_id, step_name)
Renames a step in a given pipeline.
- Parameters:
- Returns:
Object describing the pipeline.
- Return type:
- unset_pipeline_refresh_schedule(pipeline_id)
Deletes the refresh schedule for a given pipeline.
- pause_pipeline_refresh_schedule(pipeline_id)
Pauses the refresh schedule for a given pipeline.
- resume_pipeline_refresh_schedule(pipeline_id)
Resumes the refresh schedule for a given pipeline.
- create_graph_dashboard(project_id, name, python_function_ids=None)
Create a plot dashboard given selected python plots
- Parameters:
- Returns:
An object describing the graph dashboard.
- Return type:
- delete_graph_dashboard(graph_dashboard_id)
Deletes a graph dashboard
- Parameters:
graph_dashboard_id (str) – Unique string identifier for the graph dashboard to be deleted.
- update_graph_dashboard(graph_dashboard_id, name=None, python_function_ids=None)
Updates a graph dashboard
- Parameters:
- Returns:
An object describing the graph dashboard.
- Return type:
- add_graph_to_dashboard(python_function_id, graph_dashboard_id, function_variable_mappings=None, name=None)
Add a python plot function to a dashboard
- Parameters:
python_function_id (str) – Unique string identifier for the Python function.
graph_dashboard_id (str) – Unique string identifier for the graph dashboard to update.
function_variable_mappings (dict) – List of arguments to be supplied to the function as parameters, in the format [{‘name’: ‘function_argument’, ‘variable_type’: ‘FEATURE_GROUP’, ‘value’: ‘name_of_feature_group’}].
name (str) – Name of the added python plot
- Returns:
An object describing the graph dashboard.
- Return type:
- update_graph_to_dashboard(graph_reference_id, function_variable_mappings=None, name=None)
Update a python plot function to a dashboard
- Parameters:
graph_reference_id (str) – A unique string identifier for the graph reference.
function_variable_mappings (list) – A list of arguments to be supplied to the Python function as parameters in the format [{‘name’: ‘function_argument’, ‘variable_type’: ‘FEATURE_GROUP’, ‘value’: ‘name_of_feature_group’}].
name (str) – The updated name for the graph
- Returns:
An object describing the graph dashboard.
- Return type:
- create_algorithm(name, problem_type, source_code=None, training_data_parameter_names_mapping=None, training_config_parameter_name=None, train_function_name=None, predict_function_name=None, predict_many_function_name=None, initialize_function_name=None, config_options=None, is_default_enabled=False, project_id=None, use_gpu=False, package_requirements=None)
Creates a custom algorithm that is re-usable for model training.
- Parameters:
name (str) – The name to identify the algorithm; only uppercase letters, numbers, and underscores are allowed.
problem_type (str) – The type of problem this algorithm will work on.
source_code (str) – Contents of a valid Python source code file. The source code should contain the train/predict/predict_many/initialize functions. A list of allowed import and system libraries for each language is specified in the user functions documentation section.
training_data_parameter_names_mapping (dict) – The mapping from feature group types to training data parameter names in the train function.
training_config_parameter_name (str) – The train config parameter name in the train function.
train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.
predict_function_name (str) – Name of the function found in the source code that will be executed to run predictions through the model. It is not executed when this function is run.
predict_many_function_name (str) – Name of the function found in the source code that will be executed for batch prediction of the model. It is not executed when this function is run.
initialize_function_name (str) – Name of the function found in the source code to initialize the trained model before using it to make predictions using the model.
config_options (dict) – Map dataset types and configs to train function parameter names.
is_default_enabled (bool) – Whether to train with the algorithm by default.
project_id (str) – The unique version ID of the project.
use_gpu (bool) – Whether this algorithm needs to run on GPU.
package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].
- Returns:
The new custom model that can be used for training.
- Return type:
- delete_algorithm(algorithm)
Deletes the specified customer algorithm.
- Parameters:
algorithm (str) – The name of the algorithm to delete.
- update_algorithm(algorithm, source_code=None, training_data_parameter_names_mapping=None, training_config_parameter_name=None, train_function_name=None, predict_function_name=None, predict_many_function_name=None, initialize_function_name=None, config_options=None, is_default_enabled=None, use_gpu=None, package_requirements=None)
Update a custom algorithm for the given algorithm name. If source code is provided, all function names for the source code must also be provided.
- Parameters:
algorithm (str) – The name to identify the algorithm. Only uppercase letters, numbers, and underscores are allowed.
source_code (str) – Contents of a valid Python source code file. The source code should contain the train/predict/predict_many/initialize functions. A list of allowed imports and system libraries for each language is specified in the user functions documentation section.
training_data_parameter_names_mapping (dict) – The mapping from feature group types to training data parameter names in the train function.
training_config_parameter_name (str) – The train config parameter name in the train function.
train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.
predict_function_name (str) – Name of the function found in the source code that will be executed to run predictions through the model. It is not executed when this function is run.
predict_many_function_name (str) – Name of the function found in the source code that will be executed for batch prediction of the model. It is not executed when this function is run.
initialize_function_name (str) – Name of the function found in the source code to initialize the trained model before using it to make predictions using the model.
config_options (dict) – Map dataset types and configs to train function parameter names.
is_default_enabled (bool) – Whether to train with the algorithm by default.
use_gpu (bool) – Whether this algorithm needs to run on GPU.
package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].
- Returns:
The new custom model can be used for training.
- Return type:
- list_builtin_algorithms(project_id, feature_group_ids, training_config=None)
Return list of built-in algorithms based on given input.
- Parameters:
- Returns:
List of applicable builtin algorithms.
- Return type:
- create_custom_loss_function_with_source_code(name, loss_function_type, loss_function_name, loss_function_source_code)
Registers a new custom loss function which can be used as an objective function during model training.
- Parameters:
name (str) – A name for the loss, unique per organization. Must be 50 characters or fewer, and can contain only underscores, numbers, and uppercase alphabets.
loss_function_type (str) – The category of problems that this loss would be applicable to, e.g. REGRESSION_DL_TF, CLASSIFICATION_DL_TF, etc.
loss_function_name (str) – The name of the function whose full source code is passed in loss_function_source_code.
loss_function_source_code (str) – Python source code string of the function.
- Returns:
A description of the registered custom loss function.
- Return type:
- update_custom_loss_function_with_source_code(name, loss_function_name, loss_function_source_code)
Updates a previously registered custom loss function with a new function implementation.
- Parameters:
- Returns:
A description of the updated custom loss function.
- Return type:
- delete_custom_loss_function(name)
Deletes a previously registered custom loss function.
- Parameters:
name (str) – The name of the custom loss function to be deleted.
- create_custom_metric(name, problem_type, custom_metric_function_name=None, source_code=None)
Registers a new custom metric which can be used as an evaluation metric for the trained model.
- Parameters:
name (str) – A unique name for the metric, with a limit of 50 characters. Only underscores, numbers, and uppercase alphabets are allowed.
problem_type (str) – The problem type that this metric would be applicable to, e.g. REGRESSION, FORECASTING, etc.
custom_metric_function_name (str) – The name of the function whose full source code is passed in source_code.
source_code (str) – The full source code of the custom metric function. This is required if custom_metric_function_name is passed.
- Returns:
The newly created custom metric.
- Return type:
- update_custom_metric(name, custom_metric_function_name, source_code)
Updates a previously registered custom metric with a new function implementation.
- Parameters:
- Returns:
A description of the updated custom metric.
- Return type:
- delete_custom_metric(name)
Deletes a previously registered custom metric.
- Parameters:
name (str) – The name of the custom metric to be deleted.
- create_module(name, source_code=None)
Creates a module that’s re-usable in customer’s code, e.g. python function, bring your own algorithm and etc.
- delete_module(name)
Deletes the specified customer module.
- Parameters:
name (str) – The name of the custom module to delete.
- update_module(name, source_code=None)
Update the module.
- create_organization_secret(secret_key, value)
Creates a secret which can be accessed in functions and notebooks.
- Parameters:
- Returns:
The created secret.
- Return type:
- delete_organization_secret(secret_key)
Deletes a secret.
- Parameters:
secret_key (str) – The secret key.
- update_organization_secret(secret_key, value)
Updates a secret.
- Parameters:
- Returns:
The updated secret.
- Return type:
- set_natural_language_explanation(short_explanation, long_explanation, feature_group_id=None, feature_group_version=None, model_id=None)
Saves the natural language explanation of an artifact with given ID. The artifact can be - Feature Group or Feature Group Version
- Parameters:
short_explanation (str) – succinct explanation of the artifact with given ID
long_explanation (str) – verbose explanation of the artifact with given ID
feature_group_id (str) – A unique string identifier associated with the Feature Group.
feature_group_version (str) – A unique string identifier associated with the Feature Group Version.
model_id (str) – A unique string identifier associated with the Model.
- create_chat_session(project_id=None, name=None)
Creates a chat session with Abacus AI Chat.
- Parameters:
- Returns:
The chat session with Abacus AI Chat
- Return type:
- delete_chat_message(chat_session_id, message_index)
Deletes a message in a chat session and its associated response.
- export_chat_session(chat_session_id)
Exports a chat session to an HTML file
- Parameters:
chat_session_id (str) – Unique ID of the chat session.
- rename_chat_session(chat_session_id, name)
Renames a chat session with Abacus AI Chat.
- suggest_abacus_apis(query, verbosity=1, limit=5)
Suggests several Abacus APIs that are most relevant to the supplied natural language query.
- Parameters:
- Returns:
A list of suggested Abacus APIs
- Return type:
- create_deployment_conversation(deployment_id, name, deployment_token=None)
Creates a deployment conversation.
- Parameters:
- Returns:
The deployment conversation.
- Return type:
- delete_deployment_conversation(deployment_conversation_id, deployment_id=None, deployment_token=None)
Delete a Deployment Conversation.
- Parameters:
deployment_conversation_id (str) – A unique string identifier associated with the deployment conversation.
deployment_id (str) – The deployment this conversation belongs to. This is required if not logged in.
deployment_token (str) – The deployment token to authenticate access to the deployment. This is required if not logged in.
- clear_deployment_conversation(deployment_conversation_id=None, external_session_id=None, deployment_id=None, deployment_token=None, user_message_indices=None)
Clear the message history of a Deployment Conversation.
- Parameters:
deployment_conversation_id (str) – A unique string identifier associated with the deployment conversation.
external_session_id (str) – The external session id associated with the deployment conversation.
deployment_id (str) – The deployment this conversation belongs to. This is required if not logged in.
deployment_token (str) – The deployment token to authenticate access to the deployment. This is required if not logged in.
user_message_indices (list) – Optional list of user message indices to clear. The associated bot response will also be cleared. If not provided, all messages will be cleared.
- set_deployment_conversation_feedback(deployment_conversation_id, message_index, is_useful=None, is_not_useful=None, feedback=None, deployment_id=None, deployment_token=None)
Sets a deployment conversation message as useful or not useful
- Parameters:
deployment_conversation_id (str) – A unique string identifier associated with the deployment conversation.
message_index (int) – The index of the deployment conversation message
is_useful (bool) – If the message is useful. If true, the message is useful. If false, clear the useful flag.
is_not_useful (bool) – If the message is not useful. If true, the message is not useful. If set to false, clear the useful flag.
feedback (str) – Optional feedback on why the message is useful or not useful
deployment_id (str) – The deployment this conversation belongs to. This is required if not logged in.
deployment_token (str) – The deployment token to authenticate access to the deployment. This is required if not logged in.
- rename_deployment_conversation(deployment_conversation_id, name, deployment_id=None, deployment_token=None)
Rename a Deployment Conversation.
- Parameters:
deployment_conversation_id (str) – A unique string identifier associated with the deployment conversation.
name (str) – The new name of the conversation.
deployment_id (str) – The deployment this conversation belongs to. This is required if not logged in.
deployment_token (str) – The deployment token to authenticate access to the deployment. This is required if not logged in.
- create_app_user_group(name)
Creates a new App User Group. This User Group is used to have permissions to access the external chatbots.
- Parameters:
name (str) – The name of the App User Group.
- Returns:
The App User Group.
- Return type:
- delete_app_user_group(user_group_id)
Deletes an App User Group.
- Parameters:
user_group_id (str) – The ID of the App User Group.
- invite_user_to_app_user_group(email, user_group_id)
Invite a user to an App User Group. This method will send the specified email address an invitation link to join a specific user group.
This will allow them to use any chatbots that this user group has access to.
- add_users_to_app_user_group(user_group_id, user_emails)
Adds users to a App User Group.
- remove_users_from_app_user_group(user_group_id, user_emails)
Removes users from an App User Group.
- add_app_user_group_to_external_application(user_group_id, external_application_id)
Adds a permission for an App User Group to access an External Application.
- remove_app_user_group_from_external_application(user_group_id, external_application_id)
Removes a permission for an App User Group to access an External Application.
- create_external_application(deployment_id, name=None, logo=None, theme=None)
Creates a new External Application from an existing ChatLLM Deployment.
- Parameters:
- Returns:
The newly created External Application.
- Return type:
- update_external_application(external_application_id, name=None, theme=None)
Updates an External Application.
- Parameters:
- Returns:
The updated External Application.
- Return type:
- list_external_applications()
Lists External Applications in an organization.
- Returns:
List of External Applications.
- Return type:
- delete_external_application(external_application_id)
Deletes an External Application.
- Parameters:
external_application_id (str) – The ID of the External Application.
- create_agent(project_id, function_source_code, agent_function_name, name=None, memory=None, package_requirements=None, description=None, enable_binary_input=False)
Creates a new AI agent.
- Parameters:
project_id (str) – The unique ID associated with the project.
function_source_code (str) – The contents of a valid Python source code file. The source code should contain a function named agentFunctionName. A list of allowed import and system libraries for each language is specified in the user functions documentation section.
agent_function_name (str) – The name of the function found in the source code that will be executed when the agent is deployed.
name (str) – The name you want your agent to have, defaults to “<Project Name> Agent”.
memory (int) – The memory allocation (in GB) for the agent.
package_requirements (list) – A list of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].
description (str) – A description of the agent, including its purpose and instructions.
enable_binary_input (bool) – If True, the agent will be able to accept binary data as inputs.
- Returns:
The new agent
- Return type:
- update_agent(model_id, function_source_code=None, agent_function_name=None, memory=None, package_requirements=None, description=None, enable_binary_input=False)
Updates an existing AI Agent using user-provided Python code. A new version of the agent will be created and published.
- Parameters:
model_id (str) – The unique ID associated with the AI Agent to be changed.
function_source_code (str) – Contents of a valid Python source code file. The source code should contain the functions named agentFunctionName. A list of allowed import and system libraries for each language is specified in the user functions documentation section.
agent_function_name (str) – Name of the function found in the source code that will be executed to the agent when it is deployed.
memory (int) – Memory (in GB) for the agent.
package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’]
description (str) – A description of the agent, including its purpose and instructions.
enable_binary_input (bool) – If True, the agent will be able to accept binary data as inputs.
- Returns:
The updated agent
- Return type:
- evaluate_prompt(prompt, system_message=None, llm_name=None, max_tokens=None, temperature=0.0, messages=None)
Generate response to the prompt using the specified model.
- Parameters:
prompt (str) – Prompt to use for generation.
system_message (str) – System message for models that support it.
llm_name (str) – Name of the underlying LLM to be used for generation. Should be one of ‘OPENAI_GPT4’, ‘OPENAI_GPT3_5’, ‘CLAUDE_V2’, ‘ABACUS’, ‘ABACUS_LONG’, ‘PALM’, or ‘LLAMA2_CHAT’. Default is auto selection.
max_tokens (int) – Maximum number of tokens to generate. If set, the model will just stop generating after this token limit is reached.
temperature (float) – Temperature to use for generation. Higher temperature makes more non-deterministic responses, a value of zero makes mostly deterministic reponses. Default is 0.0. A range of 0.0 - 2.0 is allowed.
messages (list) – A list of messages to use as conversation history. A message is a dict with attributes: is_user (bool): Whether the message is from the user. text (str): The message’s text.
- Returns:
The response from the model, raw text and parsed components.
- Return type:
- render_feature_groups_for_llm(feature_group_ids, token_budget=None, include_definition=True)
Encode feature groups as language model inputs.
- Parameters:
- Returns:
LLM input object comprising of information about the feature groups with given IDs.
- Return type:
- get_llm_parameters(prompt, system_message=None, llm_name=None, max_tokens=None)
Generate parameteres to the prompt using the given inputs
- Parameters:
prompt (str) – Prompt to use for generation.
system_message (str) – System message for models that support it.
llm_name (str) – Name of the underlying LLM to be used for generation. Should be one of ‘gpt-4’ or ‘gpt-3.5-turbo’. Default is auto selection.
max_tokens (int) – Maximum number of tokens to generate. If set, the model will just stop generating after this token limit is reached.
- Returns:
The parameters for LLM using the given inputs.
- Return type:
- generate_code_for_data_query_using_llm(query, feature_group_ids, prompt_context=None, llm_name=None, temperature=None)
Execute a data query using a large language model in an async fashion.
- Parameters:
query (str) – The natural language query to execute. The query is converted to a SQL query using the language model.
feature_group_ids (list) – A list of feature group IDs that the query should be executed against.
prompt_context (str) – The context message used to construct the prompt for the language model. If not provide, a default context message is used.
llm_name (str) – The name of the language model to use. If not provided, the default language model is used.
temperature (float) – The temperature to use for the language model if supported. If not provided, the default temperature is used.
- Return type:
- create_document_retriever(project_id, name, feature_group_id, document_retriever_config=None)
Returns a document retriever that stores embeddings for document chunks in a feature group.
Document columns in the feature group are broken into chunks. For cases with multiple document columns, chunks from all columns are combined together to form a single chunk.
- Parameters:
project_id (str) – The ID of project that the vector store is created in.
name (str) – The name of the vector store.
feature_group_id (str) – The ID of the feature group that the vector store is associated with.
document_retriever_config (DocumentRetrieverConfig) – The configuration, including chunk_size and chunk_overlap_fraction, for document retrieval.
- Returns:
The newly created document retriever.
- Return type:
- update_document_retriever(document_retriever_id, name=None, feature_group_id=None, document_retriever_config=None)
Updates an existing document retriever.
- Parameters:
document_retriever_id (str) – The unique ID associated with the document retriever.
name (str) – The name group to update the document retriever with.
feature_group_id (str) – The ID of the feature group to update the document retriever with.
document_retriever_config (DocumentRetrieverConfig) – The configuration, including chunk_size and chunk_overlap_fraction, for document retrieval.
- Returns:
The updated document retriever.
- Return type:
- create_document_retriever_version(document_retriever_id)
Creates a document retriever version from the latest version of the feature group that the document retriever associated with.
- Parameters:
document_retriever_id (str) – The unique ID associated with the document retriever to create version with.
- Returns:
The newly created document retriever version.
- Return type:
- delete_document_retriever(vector_store_id)
Delete a Document Retriever.
- Parameters:
vector_store_id (str) – A unique string identifier associated with the document retriever.
- get_document_snippet(document_retriever_id, document_id, start_word_index=None, end_word_index=None)
Get a snippet from documents in the document retriever.
- Parameters:
document_retriever_id (str) – A unique string identifier associated with the document retriever.
document_id (str) – The ID of the document to retrieve the snippet from.
start_word_index (int) – If provided, will start the snippet at the index (of words in the document) specified.
end_word_index (int) – If provided, will end the snippet at the index of (of words in the document) specified.
- Returns:
The documentation snippet found from the document retriever.
- Return type:
- restart_document_retriever(document_retriever_id)
Restart the document retriever if it is stopped.
- Parameters:
document_retriever_id (str) – A unique string identifier associated with the document retriever.
- get_relevant_snippets(doc_ids=None, blobs=None, query=None, document_retriever_config=None, honor_sentence_boundary=True, num_retrieval_margin_words=None, max_words_per_snippet=None, max_snippets_per_document=None, start_word_index=None, end_word_index=None, including_bounding_boxes=False)
Get relevant snippets from documents with respect to the query. Document retrievers may be created on-the-fly to perform lookup.
- Parameters:
doc_ids (list) – A list of document store IDs to retrieve the snippets from.
blobs (io.TextIOBase) – A dictionary mapping document names to the blob data.
query (str) – The query that the documents are relevant to.
document_retriever_config (DocumentRetrieverConfig) – If provided, used to configure the retrieval steps like chunking for embeddings.
honor_sentence_boundary (bool) – If provided, will honor sentence boundary when returning the snippets.
num_retrieval_margin_words (int) – If provided, will add this number of words from left and right of the returned snippets.
max_words_per_snippet (int) – If provided, will limit the number of words in each snippet to the value specified.
max_snippets_per_document (int) – If provided, will limit the number of snippets retrieved from each document to the value specified.
start_word_index (int) – If provided, will start the snippet at the index (of words in the document) specified.
end_word_index (int) – If provided, will end the snippet at the index of (of words in the document) specified.
including_bounding_boxes (bool) – If true, will include the bounding boxes of the snippets if they are available.
- Returns:
The snippets found from the documents.
- Return type:
- exception abacusai.ApiException(message, http_status, exception=None, request_id=None)
Bases:
Exception
Default ApiException raised by APIs
- Parameters:
- __str__()
Return str(self).
- class abacusai.ClientOptions(exception_on_404=True, server=DEFAULT_SERVER)
Options for configuring the ApiClient
- class abacusai.ReadOnlyClient(api_key=None, server=None, client_options=None, skip_version_check=False)
Bases:
BaseApiClient
Abacus.AI Read Only API Client. Only contains GET methods
- Parameters:
api_key (str) – The api key to use as authentication to the server
server (str) – The base server url to use to send API requets to
client_options (ClientOptions) – Optional API client configurations
skip_version_check (bool) – If true, will skip checking the server’s current API version on initializing the client
- list_api_keys()
Lists all of the user’s API keys
- list_organization_users()
Retrieves a list of all users in the organization, including pending users who have been invited.
- describe_user()
Retrieve the current user’s information, such as their name, email address, and admin status.
- Returns:
An object containing information about the current user.
- Return type:
- list_organization_groups()
Lists all Organizations Groups
- Returns:
A list of all the organization groups within this organization.
- Return type:
- describe_organization_group(organization_group_id)
Returns the specific organization group passed in by the user.
- Parameters:
organization_group_id (str) – The unique identifier of the organization group to be described.
- Returns:
Information about a specific organization group.
- Return type:
- describe_webhook(webhook_id)
Describe the webhook with a given ID.
- list_deployment_webhooks(deployment_id)
List all the webhooks attached to a given deployment.
- list_use_cases()
Retrieves a list of all use cases with descriptions. Use the given mappings to specify a use case when needed.
- describe_problem_type(problem_type)
Describes a problem type
- Parameters:
problem_type (str) – The problem type to get details on
- Returns:
The problem type requirements
- Return type:
- describe_use_case_requirements(use_case)
This API call returns the feature requirements for a specified use case.
- Parameters:
use_case (str) – This contains the Enum String for the use case whose dataset requirements are needed.
- Returns:
The feature requirements of the use case are returned, including all the feature groups required for the use case along with their descriptions and feature mapping details.
- Return type:
- describe_project(project_id)
Returns a description of a project.
- list_projects(limit=100, start_after_id=None)
Retrieves a list of all projects in the current organization.
- get_project_feature_group_config(feature_group_id, project_id)
Gets a feature group’s project config
- Parameters:
- Returns:
The feature group’s project configuration.
- Return type:
- validate_project(project_id, feature_group_ids=None)
Validates that the specified project has all required feature group types for its use case and that all required feature columns are set.
- Parameters:
- Returns:
The project validation. If the specified project is missing required columns or feature groups, the response includes an array of objects for each missing required feature group and the missing required features in each feature group.
- Return type:
- infer_feature_mappings(project_id, feature_group_id)
Infer the feature mappings for the feature group in the project based on the problem type.
- Parameters:
- Returns:
A dict that contains the inferred feature mappings.
- Return type:
- verify_and_describe_annotation(feature_group_id, feature_name=None, doc_id=None, feature_group_row_identifier=None)
Get the latest annotation entry for a given feature group, feature, and document along with verification information.
- Parameters:
feature_group_id (str) – The ID of the feature group the annotation is on.
feature_name (str) – The name of the feature the annotation is on.
doc_id (str) – The ID of the primary document the annotation is on. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.
feature_group_row_identifier (str) – The key value of the feature group row the annotation is on (cast to string). Usually the feature group’s primary / identifier key value. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.
- Returns:
The latest annotation entry for the given feature group, feature, document, and/or annotation key value. Includes the verification information.
- Return type:
- get_annotations_status(feature_group_id, feature_name=None, check_for_materialization=False)
Get the status of the annotations for a given feature group and feature.
- Parameters:
- Returns:
The status of the annotations for the given feature group and feature.
- Return type:
- get_feature_group_schema(feature_group_id, project_id=None)
Returns a schema for a given FeatureGroup in a project.
- get_point_in_time_feature_group_creation_options()
Returns the options that can be used to generate PIT features.
- Returns:
List of possible generated aggregation function options.
- Return type:
- describe_feature_group(feature_group_id)
Describe a Feature Group.
- Parameters:
feature_group_id (str) – A unique string identifier associated with the feature group.
- Returns:
The feature group object.
- Return type:
- describe_feature_group_by_table_name(table_name)
Describe a Feature Group by its table name.
- Parameters:
table_name (str) – The unique table name of the Feature Group to look up.
- Returns:
The Feature Group.
- Return type:
- list_feature_groups(limit=100, start_after_id=None, feature_group_template_id=None, is_including_detached_from_template=False)
List all the feature groups
- Parameters:
limit (int) – The number of feature groups to retrieve.
start_after_id (str) – An offset parameter to exclude all feature groups up to a specified ID.
feature_group_template_id (str) – If specified, limit the results to feature groups attached to this template ID.
is_including_detached_from_template (bool) – When feature_group_template_id is specified, include feature groups that have been detached from that template ID.
- Returns:
All the feature groups in the organization associated with the specified project.
- Return type:
- describe_project_feature_group(project_id, feature_group_id)
Describe a feature group associated with a project
- Parameters:
- Returns:
The project feature group object.
- Return type:
- list_project_feature_groups(project_id, filter_feature_group_use=None)
List all the feature groups associated with a project
- Parameters:
project_id (str) – The unique ID associated with the project.
filter_feature_group_use (str) – The feature group use filter, when given as an argument only allows feature groups present in this project to be returned if they are of the given use. Possible values are: ‘USER_CREATED’, ‘BATCH_PREDICTION_OUTPUT’.
- Returns:
All the Feature Groups in a project.
- Return type:
- list_python_function_feature_groups(name, limit=100)
List all the feature groups associated with a python function.
- Parameters:
- Returns:
All the feature groups associated with the specified Python function ID.
- Return type:
- execute_async_feature_group_operation(query=None)
Starts the execution of fg operation
- Parameters:
query (str) – The SQL to be executed.
- Returns:
A dict that contains the execution status
- Return type:
- get_execute_feature_group_operation_result_part_count(feature_group_operation_run_id)
Gets the number of parts in the result of the execution of fg operation
- download_execute_feature_group_operation_result_part_chunk(feature_group_operation_run_id, part, offset=0, chunk_size=10485760)
Downloads a chunk of the result of the execution of fg operation
- Parameters:
- Return type:
- get_feature_group_version_export_download_url(feature_group_export_id)
Get a link to download the feature group version.
- Parameters:
feature_group_export_id (str) – Unique identifier of the Feature Group Export to get a signed URL for.
- Returns:
Instance containing the download URL and expiration time for the Feature Group Export.
- Return type:
- describe_feature_group_export(feature_group_export_id)
A feature group export
- Parameters:
feature_group_export_id (str) – Unique identifier of the feature group export.
- Returns:
The feature group export object.
- Return type:
- list_feature_group_exports(feature_group_id)
Lists all of the feature group exports for the feature group
- Parameters:
feature_group_id (str) – Unique identifier of the feature group
- Returns:
List of feature group exports
- Return type:
- get_feature_group_export_connector_errors(feature_group_export_id)
Returns a stream containing the write errors of the feature group export database connection, if any writes failed to the database connector.
- Parameters:
feature_group_export_id (str) – Unique identifier of the feature group export to get the errors for.
- Return type:
- list_feature_group_modifiers(feature_group_id)
List the users who can modify a given feature group.
- Parameters:
feature_group_id (str) – Unique string identifier of the feature group.
- Returns:
Information about the modification lock status and groups/organizations added to the feature group.
- Return type:
- get_materialization_logs(feature_group_version, stdout=False, stderr=False)
Returns logs for a materialized feature group version.
- Parameters:
- Returns:
A function logs object.
- Return type:
- list_feature_group_versions(feature_group_id, limit=100, start_after_version=None)
Retrieves a list of all feature group versions for the specified feature group.
- Parameters:
- Returns:
A list of feature group versions.
- Return type:
- describe_feature_group_version(feature_group_version)
Describe a feature group version.
- Parameters:
feature_group_version (str) – The unique identifier associated with the feature group version.
- Returns:
The feature group version.
- Return type:
- get_feature_group_version_metrics(feature_group_version, selected_columns=None, include_charts=False, include_statistics=True)
Get metrics for a specific feature group version.
- Parameters:
feature_group_version (str) – A unique string identifier associated with the feature group version.
selected_columns (list) – A list of columns to order first.
include_charts (bool) – A flag indicating whether charts should be included in the response. Default is false.
include_statistics (bool) – A flag indicating whether statistics should be included in the response. Default is true.
- Returns:
The metrics for the specified feature group version.
- Return type:
- describe_feature_group_template(feature_group_template_id)
Describe a Feature Group Template.
- Parameters:
feature_group_template_id (str) – The unique identifier of a feature group template.
- Returns:
The feature group template object.
- Return type:
- list_feature_group_templates(limit=100, start_after_id=None, feature_group_id=None, should_include_system_templates=False)
List feature group templates, optionally scoped by the feature group that created the templates.
- Parameters:
limit (int) – Maximum number of templates to be retrieved.
start_after_id (str) – Offset parameter to exclude all templates up to the specified feature group template ID.
feature_group_id (str) – If specified, limit to templates created from this feature group.
should_include_system_templates (bool) – If True, will include built-in templates.
- Returns:
All the feature groups in the organization, optionally limited by the feature group that created the template(s).
- Return type:
- list_project_feature_group_templates(project_id, limit=100, start_after_id=None, should_include_all_system_templates=False)
List feature group templates for feature groups associated with the project.
- Parameters:
project_id (str) – Unique string identifier to limit to templates associated with this project, e.g. templates associated with feature groups in this project.
limit (int) – Maximum number of templates to be retrieved.
start_after_id (str) – Offset parameter to exclude all templates till the specified feature group template ID.
should_include_all_system_templates (bool) – If True, will include built-in templates.
- Returns:
All the feature groups in the organization, optionally limited by the feature group that created the template(s).
- Return type:
- suggest_feature_group_template_for_feature_group(feature_group_id)
Suggest values for a feature gruop template, based on a feature group.
- Parameters:
feature_group_id (str) – Unique identifier associated with the feature group to use for suggesting values to use in the template.
- Returns:
The suggested feature group template.
- Return type:
- get_dataset_schema(dataset_id)
Retrieves the column schema of a dataset.
- Parameters:
dataset_id (str) – Unique string identifier of the dataset schema to look up.
- Returns:
List of column schema definitions.
- Return type:
- set_dataset_database_connector_config(dataset_id, database_connector_id, object_name=None, columns=None, query_arguments=None, sql_query=None)
Sets database connector config for a dataset. This method is currently only supported for streaming datasets.
- Parameters:
dataset_id (str) – Unique String Identifier of the dataset_id.
database_connector_id (str) – Unique String Identifier of the Database Connector to import the dataset from.
object_name (str) – If applicable, the name/ID of the object in the service to query.
columns (str) – The columns to query from the external service object.
query_arguments (str) – Additional query arguments to filter the data.
sql_query (str) – The full SQL query to use when fetching data. If present, this parameter will override object_name, columns and query_arguments.
- get_dataset_version_metrics(dataset_version, selected_columns=None, include_charts=False, include_statistics=True)
Get metrics for a specific dataset version.
- Parameters:
dataset_version (str) – A unique string identifier associated with the dataset version.
selected_columns (list) – A list of columns to order first.
include_charts (bool) – A flag indicating whether charts should be included in the response. Default is false.
include_statistics (bool) – A flag indicating whether statistics should be included in the response. Default is true.
- Returns:
The metrics for the specified Dataset version.
- Return type:
- get_file_connector_instructions(bucket, write_permission=False)
Retrieves verification information to create a data connector to a cloud storage bucket.
- Parameters:
- Returns:
An object with a full description of the cloud storage bucket authentication options and bucket policy. Returns an error message if the parameters are invalid.
- Return type:
- list_database_connectors()
Retrieves a list of all database connectors along with their associated attributes.
- Returns:
An object containing the database connector and its attributes.
- Return type:
- list_file_connectors()
Retrieves a list of all connected services in the organization and their current verification status.
- Returns:
A list of cloud storage buckets connected to the organization.
- Return type:
- list_database_connector_objects(database_connector_id)
Lists querable objects in the database connector.
- get_database_connector_object_schema(database_connector_id, object_name=None)
Get the schema of an object in an database connector.
- list_application_connectors()
Retrieves a list of all application connectors along with their associated attributes.
- Returns:
A list of application connectors.
- Return type:
- list_application_connector_objects(application_connector_id)
Lists querable objects in the application connector.
- list_streaming_connectors()
Retrieves a list of all streaming connectors along with their corresponding attributes.
- Returns:
A list of StreamingConnector objects.
- Return type:
- list_streaming_tokens()
Retrieves a list of all streaming tokens.
- Returns:
A list of streaming tokens and their associated attributes.
- Return type:
- get_recent_feature_group_streamed_data(feature_group_id)
Returns recently streamed data to a streaming feature group.
- Parameters:
feature_group_id (str) – Unique string identifier associated with the feature group.
- list_uploads()
Lists all pending uploads
- describe_upload(upload_id)
Retrieves the current upload status (complete or inspecting) and the list of file parts uploaded for a specified dataset upload.
- list_datasets(limit=100, start_after_id=None, exclude_streaming=False)
Retrieves a list of all datasets in the organization.
- describe_dataset(dataset_id)
Retrieves a full description of the specified dataset, with attributes such as its ID, name, source type, etc.
- describe_dataset_version(dataset_version)
Retrieves a full description of the specified dataset version, including its ID, name, source type, and other attributes.
- Parameters:
dataset_version (str) – Unique string identifier associated with the dataset version.
- Returns:
The dataset version.
- Return type:
- list_dataset_versions(dataset_id, limit=100, start_after_version=None)
Retrieves a list of all dataset versions for the specified dataset.
- Parameters:
- Returns:
A list of dataset versions.
- Return type:
- get_dataset_version_logs(dataset_version)
Retrieves the dataset import logs.
- Parameters:
dataset_version (str) – The unique version ID of the dataset version.
- Returns:
The logs for the specified dataset version.
- Return type:
- get_docstore_document(doc_id)
Return a document store document by id.
- Parameters:
doc_id (str) – Unique Docstore string identifier for the document.
- Return type:
- get_docstore_image(doc_id, max_width=None, max_height=None)
Return a document store image by id.
- Parameters:
doc_id (str) – A unique Docstore string identifier for the image.
max_width (int) – Rescales the returned image so the width is less than or equal to the given maximum width, while preserving the aspect ratio.
max_height (int) – Rescales the returned image so the height is less than or equal to the given maximum height, while preserving the aspect ratio.
- Return type:
- get_docstore_page_data(doc_id, page)
Returns the extracted page data for a document page.
- describe_train_test_data_split_feature_group(model_id)
Get the train and test data split for a trained model by its unique identifier. This is only supported for models with custom algorithms.
- Parameters:
model_id (str) – The unique ID of the model. By default, the latest model version will be returned if no version is specified.
- Returns:
The feature group containing the training data and fold information.
- Return type:
- describe_train_test_data_split_feature_group_version(model_version)
Get the train and test data split for a trained model by model version. This is only supported for models with custom algorithms.
- Parameters:
model_version (str) – The unique version ID of the model version.
- Returns:
The feature group version containing the training data and folds information.
- Return type:
- list_models(project_id)
Retrieves the list of models in the specified project.
- describe_model(model_id)
Retrieves a full description of the specified model.
- get_model_metrics(model_id, model_version=None, return_graphs=False, validation=False)
Retrieves metrics for all the algorithms trained in this model version.
If only the model’s unique identifier (model_id) is specified, the latest trained version of the model (model_version) is used.
- Parameters:
model_id (str) – Unique string identifier for the model.
model_version (str) – Version of the model.
return_graphs (bool) – If true, will return the information used for the graphs on the model metrics page such as PR Curve per label.
validation (bool) – If true, will return the validation metrics instead of the test metrics.
- Returns:
An object containing the model metrics and explanations for what each metric means.
- Return type:
- query_test_point_predictions(model_version, algorithm, to_row, from_row=0, sql_where_clause='')
Query the test points predictions data for a specific algorithm.
- Parameters:
- Returns:
TestPointPrediction
- Return type:
- list_model_versions(model_id, limit=100, start_after_version=None)
Retrieves a list of versions for a given model.
- Parameters:
- Returns:
An array of model versions.
- Return type:
- describe_model_version(model_version)
Retrieves a full description of the specified model version.
- Parameters:
model_version (str) – Unique string identifier of the model version.
- Returns:
A model version.
- Return type:
- get_feature_importance_by_model_version(model_version)
Gets the feature importance calculated by various methods for the model.
- Parameters:
model_version (str) – Unique string identifier for the model version.
- Returns:
Feature importances for the model.
- Return type:
- get_training_data_logs(model_version)
Retrieves the data preparation logs during model training.
- Parameters:
model_version (str) – The unique version ID of the model version.
- Returns:
A list of logs.
- Return type:
- get_training_logs(model_version, stdout=False, stderr=False)
Returns training logs for the model.
- Parameters:
- Returns:
A function logs object.
- Return type:
- describe_model_artifacts_export(model_artifacts_export_id)
Get the description and status of the model artifacts export.
- Parameters:
model_artifacts_export_id (str) – A unique string identifier for the export.
- Returns:
Object describing the export and its status.
- Return type:
- list_model_artifacts_exports(model_id, limit=25)
List all the model artifacts exports.
- Parameters:
- Returns:
List of model artifacts exports.
- Return type:
- list_model_monitors(project_id)
Retrieves the list of model monitors in the specified project.
- Parameters:
project_id (str) – Unique string identifier associated with the project.
- Returns:
A list of model monitors.
- Return type:
- describe_model_monitor(model_monitor_id)
Retrieves a full description of the specified model monitor.
- Parameters:
model_monitor_id (str) – Unique string identifier associated with the model monitor.
- Returns:
Description of the model monitor.
- Return type:
- get_prediction_drift(model_monitor_version)
Gets the label and prediction drifts for a model monitor.
- Parameters:
model_monitor_version (str) – Unique string identifier for a model monitor version created under the project.
- Returns:
Object describing training and prediction output label and prediction distributions.
- Return type:
- get_model_monitor_summary(model_monitor_id)
Gets the summary of a model monitor across versions.
- Parameters:
model_monitor_id (str) – A unique string identifier associated with the model monitor.
- Returns:
An object describing integrity, bias violations, model accuracy and drift for the model monitor.
- Return type:
- list_model_monitor_versions(model_monitor_id, limit=100, start_after_version=None)
Retrieves a list of versions for a given model monitor.
- Parameters:
- Returns:
A list of model monitor versions.
- Return type:
- describe_model_monitor_version(model_monitor_version)
Retrieves a full description of the specified model monitor version.
- Parameters:
model_monitor_version (str) – The unique version ID of the model monitor version.
- Returns:
A model monitor version.
- Return type:
- model_monitor_version_metric_data(model_monitor_version, metric_type, actual_values_to_detail=None)
Provides the data needed for decile metrics associated with the model monitor.
- Parameters:
- Returns:
Data associated with the metric.
- Return type:
- list_organization_model_monitors(only_starred=False)
Gets a list of Model Monitors for an organization.
- Parameters:
only_starred (bool) – Whether to return only starred Model Monitors. Defaults to False.
- Returns:
A list of Model Monitors.
- Return type:
- get_model_monitor_chart_from_organization(chart_type, limit=15)
Gets a list of model monitor summaries across monitors for an organization.
- Parameters:
- Returns:
List of ModelMonitorSummaryForOrganization objects describing accuracy, bias, drift, or integrity for all model monitors in an organization.
- Return type:
- get_model_monitor_summary_from_organization()
Gets a consolidated summary of model monitors for an organization.
- Returns:
A list of ModelMonitorSummaryForOrganization objects describing accuracy, bias, drift, and integrity for all model monitors in an organization.
- Return type:
- list_eda(project_id)
Retrieves the list of Exploratory Data Analysis (EDA) in the specified project.
- describe_eda(eda_id)
Retrieves a full description of the specified EDA object.
- list_eda_versions(eda_id, limit=100, start_after_version=None)
Retrieves a list of versions for a given EDA object.
- Parameters:
- Returns:
A list of EDA versions.
- Return type:
- describe_eda_version(eda_version)
Retrieves a full description of the specified EDA version.
- Parameters:
eda_version (str) – Unique string identifier of the EDA version.
- Returns:
An EDA version.
- Return type:
- get_eda_collinearity(eda_version)
Gets the Collinearity between all features for the Exploratory Data Analysis.
- Parameters:
eda_version (str) – Unique string identifier associated with the EDA instance.
- Returns:
An object with a record of correlations between each feature for the EDA.
- Return type:
- get_eda_data_consistency(eda_version, transformation_feature=None)
Gets the data consistency for the Exploratory Data Analysis.
- Parameters:
- Returns:
Object with duplication, deletion, and transformation data for data consistency analysis for an EDA.
- Return type:
- get_collinearity_for_feature(eda_version, feature_name=None)
Gets the Collinearity for the given feature from the Exploratory Data Analysis.
- Parameters:
- Returns:
Object with a record of correlations for the provided feature for an EDA.
- Return type:
- get_feature_association(eda_version, reference_feature_name, test_feature_name)
Gets the Feature Association for the given features from the feature group version within the eda_version.
- Parameters:
eda_version (str) – Unique string identifier associated with the EDA instance.
reference_feature_name (str) – Name of the feature for feature association (on x-axis for the plots generated for the Feature association in the product).
test_feature_name (str) – Name of the feature for feature association (on y-axis for the plots generated for the Feature association in the product).
- Returns:
An object with a record of data for the feature association between the two given features for an EDA version.
- Return type:
- get_eda_forecasting_analysis(eda_version)
Gets the Forecasting analysis for the Exploratory Data Analysis.
- Parameters:
eda_version (str) – Unique string identifier associated with the EDA version.
- Returns:
Object with forecasting analysis that includes sales_across_time, cummulative_contribution, missing_value_distribution, history_length, num_rows_histogram, product_maturity data.
- Return type:
- list_holdout_analysis(project_id, model_id=None)
List holdout analyses for a project. Optionally, filter by model.
- Parameters:
- Returns:
The holdout analyses
- Return type:
- describe_holdout_analysis(holdout_analysis_id)
Get a holdout analysis.
- Parameters:
holdout_analysis_id (str) – ID of the holdout analysis to get
- Returns:
The holdout analysis
- Return type:
- list_holdout_analysis_versions(holdout_analysis_id)
List holdout analysis versions for a holdout analysis.
- Parameters:
holdout_analysis_id (str) – ID of the holdout analysis to list holdout analysis versions for
- Returns:
The holdout analysis versions
- Return type:
- describe_holdout_analysis_version(holdout_analysis_version, get_metrics=False)
Get a holdout analysis version.
- Parameters:
- Returns:
The holdout analysis version
- Return type:
- describe_monitor_alert(monitor_alert_id)
Describes a given monitor alert id
- Parameters:
monitor_alert_id (str) – Unique identifier of the monitor alert.
- Returns:
Object containing information about the monitor alert.
- Return type:
- describe_monitor_alert_version(monitor_alert_version)
Describes a given monitor alert version id
- Parameters:
monitor_alert_version (str) – Unique string identifier for the monitor alert.
- Returns:
An object describing the monitor alert version.
- Return type:
- list_monitor_alerts_for_monitor(model_monitor_id)
Retrieves the list of monitor alerts for a specified monitor.
- Parameters:
model_monitor_id (str) – The unique ID associated with the model monitor.
- Returns:
A list of monitor alerts.
- Return type:
- list_monitor_alert_versions_for_monitor_version(model_monitor_version)
Retrieves the list of monitor alert versions for a specified monitor instance.
- Parameters:
model_monitor_version (str) – The unique ID associated with the model monitor.
- Returns:
A list of monitor alert versions.
- Return type:
- get_model_monitoring_logs(model_monitor_version, stdout=False, stderr=False)
Returns monitoring logs for the model.
- Parameters:
- Returns:
A function logs.
- Return type:
- get_drift_for_feature(model_monitor_version, feature_name, nested_feature_name=None)
Gets the feature drift associated with a single feature in an output feature group from a prediction.
- Parameters:
- Returns:
An object describing the training and prediction output feature distributions.
- Return type:
- get_outliers_for_feature(model_monitor_version, feature_name=None, nested_feature_name=None)
Gets a list of outliers measured by a single feature (or overall) in an output feature group from a prediction.
- Parameters:
- Return type:
Dict
- describe_prediction_operator(prediction_operator_id)
Describe an existing prediction operator.
- Parameters:
prediction_operator_id (str) – The unique ID of the prediction operator. Returns
- Return type:
- list_prediction_operators(project_id)
List all the prediction operators inside a project.
- Parameters:
project_id (str) – The unique ID of the project. Returns
- Return type:
- list_prediction_operator_versions(prediction_operator_id)
List all the prediction operator versions for a prediction operator.
- Parameters:
prediction_operator_id (str) – The unique ID of the prediction operator.
- Returns:
A list of prediction operator version objects.
- Return type:
- describe_deployment(deployment_id)
Retrieves a full description of the specified deployment.
- Parameters:
deployment_id (str) – Unique string identifier associated with the deployment.
- Returns:
Description of the deployment.
- Return type:
- list_deployments(project_id)
Retrieves a list of all deployments in the specified project.
- Parameters:
project_id (str) – The unique identifier associated with the project.
- Returns:
An array of deployments.
- Return type:
- list_deployment_tokens(project_id)
Retrieves a list of all deployment tokens associated with the specified project.
- Parameters:
project_id (str) – The unique ID associated with the project.
- Returns:
A list of deployment tokens.
- Return type:
- get_api_endpoint(deployment_token=None, deployment_id=None, streaming_token=None, feature_group_id=None, model_id=None)
Returns the API endpoint specific to an organization. This function can be utilized using either an API Key or a deployment ID and token for authentication.
- Parameters:
deployment_token (str) – Token used for authenticating access to deployed models.
deployment_id (str) – Unique identifier assigned to a deployment created under the specified project.
streaming_token (str) – Token used for authenticating access to streaming data.
feature_group_id (str) – Unique identifier assigned to a feature group.
model_id (str) – Unique identifier assigned to a model.
- Returns:
The API endpoint specific to the organization.
- Return type:
- get_model_training_types_for_deployment(model_id, model_version=None, algorithm=None)
Returns types of models that can be deployed for a given model instance ID.
- Parameters:
- Returns:
Model training types for deployment.
- Return type:
- describe_refresh_policy(refresh_policy_id)
Retrieve a single refresh policy
- Parameters:
refresh_policy_id (str) – The unique ID associated with this refresh policy.
- Returns:
An object representing the refresh policy.
- Return type:
- describe_refresh_pipeline_run(refresh_pipeline_run_id)
Retrieve a single refresh pipeline run
- Parameters:
refresh_pipeline_run_id (str) – Unique string identifier associated with the refresh pipeline run.
- Returns:
A refresh pipeline run object.
- Return type:
- list_refresh_policies(project_id=None, dataset_ids=[], feature_group_id=None, model_ids=[], deployment_ids=[], batch_prediction_ids=[], model_monitor_ids=[], prediction_metric_ids=[], notebook_ids=[])
List the refresh policies for the organization
- Parameters:
project_id (str) – Optionally, a Project ID can be specified so that all datasets, models, deployments, batch predictions, prediction metrics, model monitors, and notebooks are captured at the instant this policy was created.
dataset_ids (list) – Comma-separated list of Dataset IDs.
feature_group_id (str) – Feature Group ID for which we wish to see the refresh policies attached.
model_ids (list) – Comma-separated list of Model IDs.
deployment_ids (list) – Comma-separated list of Deployment IDs.
batch_prediction_ids (list) – Comma-separated list of Batch Prediction IDs.
model_monitor_ids (list) – Comma-separated list of Model Monitor IDs.
prediction_metric_ids (list) – Comma-separated list of Prediction Metric IDs.
notebook_ids (list) – Comma-separated list of Notebook IDs.
- Returns:
List of all refresh policies in the organization.
- Return type:
- list_refresh_pipeline_runs(refresh_policy_id)
List the the times that the refresh policy has been run
- Parameters:
refresh_policy_id (str) – Unique identifier associated with the refresh policy.
- Returns:
List of refresh pipeline runs for the given refresh policy ID.
- Return type:
- download_batch_prediction_result_chunk(batch_prediction_version, offset=0, chunk_size=10485760)
Returns a stream containing the batch prediction results.
- Parameters:
- Return type:
- get_batch_prediction_connector_errors(batch_prediction_version)
Returns a stream containing the batch prediction database connection write errors, if any writes failed for the specified batch prediction job.
- Parameters:
batch_prediction_version (str) – Unique string identifier of the batch prediction job to get the errors for.
- Return type:
- list_batch_predictions(project_id)
Retrieves a list of batch predictions in the project.
- Parameters:
project_id (str) – Unique string identifier of the project.
- Returns:
List of batch prediction jobs.
- Return type:
- describe_batch_prediction(batch_prediction_id)
Describe the batch prediction.
- Parameters:
batch_prediction_id (str) – The unique identifier associated with the batch prediction.
- Returns:
The batch prediction description.
- Return type:
- list_batch_prediction_versions(batch_prediction_id, limit=100, start_after_version=None)
Retrieves a list of versions of a given batch prediction
- Parameters:
- Returns:
List of batch prediction versions.
- Return type:
- describe_batch_prediction_version(batch_prediction_version)
Describes a Batch Prediction Version.
- Parameters:
batch_prediction_version (str) – Unique string identifier of the Batch Prediction Version.
- Returns:
The Batch Prediction Version.
- Return type:
- get_data(feature_group_id, primary_key=None, num_rows=None)
Gets the feature group rows.
If primary key is set, row corresponding to primary_key is returned. If num_rows is set, we return maximum of num_rows latest updated rows.
- Parameters:
- Return type:
- list_pending_feature_group_documents(feature_group_id)
Lists all pending documents added to feature group.
- Parameters:
feature_group_id (str) – The unique ID associated with the feature group.
- Return type:
- describe_python_function(name)
Describe a Python Function.
- Parameters:
name (str) – The name to identify the Python function.
- Returns:
The Python function object.
- Return type:
- list_python_functions(function_type='FEATURE_GROUP')
List all python functions within the organization.
- Parameters:
function_type (str) – Optional argument to specify the type of function to list Python functions for; defaults to FEATURE_GROUP.
- Returns:
A list of PythonFunction objects.
- Return type:
- list_pipelines(project_id=None)
Lists the pipelines for an organization or a project
- describe_pipeline_version(pipeline_version)
Describes a specified pipeline version
- Parameters:
pipeline_version (str) – Unique string identifier for the pipeline version
- Returns:
Object describing the pipeline version
- Return type:
- describe_pipeline_step(pipeline_step_id)
Deletes a step from a pipeline.
- Parameters:
pipeline_step_id (str) – The ID of the pipeline step.
- Returns:
An object describing the pipeline step.
- Return type:
- describe_pipeline_step_by_name(pipeline_id, step_name)
Describes a pipeline step by the step name.
- Parameters:
- Returns:
An object describing the pipeline step.
- Return type:
- describe_pipeline_step_version(pipeline_step_version)
Describes a pipeline step version.
- Parameters:
pipeline_step_version (str) – The ID of the pipeline step version.
- Return type:
- list_pipeline_version_logs(pipeline_version)
Gets the logs for the steps in a given pipeline version.
- Parameters:
pipeline_version (str) – The id of the pipeline version.
- Returns:
Object describing the logs for the steps in the pipeline.
- Return type:
- get_step_version_logs(pipeline_step_version)
Gets the logs for a given step version.
- Parameters:
pipeline_step_version (str) – The id of the pipeline step version.
- Returns:
Object describing the pipeline step logs.
- Return type:
- describe_graph_dashboard(graph_dashboard_id)
Describes a given graph dashboard.
- Parameters:
graph_dashboard_id (str) – Unique identifier for the graph dashboard.
- Returns:
An object containing information about the graph dashboard.
- Return type:
- list_graph_dashboards(project_id=None)
Lists the graph dashboards for a project
- Parameters:
project_id (str) – Unique string identifier for the project to list graph dashboards from.
- Returns:
A list of graph dashboards.
- Return type:
- delete_graph_from_dashboard(graph_reference_id)
Deletes a python plot function from a dashboard
- Parameters:
graph_reference_id (str) – Unique String Identifier for the graph
- describe_graph_for_dashboard(graph_reference_id)
Describes a python plot to a graph dashboard
- Parameters:
graph_reference_id (str) – Unique string identifier for the python function id for the graph
- Returns:
An object describing the graph dashboard.
- Return type:
- describe_algorithm(algorithm)
Retrieves a full description of the specified algorithm.
- list_algorithms(problem_type=None, project_id=None)
List all custom algorithms, with optional filtering on Problem Type and Project ID
- describe_custom_loss_function(name)
Retrieve a full description of a previously registered custom loss function.
- Parameters:
name (str) – Registered name of the custom loss function.
- Returns:
The description of the custom loss function with the given name.
- Return type:
- list_custom_loss_functions(name_prefix=None, loss_function_type=None)
Retrieves a list of registered custom loss functions and their descriptions.
- Parameters:
- Returns:
The description of the custom loss function with the given name.
- Return type:
- describe_custom_metric(name)
Retrieves a full description of a previously registered custom metric function.
- Parameters:
name (str) – Registered name of the custom metric.
- Returns:
The description of the custom metric with the given name.
- Return type:
- describe_custom_metric_version(custom_metric_version)
Describes a given custom metric version
- Parameters:
custom_metric_version (str) – A unique string identifier for the custom metric version.
- Returns:
An object describing the custom metric version.
- Return type:
- list_custom_metrics(name_prefix=None, problem_type=None)
Retrieves a list of registered custom metrics.
- Parameters:
- Returns:
A list of custom metrics.
- Return type:
- describe_module(name)
Retrieves a full description of the specified module.
- get_organization_secret(secret_key)
Gets a secret.
- Parameters:
secret_key (str) – The secret key.
- Returns:
The secret.
- Return type:
- list_organization_secrets()
Lists all secrets for an organization.
- Returns:
list of secrets belonging to the organization.
- Return type:
- query_feature_group_code_generator(query, language, project_id=None)
Send a query to the feature group code generator tool to generate code for the query.
- Parameters:
- Returns:
The response from the model, raw text and parsed components.
- Return type:
- get_natural_language_explanation(feature_group_id=None, feature_group_version=None, model_id=None)
Returns the saved natural language explanation of an artifact with given ID. The artifact can be - Feature Group or Feature Group Version or Model
- Parameters:
- Returns:
The object containing natural language explanation(s) as field(s).
- Return type:
- generate_natural_language_explanation(feature_group_id=None, feature_group_version=None, model_id=None)
Generates natural language explanation of an artifact with given ID. The artifact can be - Feature Group or Feature Group Version or Model
- Parameters:
- Returns:
The object containing natural language explanation(s) as field(s).
- Return type:
- get_chat_session(chat_session_id)
Gets a chat session from Abacus AI Chat.
- Parameters:
chat_session_id (str) – Unique ID of the chat session.
- Returns:
The chat session with Abacus AI Chat
- Return type:
- list_chat_sessions(most_recent_per_project=False)
Lists all chat sessions for the current user
- Parameters:
most_recent_per_project (bool) – An optional parameter whether to only return the most recent chat session per project. Default False.
- Returns:
The chat sessions with Abacus AI Chat
- Return type:
- get_deployment_conversation(deployment_conversation_id=None, external_session_id=None, deployment_id=None, deployment_token=None)
Gets a deployment conversation.
- Parameters:
deployment_conversation_id (str) – Unique ID of the conversation. One of deployment_conversation_id or external_session_id must be provided.
external_session_id (str) – External session ID of the conversation.
deployment_id (str) – The deployment this conversation belongs to. This is required if not logged in.
deployment_token (str) – The deployment token to authenticate access to the deployment. This is required if not logged in.
- Returns:
The deployment conversation.
- Return type:
- list_deployment_conversations(deployment_id)
Lists all conversations for the given deployment and current user.
- Parameters:
deployment_id (str) – The deployment to get conversations for.
- Returns:
The deployment conversations.
- Return type:
- get_app_user_group(user_group_id)
Gets an App User Group.
- Parameters:
user_group_id (str) – The ID of the App User Group.
- Returns:
The App User Group.
- Return type:
- describe_external_application(external_application_id)
Describes an External Application.
- Parameters:
external_application_id (str) – The ID of the External Application.
- Returns:
The External Application.
- Return type:
- describe_agent(agent_id)
Retrieves a full description of the specified model.
- describe_agent_version(agent_version)
Retrieves a full description of the specified agent version.
- Parameters:
agent_version (str) – Unique string identifier of the agent version.
- Returns:
A agent version.
- Return type:
- search_feature_groups(text, num_results=10, project_id=None, feature_group_ids=None)
Search feature groups based on text and filters.
- Parameters:
text (str) – Text to use for approximately matching feature groups.
num_results (int) – The maximum number of search results to retrieve. The length of the returned list is less than or equal to num_results.
project_id (str) – The ID of the project in which to restrict the search, if specified.
feature_group_ids (list) – A list of feagure group IDs to restrict the search to.
- Returns:
A list of search results, each containing the retrieved object and its relevance score
- Return type:
- list_document_retrievers(project_id, limit=100, start_after_id=None)
List all the document retrievers.
- Parameters:
- Returns:
All the document retrievers in the organization associated with the specified project.
- Return type:
- describe_document_retriever(document_retriever_id)
Describe a Document Retriever.
- Parameters:
document_retriever_id (str) – A unique string identifier associated with the document retriever.
- Returns:
The document retriever object.
- Return type:
- describe_document_retriever_by_name(name)
Describe a document retriever by its name.
- Parameters:
name (str) – The unique name of the document retriever to look up.
- Returns:
The Document Retriever.
- Return type:
- list_document_retriever_versions(document_retriever_id, limit=100, start_after_version=None)
List all the document retriever versions with a given ID.
- Parameters:
- Returns:
All the document retriever versions associated with the document retriever.
- Return type:
- abacusai._request_context
- class abacusai.PredictionClient(client_options=None)
Bases:
abacusai.client.BaseApiClient
Abacus.AI Prediction API Client. Does not utilize authentication and only contains public prediction methods
- Parameters:
client_options (ClientOptions) – Optional API client configurations
- predict_raw(deployment_token, deployment_id, **kwargs)
Raw interface for returning predictions from Plug and Play deployments.
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
**kwargs (dict) – Arbitrary key/value pairs may be passed in and is sent as part of the request body.
- lookup_features(deployment_token, deployment_id, query_data, limit_results=None, result_columns=None)
Returns the feature group deployed in the feature store project.
- Parameters:
deployment_token (str) – A deployment token used to authenticate access to created deployments. This token only authorizes predictions on deployments in this project, so it can be safely embedded inside an application or website.
deployment_id (str) – A unique identifier for a deployment created under the project.
query_data (dict) – A dictionary where the key is the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and the value is the unique value of the same entity.
limit_results (int) – If provided, will limit the number of results to the value specified.
result_columns (list) – If provided, will limit the columns present in each result to the columns specified in this list.
- Return type:
Dict
- predict(deployment_token, deployment_id, query_data)
Returns a prediction for Predictive Modeling
- Parameters:
deployment_token (str) – A deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, and is safe to embed in an application or website.
deployment_id (str) – A unique identifier for a deployment created under the project.
query_data (dict) – A dictionary where the key is the column name (e.g. a column with name ‘user_id’ in the dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed, and the value is the unique value of the same entity.
- Return type:
Dict
- predict_multiple(deployment_token, deployment_id, query_data)
Returns a list of predictions for predictive modeling.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, and is safe to embed in an application or website.
deployment_id (str) – The unique identifier for a deployment created under the project.
query_data (list) – A list of dictionaries, where the ‘key’ is the column name (e.g. a column with name ‘user_id’ in the dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed, and the ‘value’ is the unique value of the same entity.
- Return type:
Dict
- predict_from_datasets(deployment_token, deployment_id, query_data)
Returns a list of predictions for Predictive Modeling.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier for a deployment created under the project.
query_data (dict) – A dictionary where the ‘key’ is the source dataset name, and the ‘value’ is a list of records corresponding to the dataset rows.
- Return type:
Dict
- predict_lead(deployment_token, deployment_id, query_data, explain_predictions=False, explainer_type=None)
Returns the probability of a user being a lead based on their interaction with the service/product and their own attributes (e.g. income, assets, credit score, etc.). Note that the inputs to this method, wherever applicable, should be the column names in the dataset mapped to the column mappings in our system (e.g. column ‘user_id’ mapped to mapping ‘LEAD_ID’ in our system).
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – A dictionary containing user attributes and/or user’s interaction data with the product/service (e.g. number of clicks, items in cart, etc.).
explain_predictions (bool) – Will explain predictions for leads
explainer_type (str) – Type of explainer to use for explanations
- Return type:
Dict
- predict_churn(deployment_token, deployment_id, query_data)
Returns the probability of a user to churn out in response to their interactions with the item/product/service. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘churn_result’ mapped to mapping ‘CHURNED_YN’ in our system).
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where the ‘key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and the ‘value’ will be the unique value of the same entity.
- Return type:
Dict
- predict_takeover(deployment_token, deployment_id, query_data)
Returns a probability for each class label associated with the types of fraud or a ‘yes’ or ‘no’ type label for the possibility of fraud. Note that the inputs to this method, wherever applicable, will be the column names in the dataset mapped to the column mappings in our system (e.g., column ‘account_name’ mapped to mapping ‘ACCOUNT_ID’ in our system).
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – A dictionary containing account activity characteristics (e.g., login id, login duration, login type, IP address, etc.).
- Return type:
Dict
- predict_fraud(deployment_token, deployment_id, query_data)
Returns the probability of a transaction performed under a specific account being fraudulent or not. Note that the inputs to this method, wherever applicable, should be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_number’ mapped to the mapping ‘ACCOUNT_ID’ in our system).
- Parameters:
deployment_token (str) – A deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique identifier to a deployment created under the project.
query_data (dict) – A dictionary containing transaction attributes (e.g. credit card type, transaction location, transaction amount, etc.).
- Return type:
Dict
- predict_class(deployment_token, deployment_id, query_data, threshold=None, threshold_class=None, thresholds=None, explain_predictions=False, fixed_features=None, nested=None, explainer_type=None)
Returns a classification prediction
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model within an application or website.
deployment_id (str) – The unique identifier for a deployment created under the project.
query_data (dict) – A dictionary where the ‘Key’ is the column name (e.g. a column with the name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and the ‘Value’ is the unique value of the same entity.
threshold (float) – A float value that is applied on the popular class label.
threshold_class (str) – The label upon which the threshold is added (binary labels only).
thresholds (list) – Maps labels to thresholds (multi-label classification only). Defaults to F1 optimal threshold if computed for the given class, else uses 0.5.
explain_predictions (bool) – If True, returns the SHAP explanations for all input features.
fixed_features (list) – A set of input features to treat as constant for explanations - only honored when the explainer type is KERNEL_EXPLAINER
nested (str) – If specified generates prediction delta for each index of the specified nested feature.
explainer_type (str) – The type of explainer to use.
- Return type:
Dict
- predict_target(deployment_token, deployment_id, query_data, explain_predictions=False, fixed_features=None, nested=None, explainer_type=None)
Returns a prediction from a classification or regression model. Optionally, includes explanations.
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier of a deployment created under the project.
query_data (dict) – A dictionary where the ‘key’ is the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and the ‘value’ is the unique value of the same entity.
explain_predictions (bool) – If true, returns the SHAP explanations for all input features.
fixed_features (list) – Set of input features to treat as constant for explanations - only honored when the explainer type is KERNEL_EXPLAINER
nested (str) – If specified, generates prediction delta for each index of the specified nested feature.
explainer_type (str) – The type of explainer to use.
- Return type:
Dict
- get_anomalies(deployment_token, deployment_id, threshold=None, histogram=False)
Returns a list of anomalies from the training dataset.
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
threshold (float) – The threshold score of what is an anomaly. Valid values are between 0.8 and 0.99.
histogram (bool) – If True, will return a histogram of the distribution of all points.
- Return type:
- is_anomaly(deployment_token, deployment_id, query_data=None)
Returns a list of anomaly attributes based on login information for a specified account. Note that the inputs to this method, wherever applicable, should be the column names in the dataset mapped to the column mappings in our system (e.g. column ‘account_name’ mapped to mapping ‘ACCOUNT_ID’ in our system).
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – The input data for the prediction.
- Return type:
Dict
- get_event_anomaly_score(deployment_token, deployment_id, query_data=None)
Returns an anomaly score for an event.
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – The input data for the prediction.
- Return type:
Dict
- get_forecast(deployment_token, deployment_id, query_data, future_data=None, num_predictions=None, prediction_start=None, explain_predictions=False, explainer_type=None)
Returns a list of forecasts for a given entity under the specified project deployment. Note that the inputs to the deployed model will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘holiday_yn’ mapped to mapping ‘FUTURE’ in our system).
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘store_id’ in your dataset) mapped to the column mapping ITEM_ID that uniquely identifies the entity against which forecasting is performed and ‘Value’ will be the unique value of the same entity.
future_data (list) – This will be a list of values known ahead of time that are relevant for forecasting (e.g. State Holidays, National Holidays, etc.). Each element is a dictionary, where the key and the value both will be of type ‘str’. For example future data entered for a Store may be [{“Holiday”:”No”, “Promo”:”Yes”, “Date”: “2015-07-31 00:00:00”}].
num_predictions (int) – The number of timestamps to predict in the future.
prediction_start (str) – The start date for predictions (e.g., “2015-08-01T00:00:00” as input for mid-night of 2015-08-01).
explain_predictions (bool) – Will explain predictions for forecasting
explainer_type (str) – Type of explainer to use for explanations
- Return type:
Dict
- get_k_nearest(deployment_token, deployment_id, vector, k=None, distance=None, include_score=False, catalog_id=None)
Returns the k nearest neighbors for the provided embedding vector.
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
vector (list) – Input vector to perform the k nearest neighbors with.
k (int) – Overrideable number of items to return.
distance (str) – Specify the distance function to use when finding nearest neighbors.
include_score (bool) – If True, will return the score alongside the resulting embedding value.
catalog_id (str) – An optional parameter honored only for embeddings that provide a catalog id
- Return type:
Dict
- get_multiple_k_nearest(deployment_token, deployment_id, queries)
Returns the k nearest neighbors for the queries provided.
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
queries (list) – List of mappings of format {“catalogId”: “cat0”, “vectors”: […], “k”: 20, “distance”: “euclidean”}. See getKNearest for additional information about the supported parameters.
- get_labels(deployment_token, deployment_id, query_data)
Returns a list of scored labels for a document.
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – Dictionary where key is “Content” and value is the text from which entities are to be extracted.
- Return type:
Dict
- get_entities_from_pdf(deployment_token, deployment_id, pdf=None, doc_id=None, return_extracted_features=False, verbose=False)
Extracts text from the provided PDF and returns a list of recognized labels and their scores.
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
pdf (io.TextIOBase) – (Optional) The pdf to predict on. One of pdf or docId must be specified.
doc_id (str) – (Optional) The pdf to predict on. One of pdf or docId must be specified.
return_extracted_features (bool) – (Optional) If True, will return all extracted features (e.g. all tokens in a page) from the PDF. Default is False.
verbose (bool) – (Optional) If True, will return all the extracted tokens probabilities for all the trained labels. Default is False.
- Return type:
Dict
- get_recommendations(deployment_token, deployment_id, query_data, num_items=None, page=None, exclude_item_ids=None, score_field=None, scaling_factors=None, restrict_items=None, exclude_items=None, explore_fraction=None, diversity_attribute_name=None, diversity_max_results_per_value=None)
Returns a list of recommendations for a given user under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘time’ mapped to mapping ‘TIMESTAMP’ in our system).
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_name’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the user against which recommendations are made and ‘Value’ will be the unique value of the same item. For example, if you have the column name ‘user_name’ mapped to the column mapping ‘USER_ID’, then the query must have the exact same column name (user_name) as key and the name of the user (John Doe) as value.
num_items (int) – The number of items to recommend on one page. By default, it is set to 50 items per page.
page (int) – The page number to be displayed. For example, let’s say that the num_items is set to 10 with the total recommendations list size of 50 recommended items, then an input value of 2 in the ‘page’ variable will display a list of items that rank from 11th to 20th.
exclude_item_ids (list) – [DEPRECATED]
score_field (str) – The relative item scores are returned in a separate field named with the same name as the key (score_field) for this argument.
scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.
restrict_items (list) – It allows you to restrict the recommendations to certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, “value3”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”, “value3”, …]” to which to restrict the recommendations to. Let’s take an example where the input to restrict_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. This input will restrict the recommendations to SUVs and Sedans. This type of restriction is particularly useful if there’s a list of items that you know is of use in some particular scenario and you want to restrict the recommendations only to that list.
exclude_items (list) – It allows you to exclude certain items from the list of recommendations. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” to exclude from the recommendations. Let’s take an example where the input to exclude_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. The resulting recommendation list will exclude all SUVs and Sedans. This is
explore_fraction (float) – Explore fraction.
diversity_attribute_name (str) – item attribute column name which is used to ensure diversity of prediction results.
diversity_max_results_per_value (int) – maximum number of results per value of diversity_attribute_name.
- Return type:
Dict
- get_personalized_ranking(deployment_token, deployment_id, query_data, preserve_ranks=None, preserve_unknown_items=False, scaling_factors=None)
Returns a list of items with personalized promotions for a given user under the specified project deployment. Note that the inputs to this method, wherever applicable, should be the column names in the dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model in an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This should be a dictionary with two key-value pairs. The first pair represents a ‘Key’ where the column name (e.g. a column with name ‘user_id’ in the dataset) mapped to the column mapping USER_ID uniquely identifies the user against whom a prediction is made and a ‘Value’ which is the identifier value for that user. The second pair will have a ‘Key’ which will be the name of the column name (e.g. movie_name) mapped to ITEM_ID (unique item identifier) and a ‘Value’ which will be a list of identifiers that uniquely identifies those items.
preserve_ranks (list) – List of dictionaries of format {“column”: “col0”, “values”: [“value0, value1”]}, where the ranks of items in query_data is preserved for all the items in “col0” with values, “value0” and “value1”. This option is useful when the desired items are being recommended in the desired order and the ranks for those items need to be kept unchanged during recommendation generation.
preserve_unknown_items (bool) – If true, any items that are unknown to the model, will not be reranked, and the original position in the query will be preserved.
scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.
- Return type:
Dict
- get_ranked_items(deployment_token, deployment_id, query_data, preserve_ranks=None, preserve_unknown_items=False, score_field=None, scaling_factors=None, diversity_attribute_name=None, diversity_max_results_per_value=None)
Returns a list of re-ranked items for a selected user when a list of items is required to be reranked according to the user’s preferences. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary with two key-value pairs. The first pair represents a ‘Key’ where the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID uniquely identifies the user against whom a prediction is made and a ‘Value’ which is the identifier value for that user. The second pair will have a ‘Key’ which will be the name of the column name (e.g. movie_name) mapped to ITEM_ID (unique item identifier) and a ‘Value’ which will be a list of identifiers that uniquely identifies those items.
preserve_ranks (list) – List of dictionaries of format {“column”: “col0”, “values”: [“value0, value1”]}, where the ranks of items in query_data is preserved for all the items in “col0” with values, “value0” and “value1”. This option is useful when the desired items are being recommended in the desired order and the ranks for those items need to be kept unchanged during recommendation generation.
preserve_unknown_items (bool) – If true, any items that are unknown to the model, will not be reranked, and the original position in the query will be preserved
score_field (str) – The relative item scores are returned in a separate field named with the same name as the key (score_field) for this argument.
scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there is a type of item that might be less popular but you want to promote it or there is an item that always comes up and you want to demote it.
diversity_attribute_name (str) – item attribute column name which is used to ensure diversity of prediction results.
diversity_max_results_per_value (int) – maximum number of results per value of diversity_attribute_name.
- Return type:
Dict
Returns a list of related items for a given item under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where the ‘key’ will be the column name (e.g. a column with name ‘user_name’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the user against which related items are determined and the ‘value’ will be the unique value of the same item. For example, if you have the column name ‘user_name’ mapped to the column mapping ‘USER_ID’, then the query must have the exact same column name (user_name) as key and the name of the user (John Doe) as value.
num_items (int) – The number of items to recommend on one page. By default, it is set to 50 items per page.
page (int) – The page number to be displayed. For example, let’s say that the num_items is set to 10 with the total recommendations list size of 50 recommended items, then an input value of 2 in the ‘page’ variable will display a list of items that rank from 11th to 20th.
scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.
restrict_items (list) – It allows you to restrict the recommendations to certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, “value3”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”, “value3”, …]” to which to restrict the recommendations to. Let’s take an example where the input to restrict_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. This input will restrict the recommendations to SUVs and Sedans. This type of restriction is particularly useful if there’s a list of items that you know is of use in some particular scenario and you want to restrict the recommendations only to that list.
exclude_items (list) – It allows you to exclude certain items from the list of recommendations. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” to exclude from the recommendations. Let’s take an example where the input to exclude_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. The resulting recommendation list will exclude all SUVs and Sedans. This is particularly useful if there’s a list of items that you know is of no use in some particular scenario and you don’t want to show those items present in that list.
- Return type:
Dict
- get_chat_response(deployment_token, deployment_id, messages, llm_name=None, num_completion_tokens=None, system_message=None, temperature=0.0, filter_key_values=None, search_score_cutoff=None, chat_config=None, ignore_documents=False)
Return a chat response which continues the conversation based on the input messages and search results.
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
messages (list) – A list of chronologically ordered messages, starting with a user message and alternating sources. A message is a dict with attributes: is_user (bool): Whether the message is from the user. text (str): The message’s text.
llm_name (str) – Name of the specific LLM backend to use to power the chat experience
num_completion_tokens (int) – Default for maximum number of tokens for chat answers
system_message (str) – The generative LLM system message
temperature (float) – The generative LLM temperature
filter_key_values (dict) – A dictionary mapping column names to a list of values to restrict the retrived search results.
search_score_cutoff (float) – Cutoff for the document retriever score. Matching search results below this score will be ignored.
chat_config (dict) – A dictionary specifiying the query chat config override.
ignore_documents (bool) – If True, will ignore any documents and search results, and only use the messages to generate a response.
- Return type:
Dict
- get_conversation_response(deployment_id, message, deployment_conversation_id=None, external_session_id=None, llm_name=None, num_completion_tokens=None, system_message=None, temperature=0.0, filter_key_values=None, search_score_cutoff=None, chat_config=None, ignore_documents=False)
Return a conversation response which continues the conversation based on the input message and deployment conversation id (if exists).
- Parameters:
deployment_id (str) – The unique identifier to a deployment created under the project.
message (str) – A message from the user
deployment_conversation_id (str) – The unique identifier of a deployment conversation to continue. If not specified, a new one will be created.
external_session_id (str) – The user supplied unique identifier of a deployment conversation to continue. If specified, we will use this instead of a internal deployment conversation id.
llm_name (str) – Name of the specific LLM backend to use to power the chat experience
num_completion_tokens (int) – Default for maximum number of tokens for chat answers
system_message (str) – The generative LLM system message
temperature (float) – The generative LLM temperature
filter_key_values (dict) – A dictionary mapping column names to a list of values to restrict the retrived search results.
search_score_cutoff (float) – Cutoff for the document retriever score. Matching search results below this score will be ignored.
chat_config (dict) – A dictionary specifiying the query chat config override.
ignore_documents (bool) – If True, will ignore any documents and search results, and only use the message and past conversation to generate a response.
- Return type:
Dict
- get_search_results(deployment_token, deployment_id, query_data, num=15)
Return the most relevant search results to the search query from the uploaded documents.
- Parameters:
deployment_token (str) – A token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it can be securely embedded in an application or website.
deployment_id (str) – A unique identifier of a deployment created under the project.
query_data (dict) – A dictionary where the key is “Content” and the value is the text from which entities are to be extracted.
num (int) – Number of search results to return.
- Return type:
Dict
- get_sentiment(deployment_token, deployment_id, document)
Predicts sentiment on a document
- Parameters:
deployment_token (str) – A token used to authenticate access to deployments created in this project. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique string identifier for a deployment created under this project.
document (str) – The document to be analyzed for sentiment.
- Return type:
Dict
- get_entailment(deployment_token, deployment_id, document)
Predicts the classification of the document
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique string identifier for the deployment created under the project.
document (str) – The document to be classified.
- Return type:
Dict
- get_classification(deployment_token, deployment_id, document)
Predicts the classification of the document
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique string identifier for the deployment created under the project.
document (str) – The document to be classified.
- Return type:
Dict
- get_summary(deployment_token, deployment_id, query_data)
Returns a JSON of the predicted summary for the given document. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘text’ mapped to mapping ‘DOCUMENT’ in our system).
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – Raw data dictionary containing the required document data - must have a key ‘document’ corresponding to a DOCUMENT type text as value.
- Return type:
Dict
- predict_language(deployment_token, deployment_id, query_data)
Predicts the language of the text
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments within this project, making it safe to embed this model in an application or website.
deployment_id (str) – A unique string identifier for a deployment created under the project.
query_data (str) – The input string to detect.
- Return type:
Dict
- get_assignments(deployment_token, deployment_id, query_data, forced_assignments=None, solve_time_limit_seconds=None)
Get all positive assignments that match a query.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it can be safely embedded in an application or website.
deployment_id (str) – The unique identifier of a deployment created under the project.
query_data (dict) – Specifies the set of assignments being requested. The value for the key can be: 1. A simple scalar value, which is matched exactly 2. A list of values, which matches any element in the list 3. A dictionary with keys lower_in/lower_ex and upper_in/upper_ex, which matches values in an inclusive/exclusive range
forced_assignments (dict) – Set of assignments to force and resolve before returning query results.
solve_time_limit_seconds (float) – Maximum time in seconds to spend solving the query.
- Return type:
Dict
- get_alternative_assignments(deployment_token, deployment_id, query_data, add_constraints=None, solve_time_limit_seconds=None)
Get alternative positive assignments for given query. Optimal assignments are ignored and the alternative assignments are returned instead.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it can be safely embedded in an application or website.
deployment_id (str) – The unique identifier of a deployment created under the project.
query_data (dict) – Specifies the set of assignments being requested. The value for the key can be: 1. A simple scalar value, which is matched exactly 2. A list of values, which matches any element in the list 3. A dictionary with keys lower_in/lower_ex and upper_in/upper_ex, which matches values in an inclusive/exclusive range
add_constraints (list) – List of constraints dict to apply to the query. The constraint dict should have the following keys: 1. query (dict): Specifies the set of assignments involved in the constraint. The format is same as query_data. 2. operator (str): Constraint operator ‘=’ or ‘<=’ or ‘>=’. 3. constant (int): Constraint RHS constant value.
solve_time_limit_seconds (float) – Maximum time in seconds to spend solving the query.
- Return type:
Dict
- check_constraints(deployment_token, deployment_id, query_data)
Check for any constraints violated by the overrides.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model within an application or website.
deployment_id (str) – The unique identifier for a deployment created under the project.
query_data (dict) – Assignment overrides to the solution.
- Return type:
Dict
- predict_with_binary_data(deployment_token, deployment_id, blob)
Make predictions for a given blob, e.g. image, audio
- Parameters:
deployment_token (str) – A token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model in an application or website.
deployment_id (str) – A unique identifier to a deployment created under the project.
blob (io.TextIOBase) – The multipart/form-data of the data.
- Return type:
Dict
- describe_image(deployment_token, deployment_id, image, categories, top_n=None)
Describe the similarity between an image and a list of categories.
- Parameters:
deployment_token (str) – Authentication token to access created deployments. This token is only authorized to predict on deployments in the current project, and can be safely embedded in an application or website.
deployment_id (str) – Unique identifier of a deployment created under the project.
image (io.TextIOBase) – Image to describe.
categories (list) – List of candidate categories to compare with the image.
top_n (int) – Return the N most similar categories.
- Return type:
Dict
- get_text_from_document(deployment_token, deployment_id, document=None, return_detected_images=False)
Generate text from a document
- Parameters:
deployment_token (str) – Authentication token to access created deployments. This token is only authorized to predict on deployments in the current project, and can be safely embedded in an application or website.
deployment_id (str) – Unique identifier of a deployment created under the project.
document (io.TextIOBase) – Input document which can be an image, pdf, or word document (Some formats might not be supported yet)
return_detected_images (bool) – whether the detected images should be saved in docstore or not (if true, adds a docstore id to the response (may not be available for some algorithms))
- Return type:
Dict
- transcribe_audio(deployment_token, deployment_id, audio)
Transcribe the audio
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to make predictions on deployments in this project, so it can be safely embedded in an application or website.
deployment_id (str) – The unique identifier of a deployment created under the project.
audio (io.TextIOBase) – The audio to transcribe.
- Return type:
Dict
- classify_image(deployment_token, deployment_id, image=None, doc_id=None)
Classify an image.
- Parameters:
deployment_token (str) – A deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique string identifier to a deployment created under the project.
image (io.TextIOBase) – The binary data of the image to classify. One of image or doc_id must be specified.
doc_id (str) – The document ID of the image. One of image or doc_id must be specified.
- Return type:
Dict
- classify_pdf(deployment_token, deployment_id, pdf=None)
Returns a classification prediction from a PDF
- Parameters:
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model within an application or website.
deployment_id (str) – The unique identifier for a deployment created under the project.
pdf (io.TextIOBase) – (Optional) The pdf to predict on. One of pdf or docId must be specified.
- Return type:
Dict
- get_cluster(deployment_token, deployment_id, query_data)
Predicts the cluster for given data.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique string identifier for the deployment created under the project.
query_data (dict) – A dictionary where each ‘key’ represents a column name and its corresponding ‘value’ represents the value of that column. For Timeseries Clustering, the ‘key’ should be ITEM_ID, and its value should represent a unique item ID that needs clustering.
- Return type:
Dict
- get_objects_from_image(deployment_token, deployment_id, image)
Classify an image.
- Parameters:
deployment_token (str) – A deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique string identifier to a deployment created under the project.
image (io.TextIOBase) – The binary data of the image to detect objects from.
- Return type:
Dict
- score_image(deployment_token, deployment_id, image)
Score on image.
- Parameters:
deployment_token (str) – A deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique string identifier to a deployment created under the project.
image (io.TextIOBase) – The binary data of the image to get the score.
- Return type:
Dict
- transfer_style(deployment_token, deployment_id, source_image, style_image)
Change the source image to adopt the visual style from the style image.
- Parameters:
deployment_token (str) – A token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model in an application or website.
deployment_id (str) – A unique identifier to a deployment created under the project.
source_image (io.TextIOBase) – The source image to apply the makeup.
style_image (io.TextIOBase) – The image that has the style as a reference.
- Return type:
- generate_image(deployment_token, deployment_id, query_data)
Generate an image from text prompt.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model within an application or website.
deployment_id (str) – A unique identifier to a deployment created under the project.
query_data (dict) – Specifies the text prompt. For example, {‘prompt’: ‘a cat’}
- Return type:
- execute_agent(deployment_token, deployment_id, arguments=None, keyword_arguments=None)
Executes a deployed AI agent function using the arguments as keyword arguments to the agent execute function.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique string identifier for the deployment created under the project.
arguments (list) – Positional arguments to the agent execute function.
keyword_arguments (dict) – A dictionary where each ‘key’ represents the paramter name and its corresponding ‘value’ represents the value of that parameter for the agent execute function.
- Return type:
Dict
- execute_conversation_agent(deployment_token, deployment_id, arguments=None, keyword_arguments=None, deployment_conversation_id=None, external_session_id=None, regenerate=False)
Executes a deployed AI agent function using the arguments as keyword arguments to the agent execute function.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique string identifier for the deployment created under the project.
arguments (list) – Positional arguments to the agent execute function.
keyword_arguments (dict) – A dictionary where each ‘key’ represents the paramter name and its corresponding ‘value’ represents the value of that parameter for the agent execute function.
deployment_conversation_id (str) – A unique string identifier for the deployment conversation used for the conversation.
external_session_id (str) – A unique string identifier for the session used for the conversation. If both deployment_conversation_id and external_session_id are not provided, a new session will be created.
regenerate (bool) – If True, will regenerate the response from the last query.
- Return type:
Dict
- execute_agent_with_binary_data(deployment_token, deployment_id, blob, arguments=None, keyword_arguments=None, deployment_conversation_id=None, external_session_id=None)
Executes a deployed AI agent function with binary data as inputs.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – A unique string identifier for the deployment created under the project.
blob (io.TextIOBase) – The multipart/form-data of the binary data.
arguments (list) – Positional arguments to the agent execute function.
keyword_arguments (dict) – A dictionary where each ‘key’ represents the parameter name and its corresponding ‘value’ represents the value of that parameter for the agent execute function.
deployment_conversation_id (str) – A unique string identifier for the deployment conversation used for the conversation.
external_session_id (str) – A unique string identifier for the session used for the conversation. If both deployment_conversation_id and external_session_id are not provided, a new session will be created.
- Return type:
Dict
- lookup_matches(deployment_token, deployment_id, data=None, filters=None, num=None, result_columns=None, max_words=None, num_retrieval_margin_words=None, max_words_per_chunk=None)
Lookup document retrievers and return the matching documents from the document retriever deployed with given query.
Original documents are splitted into chunks and stored in the document retriever. This lookup function will return the relevant chunks from the document retriever. The returned chunks could be expanded to include more words from the original documents and merged if they are overlapping, and permitted by the settings provided. The returned chunks are sorted by relevance.
- Parameters:
deployment_token (str) – The deployment token used to authenticate access to created deployments. This token is only authorized to predict on deployments within this project, making it safe to embed this model in an application or website.
deployment_id (str) – A unique string identifier for the deployment created under the project.
data (str) – The query to search for.
filters (dict) – A dictionary mapping column names to a list of values to restrict the retrieved search results.
num (int) – If provided, will limit the number of results to the value specified.
result_columns (list) – If provided, will limit the column properties present in each result to those specified in this list.
max_words (int) – If provided, will limit the total number of words in the results to the value specified.
num_retrieval_margin_words (int) – If provided, will add this number of words from left and right of the returned chunks.
max_words_per_chunk (int) – If provided, will limit the number of words in each chunk to the value specified. If the value provided is smaller than the actual size of chunk on disk, which is determined during document retriever creation, the actual size of chunk will be used. I.e, chunks looked up from document retrievers will not be split into smaller chunks during lookup due to this setting.
- Returns:
The relevant documentation results found from the document retriever.
- Return type:
- class abacusai.StreamingClient(client_options=None)
Bases:
abacusai.client.BaseApiClient
Abacus.AI Streaming API Client. Does not utilize authentication and only contains public streaming methods
- Parameters:
client_options (ClientOptions) – Optional API client configurations
- upsert_item_embeddings(streaming_token, model_id, item_id, vector, catalog_id=None)
Upserts an embedding vector for an item id for a model_id.
- Parameters:
streaming_token (str) – The streaming token for authenticating requests to the model.
model_id (str) – A unique string identifier for the model to upsert item embeddings to.
item_id (str) – The item id for which its embeddings will be upserted.
vector (list) – The embedding vector.
catalog_id (str) – The name of the catalog in the model to update.
- delete_item_embeddings(streaming_token, model_id, item_ids, catalog_id=None)
Deletes KNN embeddings for a list of item IDs for a given model ID.
- Parameters:
streaming_token (str) – The streaming token for authenticating requests to the model.
model_id (str) – A unique string identifier for the model from which to delete item embeddings.
item_ids (list) – A list of item IDs whose embeddings will be deleted.
catalog_id (str) – An optional name to specify which catalog in a model to update.
- upsert_multiple_item_embeddings(streaming_token, model_id, upserts, catalog_id=None)
Upserts a knn embedding for multiple item ids for a model_id.
- Parameters:
streaming_token (str) – The streaming token for authenticating requests to the model.
model_id (str) – The unique string identifier of the model to upsert item embeddings to.
upserts (list) – A list of dictionaries of the form {‘itemId’: …, ‘vector’: […]} for each upsert.
catalog_id (str) – Name of the catalog in the model to update.
- append_data(feature_group_id, streaming_token, data)
Appends new data into the feature group for a given lookup key recordId.
- upsert_multiple_data(feature_group_id, streaming_token, data)
Update new data into the feature group for a given lookup key recordId if the recordId is found; otherwise, insert new data into the feature group.
- append_multiple_data(feature_group_id, streaming_token, data)
Appends new data into the feature group for a given lookup key recordId.
- abacusai.__version__ = '0.77.8'