| | |
- builtins.object
-
- gen_ai_hub.evaluations.client.EvaluationClient
- gen_ai_hub.evaluations.models.artifact_source.ArtifactSource
- gen_ai_hub.evaluations.models.dataset_config.Dataset
- gen_ai_hub.evaluations.models.evaluation_config.EvaluationConfig
- gen_ai_hub.evaluations.models.evaluation_run.EvaluationRun
- gen_ai_hub.evaluations.models.evaluation_run.Results
- gen_ai_hub.evaluations.models.metric_config.MetricConfig
- gen_ai_hub.evaluations.models.metric_config.MetricRef
class ArtifactSource(builtins.object) |
| |
ArtifactSource(file_type: Literal['csv', 'json', 'jsonl'], artifact: Union[str, ai_api_client_sdk.models.artifact.Artifact], path: Optional[str] = None)
Extends the artifact object with the relative path user can provide inside to be used for EvaluationConfig
Example Usage:
>>> ArtifactSource(
artifact={
"id": "xyfz-rtyu-2456-ojns-yu6s",
"name": "dataset-artifact",
"url": "ai://default/eval_dataset"
...
},
path= "rootfolder/data.csv,
file_type="csv"
)
>>> ArtifactSource(
artifact="xyfz-rtyu-2456-ojns-yu6s",
path="rootfolder/data.json,
file_type="json"
)
) |
| |
Methods defined here:
- __init__(self, file_type: Literal['csv', 'json', 'jsonl'], artifact: Union[str, ai_api_client_sdk.models.artifact.Artifact], path: Optional[str] = None)
- Parameters:
artifact(Union[str,Artifact]): Can just provide the artifact id as a string or the Artifact object of the AI_API_Client sdk.
path(Optional[str]): Relative path within the artifact path provided and should point to a single file.
file_type(Literal["csv", "json", "jsonl"]): One of the supported file_types
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class Dataset(builtins.object) |
| |
Dataset(source: Union[str, pathlib.Path, gen_ai_hub.evaluations.models.artifact_source.ArtifactSource])
Dataset object for the evaluations flow.
The Dataset class accepts various source types for evaluation datasets including
local file paths (as strings or Path objects) or AI Core artifacts.
:param source: Source of the dataset - can be a file path string, Path object, or ArtifactSource
:type source: Union[str, Path, ArtifactSource]
**Examples**:
Using a Path object:
>>> Dataset(Path("data/sample.json"))
Using a string path:
>>> Dataset("data/sample.json")
Using an ArtifactSource with artifact dictionary:
>>> Dataset(
... ArtifactSource(
... artifact={
... "id": "xyfz-rtyu-2456-ojns-yu6s",
... "name": "dataset-artifact",
... "url": "ai://default/eval_dataset"
... },
... path="rootfolder/data.csv",
... file_type="csv"
... )
... )
Using an ArtifactSource with artifact ID:
>>> Dataset(
... ArtifactSource(
... artifact="xyfz-rtyu-2456-ojns-yu6s",
... path="rootfolder/data.csv",
... file_type="csv"
... )
... ) |
| |
Methods defined here:
- __init__(self, source: Union[str, pathlib.Path, gen_ai_hub.evaluations.models.artifact_source.ArtifactSource])
- Initialize a Dataset instance.
:param source: Source of the dataset - can be a file path string, Path object, or ArtifactSource
:type source: Union[str, Path, ArtifactSource]
Readonly properties defined here:
- file_type
- Infer the file type from the source.
For ArtifactSource, returns the explicitly set file_type.
For file paths, infers the type from the file extension.
:return: File type (e.g., "json", "jsonl", "csv") or None if cannot be determined
:rtype: Optional[str]
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class EvaluationClient(builtins.object) |
| |
EvaluationClient(base_url: str, auth_url: str = None, client_id: str = None, client_secret: str = None, cert_str: str = None, key_str: str = None, cert_file_path: str = None, key_file_path: str = None, resource_group: str = None, aws_access_key_id: str = None, aws_secret_access_key: str = None, ai_core_client: ai_core_sdk.ai_core_v2_client.AICoreV2Client = None, orchestration_url: str = None, input_object_store_secret_name: str = None, provider_name: str = 'aws')
Base Client for the Evaluations service |
| |
Methods defined here:
- __init__(self, base_url: str, auth_url: str = None, client_id: str = None, client_secret: str = None, cert_str: str = None, key_str: str = None, cert_file_path: str = None, key_file_path: str = None, resource_group: str = None, aws_access_key_id: str = None, aws_secret_access_key: str = None, ai_core_client: ai_core_sdk.ai_core_v2_client.AICoreV2Client = None, orchestration_url: str = None, input_object_store_secret_name: str = None, provider_name: str = 'aws')
- EvaluationsClient root object to be used for Evaluations.
:param base_url: Base URL of the AI Core instance (must include `/v2` suffix).
:type base_url: str
:param auth_url: Authentication URL used to retrieve access tokens.
:type auth_url: str, optional
:param client_id: OAuth client ID.
:type client_id: str, optional
:param client_secret: OAuth client secret.
:type client_secret: str, optional
:param cert_str: X.509 certificate content as a string.
:type cert_str: str, optional
:param key_str: X.509 private key content as a string.
:type key_str: str, optional
:param cert_file_path: File path to X.509 certificate.
:type cert_file_path: str, optional
:param key_file_path: File path to X.509 private key.
:type key_file_path: str, optional
:param resource_group: Resource group name within the AI Core instance.
:type resource_group: str, optional
:param aws_access_key_id: AWS access key ID.
:type aws_access_key_id: str, optional
:param aws_secret_access_key: AWS secret access key.
:type aws_secret_access_key: str, optional
:param ai_core_client: Pre-configured AI Core client instance.
:type ai_core_client: AICoreV2Client, optional
:param orchestration_url: Pre-existing orchestration deployment URL.
:type orchestration_url: str, optional
:param input_object_store_secret_name: Name of input object store secret.
:type input_object_store_secret_name: str, optional
:param provider_name: Hyperscaler provider name (e.g., "aws").
:type provider_name: str, optional
:raises ValueError: If required hyperscaler provider parameters are missing.
- __repr__(self)
- Return repr(self).
- create_or_update_object_store_secret(self, *, context, secret_body: dict, is_default: bool, result_key: str, attr_name: str, creator_mapping: dict, replace_existing: bool, result: dict)
- evaluate(self, evaluation_configs: List[gen_ai_hub.evaluations.models.evaluation_config.EvaluationConfig]) -> List[gen_ai_hub.evaluations.models.evaluation_run.EvaluationRun]
- Main evaluate function to create the Evaluation job
Parameters:
evaluation_configs(List[EvaluationConfig]): A list of one or more of the EvaluationConfig objects
Returns:
List[EvaluationRun]: A list of EvaluationRun objects, one for each EvaluationConfig provided.
- get_system_supported_metrics(self) -> List[str]
- helper method to get the list of all supported metric ids
- list_available_models(self)
- Method to list all the available llm models
- resolve_orchestration_deployment_url(self) -> str
- Resolves the orchestration deployment URL.
For non-default resource groups, creates a new deployment.
For default resource group, attempts to discover existing deployment
with the default config name using the orchestration service,
or creates one if not found.
:return: The orchestration deployment URL.
:rtype: str
- setup(self, input_secret_body: dict | None = None, default_secret_body: dict | None = None, replace_existing: bool = False)
- One time setup function which does object store secrets creation
and orchestration deployment url creation if not provided.
- validate_secret_type(self, secret_type: str, creator_mapping: dict)
Static methods defined here:
- from_env(profile_name: str = None, **kwargs)
- Alternative way to create an EvaluationClient object.
Parameter resolution precedence:
1. Explicit keyword arguments
2. Environment variables
3. Configuration file
4. VCAP_SERVICES environment variable
:param profile_name: Profile name defined in configuration.
:type profile_name: str, optional
:param kwargs: Additional parameters passed to constructor.
:return: Configured EvaluationClient instance.
:rtype: EvaluationClient
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class EvaluationConfig(builtins.object) |
| |
EvaluationConfig(dataset_config: gen_ai_hub.evaluations.models.dataset_config.Dataset, metrics: List[gen_ai_hub.evaluations.models.metric_config.MetricConfig], llm: Optional[gen_ai_hub.orchestration_v2.models.llm_model_details.LLMModelDetails] = None, template: Union[str, gen_ai_hub.prompt_registry.models.prompt_template.PromptTemplateSpec, gen_ai_hub.orchestration_v2.models.template_ref.TemplateRef, NoneType] = None, orchestration_registry_reference: Optional[str] = None, template_variable_mapping: Optional[dict] = None, test_row_count: Optional[int] = -1, repetitions: Optional[int] = 1, tags: Optional[dict] = '{}', debug_mode: Optional[bool] = False)
Defines the evaluation configuration object for the Evaluations flow.
This class encapsulates all configuration parameters needed to run an evaluation job,
including the model/template configuration, dataset, metrics, and execution settings.
At least one of the following must be provided:
- ``llm`` and ``template`` combination (using orchestration_v2 models)
- ``orchestration_registry_reference`` (UUID of a registered orchestration configuration)
:param dataset_config: Dataset configuration object specifying the evaluation dataset
:type dataset_config: Dataset
:param metrics: List of metric configurations for evaluation
:type metrics: List[MetricConfig]
:param llm: LLM configuration from orchestration_v2 (LLMModelDetails)
:type llm: Optional[LLM]
:param template: Prompt template as string, PromptTemplateSpec, or TemplateRef
:type template: Optional[Union[str, PromptTemplateSpec, TemplateRef]]
:param orchestration_registry_reference: UUID of registered orchestration configuration
:type orchestration_registry_reference: Optional[str]
:param template_variable_mapping: Variable mapping for the prompt template
:type template_variable_mapping: Optional[dict]
:param test_row_count: Number of rows to sample from dataset (-1 for all rows), defaults to -1
:type test_row_count: Optional[int]
:param repetitions: Number of times to repeat evaluation over the dataset, defaults to 1
:type repetitions: Optional[int]
:param tags: User-defined metadata as key-value pairs, defaults to "{}"
:type tags: Optional[dict]
:param debug_mode: Enable debug logs in hyperscaler output path, defaults to False
:type debug_mode: Optional[bool]
.. note::
This module uses orchestration_v2 models directly.
**Example using TemplateRef with ID**:
>>> from gen_ai_hub.evaluations.models import EvaluationConfig, Dataset, MetricConfig
>>> from gen_ai_hub.orchestration_v2.models.llm_model_details import LLMModelDetails as LLM
>>> from gen_ai_hub.orchestration_v2.models.template_ref import TemplateRef, TemplateRefByID
>>> config = EvaluationConfig(
... dataset_config=Dataset("data/test.jsonl"),
... metrics=[MetricConfig(name="accuracy")],
... llm=LLM(name="gpt-4", version="latest"),
... template=TemplateRef(template_ref=TemplateRefByID(id="template-id-here")),
... test_row_count=100
... )
**Example using TemplateRef with scenario/name/version**:
>>> from gen_ai_hub.orchestration_v2.models.template_ref import TemplateRefByScenarioNameVersion
>>> config = EvaluationConfig(
... dataset_config=Dataset("data/test.jsonl"),
... metrics=[MetricConfig(name="accuracy")],
... llm=LLM(name="gpt-4", version="latest", params={"temperature": 0.7}),
... template=TemplateRef(template_ref=TemplateRefByScenarioNameVersion(
... scenario="foundation-models", name="prompt1", version="1.0"
... )),
... test_row_count=100
... ) |
| |
Methods defined here:
- __init__(self, dataset_config: gen_ai_hub.evaluations.models.dataset_config.Dataset, metrics: List[gen_ai_hub.evaluations.models.metric_config.MetricConfig], llm: Optional[gen_ai_hub.orchestration_v2.models.llm_model_details.LLMModelDetails] = None, template: Union[str, gen_ai_hub.prompt_registry.models.prompt_template.PromptTemplateSpec, gen_ai_hub.orchestration_v2.models.template_ref.TemplateRef, NoneType] = None, orchestration_registry_reference: Optional[str] = None, template_variable_mapping: Optional[dict] = None, test_row_count: Optional[int] = -1, repetitions: Optional[int] = 1, tags: Optional[dict] = '{}', debug_mode: Optional[bool] = False)
- Initialize an EvaluationConfig instance.
:param dataset_config: Dataset configuration object
:type dataset_config: Dataset
:param metrics: List of metric configurations
:type metrics: List[MetricConfig]
:param llm: LLM object from orchestration_v2 (LLMModelDetails), defaults to None
:type llm: Optional[LLM]
:param template: Prompt template (string, PromptTemplateSpec, or TemplateRef), defaults to None
:type template: Optional[Union[str, PromptTemplateSpec, TemplateRef]]
:param orchestration_registry_reference: UUID of orchestration config, defaults to None
:type orchestration_registry_reference: Optional[str]
:param template_variable_mapping: Variable mapping for prompt template, defaults to None
:type template_variable_mapping: Optional[dict]
:param test_row_count: Number of dataset rows to sample (-1 for all), defaults to -1
:type test_row_count: Optional[int]
:param repetitions: Number of evaluation repetitions (minimum: 1), defaults to 1
:type repetitions: Optional[int]
:param tags: Key-value metadata pairs applied to all runs, defaults to "{}"
:type tags: Optional[dict]
:param debug_mode: Enable debug logging, defaults to False
:type debug_mode: Optional[bool]
:raises ValueError: If neither (llm, template) nor orchestration_registry_reference is provided
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class EvaluationRun(builtins.object) |
| |
EvaluationRun(run_id: str, execution_id: str, ai_core_client: ai_core_sdk.ai_core_v2_client.AICoreV2Client, configuration_id: str = None, artifact_id: str = None, resource_group: str = None, object_store_credentials: gen_ai_hub.evaluations._internal._models._AWSObjectStoreData = None, metrics_list: List[str] = None)
Represents an individual EvaluationRun object and its associated context.
:param run_id: Unique identifier for the evaluation run
:type run_id: str
:param execution_id: ID of the AI Core execution
:type execution_id: str
:param ai_core_client: AI Core client instance
:type ai_core_client: AICoreV2Client
:param configuration_id: ID of the configuration, defaults to None
:type configuration_id: str
:param artifact_id: ID of the artifact, defaults to None
:type artifact_id: str
:param resource_group: Resource group name, defaults to None
:type resource_group: str
:param object_store_credentials: Object store credentials, defaults to None
:type object_store_credentials: _AWSObjectStoreData
:param metrics_list: List of metrics to evaluate, defaults to None
:type metrics_list: List[str] |
| |
Methods defined here:
- __init__(self, run_id: str, execution_id: str, ai_core_client: ai_core_sdk.ai_core_v2_client.AICoreV2Client, configuration_id: str = None, artifact_id: str = None, resource_group: str = None, object_store_credentials: gen_ai_hub.evaluations._internal._models._AWSObjectStoreData = None, metrics_list: List[str] = None)
- Initialize self. See help(type(self)) for accurate signature.
- get_current_status(self)
- Get the current status of the evaluation run.
:return: Current status of the run
:rtype: Status
:raises ValueError: If failed to retrieve the current status
- get_debug_info(self) -> gen_ai_hub.evaluations.models.evaluation_run.ExecutionStatusDetails
- Provide debug information when execution status is FAILED or DEAD.
:return: Execution status details including failed pod information
:rtype: ExecutionStatusDetails
- get_debug_logs(self)
- Get the complete trace of execution logs.
:return: List of log entries as dictionaries
:rtype: list
- load_results_tables(self)
- Download results from S3 and load the required table data.
:return: Dictionary containing completions and metrics table data
:rtype: dict
:raises RuntimeError: If failed to download results
- results(self)
- Get the results of the evaluation run.
:return: Results object for accessing completion and metric results
:rtype: Results
:raises ValueError: If execution is not completed
- set_cached_results_data(self, data)
- Set the cached results data from the child results class.
:param data: Results data to cache
:type data: Any
- wait_for_completion(self, timeout: Optional[int] = None)
- Wait for the evaluation run to complete by polling status.
:param timeout: Maximum time to wait in seconds, defaults to 3600 (1 hour)
:type timeout: Optional[int]
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class MetricConfig(builtins.object) |
| |
MetricConfig(reference: gen_ai_hub.evaluations.models.metric_config.MetricRef, variable_mapping: dict = None)
Defines the metric config of the evaluation flow
Parameters:
reference(MetricRef): Provide the reference of metric to be evaluated, can be one of name,uuid(id), scenario/name/version
variable_mapping(Optional[dict]): Any variable maping associated with the metric |
| |
Methods defined here:
- __init__(self, reference: gen_ai_hub.evaluations.models.metric_config.MetricRef, variable_mapping: dict = None)
- Initialize self. See help(type(self)) for accurate signature.
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class MetricRef(builtins.object) |
| |
MetricRef(scenario: str = None, name: str = None, version: str = None, id: str = None)
Represents a reference to a specific metric definition.
A metric can be identified in multiple ways:
- By its UUID from metric management service (`id`)
- By name (`name`)
- By a combination of scenario, name, and version (`scenario`, `name`, `version`) |
| |
Methods defined here:
- __init__(self, scenario: str = None, name: str = None, version: str = None, id: str = None)
- Initialize self. See help(type(self)) for accurate signature.
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class Results(builtins.object) |
| |
Results(run: gen_ai_hub.evaluations.models.evaluation_run.EvaluationRun)
Represents the Results handler for an EvaluationRun object.
This class provides methods to access completion results, metric results,
and aggregated results for a specific evaluation run.
:param run: The parent EvaluationRun object
:type run: EvaluationRun |
| |
Methods defined here:
- __init__(self, run: gen_ai_hub.evaluations.models.evaluation_run.EvaluationRun)
- Initialize self. See help(type(self)) for accurate signature.
- aggregations(self)
- Get the aggregated results for the run from the tracking service.
:return: JSON response containing aggregated metric results
:rtype: dict
:raises ValueError: If error occurs while fetching aggregation results
- completions(self)
- Get the completion results for the run.
:return: DataFrame containing completion results for the run
:rtype: pd.DataFrame
:raises ValueError: If error occurs while fetching completions
- metrics(self)
- Get the metric-level results for the run.
:return: DataFrame containing metric results for the run
:rtype: pd.DataFrame
:raises ValueError: If error occurs while fetching metric results
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
| |