| | |
- builtins.object
-
- gen_ai_hub.evaluations.models.artifact_source.ArtifactSource
- gen_ai_hub.evaluations.models.dataset_config.Dataset
- gen_ai_hub.evaluations.models.evaluation_config.EvaluationConfig
- gen_ai_hub.evaluations.models.evaluation_run.EvaluationRun
- gen_ai_hub.evaluations.models.evaluation_run.Results
- gen_ai_hub.evaluations.models.metric_config.MetricConfig
- gen_ai_hub.evaluations.models.metric_config.MetricRef
class ArtifactSource(builtins.object) |
| |
ArtifactSource(file_type: Literal['csv', 'json', 'jsonl'], artifact: Union[str, ai_api_client_sdk.models.artifact.Artifact], path: Optional[str] = None)
Extends the artifact object with the relative path user can provide inside to be used for EvaluationConfig
Example Usage:
>>> ArtifactSource(
artifact={
"id": "xyfz-rtyu-2456-ojns-yu6s",
"name": "dataset-artifact",
"url": "ai://default/eval_dataset"
...
},
path= "rootfolder/data.csv,
file_type="csv"
)
>>> ArtifactSource(
artifact="xyfz-rtyu-2456-ojns-yu6s",
path="rootfolder/data.json,
file_type="json"
)
) |
| |
Methods defined here:
- __init__(self, file_type: Literal['csv', 'json', 'jsonl'], artifact: Union[str, ai_api_client_sdk.models.artifact.Artifact], path: Optional[str] = None)
- Parameters:
artifact(Union[str,Artifact]): Can just provide the artifact id as a string or the Artifact object of the AI_API_Client sdk.
path(Optional[str]): Relative path within the artifact path provided and should point to a single file.
file_type(Literal["csv", "json", "jsonl"]): One of the supported file_types
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class Dataset(builtins.object) |
| |
Dataset(source: Union[str, pathlib.Path, gen_ai_hub.evaluations.models.artifact_source.ArtifactSource])
Dataset object for the evaluations flow.
The Dataset class accepts various source types for evaluation datasets including
local file paths (as strings or Path objects) or AI Core artifacts.
:param source: Source of the dataset - can be a file path string, Path object, or ArtifactSource
:type source: Union[str, Path, ArtifactSource]
**Examples**:
Using a Path object:
>>> Dataset(Path("data/sample.json"))
Using a string path:
>>> Dataset("data/sample.json")
Using an ArtifactSource with artifact dictionary:
>>> Dataset(
... ArtifactSource(
... artifact={
... "id": "xyfz-rtyu-2456-ojns-yu6s",
... "name": "dataset-artifact",
... "url": "ai://default/eval_dataset"
... },
... path="rootfolder/data.csv",
... file_type="csv"
... )
... )
Using an ArtifactSource with artifact ID:
>>> Dataset(
... ArtifactSource(
... artifact="xyfz-rtyu-2456-ojns-yu6s",
... path="rootfolder/data.csv",
... file_type="csv"
... )
... ) |
| |
Methods defined here:
- __init__(self, source: Union[str, pathlib.Path, gen_ai_hub.evaluations.models.artifact_source.ArtifactSource])
- Initialize a Dataset instance.
:param source: Source of the dataset - can be a file path string, Path object, or ArtifactSource
:type source: Union[str, Path, ArtifactSource]
Readonly properties defined here:
- file_type
- Infer the file type from the source.
For ArtifactSource, returns the explicitly set file_type.
For file paths, infers the type from the file extension.
:return: File type (e.g., "json", "jsonl", "csv") or None if cannot be determined
:rtype: Optional[str]
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class EvaluationConfig(builtins.object) |
| |
EvaluationConfig(dataset_config: gen_ai_hub.evaluations.models.dataset_config.Dataset, metrics: List[gen_ai_hub.evaluations.models.metric_config.MetricConfig], llm: Optional[gen_ai_hub.orchestration_v2.models.llm_model_details.LLMModelDetails] = None, template: Union[str, gen_ai_hub.prompt_registry.models.prompt_template.PromptTemplateSpec, gen_ai_hub.orchestration_v2.models.template_ref.TemplateRef, NoneType] = None, orchestration_registry_reference: Optional[str] = None, template_variable_mapping: Optional[dict] = None, test_row_count: Optional[int] = -1, repetitions: Optional[int] = 1, tags: Optional[dict] = '{}', debug_mode: Optional[bool] = False)
Defines the evaluation configuration object for the Evaluations flow.
This class encapsulates all configuration parameters needed to run an evaluation job,
including the model/template configuration, dataset, metrics, and execution settings.
At least one of the following must be provided:
- ``llm`` and ``template`` combination (using orchestration_v2 models)
- ``orchestration_registry_reference`` (UUID of a registered orchestration configuration)
:param dataset_config: Dataset configuration object specifying the evaluation dataset
:type dataset_config: Dataset
:param metrics: List of metric configurations for evaluation
:type metrics: List[MetricConfig]
:param llm: LLM configuration from orchestration_v2 (LLMModelDetails)
:type llm: Optional[LLM]
:param template: Prompt template as string, PromptTemplateSpec, or TemplateRef
:type template: Optional[Union[str, PromptTemplateSpec, TemplateRef]]
:param orchestration_registry_reference: UUID of registered orchestration configuration
:type orchestration_registry_reference: Optional[str]
:param template_variable_mapping: Variable mapping for the prompt template
:type template_variable_mapping: Optional[dict]
:param test_row_count: Number of rows to sample from dataset (-1 for all rows), defaults to -1
:type test_row_count: Optional[int]
:param repetitions: Number of times to repeat evaluation over the dataset, defaults to 1
:type repetitions: Optional[int]
:param tags: User-defined metadata as key-value pairs, defaults to "{}"
:type tags: Optional[dict]
:param debug_mode: Enable debug logs in hyperscaler output path, defaults to False
:type debug_mode: Optional[bool]
.. note::
This module uses orchestration_v2 models directly.
**Example using TemplateRef with ID**:
>>> from gen_ai_hub.evaluations.models import EvaluationConfig, Dataset, MetricConfig
>>> from gen_ai_hub.orchestration_v2.models.llm_model_details import LLMModelDetails as LLM
>>> from gen_ai_hub.orchestration_v2.models.template_ref import TemplateRef, TemplateRefByID
>>> config = EvaluationConfig(
... dataset_config=Dataset("data/test.jsonl"),
... metrics=[MetricConfig(name="accuracy")],
... llm=LLM(name="gpt-4", version="latest"),
... template=TemplateRef(template_ref=TemplateRefByID(id="template-id-here")),
... test_row_count=100
... )
**Example using TemplateRef with scenario/name/version**:
>>> from gen_ai_hub.orchestration_v2.models.template_ref import TemplateRefByScenarioNameVersion
>>> config = EvaluationConfig(
... dataset_config=Dataset("data/test.jsonl"),
... metrics=[MetricConfig(name="accuracy")],
... llm=LLM(name="gpt-4", version="latest", params={"temperature": 0.7}),
... template=TemplateRef(template_ref=TemplateRefByScenarioNameVersion(
... scenario="foundation-models", name="prompt1", version="1.0"
... )),
... test_row_count=100
... ) |
| |
Methods defined here:
- __init__(self, dataset_config: gen_ai_hub.evaluations.models.dataset_config.Dataset, metrics: List[gen_ai_hub.evaluations.models.metric_config.MetricConfig], llm: Optional[gen_ai_hub.orchestration_v2.models.llm_model_details.LLMModelDetails] = None, template: Union[str, gen_ai_hub.prompt_registry.models.prompt_template.PromptTemplateSpec, gen_ai_hub.orchestration_v2.models.template_ref.TemplateRef, NoneType] = None, orchestration_registry_reference: Optional[str] = None, template_variable_mapping: Optional[dict] = None, test_row_count: Optional[int] = -1, repetitions: Optional[int] = 1, tags: Optional[dict] = '{}', debug_mode: Optional[bool] = False)
- Initialize an EvaluationConfig instance.
:param dataset_config: Dataset configuration object
:type dataset_config: Dataset
:param metrics: List of metric configurations
:type metrics: List[MetricConfig]
:param llm: LLM object from orchestration_v2 (LLMModelDetails), defaults to None
:type llm: Optional[LLM]
:param template: Prompt template (string, PromptTemplateSpec, or TemplateRef), defaults to None
:type template: Optional[Union[str, PromptTemplateSpec, TemplateRef]]
:param orchestration_registry_reference: UUID of orchestration config, defaults to None
:type orchestration_registry_reference: Optional[str]
:param template_variable_mapping: Variable mapping for prompt template, defaults to None
:type template_variable_mapping: Optional[dict]
:param test_row_count: Number of dataset rows to sample (-1 for all), defaults to -1
:type test_row_count: Optional[int]
:param repetitions: Number of evaluation repetitions (minimum: 1), defaults to 1
:type repetitions: Optional[int]
:param tags: Key-value metadata pairs applied to all runs, defaults to "{}"
:type tags: Optional[dict]
:param debug_mode: Enable debug logging, defaults to False
:type debug_mode: Optional[bool]
:raises ValueError: If neither (llm, template) nor orchestration_registry_reference is provided
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class EvaluationRun(builtins.object) |
| |
EvaluationRun(run_id: str, execution_id: str, ai_core_client: ai_core_sdk.ai_core_v2_client.AICoreV2Client, configuration_id: str = None, artifact_id: str = None, resource_group: str = None, object_store_credentials: gen_ai_hub.evaluations._internal._models._AWSObjectStoreData = None, metrics_list: List[str] = None)
Represents an individual EvaluationRun object and its associated context.
:param run_id: Unique identifier for the evaluation run
:type run_id: str
:param execution_id: ID of the AI Core execution
:type execution_id: str
:param ai_core_client: AI Core client instance
:type ai_core_client: AICoreV2Client
:param configuration_id: ID of the configuration, defaults to None
:type configuration_id: str
:param artifact_id: ID of the artifact, defaults to None
:type artifact_id: str
:param resource_group: Resource group name, defaults to None
:type resource_group: str
:param object_store_credentials: Object store credentials, defaults to None
:type object_store_credentials: _AWSObjectStoreData
:param metrics_list: List of metrics to evaluate, defaults to None
:type metrics_list: List[str] |
| |
Methods defined here:
- __init__(self, run_id: str, execution_id: str, ai_core_client: ai_core_sdk.ai_core_v2_client.AICoreV2Client, configuration_id: str = None, artifact_id: str = None, resource_group: str = None, object_store_credentials: gen_ai_hub.evaluations._internal._models._AWSObjectStoreData = None, metrics_list: List[str] = None)
- Initialize self. See help(type(self)) for accurate signature.
- get_current_status(self)
- Get the current status of the evaluation run.
:return: Current status of the run
:rtype: Status
:raises ValueError: If failed to retrieve the current status
- get_debug_info(self) -> gen_ai_hub.evaluations.models.evaluation_run.ExecutionStatusDetails
- Provide debug information when execution status is FAILED or DEAD.
:return: Execution status details including failed pod information
:rtype: ExecutionStatusDetails
- get_debug_logs(self)
- Get the complete trace of execution logs.
:return: List of log entries as dictionaries
:rtype: list
- load_results_tables(self)
- Download results from S3 and load the required table data.
:return: Dictionary containing completions and metrics table data
:rtype: dict
:raises RuntimeError: If failed to download results
- results(self)
- Get the results of the evaluation run.
:return: Results object for accessing completion and metric results
:rtype: Results
:raises ValueError: If execution is not completed
- set_cached_results_data(self, data)
- Set the cached results data from the child results class.
:param data: Results data to cache
:type data: Any
- wait_for_completion(self, timeout: Optional[int] = None)
- Wait for the evaluation run to complete by polling status.
:param timeout: Maximum time to wait in seconds, defaults to 3600 (1 hour)
:type timeout: Optional[int]
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class MetricConfig(builtins.object) |
| |
MetricConfig(reference: gen_ai_hub.evaluations.models.metric_config.MetricRef, variable_mapping: dict = None)
Defines the metric config of the evaluation flow
Parameters:
reference(MetricRef): Provide the reference of metric to be evaluated, can be one of name,uuid(id), scenario/name/version
variable_mapping(Optional[dict]): Any variable maping associated with the metric |
| |
Methods defined here:
- __init__(self, reference: gen_ai_hub.evaluations.models.metric_config.MetricRef, variable_mapping: dict = None)
- Initialize self. See help(type(self)) for accurate signature.
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class MetricRef(builtins.object) |
| |
MetricRef(scenario: str = None, name: str = None, version: str = None, id: str = None)
Represents a reference to a specific metric definition.
A metric can be identified in multiple ways:
- By its UUID from metric management service (`id`)
- By name (`name`)
- By a combination of scenario, name, and version (`scenario`, `name`, `version`) |
| |
Methods defined here:
- __init__(self, scenario: str = None, name: str = None, version: str = None, id: str = None)
- Initialize self. See help(type(self)) for accurate signature.
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class Results(builtins.object) |
| |
Results(run: gen_ai_hub.evaluations.models.evaluation_run.EvaluationRun)
Represents the Results handler for an EvaluationRun object.
This class provides methods to access completion results, metric results,
and aggregated results for a specific evaluation run.
:param run: The parent EvaluationRun object
:type run: EvaluationRun |
| |
Methods defined here:
- __init__(self, run: gen_ai_hub.evaluations.models.evaluation_run.EvaluationRun)
- Initialize self. See help(type(self)) for accurate signature.
- aggregations(self)
- Get the aggregated results for the run from the tracking service.
:return: JSON response containing aggregated metric results
:rtype: dict
:raises ValueError: If error occurs while fetching aggregation results
- completions(self)
- Get the completion results for the run.
:return: DataFrame containing completion results for the run
:rtype: pd.DataFrame
:raises ValueError: If error occurs while fetching completions
- metrics(self)
- Get the metric-level results for the run.
:return: DataFrame containing metric results for the run
:rtype: pd.DataFrame
:raises ValueError: If error occurs while fetching metric results
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
| |