scitex_ml.classification.reporters.reporter_utils.aggregation
Metric aggregation utilities for cross-fold analysis.
Provides functions to aggregate metrics across folds and create summary tables.
Functions
|
Aggregate classification reports across folds. |
|
Aggregate confusion matrices across folds. |
|
Aggregate metrics across folds into arrays. |
|
Calculate mean and standard deviation. |
Calculate confidence interval for a metric. |
|
|
Create a summary table with fold results and statistics. |
|
Merge results from multiple fold directories. |
- scitex_ml.classification.reporters.reporter_utils.aggregation.aggregate_fold_metrics(fold_results, metrics_to_aggregate=None)[source]
Aggregate metrics across folds into arrays.
- Parameters:
- Returns:
Arrays of metric values across folds
- Return type:
Dict[str, np.ndarray]
Examples
>>> fold_results = [ ... {'balanced_accuracy': 0.85, 'mcc': 0.70}, ... {'balanced_accuracy': 0.87, 'mcc': 0.73}, ... {'balanced_accuracy': 0.83, 'mcc': 0.68} ... ] >>> aggregated = aggregate_fold_metrics(fold_results) >>> print(f"BA values: {aggregated['balanced_accuracy']}")
- scitex_ml.classification.reporters.reporter_utils.aggregation.calculate_mean_std(values, ddof=1)[source]
Calculate mean and standard deviation.
- Parameters:
- Returns:
(mean, std)
- Return type:
Examples
>>> mean, std = calculate_mean_std([0.85, 0.87, 0.83]) >>> print(f"Mean: {mean:.3f}, Std: {std:.3f}")
- scitex_ml.classification.reporters.reporter_utils.aggregation.create_summary_table(fold_results, metrics=None, include_stats=True, format_digits=3)[source]
Create a summary table with fold results and statistics.
- Parameters:
- Returns:
Summary table with folds as rows and metrics as columns
- Return type:
pd.DataFrame
Examples
>>> fold_results = [ ... {'fold': 0, 'balanced_accuracy': 0.85, 'mcc': 0.70}, ... {'fold': 1, 'balanced_accuracy': 0.87, 'mcc': 0.73}, ... {'fold': 2, 'balanced_accuracy': 0.83, 'mcc': 0.68} ... ] >>> df = create_summary_table(fold_results, include_stats=True) >>> print(df.to_string())
- scitex_ml.classification.reporters.reporter_utils.aggregation.aggregate_confusion_matrices(confusion_matrices)[source]
Aggregate confusion matrices across folds.
- Parameters:
confusion_matrices (List[np.ndarray]) – List of confusion matrices from each fold
- Returns:
Summed confusion matrix
- Return type:
np.ndarray
Examples
>>> cms = [np.array([[8, 2], [1, 9]]) for _ in range(3)] >>> total_cm = aggregate_confusion_matrices(cms) >>> print(total_cm)
- scitex_ml.classification.reporters.reporter_utils.aggregation.aggregate_classification_reports(reports, weighted_average=True)[source]
Aggregate classification reports across folds.
- Parameters:
reports (List[pd.DataFrame]) – List of classification report DataFrames
weighted_average (bool) – Whether to use weighted average based on support
- Returns:
Aggregated classification report
- Return type:
pd.DataFrame
Examples
>>> reports = [report_fold1_df, report_fold2_df, report_fold3_df] >>> agg_report = aggregate_classification_reports(reports) >>> print(agg_report)
- scitex_ml.classification.reporters.reporter_utils.aggregation.calculate_metric_confidence_interval(values, confidence=0.95)[source]
Calculate confidence interval for a metric.
- Parameters:
- Returns:
(mean, lower_bound, upper_bound)
- Return type:
Examples
>>> values = [0.85, 0.87, 0.83, 0.86, 0.84] >>> mean, lower, upper = calculate_metric_confidence_interval(values) >>> print(f"Mean: {mean:.3f} [{lower:.3f}, {upper:.3f}]")
- scitex_ml.classification.reporters.reporter_utils.aggregation.merge_fold_results(results_dir, n_folds)[source]
Merge results from multiple fold directories.
- Parameters:
- Returns:
Merged results dictionary
- Return type:
Dict[str, Any]
Examples
>>> merged = merge_fold_results("./results", n_folds=5) >>> print(f"Found {len(merged['folds'])} folds")