scitex_ml.classification.reporters.reporter_utils
Classification reporter utilities for modular metric calculation and reporting.
This module provides separated, focused utilities for: - Metric calculations - File organization - Validation - Report generation
- scitex_ml.classification.reporters.reporter_utils.calc_bacc(y_true, y_pred, labels=None, fold=None)[source]
Calculate balanced accuracy with robust label handling.
- Parameters:
y_true (np.ndarray) – True labels (can be str or int)
y_pred (np.ndarray) – Predicted labels (can be str or int)
labels (List, optional) – Expected label list
fold (int, optional) – Fold number for tracking
- Returns:
{‘metric’: ‘balanced_accuracy’, ‘value’: float, ‘fold’: int}
- Return type:
Dict[str, Any]
- scitex_ml.classification.reporters.reporter_utils.calc_mcc(y_true, y_pred, labels=None, fold=None)[source]
Calculate Matthews Correlation Coefficient with robust label handling.
- Parameters:
y_true (np.ndarray) – True labels (can be str or int)
y_pred (np.ndarray) – Predicted labels (can be str or int)
labels (List, optional) – Expected label list
fold (int, optional) – Fold number for tracking
- Returns:
{‘metric’: ‘mcc’, ‘value’: float, ‘fold’: int}
- Return type:
Dict[str, Any]
- scitex_ml.classification.reporters.reporter_utils.calc_conf_mat(y_true, y_pred, labels=None, fold=None, normalize=None)[source]
Calculate confusion matrix with robust label handling.
- Parameters:
- Returns:
- {
‘metric’: ‘confusion_matrix’, ‘value’: pd.DataFrame, ‘fold’: int, ‘labels’: list
}
- Return type:
Dict[str, Any]
- scitex_ml.classification.reporters.reporter_utils.calc_clf_report(y_true, y_pred, labels=None, fold=None)[source]
Generate classification report with robust label handling.
- Parameters:
y_true (np.ndarray) – True labels (can be str or int)
y_pred (np.ndarray) – Predicted labels (can be str or int)
labels (List, optional) – Expected label list
fold (int, optional) – Fold number for tracking
- Returns:
- {
‘metric’: ‘classification_report’, ‘value’: pd.DataFrame, ‘fold’: int, ‘labels’: list
}
- Return type:
Dict[str, Any]
- scitex_ml.classification.reporters.reporter_utils.calc_roc_auc(y_true, y_proba, labels=None, fold=None, return_curve=False)[source]
Calculate ROC AUC score with robust handling.
- Parameters:
- Returns:
{‘metric’: ‘roc_auc’, ‘value’: float, ‘fold’: int}
- Return type:
Dict[str, Any]
- scitex_ml.classification.reporters.reporter_utils.calc_pre_rec_auc(y_true, y_proba, labels=None, fold=None, return_curve=False)[source]
Calculate Precision-Recall AUC with robust handling.
- Parameters:
- Returns:
{‘metric’: ‘pr_auc’, ‘value’: float, ‘fold’: int}
- Return type:
Dict[str, Any]
- class scitex_ml.classification.reporters.reporter_utils.MetricStorage(base_dir, precision=3, verbose=True)[source]
Enhanced storage handler with lazy creation and precision control.
Features: - Creates directories only when actually needed - Rounds numerical values to specified precision - Graceful error handling with informative messages - Supports all standard data formats
- __init__(base_dir, precision=3, verbose=True)[source]
Initialize storage with base directory and precision.
- scitex_ml.classification.reporters.reporter_utils.save_metric(metric_value, path, fold=None, precision=4)[source]
Improved function to save individual metrics with precision control.
- scitex_ml.classification.reporters.reporter_utils.organize_outputs(base_dir)
Create directory structure mapping without actually creating directories.
This returns paths that can be created later when actually needed.
- class scitex_ml.classification.reporters.reporter_utils.MetricValidator(required_metrics)[source]
Validates classification metrics for completeness and consistency.
This class checks that all required metrics are present across folds and validates metric values are within expected ranges.
- METRIC_RANGES = {'accuracy': (0.0, 1.0), 'balanced_accuracy': (0.0, 1.0), 'f1': (0.0, 1.0), 'mcc': (-1.0, 1.0), 'pr_auc': (0.0, 1.0), 'precision': (0.0, 1.0), 'recall': (0.0, 1.0), 'roc_auc': (0.0, 1.0)}
- __init__(required_metrics)[source]
Initialize validator with required metrics.
- Parameters:
required_metrics (List[str]) – List of metric names that must be present
- validate_metric_value(metric_name, value)[source]
Validate a single metric value is within expected range.
- scitex_ml.classification.reporters.reporter_utils.validate_completeness(output_dir, required_metrics, n_folds)[source]
Validate completeness of saved metrics.
- Parameters:
- Returns:
Validation report
- Return type:
Dict[str, Any]
Examples
>>> report = validate_completeness( ... "./results", ... ['balanced_accuracy', 'mcc', 'confusion_matrix'], ... n_folds=5 ... ) >>> if report['complete']: ... print("All metrics present!")
- scitex_ml.classification.reporters.reporter_utils.check_required_metrics(metrics_dict, required)[source]
Check if all required metrics are present.
- Parameters:
- Returns:
(all_present, missing_metrics)
- Return type:
Examples
>>> metrics = {'balanced_accuracy': 0.85, 'mcc': 0.7} >>> complete, missing = check_required_metrics( ... metrics, ... ['balanced_accuracy', 'mcc', 'roc_auc'] ... ) >>> print(f"Missing: {missing}") # ['roc_auc']
- scitex_ml.classification.reporters.reporter_utils.generate_markdown_report(results, output_path, include_plots=True, verbose=True)[source]
Generate comprehensive markdown report.
- Parameters:
- Returns:
Path to generated report
- Return type:
Path
- scitex_ml.classification.reporters.reporter_utils.generate_latex_report(results, output_path, verbose=True)[source]
Generate LaTeX report for academic papers.
- scitex_ml.classification.reporters.reporter_utils.create_summary_statistics(results)[source]
Create comprehensive summary statistics from results.
- scitex_ml.classification.reporters.reporter_utils.export_for_paper(results, output_dir, verbose=True)[source]
Export results in formats suitable for academic papers.
- scitex_ml.classification.reporters.reporter_utils.aggregate_fold_metrics(fold_results, metrics_to_aggregate=None)[source]
Aggregate metrics across folds into arrays.
- Parameters:
- Returns:
Arrays of metric values across folds
- Return type:
Dict[str, np.ndarray]
Examples
>>> fold_results = [ ... {'balanced_accuracy': 0.85, 'mcc': 0.70}, ... {'balanced_accuracy': 0.87, 'mcc': 0.73}, ... {'balanced_accuracy': 0.83, 'mcc': 0.68} ... ] >>> aggregated = aggregate_fold_metrics(fold_results) >>> print(f"BA values: {aggregated['balanced_accuracy']}")
- scitex_ml.classification.reporters.reporter_utils.calculate_mean_std(values, ddof=1)[source]
Calculate mean and standard deviation.
- Parameters:
- Returns:
(mean, std)
- Return type:
Examples
>>> mean, std = calculate_mean_std([0.85, 0.87, 0.83]) >>> print(f"Mean: {mean:.3f}, Std: {std:.3f}")
- scitex_ml.classification.reporters.reporter_utils.create_summary_table(fold_results, metrics=None, include_stats=True, format_digits=3)[source]
Create a summary table with fold results and statistics.
- Parameters:
- Returns:
Summary table with folds as rows and metrics as columns
- Return type:
pd.DataFrame
Examples
>>> fold_results = [ ... {'fold': 0, 'balanced_accuracy': 0.85, 'mcc': 0.70}, ... {'fold': 1, 'balanced_accuracy': 0.87, 'mcc': 0.73}, ... {'fold': 2, 'balanced_accuracy': 0.83, 'mcc': 0.68} ... ] >>> df = create_summary_table(fold_results, include_stats=True) >>> print(df.to_string())
- scitex_ml.classification.reporters.reporter_utils.aggregate_confusion_matrices(confusion_matrices)[source]
Aggregate confusion matrices across folds.
- Parameters:
confusion_matrices (List[np.ndarray]) – List of confusion matrices from each fold
- Returns:
Summed confusion matrix
- Return type:
np.ndarray
Examples
>>> cms = [np.array([[8, 2], [1, 9]]) for _ in range(3)] >>> total_cm = aggregate_confusion_matrices(cms) >>> print(total_cm)
Modules
Metric aggregation utilities for cross-fold analysis. |
|
Data models for classification reporting using dataclasses. |
|
Validation utilities for classification metrics. |