scitex_ml.classification.reporters
Reporter implementations for classification.
- class scitex_ml.classification.reporters.ClassificationReporter(output_dir, tasks=None, precision=3, required_metrics=['balanced_accuracy', 'mcc', 'confusion_matrix', 'classification_report', 'roc_auc', 'roc_curve', 'pre_rec_auc', 'pre_rec_curve'], verbose=True, **kwargs)[source]
Unified classification reporter for single and multi-task scenarios.
This reporter automatically adapts to your use case: - Single task: Just use it without specifying tasks - Multiple tasks: Specify tasks upfront or create them dynamically - Seamless switching between single and multi-task workflows
Features: - Comprehensive metrics calculation (balanced accuracy, MCC, ROC-AUC, PR-AUC, etc.) - Automated visualization generation:
Confusion matrices
ROC and Precision-Recall curves
Feature importance plots (via plotter)
CV aggregation plots with faded fold lines
Comprehensive metrics dashboard
Multi-format report generation (Org, Markdown, LaTeX, HTML, DOCX, PDF)
Cross-validation support with automatic fold aggregation
Multi-task classification tracking
- Parameters:
output_dir (Union[str, Path]) – Base directory for outputs. If None, creates timestamped directory.
tasks (List[str], optional) – List of task names. If None, tasks are created dynamically as needed.
precision (int, default 3) – Number of decimal places for numerical outputs
required_metrics (List[str], optional) – List of metrics to calculate. Defaults to comprehensive set.
verbose (bool, default True) – Whether to print initialization messages
**kwargs – Additional arguments passed to base class
Examples
>>> # Single task usage (no tasks specified) >>> reporter = ClassificationReporter("./results") >>> reporter.calculate_metrics(y_true, y_pred, y_proba)
>>> # Multi-task with predefined tasks >>> reporter = ClassificationReporter("./results", tasks=["binary", "multiclass"]) >>> reporter.calculate_metrics(y_true, y_pred, task="binary")
>>> # Dynamic task creation >>> reporter = ClassificationReporter("./results") >>> reporter.calculate_metrics(y_true1, y_pred1, task="task1") >>> reporter.calculate_metrics(y_true2, y_pred2, task="task2")
>>> # Feature importance visualization (via plotter) >>> reporter._single_reporter.plotter.create_feature_importance_plot( ... feature_importance=importances, ... feature_names=feature_names, ... save_path="./results/feature_importance.png" ... )
>>> # CV aggregation plots (automatically created on save_summary) >>> for fold in range(5): ... metrics = reporter.calculate_metrics(y_true, y_pred, y_proba, fold=fold) >>> reporter.save_summary() # Creates CV aggregation plots with faded fold lines
- __init__(output_dir, tasks=None, precision=3, required_metrics=['balanced_accuracy', 'mcc', 'confusion_matrix', 'classification_report', 'roc_auc', 'roc_curve', 'pre_rec_auc', 'pre_rec_curve'], verbose=True, **kwargs)[source]
- calculate_metrics(y_true, y_pred, y_proba=None, labels=None, fold=None, task=None, verbose=True, model=None, feature_names=None)[source]
Calculate metrics for classification.
Automatically handles single vs multi-task scenarios: - If no task specified and no tasks defined: creates “default” task - If no task specified but tasks exist: uses first task - If task specified: uses/creates that specific task
- Parameters:
y_true (np.ndarray) – True class labels
y_pred (np.ndarray) – Predicted class labels
y_proba (np.ndarray, optional) – Prediction probabilities (required for AUC metrics)
labels (List[str], optional) – Class labels for display
fold (int, optional) – Fold index for cross-validation
task (str, optional) – Task identifier. If None and no tasks exist, creates “default” task.
verbose (bool, default True) – Whether to print progress
model (object, optional) – Trained model for automatic feature importance extraction
feature_names (List[str], optional) – Feature names for feature importance (required if model is provided)
- Returns:
Dictionary of calculated metrics
- Return type:
Dict[str, Any]
- save(data, relative_path, task=None, fold=None)[source]
Save custom data with automatic task/fold organization.
- Parameters:
data (Any) – Data to save (any format supported by scitex_io.save)
relative_path (Union[str, Path]) – Relative path from output directory
task (Optional[str], default None) – Task name. If provided, saves to task-specific directory
fold (Optional[int], default None) – If provided, automatically prepends “fold_{fold:02d}/” to path
- Returns:
Absolute path to the saved file
- Return type:
Path
Examples
>>> # Single task mode (no task specified) >>> reporter.save({"accuracy": 0.95}, "metrics.json")
>>> # Multi-task mode >>> reporter.save(results, "results.csv", task="binary", fold=0)
- get_summary()[source]
Get summary of all calculated metrics.
- Returns:
Summary of metrics across all tasks and folds
- Return type:
Dict[str, Any]
- save_feature_importance(model, feature_names, fold=None, task=None)[source]
Calculate and save feature importance for tree-based models.
- Parameters:
model (object) – Fitted classifier (must have feature_importances_)
feature_names (List[str]) – Names of features
fold (int, optional) – Fold number for tracking
task (str, optional) – Task name for multi-task mode
- Returns:
Dictionary of feature importances {feature_name: importance}
- Return type:
- class scitex_ml.classification.reporters.SingleTaskClassificationReporter(output_dir, config=None, verbose=True, **kwargs)[source]
Improved single-task classification reporter with unified API.
Key improvements: - Inherits from BaseClassificationReporter for consistent API - Lazy directory creation (no empty folders) - Numerical precision control - Graceful plotting with proper error handling - Consistent parameter names across all methods
Features: - Comprehensive metrics calculation (balanced accuracy, MCC, ROC-AUC, PR-AUC, etc.) - Automated visualization generation:
Confusion matrices
ROC and Precision-Recall curves
Feature importance plots
CV aggregation plots with faded fold lines
Comprehensive metrics dashboard
Multi-format report generation (Org, Markdown, LaTeX, HTML, DOCX, PDF)
Cross-validation support with automatic fold aggregation
- Parameters:
Examples
>>> # Basic usage >>> reporter = SingleTaskClassificationReporter("./results") >>> metrics = reporter.calculate_metrics(y_true, y_pred, y_proba, labels=['A', 'B']) >>> reporter.save_summary()
>>> # Cross-validation with automatic CV aggregation plots >>> for fold, (train_idx, test_idx) in enumerate(cv.split(X, y)): ... metrics = reporter.calculate_metrics( ... y_test, y_pred, y_proba, fold=fold ... ) >>> reporter.save_summary() # Automatically creates CV aggregation visualizations
>>> # Feature importance visualization >>> reporter.plotter.create_feature_importance_plot( ... feature_importance=importances, ... feature_names=feature_names, ... save_path=output_dir / "feature_importance.png" ... )
- set_session_config(config)[source]
Set the SciTeX session CONFIG object for inclusion in reports.
- Parameters:
config (Any) – The SciTeX session CONFIG object
- Return type:
Modules
Classification reporter utilities for modular metric calculation and reporting. |