scitex_ml.classification.reporters

Reporter implementations for classification.

class scitex_ml.classification.reporters.ClassificationReporter(output_dir, tasks=None, precision=3, required_metrics=['balanced_accuracy', 'mcc', 'confusion_matrix', 'classification_report', 'roc_auc', 'roc_curve', 'pre_rec_auc', 'pre_rec_curve'], verbose=True, **kwargs)[source]

Unified classification reporter for single and multi-task scenarios.

This reporter automatically adapts to your use case: - Single task: Just use it without specifying tasks - Multiple tasks: Specify tasks upfront or create them dynamically - Seamless switching between single and multi-task workflows

Features: - Comprehensive metrics calculation (balanced accuracy, MCC, ROC-AUC, PR-AUC, etc.) - Automated visualization generation:

  • Confusion matrices

  • ROC and Precision-Recall curves

  • Feature importance plots (via plotter)

  • CV aggregation plots with faded fold lines

  • Comprehensive metrics dashboard

  • Multi-format report generation (Org, Markdown, LaTeX, HTML, DOCX, PDF)

  • Cross-validation support with automatic fold aggregation

  • Multi-task classification tracking

Parameters:
  • output_dir (Union[str, Path]) – Base directory for outputs. If None, creates timestamped directory.

  • tasks (List[str], optional) – List of task names. If None, tasks are created dynamically as needed.

  • precision (int, default 3) – Number of decimal places for numerical outputs

  • required_metrics (List[str], optional) – List of metrics to calculate. Defaults to comprehensive set.

  • verbose (bool, default True) – Whether to print initialization messages

  • **kwargs – Additional arguments passed to base class

Examples

>>> # Single task usage (no tasks specified)
>>> reporter = ClassificationReporter("./results")
>>> reporter.calculate_metrics(y_true, y_pred, y_proba)
>>> # Multi-task with predefined tasks
>>> reporter = ClassificationReporter("./results", tasks=["binary", "multiclass"])
>>> reporter.calculate_metrics(y_true, y_pred, task="binary")
>>> # Dynamic task creation
>>> reporter = ClassificationReporter("./results")
>>> reporter.calculate_metrics(y_true1, y_pred1, task="task1")
>>> reporter.calculate_metrics(y_true2, y_pred2, task="task2")
>>> # Feature importance visualization (via plotter)
>>> reporter._single_reporter.plotter.create_feature_importance_plot(
...     feature_importance=importances,
...     feature_names=feature_names,
...     save_path="./results/feature_importance.png"
... )
>>> # CV aggregation plots (automatically created on save_summary)
>>> for fold in range(5):
...     metrics = reporter.calculate_metrics(y_true, y_pred, y_proba, fold=fold)
>>> reporter.save_summary()  # Creates CV aggregation plots with faded fold lines
__init__(output_dir, tasks=None, precision=3, required_metrics=['balanced_accuracy', 'mcc', 'confusion_matrix', 'classification_report', 'roc_auc', 'roc_curve', 'pre_rec_auc', 'pre_rec_curve'], verbose=True, **kwargs)[source]
calculate_metrics(y_true, y_pred, y_proba=None, labels=None, fold=None, task=None, verbose=True, model=None, feature_names=None)[source]

Calculate metrics for classification.

Automatically handles single vs multi-task scenarios: - If no task specified and no tasks defined: creates “default” task - If no task specified but tasks exist: uses first task - If task specified: uses/creates that specific task

Parameters:
  • y_true (np.ndarray) – True class labels

  • y_pred (np.ndarray) – Predicted class labels

  • y_proba (np.ndarray, optional) – Prediction probabilities (required for AUC metrics)

  • labels (List[str], optional) – Class labels for display

  • fold (int, optional) – Fold index for cross-validation

  • task (str, optional) – Task identifier. If None and no tasks exist, creates “default” task.

  • verbose (bool, default True) – Whether to print progress

  • model (object, optional) – Trained model for automatic feature importance extraction

  • feature_names (List[str], optional) – Feature names for feature importance (required if model is provided)

Returns:

Dictionary of calculated metrics

Return type:

Dict[str, Any]

save(data, relative_path, task=None, fold=None)[source]

Save custom data with automatic task/fold organization.

Parameters:
  • data (Any) – Data to save (any format supported by scitex_io.save)

  • relative_path (Union[str, Path]) – Relative path from output directory

  • task (Optional[str], default None) – Task name. If provided, saves to task-specific directory

  • fold (Optional[int], default None) – If provided, automatically prepends “fold_{fold:02d}/” to path

Returns:

Absolute path to the saved file

Return type:

Path

Examples

>>> # Single task mode (no task specified)
>>> reporter.save({"accuracy": 0.95}, "metrics.json")
>>> # Multi-task mode
>>> reporter.save(results, "results.csv", task="binary", fold=0)
get_summary()[source]

Get summary of all calculated metrics.

Returns:

Summary of metrics across all tasks and folds

Return type:

Dict[str, Any]

save_summary(filename='summary.json', verbose=True)[source]

Save summary to file.

Parameters:
  • filename (str) – Filename for summary

  • verbose (bool) – Whether to print summary

Returns:

Path to saved summary file

Return type:

Path

save_feature_importance(model, feature_names, fold=None, task=None)[source]

Calculate and save feature importance for tree-based models.

Parameters:
  • model (object) – Fitted classifier (must have feature_importances_)

  • feature_names (List[str]) – Names of features

  • fold (int, optional) – Fold number for tracking

  • task (str, optional) – Task name for multi-task mode

Returns:

Dictionary of feature importances {feature_name: importance}

Return type:

Dict[str, float]

save_feature_importance_summary(all_importances, task=None)[source]

Create summary visualization of feature importances across all folds.

Parameters:
  • all_importances (List[Dict[str, float]]) – List of feature importance dicts from each fold

  • task (str, optional) – Task name for multi-task mode

Return type:

None

class scitex_ml.classification.reporters.SingleTaskClassificationReporter(output_dir, config=None, verbose=True, **kwargs)[source]

Improved single-task classification reporter with unified API.

Key improvements: - Inherits from BaseClassificationReporter for consistent API - Lazy directory creation (no empty folders) - Numerical precision control - Graceful plotting with proper error handling - Consistent parameter names across all methods

Features: - Comprehensive metrics calculation (balanced accuracy, MCC, ROC-AUC, PR-AUC, etc.) - Automated visualization generation:

  • Confusion matrices

  • ROC and Precision-Recall curves

  • Feature importance plots

  • CV aggregation plots with faded fold lines

  • Comprehensive metrics dashboard

  • Multi-format report generation (Org, Markdown, LaTeX, HTML, DOCX, PDF)

  • Cross-validation support with automatic fold aggregation

Parameters:
  • output_dir (Union[str, Path]) – Base directory for outputs. If None, creates timestamped directory.

  • config (ReporterConfig, optional) – Configuration object for advanced settings

  • verbose (bool, default True) – Print initialization message

  • **kwargs – Additional arguments passed to base class

Examples

>>> # Basic usage
>>> reporter = SingleTaskClassificationReporter("./results")
>>> metrics = reporter.calculate_metrics(y_true, y_pred, y_proba, labels=['A', 'B'])
>>> reporter.save_summary()
>>> # Cross-validation with automatic CV aggregation plots
>>> for fold, (train_idx, test_idx) in enumerate(cv.split(X, y)):
...     metrics = reporter.calculate_metrics(
...         y_test, y_pred, y_proba, fold=fold
...     )
>>> reporter.save_summary()  # Automatically creates CV aggregation visualizations
>>> # Feature importance visualization
>>> reporter.plotter.create_feature_importance_plot(
...     feature_importance=importances,
...     feature_names=feature_names,
...     save_path=output_dir / "feature_importance.png"
... )
__init__(output_dir, config=None, verbose=True, **kwargs)[source]
set_session_config(config)[source]

Set the SciTeX session CONFIG object for inclusion in reports.

Parameters:

config (Any) – The SciTeX session CONFIG object

Return type:

None

save_summary(filename='cv_summary/summary.json', verbose=True)[source]

Save summary to file, create CV summary visualizations, and generate reports.

Parameters:
  • filename (str, default "cv_summary/summary.json") – Filename for summary (now in cv_summary directory)

  • verbose (bool, default True) – Print summary to console

Returns:

Path to saved summary file

Return type:

Path

Modules

reporter_utils

Classification reporter utilities for modular metric calculation and reporting.