scitex_ml
- class scitex_ml.ClassificationReporter(output_dir, tasks=None, precision=3, required_metrics=['balanced_accuracy', 'mcc', 'confusion_matrix', 'classification_report', 'roc_auc', 'roc_curve', 'pre_rec_auc', 'pre_rec_curve'], verbose=True, **kwargs)[source]
Unified classification reporter for single and multi-task scenarios.
This reporter automatically adapts to your use case: - Single task: Just use it without specifying tasks - Multiple tasks: Specify tasks upfront or create them dynamically - Seamless switching between single and multi-task workflows
Features: - Comprehensive metrics calculation (balanced accuracy, MCC, ROC-AUC, PR-AUC, etc.) - Automated visualization generation:
Confusion matrices
ROC and Precision-Recall curves
Feature importance plots (via plotter)
CV aggregation plots with faded fold lines
Comprehensive metrics dashboard
Multi-format report generation (Org, Markdown, LaTeX, HTML, DOCX, PDF)
Cross-validation support with automatic fold aggregation
Multi-task classification tracking
- Parameters:
output_dir (Union[str, Path]) – Base directory for outputs. If None, creates timestamped directory.
tasks (List[str], optional) – List of task names. If None, tasks are created dynamically as needed.
precision (int, default 3) – Number of decimal places for numerical outputs
required_metrics (List[str], optional) – List of metrics to calculate. Defaults to comprehensive set.
verbose (bool, default True) – Whether to print initialization messages
**kwargs – Additional arguments passed to base class
Examples
>>> # Single task usage (no tasks specified) >>> reporter = ClassificationReporter("./results") >>> reporter.calculate_metrics(y_true, y_pred, y_proba)
>>> # Multi-task with predefined tasks >>> reporter = ClassificationReporter("./results", tasks=["binary", "multiclass"]) >>> reporter.calculate_metrics(y_true, y_pred, task="binary")
>>> # Dynamic task creation >>> reporter = ClassificationReporter("./results") >>> reporter.calculate_metrics(y_true1, y_pred1, task="task1") >>> reporter.calculate_metrics(y_true2, y_pred2, task="task2")
>>> # Feature importance visualization (via plotter) >>> reporter._single_reporter.plotter.create_feature_importance_plot( ... feature_importance=importances, ... feature_names=feature_names, ... save_path="./results/feature_importance.png" ... )
>>> # CV aggregation plots (automatically created on save_summary) >>> for fold in range(5): ... metrics = reporter.calculate_metrics(y_true, y_pred, y_proba, fold=fold) >>> reporter.save_summary() # Creates CV aggregation plots with faded fold lines
- __init__(output_dir, tasks=None, precision=3, required_metrics=['balanced_accuracy', 'mcc', 'confusion_matrix', 'classification_report', 'roc_auc', 'roc_curve', 'pre_rec_auc', 'pre_rec_curve'], verbose=True, **kwargs)[source]
- calculate_metrics(y_true, y_pred, y_proba=None, labels=None, fold=None, task=None, verbose=True, model=None, feature_names=None)[source]
Calculate metrics for classification.
Automatically handles single vs multi-task scenarios: - If no task specified and no tasks defined: creates “default” task - If no task specified but tasks exist: uses first task - If task specified: uses/creates that specific task
- Parameters:
y_true (np.ndarray) – True class labels
y_pred (np.ndarray) – Predicted class labels
y_proba (np.ndarray, optional) – Prediction probabilities (required for AUC metrics)
labels (List[str], optional) – Class labels for display
fold (int, optional) – Fold index for cross-validation
task (str, optional) – Task identifier. If None and no tasks exist, creates “default” task.
verbose (bool, default True) – Whether to print progress
model (object, optional) – Trained model for automatic feature importance extraction
feature_names (List[str], optional) – Feature names for feature importance (required if model is provided)
- Returns:
Dictionary of calculated metrics
- Return type:
Dict[str, Any]
- save(data, relative_path, task=None, fold=None)[source]
Save custom data with automatic task/fold organization.
- Parameters:
data (Any) – Data to save (any format supported by scitex_io.save)
relative_path (Union[str, Path]) – Relative path from output directory
task (Optional[str], default None) – Task name. If provided, saves to task-specific directory
fold (Optional[int], default None) – If provided, automatically prepends “fold_{fold:02d}/” to path
- Returns:
Absolute path to the saved file
- Return type:
Path
Examples
>>> # Single task mode (no task specified) >>> reporter.save({"accuracy": 0.95}, "metrics.json")
>>> # Multi-task mode >>> reporter.save(results, "results.csv", task="binary", fold=0)
- get_summary()[source]
Get summary of all calculated metrics.
- Returns:
Summary of metrics across all tasks and folds
- Return type:
Dict[str, Any]
- save_feature_importance(model, feature_names, fold=None, task=None)[source]
Calculate and save feature importance for tree-based models.
- Parameters:
model (object) – Fitted classifier (must have feature_importances_)
feature_names (List[str]) – Names of features
fold (int, optional) – Fold number for tracking
task (str, optional) – Task name for multi-task mode
- Returns:
Dictionary of feature importances {feature_name: importance}
- Return type:
- class scitex_ml.Classifier(class_weight=None, random_state=42)[source]
Server for initializing various scikit-learn classifiers with consistent interface.
Example
>>> clf_server = Classifier(class_weight={0: 1.0, 1: 2.0}, random_state=42) >>> clf = clf_server("SVC", scaler=_StandardScaler()) >>> print(clf_server.list) ['CatBoostClassifier', 'Perceptron', ...]
- Parameters:
- class scitex_ml.EarlyStopping(patience=7, verbose=False, delta=1e-05, direction='minimize')[source]
Early stops the training if the validation score doesn’t improve after a given patience period.
- class scitex_ml.LearningCurveLogger[source]
Records and visualizes learning metrics during model training.
Example
>>> logger = LearningCurveLogger() >>> metrics = { ... "loss_plot": 0.5, ... "balanced_ACC_plot": 0.8, ... "pred_proba": pred_proba, ... "true_class": labels, ... "i_fold": 0, ... "i_epoch": 1, ... "i_global": 100 ... } >>> logger(metrics, "Training") >>> fig = logger.plot_learning_curves()
- property dfs: Dict[str, DataFrame]
Returns DataFrames of logged metrics.
- Returns:
Dictionary of DataFrames for each step
- Return type:
Dict[str, pd.DataFrame]
- to_metrics_df()[source]
Convert logged data to metrics DataFrame for plot_learning_curve.
- Returns:
DataFrame with columns: step, i_global, i_epoch, i_batch, and metric columns
- Return type:
pd.DataFrame
- plot_learning_curves(title=None, max_n_ticks=4, linewidth=1, scattersize=3, yscale='linear', spath=None)[source]
Plots learning curves from logged metrics.
Delegates to scitex_ml.plt.plot_learning_curve for consistent plotting.
- Parameters:
- Returns:
Figure containing learning curves
- Return type:
matplotlib.figure.Figure
- class scitex_ml.MultiTaskLoss(*args: Any, **kwargs: Any)[source]
-
Example
are_regression = [False, False] mtl = MultiTaskLoss(are_regression) losses = [torch.rand(1, requires_grad=True) for _ in range(len(are_regression))] loss = mtl(losses) print(loss) # [tensor([0.4215], grad_fn=<AddBackward0>), tensor([0.6190], grad_fn=<AddBackward0>)]
- scitex_ml.get_optimizer(name)[source]
Get optimizer class by name.
- Parameters:
name (
str) – Optimizer name (adam, ranger, rmsprop, sgd)- Returns:
Optimizer class
- Raises:
ValueError – If optimizer name is not supported
Modules
Scitex act module. |
|
Classification utilities with unified API. |
|
Scitex clustering module. |
|
Feature selection utilities for machine learning. |
|
Scitex loss module. |
|
Scitex metrics module. |
|
Scitex optim module. |
|
Scitex centralized plotting module. |
|
Scitex sk module. |
|
Sklearn wrappers and utilities. |
|
Training utilities. |
|
Scitex utils module. |