scitex_ml

class scitex_ml.ClassificationReporter(output_dir, tasks=None, precision=3, required_metrics=['balanced_accuracy', 'mcc', 'confusion_matrix', 'classification_report', 'roc_auc', 'roc_curve', 'pre_rec_auc', 'pre_rec_curve'], verbose=True, **kwargs)[source]

Unified classification reporter for single and multi-task scenarios.

This reporter automatically adapts to your use case: - Single task: Just use it without specifying tasks - Multiple tasks: Specify tasks upfront or create them dynamically - Seamless switching between single and multi-task workflows

Features: - Comprehensive metrics calculation (balanced accuracy, MCC, ROC-AUC, PR-AUC, etc.) - Automated visualization generation:

  • Confusion matrices

  • ROC and Precision-Recall curves

  • Feature importance plots (via plotter)

  • CV aggregation plots with faded fold lines

  • Comprehensive metrics dashboard

  • Multi-format report generation (Org, Markdown, LaTeX, HTML, DOCX, PDF)

  • Cross-validation support with automatic fold aggregation

  • Multi-task classification tracking

Parameters:
  • output_dir (Union[str, Path]) – Base directory for outputs. If None, creates timestamped directory.

  • tasks (List[str], optional) – List of task names. If None, tasks are created dynamically as needed.

  • precision (int, default 3) – Number of decimal places for numerical outputs

  • required_metrics (List[str], optional) – List of metrics to calculate. Defaults to comprehensive set.

  • verbose (bool, default True) – Whether to print initialization messages

  • **kwargs – Additional arguments passed to base class

Examples

>>> # Single task usage (no tasks specified)
>>> reporter = ClassificationReporter("./results")
>>> reporter.calculate_metrics(y_true, y_pred, y_proba)
>>> # Multi-task with predefined tasks
>>> reporter = ClassificationReporter("./results", tasks=["binary", "multiclass"])
>>> reporter.calculate_metrics(y_true, y_pred, task="binary")
>>> # Dynamic task creation
>>> reporter = ClassificationReporter("./results")
>>> reporter.calculate_metrics(y_true1, y_pred1, task="task1")
>>> reporter.calculate_metrics(y_true2, y_pred2, task="task2")
>>> # Feature importance visualization (via plotter)
>>> reporter._single_reporter.plotter.create_feature_importance_plot(
...     feature_importance=importances,
...     feature_names=feature_names,
...     save_path="./results/feature_importance.png"
... )
>>> # CV aggregation plots (automatically created on save_summary)
>>> for fold in range(5):
...     metrics = reporter.calculate_metrics(y_true, y_pred, y_proba, fold=fold)
>>> reporter.save_summary()  # Creates CV aggregation plots with faded fold lines
__init__(output_dir, tasks=None, precision=3, required_metrics=['balanced_accuracy', 'mcc', 'confusion_matrix', 'classification_report', 'roc_auc', 'roc_curve', 'pre_rec_auc', 'pre_rec_curve'], verbose=True, **kwargs)[source]
calculate_metrics(y_true, y_pred, y_proba=None, labels=None, fold=None, task=None, verbose=True, model=None, feature_names=None)[source]

Calculate metrics for classification.

Automatically handles single vs multi-task scenarios: - If no task specified and no tasks defined: creates “default” task - If no task specified but tasks exist: uses first task - If task specified: uses/creates that specific task

Parameters:
  • y_true (np.ndarray) – True class labels

  • y_pred (np.ndarray) – Predicted class labels

  • y_proba (np.ndarray, optional) – Prediction probabilities (required for AUC metrics)

  • labels (List[str], optional) – Class labels for display

  • fold (int, optional) – Fold index for cross-validation

  • task (str, optional) – Task identifier. If None and no tasks exist, creates “default” task.

  • verbose (bool, default True) – Whether to print progress

  • model (object, optional) – Trained model for automatic feature importance extraction

  • feature_names (List[str], optional) – Feature names for feature importance (required if model is provided)

Returns:

Dictionary of calculated metrics

Return type:

Dict[str, Any]

save(data, relative_path, task=None, fold=None)[source]

Save custom data with automatic task/fold organization.

Parameters:
  • data (Any) – Data to save (any format supported by scitex_io.save)

  • relative_path (Union[str, Path]) – Relative path from output directory

  • task (Optional[str], default None) – Task name. If provided, saves to task-specific directory

  • fold (Optional[int], default None) – If provided, automatically prepends “fold_{fold:02d}/” to path

Returns:

Absolute path to the saved file

Return type:

Path

Examples

>>> # Single task mode (no task specified)
>>> reporter.save({"accuracy": 0.95}, "metrics.json")
>>> # Multi-task mode
>>> reporter.save(results, "results.csv", task="binary", fold=0)
get_summary()[source]

Get summary of all calculated metrics.

Returns:

Summary of metrics across all tasks and folds

Return type:

Dict[str, Any]

save_summary(filename='summary.json', verbose=True)[source]

Save summary to file.

Parameters:
  • filename (str) – Filename for summary

  • verbose (bool) – Whether to print summary

Returns:

Path to saved summary file

Return type:

Path

save_feature_importance(model, feature_names, fold=None, task=None)[source]

Calculate and save feature importance for tree-based models.

Parameters:
  • model (object) – Fitted classifier (must have feature_importances_)

  • feature_names (List[str]) – Names of features

  • fold (int, optional) – Fold number for tracking

  • task (str, optional) – Task name for multi-task mode

Returns:

Dictionary of feature importances {feature_name: importance}

Return type:

Dict[str, float]

save_feature_importance_summary(all_importances, task=None)[source]

Create summary visualization of feature importances across all folds.

Parameters:
  • all_importances (List[Dict[str, float]]) – List of feature importance dicts from each fold

  • task (str, optional) – Task name for multi-task mode

Return type:

None

class scitex_ml.Classifier(class_weight=None, random_state=42)[source]

Server for initializing various scikit-learn classifiers with consistent interface.

Example

>>> clf_server = Classifier(class_weight={0: 1.0, 1: 2.0}, random_state=42)
>>> clf = clf_server("SVC", scaler=_StandardScaler())
>>> print(clf_server.list)
['CatBoostClassifier', 'Perceptron', ...]
Parameters:
  • class_weight (Optional[Dict[int, float]]) – Class weights for handling imbalanced datasets

  • random_state (int) – Random seed for reproducibility

__init__(class_weight=None, random_state=42)[source]
property list: List[str]
class scitex_ml.EarlyStopping(patience=7, verbose=False, delta=1e-05, direction='minimize')[source]

Early stops the training if the validation score doesn’t improve after a given patience period.

__init__(patience=7, verbose=False, delta=1e-05, direction='minimize')[source]
Parameters:
  • patience (int) – How long to wait after last time validation score improved. Default: 7

  • verbose (bool) – If True, prints a message for each validation score improvement. Default: False

  • delta (float) – Minimum change in the monitored quantity to qualify as an improvement. Default: 0

is_best(val_score)[source]
save(current_score, models_spaths_dict, i_global)[source]

Saves model when validation score decrease.

class scitex_ml.LearningCurveLogger[source]

Records and visualizes learning metrics during model training.

Example

>>> logger = LearningCurveLogger()
>>> metrics = {
...     "loss_plot": 0.5,
...     "balanced_ACC_plot": 0.8,
...     "pred_proba": pred_proba,
...     "true_class": labels,
...     "i_fold": 0,
...     "i_epoch": 1,
...     "i_global": 100
... }
>>> logger(metrics, "Training")
>>> fig = logger.plot_learning_curves()
__init__()[source]
property dfs: Dict[str, DataFrame]

Returns DataFrames of logged metrics.

Returns:

Dictionary of DataFrames for each step

Return type:

Dict[str, pd.DataFrame]

to_metrics_df()[source]

Convert logged data to metrics DataFrame for plot_learning_curve.

Returns:

DataFrame with columns: step, i_global, i_epoch, i_batch, and metric columns

Return type:

pd.DataFrame

plot_learning_curves(title=None, max_n_ticks=4, linewidth=1, scattersize=3, yscale='linear', spath=None)[source]

Plots learning curves from logged metrics.

Delegates to scitex_ml.plt.plot_learning_curve for consistent plotting.

Parameters:
  • title (str, optional) – Plot title

  • max_n_ticks (int) – Maximum number of ticks on axes

  • linewidth (float) – Width of plot lines

  • scattersize (float) – Size of scatter points

  • yscale (str) – Y-axis scale (‘linear’ or ‘log’)

  • spath (str, optional) – Save path for the figure

Returns:

Figure containing learning curves

Return type:

matplotlib.figure.Figure

get_x_of_i_epoch(x, step, i_epoch)[source]

Gets metric values for a specific epoch.

Parameters:
  • x (str) – Name of metric to retrieve

  • step (str) – Training phase

  • i_epoch (int) – Epoch number

Returns:

Array of metric values for specified epoch

Return type:

np.ndarray

print(step)[source]

Prints metrics for given step.

Parameters:

step (str) – Training phase to print metrics for

Return type:

None

class scitex_ml.MultiTaskLoss(*args: Any, **kwargs: Any)[source]

# https://openaccess.thecvf.com/content_cvpr_2018/papers/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.pdf

Example

are_regression = [False, False] mtl = MultiTaskLoss(are_regression) losses = [torch.rand(1, requires_grad=True) for _ in range(len(are_regression))] loss = mtl(losses) print(loss) # [tensor([0.4215], grad_fn=<AddBackward0>), tensor([0.6190], grad_fn=<AddBackward0>)]

__init__(are_regression=[False, False], reduction='none')[source]
forward(losses)[source]
scitex_ml.get_optimizer(name)[source]

Get optimizer class by name.

Parameters:

name (str) – Optimizer name (adam, ranger, rmsprop, sgd)

Returns:

Optimizer class

Raises:

ValueError – If optimizer name is not supported

scitex_ml.set_optimizer(models, optimizer_name, lr)[source]

Set optimizer for models.

Parameters:
  • models – Model or list of models

  • optimizer_name (str) – Name of optimizer

  • lr (float) – Learning rate

Returns:

Configured optimizer instance

Modules

activation

Scitex act module.

classification

Classification utilities with unified API.

clustering

Scitex clustering module.

feature_extraction

feature_selection

Feature selection utilities for machine learning.

loss

Scitex loss module.

metrics

Scitex metrics module.

optim

Scitex optim module.

plt

Scitex centralized plotting module.

sk

Scitex sk module.

sklearn

Sklearn wrappers and utilities.

training

Training utilities.

utils

Scitex utils module.