scitex_ml.metrics

Scitex metrics module.

Standardized naming convention: - calc_* functions: Modern standardized metric calculations - Legacy names (bACC, balanced_accuracy, etc.): For backward compatibility

scitex_ml.metrics.calc_bacc(y_true, y_pred, labels=None, fold=None)[source]

Calculate balanced accuracy with robust label handling.

Parameters:
  • y_true (np.ndarray) – True labels (can be str or int)

  • y_pred (np.ndarray) – Predicted labels (can be str or int)

  • labels (List, optional) – Expected label list

  • fold (int, optional) – Fold number for tracking

Returns:

{‘metric’: ‘balanced_accuracy’, ‘value’: float, ‘fold’: int}

Return type:

Dict[str, Any]

scitex_ml.metrics.calc_mcc(y_true, y_pred, labels=None, fold=None)[source]

Calculate Matthews Correlation Coefficient with robust label handling.

Parameters:
  • y_true (np.ndarray) – True labels (can be str or int)

  • y_pred (np.ndarray) – Predicted labels (can be str or int)

  • labels (List, optional) – Expected label list

  • fold (int, optional) – Fold number for tracking

Returns:

{‘metric’: ‘mcc’, ‘value’: float, ‘fold’: int}

Return type:

Dict[str, Any]

scitex_ml.metrics.calc_conf_mat(y_true, y_pred, labels=None, fold=None, normalize=None)[source]

Calculate confusion matrix with robust label handling.

Parameters:
  • y_true (np.ndarray) – True labels (can be str or int)

  • y_pred (np.ndarray) – Predicted labels (can be str or int)

  • labels (List, optional) – Expected label list

  • fold (int, optional) – Fold number for tracking

  • normalize (str, optional) – ‘true’, ‘pred’, ‘all’, or None

Returns:

{

‘metric’: ‘confusion_matrix’, ‘value’: pd.DataFrame, ‘fold’: int, ‘labels’: list

}

Return type:

Dict[str, Any]

scitex_ml.metrics.calc_clf_report(y_true, y_pred, labels=None, fold=None)[source]

Generate classification report with robust label handling.

Parameters:
  • y_true (np.ndarray) – True labels (can be str or int)

  • y_pred (np.ndarray) – Predicted labels (can be str or int)

  • labels (List, optional) – Expected label list

  • fold (int, optional) – Fold number for tracking

Returns:

{

‘metric’: ‘classification_report’, ‘value’: pd.DataFrame, ‘fold’: int, ‘labels’: list

}

Return type:

Dict[str, Any]

scitex_ml.metrics.calc_roc_auc(y_true, y_proba, labels=None, fold=None, return_curve=False)[source]

Calculate ROC AUC score with robust handling.

Parameters:
  • y_true (np.ndarray) – True labels (can be str or int)

  • y_proba (np.ndarray) – Predicted probabilities

  • labels (List, optional) – Expected label list

  • fold (int, optional) – Fold number for tracking

  • return_curve (bool) – Whether to return ROC curve data

Returns:

{‘metric’: ‘roc_auc’, ‘value’: float, ‘fold’: int}

Return type:

Dict[str, Any]

scitex_ml.metrics.calc_pre_rec_auc(y_true, y_proba, labels=None, fold=None, return_curve=False)[source]

Calculate Precision-Recall AUC with robust handling.

Parameters:
  • y_true (np.ndarray) – True labels (can be str or int)

  • y_proba (np.ndarray) – Predicted probabilities

  • labels (List, optional) – Expected label list

  • fold (int, optional) – Fold number for tracking

  • return_curve (bool) – Whether to return PR curve data

Returns:

{‘metric’: ‘pr_auc’, ‘value’: float, ‘fold’: int}

Return type:

Dict[str, Any]

scitex_ml.metrics.calc_bacc_from_conf_mat(cm)[source]

Calculate balanced accuracy from confusion matrix.

Parameters:

cm (np.ndarray) – Confusion matrix

Returns:

Balanced accuracy

Return type:

float

scitex_ml.metrics.calc_seizure_window_prediction_metrics(y_true, y_pred, metadata, window_duration_min=1.0)[source]

Calculate clinical seizure prediction metrics (window-based).

This function calculates window-based sensitivity, meaning it measures the percentage of seizure time windows that were correctly identified. This is NOT event-based sensitivity (which would measure % of seizure events detected regardless of how many windows within each event).

Parameters:
  • y_true (np.ndarray) – True labels (string: ‘seizure’ or ‘interictal_control’)

  • y_pred (np.ndarray) – Predicted labels (string: ‘seizure’ or ‘interictal_control’)

  • metadata (pd.DataFrame) – Metadata with ‘seizure_type’ column indicating seizure/interictal periods

  • window_duration_min (float, optional) – Duration of each time window in minutes (default: 1.0)

Returns:

Dictionary containing: - seizure_sensitivity: % of seizure time windows detected (NOT event-based) - fp_per_hour: False positives per hour during interictal periods - time_in_warning: % of total time in alarm state - n_seizure_windows: Number of seizure windows - n_interictal_windows: Number of interictal windows - n_true_positives: Correctly predicted seizure windows - n_false_positives: Incorrectly predicted as seizure - n_false_negatives: Missed seizure windows - n_true_negatives: Correctly predicted as interictal - meets_sensitivity_target: Whether sensitivity ≥ 90% - meets_fp_target: Whether FP/h ≤ 0.2 - meets_tiw_target: Whether time in warning ≤ 20%

Return type:

Dict[str, float]

Notes

  • False positives are calculated only during interictal periods

  • True positives/false negatives are calculated only during seizure periods

  • Clinical targets based on FDA guidance for seizure prediction devices

  • For event-based sensitivity, use calc_seizure_event_prediction_metrics instead

Example

>>> # 1 seizure spanning 20 windows, detect 5 windows
>>> # Window-based sensitivity: 5/20 = 25%
>>> # This measures temporal coverage of the seizure

References

FDA guidance on seizure prediction devices

scitex_ml.metrics.calc_seizure_event_prediction_metrics(y_true, y_pred, metadata, window_duration_min=1.0)[source]

Calculate clinical seizure prediction metrics (event-based).

This function calculates event-based sensitivity, meaning it measures whether each seizure EVENT was detected (at least one alarm raised), regardless of how many windows within that event were predicted.

This is clinically more relevant as one timely alarm per seizure event is sufficient for intervention, matching the clinical requirement: “Did the system raise an alarm for this seizure?”

Parameters:
  • y_true (np.ndarray) – True labels (string: ‘seizure’ or ‘interictal_control’)

  • y_pred (np.ndarray) – Predicted labels (string: ‘seizure’ or ‘interictal_control’)

  • metadata (pd.DataFrame) –

    Metadata with ‘seizure_type’ and ‘seizure_id’ columns. seizure_id: Unique identifier for each seizure event (e.g., ‘sz_001’, ‘sz_002’)

    Should be NaN or empty for interictal periods

  • window_duration_min (float, optional) – Duration of each time window in minutes (default: 1.0)

Returns:

Dictionary containing: - seizure_sensitivity: % of seizure events detected (event-based) - fp_per_hour: False positives per hour during interictal periods - time_in_warning: % of total time in alarm state - n_seizure_events: Number of unique seizure events - n_detected_events: Number of events with at least one alarm - n_missed_events: Number of events with zero alarms - n_interictal_windows: Number of interictal windows - n_false_positives: Incorrectly predicted as seizure - n_true_negatives: Correctly predicted as interictal - meets_sensitivity_target: Whether sensitivity ≥ 90% - meets_fp_target: Whether FP/h ≤ 0.2 - meets_tiw_target: Whether time in warning ≤ 20%

Return type:

Dict[str, float]

Notes

  • Requires ‘seizure_id’ column in metadata to group windows by event

  • False positives are calculated only during interictal periods

  • Event detection requires at least one window predicted as seizure

  • Clinical targets based on FDA guidance for seizure prediction devices

  • For window-based sensitivity, use calc_seizure_window_prediction_metrics instead

Example

>>> # 1 seizure spanning 20 windows, detect just 1 window
>>> # Event-based sensitivity: 1/1 = 100% (event was detected!)
>>> # This measures "did we catch the seizure at all?"

References

FDA guidance on seizure prediction devices

scitex_ml.metrics.calc_seizure_prediction_metrics(y_true, y_pred, metadata, window_duration_min=1.0)

Calculate clinical seizure prediction metrics (window-based).

This function calculates window-based sensitivity, meaning it measures the percentage of seizure time windows that were correctly identified. This is NOT event-based sensitivity (which would measure % of seizure events detected regardless of how many windows within each event).

Parameters:
  • y_true (np.ndarray) – True labels (string: ‘seizure’ or ‘interictal_control’)

  • y_pred (np.ndarray) – Predicted labels (string: ‘seizure’ or ‘interictal_control’)

  • metadata (pd.DataFrame) – Metadata with ‘seizure_type’ column indicating seizure/interictal periods

  • window_duration_min (float, optional) – Duration of each time window in minutes (default: 1.0)

Returns:

Dictionary containing: - seizure_sensitivity: % of seizure time windows detected (NOT event-based) - fp_per_hour: False positives per hour during interictal periods - time_in_warning: % of total time in alarm state - n_seizure_windows: Number of seizure windows - n_interictal_windows: Number of interictal windows - n_true_positives: Correctly predicted seizure windows - n_false_positives: Incorrectly predicted as seizure - n_false_negatives: Missed seizure windows - n_true_negatives: Correctly predicted as interictal - meets_sensitivity_target: Whether sensitivity ≥ 90% - meets_fp_target: Whether FP/h ≤ 0.2 - meets_tiw_target: Whether time in warning ≤ 20%

Return type:

Dict[str, float]

Notes

  • False positives are calculated only during interictal periods

  • True positives/false negatives are calculated only during seizure periods

  • Clinical targets based on FDA guidance for seizure prediction devices

  • For event-based sensitivity, use calc_seizure_event_prediction_metrics instead

Example

>>> # 1 seizure spanning 20 windows, detect 5 windows
>>> # Window-based sensitivity: 5/20 = 25%
>>> # This measures temporal coverage of the seizure

References

FDA guidance on seizure prediction devices

scitex_ml.metrics.calc_silhouette_score_slow(X, labels, metric='euclidean', sample_size=None, random_state=None, **kwds)[source]

Compute the mean Silhouette Coefficient of all samples.

This method is computationally expensive compared to the reference one.

The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is (b - a) / max(a, b). To clarrify, b is the distance between a sample and the nearest cluster that b is not a part of.

This function returns the mean Silhoeutte Coefficient over all samples. To obtain the values for each sample, use silhouette_samples

The best value is 1 and the worst value is -1. Values near 0 indicate overlapping clusters. Negative values genly indicate that a sample has been assigned to the wrong cluster, as a different cluster is more similar.

Parameters:
  • X (array [n_samples_a, n_features]) – Feature array.

  • labels (array, shape = [n_samples]) – label values for each sample

  • metric (string, or callable) – The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by metrics.pairwise._pairwise_distances. If X is the distance array itself, use “precomputed” as the metric.

  • sample_size (int or None) – The size of the sample to use when computing the Silhouette Coefficient. If sample_size is None, no sampling is used.

  • random_state (integer or numpy.RandomState, optional) – The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.

  • **kwds (optional keyword parameters) – Any further parameters are passed directly to the distance function. If using a scipy.spatial.distance metric, the parameters are still metric dependent. See the scipy docs for usage examples.

Returns:

silhouette – Mean Silhouette Coefficient for all samples.

Return type:

float

References

Peter J. Rousseeuw (1987). “Silhouettes: a Graphical Aid to the

Interpretation and Validation of Cluster Analysis”. Computational and Applied Mathematics 20: 53-65. doi:10.1016/0377-0427(87)90125-7.

http://en.wikipedia.org/wiki/Silhouette_(clustering)

scitex_ml.metrics.calc_silhouette_samples_slow(X, labels, metric='euclidean', **kwds)[source]

Compute the Silhouette Coefficient for each sample.

The Silhoeutte Coefficient is a measure of how well samples are clustered with samples that are similar to themselves. Clustering models with a high Silhouette Coefficient are said to be dense, where samples in the same cluster are similar to each other, and well separated, where samples in different clusters are not very similar to each other.

The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is (b - a) / max(a, b).

This function returns the Silhoeutte Coefficient for each sample.

The best value is 1 and the worst value is -1. Values near 0 indicate overlapping clusters.

Parameters:
  • X (array [n_samples_a, n_features]) – Feature array.

  • labels (array, shape = [n_samples]) – label values for each sample

  • metric (string, or callable) – The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by metrics.pairwise._pairwise_distances. If X is the distance array itself, use “precomputed” as the metric.

  • **kwds (optional keyword parameters) – Any further parameters are passed directly to the distance function. If using a scipy.spatial.distance metric, the parameters are still metric dependent. See the scipy docs for usage examples.

Returns:

silhouette – Silhouette Coefficient for each samples.

Return type:

array, shape = [n_samples]

References

Peter J. Rousseeuw (1987). “Silhouettes: a Graphical Aid to the

Interpretation and Validation of Cluster Analysis”. Computational and Applied Mathematics 20: 53-65. doi:10.1016/0377-0427(87)90125-7.

http://en.wikipedia.org/wiki/Silhouette_(clustering)

scitex_ml.metrics.calc_silhouette_score_block(X, labels, metric='euclidean', sample_size=None, random_state=None, n_jobs=1, **kwds)[source]

Compute the mean Silhouette Coefficient of all samples.

The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is (b - a) / max(a, b). To clarrify, b is the distance between a sample and the nearest cluster that b is not a part of.

This function returns the mean Silhoeutte Coefficient over all samples. To obtain the values for each sample, use silhouette_samples

The best value is 1 and the worst value is -1. Values near 0 indicate overlapping clusters. Negative values genly indicate that a sample has been assigned to the wrong cluster, as a different cluster is more similar.

Parameters:
  • X (array [n_samples_a, n_features]) – Feature array.

  • labels (array, shape = [n_samples]) – label values for each sample

  • metric (string, or callable) – The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by metrics.pairwise._pairwise_distances. If X is the distance array itself, use “precomputed” as the metric.

  • sample_size (int or None) – The size of the sample to use when computing the Silhouette Coefficient. If sample_size is None, no sampling is used.

  • random_state (integer or numpy.RandomState, optional) – The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.

  • **kwds (optional keyword parameters) – Any further parameters are passed directly to the distance function. If using a scipy.spatial.distance metric, the parameters are still metric dependent. See the scipy docs for usage examples.

Returns:

silhouette – Mean Silhouette Coefficient for all samples.

Return type:

float

References

Peter J. Rousseeuw (1987). “Silhouettes: a Graphical Aid to the

Interpretation and Validation of Cluster Analysis”. Computational and Applied Mathematics 20: 53-65. doi:10.1016/0377-0427(87)90125-7.

http://en.wikipedia.org/wiki/Silhouette_(clustering)

scitex_ml.metrics.calc_silhouette_samples_block(X, labels, metric='euclidean', n_jobs=1, **kwds)[source]

Compute the Silhouette Coefficient for each sample.

The Silhoeutte Coefficient is a measure of how well samples are clustered with samples that are similar to themselves. Clustering models with a high Silhouette Coefficient are said to be dense, where samples in the same cluster are similar to each other, and well separated, where samples in different clusters are not very similar to each other.

The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is (b - a) / max(a, b).

This function returns the Silhoeutte Coefficient for each sample.

The best value is 1 and the worst value is -1. Values near 0 indicate overlapping clusters.

Parameters:
  • X (array [n_samples_a, n_features]) – Feature array.

  • labels (array, shape = [n_samples]) – label values for each sample

  • metric (string, or callable) – The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by metrics.pairwise._pairwise_distances. If X is the distance array itself, use “precomputed” as the metric.

  • **kwds (optional keyword parameters) – Any further parameters are passed directly to the distance function. If using a scipy.spatial.distance metric, the parameters are still metric dependent. See the scipy docs for usage examples.

Returns:

silhouette – Silhouette Coefficient for each samples.

Return type:

array, shape = [n_samples]

References

Peter J. Rousseeuw (1987). “Silhouettes: a Graphical Aid to the

Interpretation and Validation of Cluster Analysis”. Computational and Applied Mathematics 20: 53-65. doi:10.1016/0377-0427(87)90125-7.

http://en.wikipedia.org/wiki/Silhouette_(clustering)

scitex_ml.metrics.calc_feature_importance(model, feature_names=None, top_n=None)[source]

Calculate feature importance from a trained model.

Parameters:
  • model (object) – Trained model with feature importance attributes Supports: - Tree-based: feature_importances_ (RandomForest, XGBoost, etc.) - Linear: coef_ (LogisticRegression, LinearSVC, etc.)

  • feature_names (List[str], optional) – Names of features. If None, uses feature_0, feature_1, …

  • top_n (int, optional) – Return only top N most important features

Return type:

Tuple[Dict[str, float], ndarray]

Returns:

  • importance_dict (Dict[str, float]) – Dictionary mapping feature names to importance scores

  • importance_array (np.ndarray) – Array of importance scores (same order as feature_names)

Raises:

ValueError – If model doesn’t support feature importance extraction

Examples

>>> from sklearn.ensemble import RandomForestClassifier
>>> import numpy as np
>>> X = np.random.rand(100, 5)
>>> y = np.random.randint(0, 2, 100)
>>> model = RandomForestClassifier().fit(X, y)
>>> importance_dict, importance_array = calc_feature_importance(
...     model, feature_names=['f1', 'f2', 'f3', 'f4', 'f5']
... )
scitex_ml.metrics.calc_permutation_importance(model, X, y, feature_names=None, n_repeats=10, random_state=None, scoring=None)[source]

Calculate permutation feature importance.

More reliable than built-in importance for some models, but slower.

Parameters:
  • model (object) – Trained model

  • X (np.ndarray) – Feature matrix

  • y (np.ndarray) – Target vector

  • feature_names (List[str], optional) – Names of features

  • n_repeats (int, default 10) – Number of times to permute each feature

  • random_state (int, optional) – Random seed for reproducibility

  • scoring (str, optional) – Scoring metric (default uses model’s score method)

Return type:

Tuple[Dict[str, float], Dict[str, float]]

Returns:

  • importance_mean (Dict[str, float]) – Mean importance for each feature

  • importance_std (Dict[str, float]) – Standard deviation of importance for each feature

Modules