models subpackage
Overview
The models
subpackage serves as a comprehensive backbone for managing and utilizing machine learning models across various aspects of the package. It is meticulously designed to support the development, evaluation, and optimization of models,
ensuring compatibility and efficiency in integrating with med3pa
and Detectron
methodologies.
This subpackage leverages several design patterns, such as Factory, Singleton, and Prototype, to ensure robustness, modularity, and scalability.
Through its structured approach, This subpackage offers a robust framework that includes abstract base classes for uniformity in model operations, concrete implementations for specialized algorithms,
and utility tools for precise model evaluation and data handling.
This subpackage is composed of the following modules:
factories.py: Utilizes the factory design pattern to facilitate flexible and scalable model instantiation, enhancing the modularity of model creation.
abstract_models.py: Defines the abstract base classes for all model types, including general models, classification models, and regression models. These classes provide a common interface for model operations.
concrete_classifiers.py: Contains concrete implementations of classification models, like the XGBoostModel.
concrete_regressors.py: Provides implementations for regression models, such as RandomForestRegressor and DecisionTreeRegressor.
abstract_metrics.py: Provides the abstract base classes for all evaluation metrics, centralizing the logic for metric calculations across different model types.
classification_metrics.py: Implements a variety of evaluation metrics specifically for classification tasks, such as accuracy, precision, and recall.
regression_metrics.py: Hosts evaluation metrics for regression tasks, including mean squared error and R2 score, crucial for assessing model performance.
data_strategies.py: Offers various strategies for preparing data, ensuring compatibility and optimal formatting for model training and evaluation.
base.py: Manages a singleton base model, responsible for the instantiation and cloning of the base model across the methods, ensuring consistency and reliability in model management.
The package includes the following classes:
factories module
This module utilizes the Factory design pattern to abstract the creation process of machine learning models. It defines a general factory class and specialized factories for different model types, such as XGBoost. This setup allows for dynamic model instantiation based on provided specifications or configurations. By decoupling model creation from usage
- class MED3pa.models.factories.ModelFactory[source]
Bases:
object
A factory class for creating models with different types, using the factory design pattern. It supports creating models based on hyperparameters or loading them from pickled files.
- static create_model_from_pickled(pickled_file_path: str) Model [source]
Creates a model by loading it from a pickled file.
- Parameters:
pickled_file_path (str) – The file path to the pickled model file.
- Returns:
A model instance loaded from the pickled file.
- Return type:
- Raises:
IOError – If there is an error loading the model from the file.
TypeError – If the loaded model is not of a supported type.
- static create_model_with_hyperparams(model_type: str, hyperparams: dict) Model [source]
Creates a model of the specified type with the given hyperparameters.
- Parameters:
model_type (str) – The type of model to create.
hyperparams (dict) – A dictionary of hyperparameters for the model.
- Returns:
A model instance of the specified type, initialized with the given hyperparameters.
- Return type:
- factories = {'XGBoostModel': <function ModelFactory.<lambda>>}
- static get_factory(model_type: str) ModelFactory [source]
Retrieves the factory object for the given model type.
- Parameters:
model_type (str) – The type of model for which the factory is to be retrieved.
- Returns:
An instance of the factory associated with the given model type.
- Return type:
- Raises:
ValueError – If no factory is available for the given model type.
- static get_supported_models() list [source]
Retrieves a list of all supported model types.
- Returns:
A list containing the keys from model_mapping which represent the supported model types.
- Return type:
list
- model_mapping = {'XGBoostModel': [<class 'xgboost.core.Booster'>, <class 'xgboost.sklearn.XGBClassifier'>]}
- class MED3pa.models.factories.XGBoostFactory[source]
Bases:
ModelFactory
A factory for creating XGBoost model objects, either from hyperparameters or by loading from pickled files. Inherits from ModelFactory and specifies creation methods for XGBoost models.
- check_version(loaded_model: xgboost.core.Booster | xgboost.sklearn.XGBClassifier) bool [source]
Checks the version of the loaded XGBoost model to ensure it is supported.
- Parameters:
loaded_model (xgb.Booster | xgb.XGBClassifier) – The loaded model object.
- Returns:
True if the model version is supported, False otherwise.
- Return type:
bool
- create_model_from_pickled(loaded_model: xgboost.core.Booster | xgboost.sklearn.XGBClassifier) XGBoostModel [source]
Recreates an XGBoostModel from a loaded pickled model.
- Parameters:
loaded_model (xgb.Booster | xgb.XGBClassifier) – The loaded model object, expected to be an instance of xgb.Booster or xgb.XGBClassifier.
- Returns:
An instance of XGBoostModel created from the loaded model.
- Return type:
- Raises:
TypeError – If the loaded model is not a supported implementation of the XGBoost model.
ValueError – If the XGBoost model version is not supported.
- create_model_with_hyperparams(hyperparams: dict) XGBoostModel [source]
Creates an XGBoostModel with the given hyperparameters.
- Parameters:
hyperparams (dict) – A dictionary of hyperparameters for the XGBoost model.
- Returns:
An instance of XGBoostModel initialized with the given hyperparameters.
- Return type:
- extract_params(loaded_model: xgboost.core.Booster | xgboost.sklearn.XGBClassifier) dict [source]
Extracts the parameters from a loaded XGBoost model.
- Parameters:
loaded_model (xgb.Booster | xgb.XGBClassifier) – The loaded model object.
- Returns:
A dictionary of extracted parameters.
- Return type:
dict
abstract_models module
The abstract_models.py module defines core abstract classes that serve as the foundation for model management in the system.
It includes Model
, which standardizes basic operations like evaluation and parameter validation..etc across all models.
It also introduces specialized abstract classes such as ClassificationModel
and RegressionModel
,
each adapting these operations to specific needs of classification and regression tasks.
- class MED3pa.models.abstract_models.ClassificationModel[source]
Bases:
Model
Abstract base class for classification models, extending the generic Model class with additional classification-specific methods.
- balance_train_weights(y_train: ndarray) ndarray [source]
Balances the training weights based on the class distribution in the training data.
- Parameters:
y_train (np.ndarray) – Labels for training.
- Returns:
Balanced training weights.
- Return type:
np.ndarray
- Raises:
AssertionError – If balancing is attempted on non-binary classification data.
- abstract predict(X: ndarray, return_proba: bool = False, threshold: float = 0.5) ndarray [source]
Makes predictions for the given input observations.
- Parameters:
X (np.ndarray) – observations for prediction.
return_proba (bool, optional) – Whether to return probabilities instead of class labels. Defaults to False.
threshold (float, optional) – Threshold for converting probabilities to class labels. Defaults to 0.5.
- Returns:
The predicted labels or probabilities.
- Return type:
np.ndarray
- Raises:
NotImplementedError – Must be implemented by subclasses.
- abstract train(x_train: ndarray, y_train: ndarray, x_validation: ndarray, y_validation: ndarray, training_parameters: Optional[Dict[str, Any]], balance_train_classes: bool) None [source]
Trains the classification model using provided training and validation data.
- Parameters:
x_train (np.ndarray) – observations for training.
y_train (np.ndarray) – Labels for training.
x_validation (np.ndarray) – observations for validation.
y_validation (np.ndarray) – Labels for validation.
training_parameters (Dict[str, Any], optional) – Additional training parameters.
balance_train_classes (bool) – Whether to balance the training classes.
- Raises:
NotImplementedError – Must be implemented by subclasses.
- class MED3pa.models.abstract_models.Model[source]
Bases:
ABC
An abstract base class for all models, defining a common API for model operations such as evaluation and parameter validation.
- model
The underlying model instance.
- Type:
Any
- model_class
The class type of the underlying model instance.
- Type:
type
- params
The params used for initializinf the model.
- Type:
dict
- data_preparation_strategy
Strategy for preparing data before training or evaluation.
- Type:
- pickled_model
A boolean indicating whether or not the model has been loaded from a pickled file.
- Type:
Boolean
- abstract evaluate(X: ndarray, y: ndarray, eval_metrics: List[str], print_results: bool = False) Dict[str, float] [source]
Evaluates the model using specified metrics.
- Parameters:
X (np.ndarray) – observations for evaluation.
y (np.ndarray) – True labels for evaluation.
eval_metrics (List[str]) – Metrics to use for evaluation.
print_results (bool, optional) – Whether to print the evaluation results. Defaults to False.
- Returns:
A dictionary with metric names and their evaluated scores.
- Return type:
Dict[str, float]
- get_data_strategy() Optional[str] [source]
Retrieves the data preparation strategy associated with the model. This strategy handles how data should be formatted before being passed to the model for training or evaluation.
- Returns:
The name of the current data preparation strategy if set, None otherwise.
- Return type:
Optional[str]
- get_info() Dict[str, Any] [source]
Retrieves detailed information about the model.
- Returns:
- A dictionary containing information about the model’s type, parameters,
data preparation strategy, and whether it’s a pickled model.
- Return type:
Dict[str, Any]
- get_model() Any [source]
Retrieves the underlying model instance, which is typically a machine learning model object.
- Returns:
The underlying model instance if set, None otherwise.
- Return type:
Any
- get_model_type() Optional[str] [source]
Retrieves the class type of the underlying model instance, which indicates the specific implementation of the model used.
- Returns:
The class of the model if set, None otherwise.
- Return type:
Optional[str]
- get_params()[source]
Retrieves the underlying model’s parameters.
- Returns:
the model’s parameters.
- Return type:
Dict[str, Any]
- is_pickled()[source]
Returns whether or not the model has been loaded from a pickled file.
- Returns:
has the model been loaded from a pickled file.
- Return type:
Boolean
- print_evaluation_results(results: Dict[str, float]) None [source]
Prints the evaluation results in a formatted manner.
- Parameters:
results (Dict[str, float]) – A dictionary with metric names and their evaluated scores.
- save(path: str) None [source]
Saves the model instance as a pickled file and the parameters as a JSON file within the specified directory.
- Parameters:
path (str) – The directory path where the model and parameters will be saved.
- set_data_strategy(strategy: DataPreparingStrategy)[source]
Sets the underlying model’s data preparation strategy.
- Parameters:
strategy (DataPreparingStrategy) – strategy to be used to prepare the data for training, validation…etc.
- set_model(model: Any) None [source]
Sets the underlying model instance and updates the model class to match the type of the given model.
- Parameters:
model (Any) – The model instance to be set.
- set_params(params: dict)[source]
Sets the parameters for the model. These parameters are typically used for model initialization or configuration.
- Parameters:
params (Dict[str, Any]) – A dictionary of parameters for the model.
- update_params(params: dict)[source]
Updates the current model parameters by merging new parameter values from the given dictionary. This method allows for dynamic adjustment of model configuration during runtime.
- Parameters:
params (Dict[str, Any]) – A dictionary containing parameter names and values to be updated.
- validate_params(params: Dict[str, Any], valid_param_sets: List[set]) Dict[str, Any] [source]
Validates the model parameters against a list of valid parameter sets.
- Parameters:
params (Dict[str, Any]) – Parameters to validate.
valid_param_sets (List[set]) – A list of sets containing valid parameter names.
- Returns:
Validated parameters.
- Return type:
Dict[str, Any]
- Raises:
ValueError – If any invalid parameters are found.
- class MED3pa.models.abstract_models.RegressionModel[source]
Bases:
Model
Abstract base class for regression models, providing a framework for training and prediction in regression tasks.
- abstract predict(X: ndarray) ndarray [source]
Makes predictions for the given input observations.
- Parameters:
X (np.ndarray) – observations for prediction.
- Returns:
The predicted values.
- Return type:
np.ndarray
- Raises:
NotImplementedError – Must be implemented by subclasses.
- abstract train(x_train: ndarray, y_train: ndarray, x_validation: ndarray, y_validation: ndarray, training_parameters: Optional[Dict[str, Any]]) None [source]
Trains the regression model using provided training and validation data.
- Parameters:
x_train (np.ndarray) – observations for training.
y_train (np.ndarray) – Labels for training.
x_validation (np.ndarray) – observations for validation.
y_validation (np.ndarray) – Labels for validation.
training_parameters (Dict[str, Any], optional) – Additional training parameters.
- Raises:
NotImplementedError – Must be implemented by subclasses.
concrete_classifiers module
This module offers concrete implementations of specific classification models, such as XGBoost.
It adapts the abstract interfaces defined in abstract_models.py
to provide fully functional models ready for training and prediction.
- class MED3pa.models.concrete_classifiers.XGBoostModel(params: Optional[Dict[str, Any]] = None, model: Optional[Union[Booster, XGBClassifier]] = None)[source]
Bases:
ClassificationModel
A concrete implementation of the ClassificationModel class for XGBoost models. This class provides functionalities to train, predict, and evaluate models built with the XGBoost library.
- evaluate(X: ndarray, y: ndarray, eval_metrics: Union[str, List[str]], print_results: bool = False) Dict[str, float] [source]
Evaluates the model using specified metrics.
- Parameters:
X (np.ndarray) – Features for evaluation.
y (np.ndarray) – True labels for evaluation.
eval_metrics (List[str]) – Metrics to use for evaluation.
print_results (bool, optional) – Whether to print the evaluation results.
- Returns:
A dictionary with metric names and their evaluated scores.
- Return type:
Dict[str, float]
- Raises:
ValueError – If the model has not been trained before evaluation.
- predict(X: ndarray, return_proba: bool = False, threshold: float = 0.5) ndarray [source]
Makes predictions using the model for the given input.
- Parameters:
X (np.ndarray) – Features for prediction.
return_proba (bool, optional) – Whether to return probabilities. Defaults to False.
threshold (float, optional) – Threshold for converting probabilities to class labels. Defaults to 0.5.
- Returns:
Predictions made by the model.
- Return type:
np.ndarray
- Raises:
ValueError – If the model has not been initialized.
NotImplementedError – If prediction is not implemented for the model class.
- train(x_train: ndarray, y_train: ndarray, x_validation: ndarray, y_validation: ndarray, training_parameters: Optional[Dict[str, Any]], balance_train_classes: bool) None [source]
Trains the model on the provided dataset.
- Parameters:
x_train (np.ndarray) – Features for training.
y_train (np.ndarray) – Labels for training.
x_validation (np.ndarray) – Features for validation.
y_validation (np.ndarray) – Labels for validation.
training_parameters (Optional[Dict[str, Any]]) – Additional training parameters.
balance_train_classes (bool) – Whether to balance the training classes.
- Raises:
ValueError – If parameters for xgb.Booster are not initialized before training.
NotImplementedError – If the model_class is not supported for training.
- train_to_disagree(x_train: ndarray, y_train: ndarray, x_validation: ndarray, y_validation: ndarray, x_test: ndarray, y_test: ndarray, training_parameters: Optional[Dict[str, Any]], balance_train_classes: bool, N: int) None [source]
Trains the model to disagree with another model using a specified dataset.
This method is intended for scenarios where the model is trained to produce outputs that intentionally diverge from those of another model, to be used in the
detectron
method- Parameters:
x_train (np.ndarray) – Features for training.
y_train (np.ndarray) – Labels for training.
x_validation (np.ndarray) – Features for validation.
y_validation (np.ndarray) – Labels for validation.
x_test (np.ndarray) – Features for testing or disagreement evaluation.
y_test (np.ndarray) – Labels for testing or disagreement evaluation.
training_parameters (Optional[Dict[str, Any]]) – Additional parameters for training the model.
balance_train_classes (bool) – Whether to balance the class distribution in the training data.
N (int) – The number of examples in the testing set that should be used for calculating disagreement.
- Raises:
ValueError – If the necessary parameters for training are not properly initialized.
NotImplementedError – If the model class does not support this type of training.
concrete_regressors module
Similar to concrete_classifiers.py
, this module contains implementations of regression models like RandomForestRegressor and DecisionTreeRegressor.
It provides practical, ready-to-use models that comply with the abstract definitions, making it easier to integrate and use these models in med3pa
and detectron
.
- class MED3pa.models.concrete_regressors.DecisionTreeRegressorModel(params: Dict[str, Any])[source]
Bases:
RegressionModel
A concrete implementation of the Model class for DecisionTree models.
- evaluate(X: ndarray, y: ndarray, eval_metrics: List[str], print_results: bool = False) Dict[str, float] [source]
Evaluates the model using specified metrics.
- Parameters:
X (np.ndarray) – observations for evaluation.
y (np.ndarray) – True labels for evaluation.
eval_metrics (List[str]) – Metrics to use for evaluation.
print_results (bool, optional) – Whether to print the evaluation results.
- Returns:
A dictionary with metric names and their evaluated scores.
- Return type:
Dict[str, float]
- Raises:
ValueError – If the model has not been trained before evaluation.
- predict(X: ndarray) ndarray [source]
Makes predictions with the model for the given input.
- Parameters:
X (np.ndarray) – observations for prediction.
- Returns:
Predictions made by the model.
- Return type:
np.ndarray
- Raises:
ValueError – If the DecisionTreeRegressorModel has not been initialized before training.
- train(x_train: ndarray, y_train: ndarray, x_validation: ndarray = None, y_validation: ndarray = None, training_parameters: Optional[Dict[str, Any]] = None) None [source]
Trains the model on the provided dataset.
- Parameters:
x_train (np.ndarray) – observations for training.
y_train (np.ndarray) – Labels for training.
x_validation (np.ndarray, optional) – observations for validation.
y_validation (np.ndarray, optional) – Labels for validation.
training_parameters (dict, optional) – Additional training parameters.
- Raises:
ValueError – If the DecisionTreeRegressorModel has not been initialized before training.
- class MED3pa.models.concrete_regressors.RandomForestRegressorModel(params: Dict[str, Any])[source]
Bases:
RegressionModel
A concrete implementation of the Model class for RandomForestRegressor models.
- evaluate(X: ndarray, y: ndarray, eval_metrics: List[str], print_results: bool = False) Dict[str, float] [source]
Evaluates the model using specified metrics.
- Parameters:
X (np.ndarray) – observations for evaluation.
y (np.ndarray) – True labels for evaluation.
eval_metrics (List[str]) – Metrics to use for evaluation.
print_results (bool, optional) – Whether to print the evaluation results.
- Returns:
A dictionary with metric names and their evaluated scores.
- Return type:
Dict[str, float]
- Raises:
ValueError – If the model has not been trained before evaluation.
- predict(X: ndarray) ndarray [source]
Makes predictions with the model for the given input.
- Parameters:
X (np.ndarray) – observations for prediction.
- Returns:
Predictions made by the model.
- Return type:
np.ndarray
- Raises:
ValueError – If the RandomForestRegressorModel has not been initialized before training.
- train(x_train: ndarray, y_train: ndarray, x_validation: ndarray = None, y_validation: ndarray = None, training_parameters: Optional[Dict[str, Any]] = None) None [source]
Trains the model on the provided dataset.
- Parameters:
x_train (np.ndarray) – observations for training.
y_train (np.ndarray) – Labels for training.
x_validation (np.ndarray, optional) – observations for validation.
y_validation (np.ndarray, optional) – Labels for validation.
training_parameters (dict, optional) – Additional training parameters.
- Raises:
ValueError – If the RandomForestRegressorModel has not been initialized before training.
abstract_metrics module
The abstract_metrics.py
module defines the EvaluationMetric
abstract base class,
providing a standard interface for calculating metric values for model evaluations.
- class MED3pa.models.abstract_metrics.EvaluationMetric[source]
Bases:
ABC
Abstract base class for all evaluation metrics. This class provides a standardized interface for calculating metric values across different types of tasks, ensuring consistency and reusability.
classification_metrics module
The classification_metrics.py
module defines the ClassificationEvaluationMetrics
class,
that contains various classification metrics that can be used to assess the model’s performance.
- class MED3pa.models.classification_metrics.ClassificationEvaluationMetrics[source]
Bases:
EvaluationMetric
A class to compute various classification evaluation metrics.
- static accuracy(y_true: ndarray, y_pred: ndarray, sample_weight: ndarray = None) Optional[float] [source]
Calculate the accuracy score.
- Parameters:
y_true (np.ndarray) – True labels.
y_pred (np.ndarray) – Predicted labels.
sample_weight (np.ndarray, optional) – Sample weights.
- Returns:
Accuracy score.
- Return type:
float
- static average_precision(y_true: ndarray, y_pred: ndarray, sample_weight: ndarray = None) Optional[float] [source]
Calculate the average precision score.
- Parameters:
y_true (np.ndarray) – True labels.
y_pred (np.ndarray) – Predicted probabilities.
sample_weight (np.ndarray, optional) – Sample weights.
- Returns:
Average precision score.
- Return type:
float
- static balanced_accuracy(y_true: ndarray, y_pred: ndarray, sample_weight: ndarray = None) Optional[float] [source]
Calculate the balanced accuracy score.
- Parameters:
y_true (np.ndarray) – True labels.
y_pred (np.ndarray) – Predicted labels.
sample_weight (np.ndarray, optional) – Sample weights.
- Returns:
Balanced accuracy score.
- Return type:
float
- static f1_score(y_true: ndarray, y_pred: ndarray, sample_weight: ndarray = None) Optional[float] [source]
Calculate the F1 score.
- Parameters:
y_true (np.ndarray) – True labels.
y_pred (np.ndarray) – Predicted labels.
sample_weight (np.ndarray, optional) – Sample weights.
- Returns:
F1 score.
- Return type:
float
- classmethod get_metric(metric_name: str = '')[source]
Get the metric function based on the metric name.
- Parameters:
metric_name (str) – The name of the metric.
- Returns:
The function corresponding to the metric.
- Return type:
function
- static log_loss(y_true: ndarray, y_pred: ndarray, sample_weight: ndarray = None) Optional[float] [source]
Calculate the log loss score.
- Parameters:
y_true (np.ndarray) – True labels.
y_pred (np.ndarray) – Predicted probabilities.
sample_weight (np.ndarray, optional) – Sample weights.
- Returns:
Log loss score.
- Return type:
float
- static matthews_corrcoef(y_true: ndarray, y_pred: ndarray, sample_weight: ndarray = None) Optional[float] [source]
Calculate the Matthews correlation coefficient.
- Parameters:
y_true (np.ndarray) – True labels.
y_pred (np.ndarray) – Predicted labels.
sample_weight (np.ndarray, optional) – Sample weights.
- Returns:
Matthews correlation coefficient.
- Return type:
float
- static npv(y_true: ndarray, y_pred: ndarray, sample_weight: ndarray = None) Optional[float] [source]
Calculate the negative predictive value (NPV).
- Parameters:
y_true (np.ndarray) – True labels.
y_pred (np.ndarray) – Predicted labels.
sample_weight (np.ndarray, optional) – Sample weights.
- Returns:
Negative predictive value.
- Return type:
float
- static ppv(y_true: ndarray, y_pred: ndarray, sample_weight: ndarray = None) Optional[float] [source]
Calculate the positive predictive value (PPV).
- Parameters:
y_true (np.ndarray) – True labels.
y_pred (np.ndarray) – Predicted labels.
sample_weight (np.ndarray, optional) – Sample weights.
- Returns:
Positive predictive value.
- Return type:
float
- static precision(y_true: ndarray, y_pred: ndarray, sample_weight: ndarray = None) Optional[float] [source]
Calculate the precision score.
- Parameters:
y_true (np.ndarray) – True labels.
y_pred (np.ndarray) – Predicted labels.
sample_weight (np.ndarray, optional) – Sample weights.
- Returns:
Precision score.
- Return type:
float
- static recall(y_true: ndarray, y_pred: ndarray, sample_weight: ndarray = None) Optional[float] [source]
Calculate the recall score.
- Parameters:
y_true (np.ndarray) – True labels.
y_pred (np.ndarray) – Predicted labels.
sample_weight (np.ndarray, optional) – Sample weights.
- Returns:
Recall score.
- Return type:
float
- static roc_auc(y_true: ndarray, y_pred: ndarray, sample_weight: ndarray = None) Optional[float] [source]
Calculate the ROC AUC score.
- Parameters:
y_true (np.ndarray) – True labels.
y_pred (np.ndarray) – Predicted probabilities.
sample_weight (np.ndarray, optional) – Sample weights.
- Returns:
ROC AUC score.
- Return type:
float
- static sensitivity(y_true: ndarray, y_pred: ndarray, sample_weight: ndarray = None) Optional[float] [source]
Calculate the sensitivity (recall for the positive class).
- Parameters:
y_true (np.ndarray) – True labels.
y_pred (np.ndarray) – Predicted labels.
sample_weight (np.ndarray, optional) – Sample weights.
- Returns:
Sensitivity score.
- Return type:
float
- static specificity(y_true: ndarray, y_pred: ndarray, sample_weight: ndarray = None) Optional[float] [source]
Calculate the specificity (recall for the negative class).
- Parameters:
y_true (np.ndarray) – True labels.
y_pred (np.ndarray) – Predicted labels.
sample_weight (np.ndarray, optional) – Sample weights.
- Returns:
Specificity score.
- Return type:
float
regression_metrics module
The regression_metrics.py
module defines the RegressionEvaluationMetrics
class,
that contains various regression metrics that can be used to assess the model’s performance.
- class MED3pa.models.regression_metrics.RegressionEvaluationMetrics[source]
Bases:
EvaluationMetric
A class to compute various regression evaluation metrics.
- classmethod get_metric(metric_name: str)[source]
Get the metric function based on the metric name.
- Parameters:
metric_name (str) – The name of the metric.
- Returns:
The function corresponding to the metric.
- Return type:
function
- static mean_absolute_error(y_true: ndarray, y_pred: ndarray, sample_weight: ndarray = None) float [source]
Calculate the Mean Absolute Error (MAE).
- Parameters:
y_true (np.ndarray) – True values.
y_pred (np.ndarray) – Predicted values.
sample_weight (np.ndarray, optional) – Sample weights.
- Returns:
Mean Absolute Error.
- Return type:
float
- static mean_squared_error(y_true: ndarray, y_pred: ndarray, sample_weight: ndarray = None) float [source]
Calculate the Mean Squared Error (MSE).
- Parameters:
y_true (np.ndarray) – True values.
y_pred (np.ndarray) – Predicted values.
sample_weight (np.ndarray, optional) – Sample weights.
- Returns:
Mean Squared Error.
- Return type:
float
- static r2_score(y_true: ndarray, y_pred: ndarray, sample_weight: ndarray = None) float [source]
Calculate the R-squared (R2) score.
- Parameters:
y_true (np.ndarray) – True values.
y_pred (np.ndarray) – Predicted values.
sample_weight (np.ndarray, optional) – Sample weights.
- Returns:
R-squared score.
- Return type:
float
- static root_mean_squared_error(y_true: ndarray, y_pred: ndarray, sample_weight: ndarray = None) float [source]
Calculate the Root Mean Squared Error (RMSE).
- Parameters:
y_true (np.ndarray) – True values.
y_pred (np.ndarray) – Predicted values.
sample_weight (np.ndarray, optional) – Sample weights.
- Returns:
Root Mean Squared Error.
- Return type:
float
data_strategies module
This module is crucial for data handling, utilizing the Strategy design pattern and therefor offering multiple strategies to transform raw data into formats that enhance model training and evaluation. According to the model type.
- class MED3pa.models.data_strategies.DataPreparingStrategy[source]
Bases:
object
Abstract base class for data preparation strategies.
- static execute(observations, labels=None, weights=None)[source]
Prepares data for model training or prediction.
- Parameters:
observations (array-like) – observations array.
labels (array-like, optional) – Labels array.
weights (array-like, optional) – Weights array.
- Returns:
Prepared data in the required format for the model.
- Return type:
object
- Raises:
NotImplementedError – If the method is not implemented by a subclass.
- class MED3pa.models.data_strategies.ToDataframesStrategy[source]
Bases:
DataPreparingStrategy
Converts input data to pandas DataFrames, suitable for models requiring DataFrame inputs.
- static execute(column_labels: list, observations: ndarray, labels: ndarray = None, weights: ndarray = None) tuple [source]
Converts observations, labels, and weights into pandas DataFrames with specified column labels.
- Parameters:
column_labels (list) – Column labels for the observations DataFrame.
observations (np.ndarray) – observations array.
labels (np.ndarray, optional) – Labels array.
weights (np.ndarray, optional) – Weights array.
- Returns:
DataFrames for observations, labels, and weights. Returns None for labels and weights DataFrames if not provided.
- Return type:
tuple
- Raises:
ValueError – If the observations array is empty.
- class MED3pa.models.data_strategies.ToDmatrixStrategy[source]
Bases:
DataPreparingStrategy
Concrete implementation for converting data into DMatrix format suitable for XGBoost models.
- static execute(observations, labels=None, weights=None) DMatrix [source]
Converts observations, labels, and weights into an XGBoost DMatrix.
- Parameters:
observations (array-like) – observations data.
labels (array-like, optional) – Labels data.
weights (array-like, optional) – Weights data.
- Returns:
A DMatrix object ready for use with XGBoost.
- Return type:
xgb.DMatrix
- Raises:
ValueError – If any input data types are not supported.
- static is_supported_data(observations, labels=None, weights=None) bool [source]
Checks if the data types of observations, labels, and weights are supported for conversion to DMatrix.
- Parameters:
observations (array-like) – observations data.
labels (array-like, optional) – Labels data.
weights (array-like, optional) – Weights data.
- Returns:
True if all data types are supported, False otherwise.
- Return type:
bool
- class MED3pa.models.data_strategies.ToNumpyStrategy[source]
Bases:
DataPreparingStrategy
Converts input data to NumPy arrays, ensuring compatibility with models expecting NumPy inputs.
- static execute(observations, labels=None, weights=None) tuple [source]
Converts observations, labels, and weights into NumPy arrays.
- Parameters:
observations (array-like) – observations data.
labels (array-like, optional) – Labels data.
weights (array-like, optional) – Weights data.
- Returns:
A tuple of NumPy arrays for observations, labels, and weights. Returns None for labels and weights if they are not provided.
- Return type:
tuple
- Raises:
ValueError – If the observations or labels are empty arrays.
base module
This module introduces a singleton manager that manages the instantiation and cloning of a base model,
which is particularly useful for applications like med3pa
and Detectron
where a consistent reference model is necessary.
It employs the Singleton and Prototype design patterns to ensure that the base model is instantiated once and can be cloned without reinitialization.
- class MED3pa.models.base.BaseModelManager[source]
Bases:
object
Singleton manager class for the base model. ensures the base model is set only once.
- classmethod clone_base_model() Model [source]
Creates and returns a deep clone of the base model, following the Prototype pattern.
This method uses serialization and deserialization to clone complex model attributes, allowing for independent modification of the cloned model.
- Returns:
A cloned instance of the base model.
- Raises:
TypeError – If the base model has not been initialized yet.