hidimstat.BlockBasedImportance¶
- class hidimstat.BlockBasedImportance(estimator='DNN', importance_estimator='sampling_RF', coffeine_transformer=None, do_hypertuning=True, dict_hypertuning=None, problem_type='regression', encoding_input=True, sampling_with_repetition=True, split_percentage=0.8, conditional=True, variables_categories=None, residuals_sampling=False, n_permutations=50, n_jobs=1, verbose=0, groups=None, group_stacking=False, sub_groups=None, k_fold=2, prop_out_subLayers=0, iteration_index=None, random_state=2023, do_compute_importance=True, group_fold=None)¶
This class implements the Block-Based Importance (BBI), a framework for variable importance computation with statistical guarantees. It consists of two blocks of estimators: Learner block (Predicting on the data) and Importance block (Resampling the variable/group of interest to assess the impact on the loss). For single-level see CHAMMA et al.[1] and for group-level see Chamma et al.[2].
- Parameters:
- estimator{String or sklearn.base.BaseEstimator}, default=”DNN”
The provided estimator for the learner block. The default estimator is a custom Multi-Layer Perceptron (MLP) learner.
- String options include:
“DNN” for the Multi-Layer Perceptron
“RF” for the Random Forest
- Other options include:
sklearn.base.BaseEstimator
- importance_estimator{String or sklearn.base.BaseEstimator}, default=”sampling_RF”
The provided estimator for the importance block. The default estimator includes the use of the sampling Random Forest where the sampling is executed in the corresponding leaf of each instance within its neighbors
- String options include:
“sampling_RF” for the sampling Random Forest
“residuals_RF” for the Random Forest along with the residuals path for importance computation
- Other options include:
sklearn.base.BaseEstimator
- coffeine_transformertuple, default=None
Applying the coffeine’s pipeline for filterbank models on electrophysiological data. The tuple cosists of (coffeine pipeline, new number of variables) or (coffeine pipeline, new number of variables, list of variables to keep after variable selection).
- do_hypertuningbool, default=True
Tuning the hyperparameters of the provided estimator.
- dict_hypertuningdict, default=None
The dictionary of hyperparameters to tune, depending on the provided inference estimator.
- problem_typestr, default=’regression’
A classification or a regression problem.
- encoding_inputbool, default=True
To one-hot or ordinal encode the nominal and ordinal input variables.
- sampling_with_repetitionbool, default=True
Sampling with repetition the train part of the train/valid scheme under the training set. The number of training samples in train is equal to the number of instances in the training set.
- split_percentagefloat, default=0.8
The training/validation cut for the provided data.
- conditionalbool, default=True
The permutation or the conditional sampling approach.
- variables_categoriesdict, default=None
The dictionary of binary, nominal and ordinal variables.
- residuals_samplingbool, default=False
The use of permutations or random sampling for residuals with the conditional sampling.
- n_permutationsint, default=50
The number of permutations/random samplings for each column.
- n_jobsint, default=1
The number of workers for parallel processing.
- verboseint, default=0
If verbose > 0, the fitted iterations will be printed.
- groupsdict, default=None
The knowledge-driven/data-driven grouping of the variables if provided.
- group_stackingbool, default=False
Apply the stacking-based method for the provided groups.
- sub_groupsdict, default=None
The list of provided variables’s indices to condition on per variable/group of interest (default set to all the remaining variables).
- k_foldint, default=2
The number of folds for k-fold cross fitting.
- prop_out_subLayersint, default=0.
If group_stacking is True, the proportion of outputs for the linear sub-layers per group.
- iteration_indexint, default=None
The index of the current processed iteration.
- random_stateint, default=2023
Fixing the seeds of the random generator.
- do_compute_importanceboolean, default=True
Whether to compute the Importance Scores.
- group_foldlist, default=None
The list of group labels to perform GroupKFold to keep subjects within the same training or test set.
References
- __init__(estimator='DNN', importance_estimator='sampling_RF', coffeine_transformer=None, do_hypertuning=True, dict_hypertuning=None, problem_type='regression', encoding_input=True, sampling_with_repetition=True, split_percentage=0.8, conditional=True, variables_categories=None, residuals_sampling=False, n_permutations=50, n_jobs=1, verbose=0, groups=None, group_stacking=False, sub_groups=None, k_fold=2, prop_out_subLayers=0, iteration_index=None, random_state=2023, do_compute_importance=True, group_fold=None)¶
Methods
__init__
([estimator, importance_estimator, ...])compute_importance
([X, y])This function computes the importance scores and the statistical guarantees per variable/group of interest
fit
(X[, y])Build the provided estimator with the training set (X, y)
fit_transform
(X[, y])Fit to data, then transform it.
get_metadata_routing
()Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
predict
([X])This function predicts the regression target for the input samples X.
predict_proba
([X])This function predicts the class probabilities for the input samples X.
set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.