Subkey

Description

Variance

Percentage of times features which have a variance < 0.01 are excluded. Based on ` sklearn”s VarianceThreshold <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.VarianceThreshold.html/>`_.

GroupwiseSearch

Randomly select which feature groups to use. Parameters determined by the SelectFeatGroup config part, see below.

SelectFromModel

Percentage of times features are selected by first training a machine learning model which can rank the features with an ``importance. See also sklearn”s SelectFromModel.

SelectFromModel_estimator

Machine learning model / estimator used: can be LASSO, LogisticRegression, or a Random Forest

SelectFromModel_lasso_alpha

When using LASSO, search space of weigth of L1 term, see also sklearn <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html/>.

SelectFromModel_n_trees

When using a random forest, search space of number of trees used.

UsePCA

Percentage of times Principle Component Analysis (PCA) is used to select features.

PCAType

Method to select number of components using PCA: Either the number of components that explains 95% of the variance, or use a fixed number of components.95variance

StatisticalTestUse

Percentage of times a statistical test is used to select features.

StatisticalTestMetric

Define the type of statistical test to be used.

StatisticalTestThreshold

Specify a threshold for the p-value threshold used in the statistical test to select features. The first element defines the lower boundary, the other the upper boundary. Random sampling will occur between the boundaries.

ReliefUse

Percentage of times Relief is used to select features.

ReliefNN

Min and max of number of nearest neighbors search range in Relief.

ReliefSampleSize

Min and max of sample size search range in Relief.

ReliefDistanceP

Min and max of positive distance search range in Relief.

ReliefNumFeatures

Min and max of number of features that is selected search range in Relief.