Knockoff Filter API Reference¶
-
class
knockpy.knockoff_filter.
KnockoffFilter
(fstat='lasso', ksampler='gaussian', fstat_kwargs={}, knockoff_kwargs={})[source]¶ Performs knockoff-based inference, from start to finish.
This wraps both the
knockoffs.KnockoffSampler
andknockoff_stats.FeatureStatistic
classes.- Parameters
- fstatstr or knockpy.knockoff_stats.FeatureStatistic
The feature statistic to use in the knockoff filter. This may also be a string identifier, including: - ‘lasso’ or ‘lcd’: cross-validated lasso coefficients differences - ‘lsm’: signed maximum of the lasso path statistic as
in Barber and Candes 2015
‘dlasso’: Cross-validated debiased lasso coefficients
‘ridge’: Cross validated ridge coefficients
‘ols’: Ordinary least squares coefficients
‘margcorr’: marginal correlations between features and response
‘deeppink’: The deepPINK statistic as in Lu et al. 2018
‘randomforest’: A random forest with swap importances
- ksamplerstr or knockpy.knockoffs.KnockoffSampler
The knockoff sampler to use in the knockoff filter. This may also be a string identifier, including: - ‘gaussian’: Gaussian Model-X knockoffs - ‘fx’: Fixed-X knockoffs - ‘metro’: Generic metropolized knockoff sampler. - ‘artk’: t-tailed Markov chain - ‘blockt’: Blocks of t-distributed - ‘gibbs_grid’: Discrete gibbs grid An alternative to specifying the ksampler is to simply pass in a knockoff matrix during the
forward
call.- fstat_kwargsdict
Kwargs to pass to the feature statistic
fit
function, excluding the required arguments, defaults to {}- knockoff_kwargsdict
Kwargs for instantiating the knockoff sampler argument if the ksampler argument is a string identifier. This can be the empt dict for some identifiers such as “gaussian” or “fx”, but additional keyword arguments are required for complex samplers such as the “metro” identifier. Defaults to {}
Examples
Here we fit the KnockoffFilter on fake data from a Gaussian linear model:
# Fake data-generating process for Gaussian linear model import knockpy as kpy dgprocess = kpy.dgp.DGP() dgprocess.sample_data(n=500, p=500, sparsity=0.1) # LCD statistic with Gaussian MX knockoffs # This uses LedoitWolf covariance estimation by default. from knockpy.knockoff_filter import KnockoffFilter kfilter = KnockoffFilter( fstat='lcd', ksampler='gaussian', knockoff_kwargs={"method":"mvr"}, ) rejections = kfilter.forward(X=dgprocess.X, y=dgprocess.y)
- Attributes
- fstatknockpy.knockoff_stats.FeatureStatistic
The feature statistics to use for inference. This inherits from
knockoff_stats.FeatureStatistic
.- ksamplerknockpy.knockoffs.KnockoffSampler
The knockoff sampler to use during inference. This eventually inherits from
knockoffs.KnockoffSampler
.- fstat_kwargsdict
Dictionary of kwargs to pass to the
fit
call ofself.fstat
.- knockoff_kwargsdict
If
ksampler
is not yet initialized, kwargs to pass toksampler
.- Znp.ndarray
a
2p
-dimsional array of feature and knockoff importances. The first p coordinates correspond to features, the last p correspond to knockoffs.- Wnp.ndarray
an array of feature statistics. This is
(p,)
-dimensional for regular knockoffs and(num_groups,)
-dimensional for group knockoffs.- Snp.ndarray
the
(p, p)
-shaped knockoff S-matrix used to generate knockoffs.- Xnp.ndarray
the
(n, p)
-shaped design matrix- Xknp.ndarray
the
(n, p)
-shaped matrix of knockoffs- groupsnp.ndarray
For group knockoffs, a p-length array of integers from 1 to num_groups such that
groups[j] == i
indicates that variable j is a member of group i. Defaults to None (regular knockoffs).- rejectionsnp.ndarray
a
(p,)
-shaped boolean array where rejections[j] == 1 iff the the knockoff filter rejects the jth feature.- Gnp.ndarray
the
(2p, 2p)
-shaped feature-knockoff covariance matrix- thresholdfloat
the knockoff data-dependent threshold used to select variables
Methods
forward
(X, y[, Xk, mu, Sigma, groups, fdr, …])Runs the knockoff filter; returns whether each feature was rejected.
make_selections
(W, fdr)” Calculate data dependent threshhold and selections
Samples knockoffs during
forward
.-
forward
(X, y, Xk=None, mu=None, Sigma=None, groups=None, fdr=0.1, fstat_kwargs={}, knockoff_kwargs={}, shrinkage='ledoitwolf', recycle_up_to=None)[source]¶ Runs the knockoff filter; returns whether each feature was rejected.
- Parameters
- Xnp.ndarray
(n, p)
-shaped design matrix- ynp.ndarray
(n,)
-shaped response vector- Xknp.ndarray
(n, p)
-shaped knockoff matrix. IfNone
, this will construct knockoffs usingself.ksampler
.- munp.ndarray
(p, )
-shaped mean of the features. IfNone
, this defaults to the empirical mean of the features.- Sigmanp.ndarray
(p, p)
-shaped covariance matrix of the features. IfNone
, this is estimated using theshrinkage
option. This is ignored for fixed-X knockoffs.- groupsnp.ndarray
For group knockoffs, a p-length array of integers from 1 to num_groups such that
groups[j] == i
indicates that variable j is a member of group i. Defaults toNone
(regular knockoffs).- fdrfloat
The desired level of false discovery rate control.
- fstat_kwargsdict
Extra kwargs to pass to the feature statistic
fit
function, excluding the required arguments.- knockoff_kwargsdict
Extra kwargs for instantiating the knockoff sampler argument if the ksampler argument is a string identifier. This can be the empty dict for some identifiers such as “gaussian” or “fx”, but additional keyword arguments are required for complex samplers such as the “metro” identifier. Defaults to {}
- shrinkagestr
Shrinkage method if estimating the covariance matrix. Defaults to “LedoitWolf.” Other options are “MLE” and “glasso” (graphical lasso).
- recycle_up_toint or float
- Three options:
if
None
, does nothing.if an integer > 1, uses the first “recycle_up_to”
rows of X as the the first “recycle_up_to” rows of knockoffs. - if a float between 0 and 1 (inclusive), interpreted as the proportion of rows to recycle.
For more on recycling, see https://arxiv.org/abs/1602.03574