Knockoff Filter API Reference

class knockpy.knockoff_filter.KnockoffFilter(fstat='lasso', ksampler='gaussian', fstat_kwargs={}, knockoff_kwargs={})[source]

Performs knockoff-based inference, from start to finish.

This wraps both the knockoffs.KnockoffSampler and knockoff_stats.FeatureStatistic classes.

Parameters
fstatstr or knockpy.knockoff_stats.FeatureStatistic

The feature statistic to use in the knockoff filter. This may also be a string identifier, including: - ‘lasso’ or ‘lcd’: cross-validated lasso coefficients differences - ‘lsm’: signed maximum of the lasso path statistic as

in Barber and Candes 2015

  • ‘dlasso’: Cross-validated debiased lasso coefficients

  • ‘ridge’: Cross validated ridge coefficients

  • ‘ols’: Ordinary least squares coefficients

  • ‘margcorr’: marginal correlations between features and response

  • ‘deeppink’: The deepPINK statistic as in Lu et al. 2018

  • ‘randomforest’: A random forest with swap importances

ksamplerstr or knockpy.knockoffs.KnockoffSampler

The knockoff sampler to use in the knockoff filter. This may also be a string identifier, including: - ‘gaussian’: Gaussian Model-X knockoffs - ‘fx’: Fixed-X knockoffs - ‘metro’: Generic metropolized knockoff sampler. - ‘artk’: t-tailed Markov chain - ‘blockt’: Blocks of t-distributed - ‘gibbs_grid’: Discrete gibbs grid An alternative to specifying the ksampler is to simply pass in a knockoff matrix during the forward call.

fstat_kwargsdict

Kwargs to pass to the feature statistic fit function, excluding the required arguments, defaults to {}

knockoff_kwargsdict

Kwargs for instantiating the knockoff sampler argument if the ksampler argument is a string identifier. This can be the empt dict for some identifiers such as “gaussian” or “fx”, but additional keyword arguments are required for complex samplers such as the “metro” identifier. Defaults to {}

Examples

Here we fit the KnockoffFilter on fake data from a Gaussian linear model:

# Fake data-generating process for Gaussian linear model
import knockpy as kpy
dgprocess = kpy.dgp.DGP()
dgprocess.sample_data(n=500, p=500, sparsity=0.1)

# LCD statistic with Gaussian MX knockoffs
# This uses LedoitWolf covariance estimation by default.
from knockpy.knockoff_filter import KnockoffFilter 
kfilter = KnockoffFilter( 
    fstat='lcd', 
    ksampler='gaussian', 
    knockoff_kwargs={"method":"mvr"}, 
)
rejections = kfilter.forward(X=dgprocess.X, y=dgprocess.y)
Attributes
fstatknockpy.knockoff_stats.FeatureStatistic

The feature statistics to use for inference. This inherits from knockoff_stats.FeatureStatistic.

ksamplerknockpy.knockoffs.KnockoffSampler

The knockoff sampler to use during inference. This eventually inherits from knockoffs.KnockoffSampler.

fstat_kwargsdict

Dictionary of kwargs to pass to the fit call of self.fstat.

knockoff_kwargsdict

If ksampler is not yet initialized, kwargs to pass to ksampler.

Znp.ndarray

a 2p-dimsional array of feature and knockoff importances. The first p coordinates correspond to features, the last p correspond to knockoffs.

Wnp.ndarray

an array of feature statistics. This is (p,)-dimensional for regular knockoffs and (num_groups,)-dimensional for group knockoffs.

Snp.ndarray

the (p, p)-shaped knockoff S-matrix used to generate knockoffs.

Xnp.ndarray

the (n, p)-shaped design matrix

Xknp.ndarray

the (n, p)-shaped matrix of knockoffs

groupsnp.ndarray

For group knockoffs, a p-length array of integers from 1 to num_groups such that groups[j] == i indicates that variable j is a member of group i. Defaults to None (regular knockoffs).

rejectionsnp.ndarray

a (p,)-shaped boolean array where rejections[j] == 1 iff the the knockoff filter rejects the jth feature.

Gnp.ndarray

the (2p, 2p)-shaped feature-knockoff covariance matrix

thresholdfloat

the knockoff data-dependent threshold used to select variables

Methods

forward(X, y[, Xk, mu, Sigma, groups, fdr, …])

Runs the knockoff filter; returns whether each feature was rejected.

make_selections(W, fdr)

” Calculate data dependent threshhold and selections

sample_knockoffs()

Samples knockoffs during forward.

forward(X, y, Xk=None, mu=None, Sigma=None, groups=None, fdr=0.1, fstat_kwargs={}, knockoff_kwargs={}, shrinkage='ledoitwolf', recycle_up_to=None)[source]

Runs the knockoff filter; returns whether each feature was rejected.

Parameters
Xnp.ndarray

(n, p)-shaped design matrix

ynp.ndarray

(n,)-shaped response vector

Xknp.ndarray

(n, p)-shaped knockoff matrix. If None, this will construct knockoffs using self.ksampler.

munp.ndarray

(p, )-shaped mean of the features. If None, this defaults to the empirical mean of the features.

Sigmanp.ndarray

(p, p)-shaped covariance matrix of the features. If None, this is estimated using the shrinkage option. This is ignored for fixed-X knockoffs.

groupsnp.ndarray

For group knockoffs, a p-length array of integers from 1 to num_groups such that groups[j] == i indicates that variable j is a member of group i. Defaults to None (regular knockoffs).

fdrfloat

The desired level of false discovery rate control.

fstat_kwargsdict

Extra kwargs to pass to the feature statistic fit function, excluding the required arguments.

knockoff_kwargsdict

Extra kwargs for instantiating the knockoff sampler argument if the ksampler argument is a string identifier. This can be the empty dict for some identifiers such as “gaussian” or “fx”, but additional keyword arguments are required for complex samplers such as the “metro” identifier. Defaults to {}

shrinkagestr

Shrinkage method if estimating the covariance matrix. Defaults to “LedoitWolf.” Other options are “MLE” and “glasso” (graphical lasso).

recycle_up_toint or float
Three options:
  • if None, does nothing.

  • if an integer > 1, uses the first “recycle_up_to”

rows of X as the the first “recycle_up_to” rows of knockoffs. - if a float between 0 and 1 (inclusive), interpreted as the proportion of rows to recycle.

For more on recycling, see https://arxiv.org/abs/1602.03574

make_selections(W, fdr)[source]

” Calculate data dependent threshhold and selections

sample_knockoffs()[source]

Samples knockoffs during forward.