select_h#

QuadratiK.kernel_test.select_h(x, y=None, alternative='location', method='subsampling', b=0.8, num_iter=150, delta_dim=1, delta=None, h_values=None, n_rep=50, n_jobs=8, quantile=0.95, k_threshold=10, power_plot=False, random_state=None)#

This function computes the kernel bandwidth of the Gaussian kernel for the one sample, two-sample and k-sample kernel-based quadratic distance (KBQD) tests.

The function performs the selection of the optimal value for the tuning parameter h of the normal kernel function, for the two-sample and k-sample KBQD tests. It performs a small simulation study, generating samples according to the family of alternative specified, for the chosen values of h_values and delta.

Parameters#

xnumpy.ndarray or pandas.DataFrame

Data set of observations from X

ynumpy.ndarray or pandas.DataFrame, optional

Data set of observations from Y for two sample test or set of labels in case of k-sample test

alternativestr, optional

Family of alternative chosen for selecting h, must be one of “location”, “scale” and “skewness”. Defaults to “location”

methodstr, optional.

The method used for critical value estimation, must be one of “subsampling”, “bootstrap”, or “permutation”. Defaults to “subsampling”.

bfloat, optional.

The size of the subsamples used in the subsampling algorithm. Defaults to 0.8.

num_iterint, optional.

The number of iterations to use for critical value estimation. Defaults to 150.

delta_dimint, numpy.ndarray, optional.

Array of coefficient of alternative with respect to each dimension. Defaults to 1.

deltanumpy.ndarray, optional.

Array of parameter values indicating chosen alternatives. Defaults to None.

h_valuesnumpy.ndarray, optional.

Values of the tuning parameter used for the selection. Defaults to None.

n_repint, optional. Defaults to 50.

Number of bootstrap replications

n_jobsint, optional.

n_jobs specifies the maximum number of concurrently running workers. If 1 is given, no joblib parallelism is used at all, which is useful for debugging. For more information on joblib n_jobs refer to - https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html. Defaults to 8.

quantilefloat, optional.

Quantile to use for critical value estimation. Defaults to 0.95.

k_thresholdint.

Maximum number of groups allowed. Defaults to 10.

power_plotboolean, optional.

If True, plot is displayed the plot of power for values in h_values and delta. Defaults to False.

random_stateint, None, optional.

Seed for random number generation. Defaults to None

Returns#

hfloat

The selected value of tuning parameter h

h vs Power tablepandas.DataFrame

A table containing the h, delta and corresponding powers

References#

Markatou M., Saraceno G., Chen Y. (2023). “Two- and k-Sample Tests Based on Quadratic Distances. ”Manuscript, (Department of Biostatistics, University at Buffalo)

Examples#

>>> import numpy as np
>>> from QuadratiK.kernel_test import select_h
>>> np.random.seed(42)
>>> X = np.random.randn(200, 2)
>>> np.random.seed(42)
>>> y = np.random.randint(0, 2, 200)
>>> h_selected, all_values, power_plot = select_h(
...    X, y, alternative='location', power_plot=True, random_state=42)
>>> print("Selected h is: ", h_selected)
... Selected h is:  1.2