KernelTest#

class QuadratiK.kernel_test.KernelTest(h=None, method='subsampling', num_iter=150, b=0.9, quantile=0.95, mu_hat=None, sigma_hat=None, centering_type='nonparam', alternative=None, k_threshold=10, random_state=None, n_jobs=8)#

Class for performing the kernel-based quadratic distance goodness-of-fit tests using the Gaussian kernel with tuning parameter h. Depending on the input y the function performs the test of multivariate normality, the non-parametric two-sample tests or the k-sample tests.

Parameters#

hfloat, optional

Bandwidth for the kernel function.

methodstr, optional

The method used for critical value estimation (“subsampling”, “bootstrap”, or “permutation”).

num_iterint, optional

The number of iterations to use for critical value estimation. Defaults to 150.

bfloat, optional

The size of the subsamples used in the subsampling algorithm. Defaults to 0.9.

quantilefloat, optional

The quantile to use for critical value estimation. Defaults to 0.95.

mu_hatnumpy.ndarray, optional

Mean vector for the reference distribution. Defaults to None.

sigma_hatnumpy.ndarray, optional

Covariance matrix of the reference distribution. Defaults to None.

alternativestr, optional

String indicating the type of alternative to be used for calculating “h” by the tuning parameter selection algorithm when h is not provided. Defaults to ‘None’

k_thresholdint, optional

Maximum number of groups allowed. Defaults to 10. Change in case of more than 10 groups.

random_stateint, None, optional.

Seed for random number generation. Defaults to None

n_jobsint, optional.

n_jobs specifies the maximum number of concurrently running workers. If 1 is given, no joblib parallelism is used at all, which is useful for debugging. For more information on joblib n_jobs refer to - https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html. Defaults to 8.

Attributes#

For Normality Test:
test_type_str

The type of test performed on the data

execution_timefloat

Time taken for the test method to execute

un_h0_rejected_boolean

Whether the null hypothesis using Un is rejected (True) or not (False)

vn_h0_rejected_boolean

Whether the null hypothesis using Vn is rejected (True) or not (False)

un_test_statistic_float

Un Test statistic of the perfomed test type

vn_test_statistic_float

Vn Test statistic of the perfomed test type

un_cv_float

Critical value for Un

un_cv_float

Critical value for Vn

For Two-Sample and K-Sample Test:
test_type_str

The type of test performed on the data

execution_timefloat

Time taken for the test method to execute

un_h0_rejected_boolean

Whether the null hypothesis using Un is rejected (True) or not (False)

un_test_statistic_float

Un Test statistic of the perfomed test type

un_cv_float

Critical value for Un

un_cv_float

Critical value for Vn

cv_method_str

Critical value method used for performing the test

References#

Markatou M., Saraceno G., Chen Y (2023). “Two- and k-Sample Tests Based on Quadratic Distances. ”Manuscript, (Department of Biostatistics, University at Buffalo)

Lindsay BG, Markatou M. & Ray S. (2014) Kernels, Degrees of Freedom, and Power Properties of Quadratic Distance Goodness-of-Fit Tests, Journal of the American Statistical Association, 109:505, 395-410, DOI: 10.1080/01621459.2013.836972

Examples#

>>> import numpy as np
>>> np.random.seed(78990)
>>> from QuadratiK.kernel_test import KernelTest
>>> # data generation
>>> data_norm = np.random.multivariate_normal(mean = np.zeros(4), cov = np.eye(4),size = 500)
>>> # performing the normality test
>>> normality_test = KernelTest(h=0.4, num_iter=150, method= "subsampling", random_state=42).test(data_norm)
>>> print(f"Test : {normality_test.test_type_}")
>>> print(f"Execution time: {normality_test.execution_time:.3f}")
>>> print(f"H0 is Rejected : {normality_test.un_h0_rejected_}")
>>> print(f"Test Statistic : {normality_test.un_test_statistic_}")
>>> print(f"Critical Value (CV) : {normality_test.un_cv_}")
>>> print(f"CV Method : {normality_test.cv_method_}")
... Test : Kernel-based quadratic distance Normality test
... Execution time: 0.356
... H0 is Rejected : False
... Test Statistic : 0.01018599246239244
... Critical Value (CV) : 0.07765034009837886
... CV Method : Empirical
>>> import numpy as np
>>> np.random.seed(0)
>>> from scipy.stats import skewnorm
>>> from QuadratiK.kernel_test import KernelTest
>>> # data generation
>>> X_2 = np.random.multivariate_normal(mean = np.zeros(4), cov = np.eye(4), size=200)
>>> Y_2 = skewnorm.rvs(size=(200, 4),loc=np.zeros(4), scale=np.ones(4),a=np.repeat(0.5,4), random_state=20)
>>> # performing the two sample test
>>> two_sample_test = KernelTest(h = 2,num_iter = 150, random_state=42).test(X_2,Y_2)
>>> print("Test : {}".format(two_sample_test.test_type_))
>>> print("Execution time: {:.3f} seconds".format(two_sample_test.execution_time))
>>> print("H0 is Rejected : {}".format(two_sample_test.un_h0_rejected_))
>>> print("Test Statistic : {}".format(two_sample_test.un_test_statistic_))
>>> print("Critical Value (CV) : {}".format(two_sample_test.un_cv_))
>>> print("CV Method : {}".format(two_sample_test.cv_method_))
>>> print("Selected tuning parameter : {}".format(two_sample_test.h))
... Test : Kernel-based quadratic distance two-sample test
... Execution time: 1.900 seconds
... H0 is Rejected : [ True  True]
... Test Statistic : [ 5.061213   15.75171816]
... Critical Value (CV) : [0.49011552 1.52578287]
... CV Method : subsampling
... Selected tuning parameter : 2

Methods

KernelTest.stats()

Function to generate descriptive statistics per variable (and per group if available).

KernelTest.summary([print_fmt])

Summary function generates a table for the kernel test results and the summary statistics.

KernelTest.test(x[, y])

Function to perform the kernel-based quadratic distance tests using the Gaussian kernel with bandwidth parameter h.


KernelTest.stats()#

Function to generate descriptive statistics per variable (and per group if available).

Returns#

summary_stats_dfpandas.DataFrame

Dataframe of descriptive statistics

KernelTest.summary(print_fmt='simple_grid')#

Summary function generates a table for the kernel test results and the summary statistics.

Parameters#

print_fmtstr, optional.

Used for printing the output in the desired format. Defaults to “simple_grid”. Supports all available options in tabulate, see here: https://pypi.org/project/tabulate/

Returns#

summarystr

A string formatted in the desired output format with the kernel test results and summary statistics.

KernelTest.test(x, y=None)#

Function to perform the kernel-based quadratic distance tests using the Gaussian kernel with bandwidth parameter h. Depending on the shape of the y, the function performs the tests of multivariate normality, the non-parametric two-sample tests or the k-sample tests.

Parameters#

xnumpy.ndarray or pandas.DataFrame.

A numeric array of data values.

ynumpy.ndarray or pandas.DataFrame, optional

A numeric array data values (for two-sample test) and a 1D array of class labels (for k-sample test). Defaults to None.

Returns#

selfobject

Fitted estimator