KernelTest#
- class QuadratiK.kernel_test.KernelTest(h=None, method='subsampling', num_iter=150, b=0.9, quantile=0.95, mu_hat=None, sigma_hat=None, centering_type='nonparam', alternative=None, k_threshold=10, random_state=None, n_jobs=8)#
Class for performing the kernel-based quadratic distance goodness-of-fit tests using the Gaussian kernel with tuning parameter h. Depending on the input y the function performs the test of multivariate normality, the non-parametric two-sample tests or the k-sample tests.
Parameters#
- hfloat, optional
Bandwidth for the kernel function.
- methodstr, optional
The method used for critical value estimation (“subsampling”, “bootstrap”, or “permutation”).
- num_iterint, optional
The number of iterations to use for critical value estimation. Defaults to 150.
- bfloat, optional
The size of the subsamples used in the subsampling algorithm. Defaults to 0.9.
- quantilefloat, optional
The quantile to use for critical value estimation. Defaults to 0.95.
- mu_hatnumpy.ndarray, optional
Mean vector for the reference distribution. Defaults to None.
- sigma_hatnumpy.ndarray, optional
Covariance matrix of the reference distribution. Defaults to None.
- alternativestr, optional
String indicating the type of alternative to be used for calculating “h” by the tuning parameter selection algorithm when h is not provided. Defaults to ‘None’
- k_thresholdint, optional
Maximum number of groups allowed. Defaults to 10. Change in case of more than 10 groups.
- random_stateint, None, optional.
Seed for random number generation. Defaults to None
- n_jobsint, optional.
n_jobs specifies the maximum number of concurrently running workers. If 1 is given, no joblib parallelism is used at all, which is useful for debugging. For more information on joblib n_jobs refer to - https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html. Defaults to 8.
Attributes#
- For Normality Test:
- test_type_str
The type of test performed on the data
- execution_timefloat
Time taken for the test method to execute
- un_h0_rejected_boolean
Whether the null hypothesis using Un is rejected (True) or not (False)
- vn_h0_rejected_boolean
Whether the null hypothesis using Vn is rejected (True) or not (False)
- un_test_statistic_float
Un Test statistic of the perfomed test type
- vn_test_statistic_float
Vn Test statistic of the perfomed test type
- un_cv_float
Critical value for Un
- un_cv_float
Critical value for Vn
- For Two-Sample and K-Sample Test:
- test_type_str
The type of test performed on the data
- execution_timefloat
Time taken for the test method to execute
- un_h0_rejected_boolean
Whether the null hypothesis using Un is rejected (True) or not (False)
- un_test_statistic_float
Un Test statistic of the perfomed test type
- un_cv_float
Critical value for Un
- un_cv_float
Critical value for Vn
- cv_method_str
Critical value method used for performing the test
References#
Markatou M., Saraceno G., Chen Y (2023). “Two- and k-Sample Tests Based on Quadratic Distances. ”Manuscript, (Department of Biostatistics, University at Buffalo)
Lindsay BG, Markatou M. & Ray S. (2014) Kernels, Degrees of Freedom, and Power Properties of Quadratic Distance Goodness-of-Fit Tests, Journal of the American Statistical Association, 109:505, 395-410, DOI: 10.1080/01621459.2013.836972
Examples#
>>> import numpy as np >>> np.random.seed(78990) >>> from QuadratiK.kernel_test import KernelTest >>> # data generation >>> data_norm = np.random.multivariate_normal(mean = np.zeros(4), cov = np.eye(4),size = 500) >>> # performing the normality test >>> normality_test = KernelTest(h=0.4, num_iter=150, method= "subsampling", random_state=42).test(data_norm) >>> print(f"Test : {normality_test.test_type_}") >>> print(f"Execution time: {normality_test.execution_time:.3f}") >>> print(f"H0 is Rejected : {normality_test.un_h0_rejected_}") >>> print(f"Test Statistic : {normality_test.un_test_statistic_}") >>> print(f"Critical Value (CV) : {normality_test.un_cv_}") >>> print(f"CV Method : {normality_test.cv_method_}") ... Test : Kernel-based quadratic distance Normality test ... Execution time: 0.356 ... H0 is Rejected : False ... Test Statistic : 0.01018599246239244 ... Critical Value (CV) : 0.07765034009837886 ... CV Method : Empirical
>>> import numpy as np >>> np.random.seed(0) >>> from scipy.stats import skewnorm >>> from QuadratiK.kernel_test import KernelTest >>> # data generation >>> X_2 = np.random.multivariate_normal(mean = np.zeros(4), cov = np.eye(4), size=200) >>> Y_2 = skewnorm.rvs(size=(200, 4),loc=np.zeros(4), scale=np.ones(4),a=np.repeat(0.5,4), random_state=20) >>> # performing the two sample test >>> two_sample_test = KernelTest(h = 2,num_iter = 150, random_state=42).test(X_2,Y_2) >>> print("Test : {}".format(two_sample_test.test_type_)) >>> print("Execution time: {:.3f} seconds".format(two_sample_test.execution_time)) >>> print("H0 is Rejected : {}".format(two_sample_test.un_h0_rejected_)) >>> print("Test Statistic : {}".format(two_sample_test.un_test_statistic_)) >>> print("Critical Value (CV) : {}".format(two_sample_test.un_cv_)) >>> print("CV Method : {}".format(two_sample_test.cv_method_)) >>> print("Selected tuning parameter : {}".format(two_sample_test.h)) ... Test : Kernel-based quadratic distance two-sample test ... Execution time: 1.900 seconds ... H0 is Rejected : [ True True] ... Test Statistic : [ 5.061213 15.75171816] ... Critical Value (CV) : [0.49011552 1.52578287] ... CV Method : subsampling ... Selected tuning parameter : 2
Methods
Function to generate descriptive statistics per variable (and per group if available). |
|
|
Summary function generates a table for the kernel test results and the summary statistics. |
|
Function to perform the kernel-based quadratic distance tests using the Gaussian kernel with bandwidth parameter h. |
- KernelTest.stats()#
Function to generate descriptive statistics per variable (and per group if available).
Returns#
- summary_stats_dfpandas.DataFrame
Dataframe of descriptive statistics
- KernelTest.summary(print_fmt='simple_grid')#
Summary function generates a table for the kernel test results and the summary statistics.
Parameters#
- print_fmtstr, optional.
Used for printing the output in the desired format. Defaults to “simple_grid”. Supports all available options in tabulate, see here: https://pypi.org/project/tabulate/
Returns#
- summarystr
A string formatted in the desired output format with the kernel test results and summary statistics.
- KernelTest.test(x, y=None)#
Function to perform the kernel-based quadratic distance tests using the Gaussian kernel with bandwidth parameter h. Depending on the shape of the y, the function performs the tests of multivariate normality, the non-parametric two-sample tests or the k-sample tests.
Parameters#
- xnumpy.ndarray or pandas.DataFrame.
A numeric array of data values.
- ynumpy.ndarray or pandas.DataFrame, optional
A numeric array data values (for two-sample test) and a 1D array of class labels (for k-sample test). Defaults to None.
Returns#
- selfobject
Fitted estimator