Author: | Pierre Barbier de Reuille <pierre.barbierdereuille@gmail.com> |
---|
Module implementing kernel-based estimation of density of probability.
Perform a kernel based density estimation in 1D, possibly on a bounded domain \([L,U]\).
Parameters: | data (ndarray) – 1D array with the data points |
---|
Any other named argument will be equivalent to setting the property after the fact. For example:
>>> xs = [1,2,3]
>>> k = KDE1D(xs, lower=0)
will be equivalent to:
>>> k = KDE1D(xs)
>>> k.lower = 0
The method rely on an estimator of kernel density given by:
where \(h\) is the bandwidth of the kernel (bandwidth), and \(K\) is the kernel used for the density estimation (kernel), \(w_i\) are the weights of the data points (weights) and \(\lambda_i\) are the adaptation factor of the kernel width (lambdas). \(K\) should be a function such that:
Which translates into, the function should be of sum 1 (i.e. a valid density of probability), of average 0 (i.e. centered) and of finite variance. It is even recommanded that the variance is close to 1 to give a uniform meaning to the bandwidth.
If the domain of the density estimation is bounded to the interval \([L,U]\) (i.e. from lower to upper), the density is then estimated with:
Where \(\hat{K}\) is a modified kernel that depends on the exact method used.
To express the various methods, we will refer to the following functions:
The default methods are implemented in the kde_methods module.
Bandwidth of the kernel. Can be set either as a fixed value or using a bandwidth calculator, that is a function of signature w(xdata) that returns a single value.
Note
A ndarray with a single value will be converted to a floating point value.
Covariance of the gaussian kernel. Can be set either as a fixed value or using a bandwidth calculator, that is a function of signature w(xdata) that returns a single value.
Note
A ndarray with a single value will be converted to a floating point value.
Evaluate the density on a grid of N points spanning the whole dataset.
Returns: | a tuple with the mesh on which the density is evaluated and |
---|
the density itself
Kernel object. Should provide the following methods:
By default, the kernel is an instance of kernels.normal_kernel1d
Scaling of the bandwidth, per data point. It can be either a single value or an array with one value per data point.
When deleted, the lamndas are reset to 1.
Select the method to use. Available methods in the pyqt_fit.kde_methods sub-module.
The method is an object that should provide the following:
Compute the Kernel Density Estimate of a dataset, transforming it first to a domain where distances are “more meaningful”.
Often, KDE is best estimated in a different domain. This object takes a KDE1D object (or one compatible), and a transformation function.
Given a random variable \(X\) of distribution \(f_X\), the random variable \(Y = g(X)\) has a distribution \(f_Y\) given by:
In our term, \(Y\) is the random variable the user is interested in, and \(X\) the random variable we can estimate using the KDE. In this case, \(g\) is the transform from \(Y\) to \(X\).
So to estimate the distribution on a set of points given in \(x\), we need a total of three functions:
- Direct function: transform from the original space to the one in which the KDE will be perform (i.e. \(g^{-1}: y \mapsto x\))
- Invert function: transform from the KDE space to the original one (i.e. \(g: x \mapsto y\))
- Derivative of the invert function
If the derivative is not provided, it will be estimated numerically.
Parameters: |
|
---|
Any unknown member is forwarded to the underlying KDE object.
Returns the covariance matrix:
where \(\tau\) is a correcting factor that depends on the method.
The Silverman bandwidth is defined as a variance bandwidth with factor:
The Scotts bandwidth is defined as a variance bandwidth with factor:
Implementation of the KDE bandwidth selection method outline in:
Z. I. Botev, J. F. Grotowski, and D. P. Kroese. Kernel density estimation via diffusion. The Annals of Statistics, 38(5):2916-2957, 2010.
Based on the implementation of Daniel B. Smith, PhD.
The object is a callable returning the bandwidth for a 1D kernel.