\[\DeclareMathOperator{\erf}{erf} \DeclareMathOperator{\argmin}{argmin} \newcommand{\R}{\mathbb{R}} \newcommand{\n}{\boldsymbol{n}}\]

Module pyqt_fit.kernel_smoothing

Author:Pierre Barbier de Reuille <pierre.barbierdereuille@gmail.com>

Module implementing non-parametric regressions using kernel smoothing methods.

Kernel Smoothing Methods

class pyqt_fit.kernel_smoothing.SpatialAverage(xdata, ydata, cov=<function scotts_covariance at 0x2af4f6bc2938>)[source]

Perform a Nadaraya-Watson regression on the data (i.e. also called local-constant regression) using a gaussian kernel.

The Nadaraya-Watson estimate is given by:

\[f_n(x) \triangleq \frac{\sum_i K\left(\frac{x-X_i}{h}\right) Y_i} {\sum_i K\left(\frac{x-X_i}{h}\right)}\]

Where \(K(x)\) is the kernel and must be such that \(E(K(x)) = 0\) and \(h\) is the bandwidth of the method.

Parameters:
  • xdata (ndarray) – Explaining variables (at most 2D array)
  • ydata (ndarray) – Explained variables (should be 1D array)
  • cov (ndarray or callable) – If an ndarray, it should be a 2D array giving the matrix of covariance of the gaussian kernel. Otherwise, it should be a function cov(xdata, ydata) returning the covariance matrix.
__call__(*args, **kwords)[source]

This method is an alias for SpatialAverage.evaluate()

bandwidth[source]

Bandwidth of the kernel. It cannot be set directly, but rather should be set via the covariance attribute.

correction[source]

The correction coefficient allows to change the width of the kernel depending on the point considered. It can be either a constant (to correct globaly the kernel width), or a 1D array of same size as the input.

covariance[source]

Covariance of the gaussian kernel. Can be set either as a fixed value or using a bandwith calculator, that is a function of signature w(xdata, ydata) that returns a 2D matrix for the covariance of the kernel.

evaluate(points, result=None)[source]

Evaluate the spatial averaging on a set of points

Parameters:
  • points (ndarray) – Points to evaluate the averaging on
  • result (ndarray) – If provided, the result will be put in this array
set_density_correction()[source]

Add a correction coefficient depending on the density of the input

class pyqt_fit.kernel_smoothing.LocalLinearKernel1D(xdata, ydata, cov=<function scotts_covariance at 0x2af4f6bc2938>)[source]

Perform a local-linear regression using a gaussian kernel.

The local constant regression is the function that minimises, for each position:

\[f_n(x) \triangleq \argmin_{a_0\in\mathbb{R}} \sum_i K\left(\frac{x-X_i}{h}\right) \left(Y_i - a_0 - a_1(x-X_i)\right)^2\]

Where \(K(x)\) is the kernel and must be such that \(E(K(x)) = 0\) and \(h\) is the bandwidth of the method.

Parameters:
  • xdata (ndarray) – Explaining variables (at most 2D array)
  • ydata (ndarray) – Explained variables (should be 1D array)
  • cov (float or callable) – If an float, it should be a variance of the gaussian kernel. Otherwise, it should be a function cov(xdata, ydata) returning the variance.
__call__(*args, **kwords)[source]

This method is an alias for LocalLinearKernel1D.evaluate()

bandwidth[source]

Bandwidth of the kernel.

covariance[source]

Covariance of the gaussian kernel. Can be set either as a fixed value or using a bandwith calculator, that is a function of signature w(xdata, ydata) that returns a single value.

Note

A ndarray with a single value will be converted to a floating point value.

evaluate(points, out=None)[source]

Evaluate the spatial averaging on a set of points

Parameters:
  • points (ndarray) – Points to evaluate the averaging on
  • result (ndarray) – If provided, the result will be put in this array
class pyqt_fit.kernel_smoothing.LocalPolynomialKernel1D(xdata, ydata, q=3, **kwords)[source]

Perform a local-polynomial regression using a user-provided kernel (Gaussian by default).

The local constant regression is the function that minimises, for each position:

\[f_n(x) \triangleq \argmin_{a_0\in\mathbb{R}} \sum_i K\left(\frac{x-X_i}{h}\right) \left(Y_i - a_0 - a_1(x-X_i) - \ldots - a_q \frac{(x-X_i)^q}{q!}\right)^2\]

Where \(K(x)\) is the kernel such that \(E(K(x)) = 0\), \(q\) is the order of the fitted polynomial and \(h\) is the bandwidth of the method. It is also recommended to have \(\int_\mathbb{R} x^2K(x)dx = 1\), (i.e. variance of the kernel is 1) or the effective bandwidth will be scaled by the square-root of this integral (i.e. the standard deviation of the kernel).

Parameters:
  • xdata (ndarray) – Explaining variables (at most 2D array)
  • ydata (ndarray) – Explained variables (should be 1D array)
  • q (int) – Order of the polynomial to fit. Default: 3
  • cov (float or callable) – If an float, it should be a variance of the gaussian kernel. Otherwise, it should be a function cov(xdata, ydata) returning the variance. Default: scotts_covariance
__call__(*args, **kwords)[source]

This method is an alias for LocalLinearKernel1D.evaluate()

bandwidth[source]

Bandwidth of the kernel.

covariance[source]

Covariance of the gaussian kernel. Can be set either as a fixed value or using a bandwith calculator, that is a function of signature w(xdata, ydata) that returns a single value.

Note

A ndarray with a single value will be converted to a floating point value.

evaluate(points, out=None)[source]

Evaluate the spatial averaging on a set of points

Parameters:
  • points (ndarray) – Points to evaluate the averaging on
  • result (ndarray) – If provided, the result will be put in this array
class pyqt_fit.kernel_smoothing.LocalPolynomialKernel(xdata, ydata, q=3, cov=<function scotts_covariance at 0x2af4f6bc2938>, kernel=None)[source]

Perform a local-polynomial regression in N-D using a user-provided kernel (Gaussian by default).

The local constant regression is the function that minimises, for each position:

\[f_n(x) \triangleq \argmin_{a_0\in\mathbb{R}} \sum_i K\left(\frac{x-X_i}{h}\right) \left(Y_i - a_0 - \mathcal{P}_q(X_i-x)\right)^2\]

Where \(K(x)\) is the kernel such that \(E(K(x)) = 0\), \(q\) is the order of the fitted polynomial, \(\mathcal{P}_q(x)\) is a polynomial of order \(d\) in \(x\) and \(h\) is the bandwidth of the method.

The polynomial \(\mathcal{P}_q(x)\) is of the form:

\[\mathcal{F}_d(k) = \left\{ \n \in \mathbb{N}^d \middle| \sum_{i=1}^d n_i = k \right\}\]\[\mathcal{P}_q(x_1,\ldots,x_d) = \sum_{k=1}^q \sum_{\n\in\mathcal{F}_d(k)} a_{k,\n} \prod_{i=1}^d x_i^{n_i}\]

For example we have:

\[\mathcal{P}_2(x,y) = a_{110} x + a_{101} y + a_{220} x^2 + a_{211} xy + a_{202} y^2\]
Parameters:
  • xdata (ndarray) – Explaining variables (at most 2D array). The shape should be (N,D) with D the dimension of the problem and N the number of points. For 1D array, the shape can be (N,), in which case it will be converted to (N,1) array.
  • ydata (ndarray) – Explained variables (should be 1D array). The shape must be (N,).
  • q (int) – Order of the polynomial to fit. Default: 3
  • kernel (callable) – Kernel to use for the weights. Call is kernel(points) and should return an array of values the same size as points. If None, the kernel will be normal_kernel(D).
  • cov (float or callable) – If an float, it should be a variance of the gaussian kernel. Otherwise, it should be a function cov(xdata, ydata) returning the variance. Default: scotts_covariance
__call__(*args, **kwords)[source]

This method is an alias for LocalLinearKernel1D.evaluate()

bandwidth[source]

Bandwidth of the kernel.

covariance[source]

Covariance of the gaussian kernel. Can be set either as a fixed value or using a bandwith calculator, that is a function of signature w(xdata, ydata) that returns a DxD matrix.

Note

A ndarray with a single value will be converted to a floating point value.

evaluate(points, out=None)[source]

Evaluate the spatial averaging on a set of points

Parameters:
  • points (ndarray) – Points to evaluate the averaging on
  • out (ndarray) – Pre-allocated array for the result

Utility functions

class pyqt_fit.kernel_smoothing.PolynomialDesignMatrix(dim, deg)[source]

Class used to create a design matrix for polynomial regression

__call__(x, out=None)[source]

Creates the design matrix for polynomial fitting using the points x.

Parameters:
  • x (ndarray) – Points to create the design matrix. Shape must be (D,N) or (N,), where D is the dimension of the problem, 1 if not there.
  • deg (int) – Degree of the fitting polynomial
  • factors (ndarray) – Scaling factor for the columns of the design matrix. The shape should be (M,) or (M,1), where M is the number of columns of the out. This value can be obtained using the designMatrixSize() function.
Returns:

The design matrix as a (M,N) matrix.

Table Of Contents

Previous topic

Module pyqt_fit.bootstrap

Next topic

Module pyqt_fit.kde

This Page