Coverage for /home/martinb/.local/share/virtualenvs/camcops/lib/python3.6/site-packages/statsmodels/nonparametric/smoothers_lowess.py : 9%

Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
1# -*- coding: utf-8 -*-
2"""Lowess - wrapper for cythonized extension
4Author : Chris Jordan-Squire
5Author : Carl Vogel
6Author : Josef Perktold
8"""
10import numpy as np
11from ._smoothers_lowess import lowess as _lowess
13def lowess(endog, exog, frac=2.0/3.0, it=3, delta=0.0, is_sorted=False,
14 missing='drop', return_sorted=True):
15 '''LOWESS (Locally Weighted Scatterplot Smoothing)
17 A lowess function that outs smoothed estimates of endog
18 at the given exog values from points (exog, endog)
20 Parameters
21 ----------
22 endog : 1-D numpy array
23 The y-values of the observed points
24 exog : 1-D numpy array
25 The x-values of the observed points
26 frac : float
27 Between 0 and 1. The fraction of the data used
28 when estimating each y-value.
29 it : int
30 The number of residual-based reweightings
31 to perform.
32 delta : float
33 Distance within which to use linear-interpolation
34 instead of weighted regression.
35 is_sorted : bool
36 If False (default), then the data will be sorted by exog before
37 calculating lowess. If True, then it is assumed that the data is
38 already sorted by exog.
39 missing : str
40 Available options are 'none', 'drop', and 'raise'. If 'none', no nan
41 checking is done. If 'drop', any observations with nans are dropped.
42 If 'raise', an error is raised. Default is 'drop'.
43 return_sorted : bool
44 If True (default), then the returned array is sorted by exog and has
45 missing (nan or infinite) observations removed.
46 If False, then the returned array is in the same length and the same
47 sequence of observations as the input array.
49 Returns
50 -------
51 out: ndarray, float
52 The returned array is two-dimensional if return_sorted is True, and
53 one dimensional if return_sorted is False.
54 If return_sorted is True, then a numpy array with two columns. The
55 first column contains the sorted x (exog) values and the second column
56 the associated estimated y (endog) values.
57 If return_sorted is False, then only the fitted values are returned,
58 and the observations will be in the same order as the input arrays.
60 Notes
61 -----
62 This lowess function implements the algorithm given in the
63 reference below using local linear estimates.
65 Suppose the input data has N points. The algorithm works by
66 estimating the `smooth` y_i by taking the frac*N closest points
67 to (x_i,y_i) based on their x values and estimating y_i
68 using a weighted linear regression. The weight for (x_j,y_j)
69 is tricube function applied to abs(x_i-x_j).
71 If it > 1, then further weighted local linear regressions
72 are performed, where the weights are the same as above
73 times the _lowess_bisquare function of the residuals. Each iteration
74 takes approximately the same amount of time as the original fit,
75 so these iterations are expensive. They are most useful when
76 the noise has extremely heavy tails, such as Cauchy noise.
77 Noise with less heavy-tails, such as t-distributions with df>2,
78 are less problematic. The weights downgrade the influence of
79 points with large residuals. In the extreme case, points whose
80 residuals are larger than 6 times the median absolute residual
81 are given weight 0.
83 `delta` can be used to save computations. For each `x_i`, regressions
84 are skipped for points closer than `delta`. The next regression is
85 fit for the farthest point within delta of `x_i` and all points in
86 between are estimated by linearly interpolating between the two
87 regression fits.
89 Judicious choice of delta can cut computation time considerably
90 for large data (N > 5000). A good choice is ``delta = 0.01 * range(exog)``.
92 Some experimentation is likely required to find a good
93 choice of `frac` and `iter` for a particular dataset.
95 References
96 ----------
97 Cleveland, W.S. (1979) "Robust Locally Weighted Regression
98 and Smoothing Scatterplots". Journal of the American Statistical
99 Association 74 (368): 829-836.
101 Examples
102 --------
103 The below allows a comparison between how different the fits from
104 lowess for different values of frac can be.
106 >>> import numpy as np
107 >>> import statsmodels.api as sm
108 >>> lowess = sm.nonparametric.lowess
109 >>> x = np.random.uniform(low = -2*np.pi, high = 2*np.pi, size=500)
110 >>> y = np.sin(x) + np.random.normal(size=len(x))
111 >>> z = lowess(y, x)
112 >>> w = lowess(y, x, frac=1./3)
114 This gives a similar comparison for when it is 0 vs not.
116 >>> import numpy as np
117 >>> import scipy.stats as stats
118 >>> import statsmodels.api as sm
119 >>> lowess = sm.nonparametric.lowess
120 >>> x = np.random.uniform(low = -2*np.pi, high = 2*np.pi, size=500)
121 >>> y = np.sin(x) + stats.cauchy.rvs(size=len(x))
122 >>> z = lowess(y, x, frac= 1./3, it=0)
123 >>> w = lowess(y, x, frac=1./3)
125 '''
127 endog = np.asarray(endog, float)
128 exog = np.asarray(exog, float)
130 # Inputs should be vectors (1-D arrays) of the
131 # same length.
132 if exog.ndim != 1:
133 raise ValueError('exog must be a vector')
134 if endog.ndim != 1:
135 raise ValueError('endog must be a vector')
136 if endog.shape[0] != exog.shape[0] :
137 raise ValueError('exog and endog must have same length')
139 if missing in ['drop', 'raise']:
140 # Cut out missing values
141 mask_valid = (np.isfinite(exog) & np.isfinite(endog))
142 all_valid = np.all(mask_valid)
143 if all_valid:
144 y = endog
145 x = exog
146 else:
147 if missing == 'drop':
148 x = exog[mask_valid]
149 y = endog[mask_valid]
150 else:
151 raise ValueError('nan or inf found in data')
152 elif missing == 'none':
153 y = endog
154 x = exog
155 all_valid = True # we assume it's true if missing='none'
156 else:
157 raise ValueError("missing can only be 'none', 'drop' or 'raise'")
159 if not is_sorted:
160 # Sort both inputs according to the ascending order of x values
161 sort_index = np.argsort(x)
162 x = np.array(x[sort_index])
163 y = np.array(y[sort_index])
165 res = _lowess(y, x, frac=frac, it=it, delta=delta)
166 _, yfitted = res.T
168 if return_sorted:
169 return res
170 else:
171 # rebuild yfitted with original indices
172 # a bit messy: y might have been selected twice
173 if not is_sorted:
174 yfitted_ = np.empty_like(y)
175 yfitted_.fill(np.nan)
176 yfitted_[sort_index] = yfitted
177 yfitted = yfitted_
178 else:
179 yfitted = yfitted
181 if not all_valid:
182 yfitted_ = np.empty_like(endog)
183 yfitted_.fill(np.nan)
184 yfitted_[mask_valid] = yfitted
185 yfitted = yfitted_
187 # we do not need to return exog anymore
188 return yfitted