acgc.stats.bivariate_lines

Specialized methods of bivariate line fitting

  • Standard major axis (SMA) also called reduced major axis (RMA)
  • York regression, for data with errors in x and y
  • Theil-Sen, non-parametric slope estimation (use numba to accelerate the function in this module)
  1# -*- coding: utf-8 -*-
  2"""Specialized methods of bivariate line fitting
  3
  4* Standard major axis (SMA) also called reduced major axis (RMA)
  5* York regression, for data with errors in x and y
  6* Theil-Sen, non-parametric slope estimation (use numba to accelerate the function in this module)
  7"""
  8# from collections import namedtuple
  9import warnings
 10import numpy as np
 11import scipy.stats as stats
 12from sklearn.covariance import MinCovDet
 13import statsmodels.formula.api as smf
 14import statsmodels.robust.norms as norms
 15from scipy.stats import theilslopes
 16#from numba import jit
 17
 18__all__ = [
 19    "bivariate_line_equation",
 20    "sma",
 21    "smafit",
 22    "sen",
 23    "sen_slope",
 24    "sen_numba",
 25    "york"
 26]
 27# Aliases
 28def sen_slope(*args,**kwargs):
 29    '''Alias for `sen`'''
 30    return sen(*args,**kwargs)
 31def smafit(*args,**kwargs):
 32    '''Alias for `sma`'''
 33    return sma(*args,**kwargs)
 34
 35def bivariate_line_equation(fitresult,
 36                    floatformat='{:.3f}',
 37                    ystring='include',
 38                    include_error=False ):
 39    '''Write equation for the fitted line as a string
 40    
 41    Parameters
 42    ----------
 43    fitresult : dict
 44        results of the line fit
 45    floatformat : str
 46        format string for the numerical values (default='{:.3f}')
 47    ystring : {'include' (default), 'separate', 'none'}
 48        specifies whether "y =" should be included in result, a separate item in tuple, or none
 49    include_error : bool
 50        specifies whether uncertainty terms should be included in the equation
 51    
 52    Returns
 53    -------
 54    fitline_string : str
 55        equation for the the fitted line, in the form "y = a x + b" or "y = a x"
 56        If uncertainty terms are included, then "y = (a ± c) x + (b ± d)" or "y = (a ± c) x"
 57    '''
 58
 59    # Left-hand side
 60    lhs = "y_"+fitresult['method']
 61
 62    # Right-hand side
 63    if fitresult['fitintercept']:
 64        if include_error:
 65            rhs = f'({floatformat:s} ± {floatformat:s}) x + ({floatformat:s} ± {floatformat:s})'.\
 66                    format( fitresult['slope'], fitresult['slope_ste'], fitresult['intercept'], fitresult['intercept_ste'] )
 67        else:
 68            rhs = f'{floatformat:s} x + {floatformat:s}'.\
 69                    format( fitresult['slope'], fitresult['intercept'] )
 70    else:
 71        if include_error:
 72            rhs = f'({floatformat:s} ± {floatformat:s}) x'.\
 73                    format( fitresult['slope'], fitresult['slope_ste'] )
 74        else:
 75            rhs = f'{floatformat:s} x'.\
 76                    format( fitresult['slope'] )
 77
 78    # Combine right and left-hand sides
 79    if ystring=='include':
 80        equation = f'{lhs:s} = {rhs:s}'
 81    elif ystring=='separate':
 82        equation = (lhs,rhs)
 83    elif ystring=='none':
 84        equation = rhs
 85    else:
 86        raise ValueError('Unrecognized value of ystring: '+ystring)
 87
 88    return equation
 89
 90def sma(X,Y,W=None,
 91           data=None,
 92           alpha=0.95,
 93           intercept=True,
 94           robust=False,robust_method='FastMCD'):
 95    '''Standard Major-Axis (SMA) line fitting
 96    
 97    Calculate standard major axis, aka reduced major axis, fit to 
 98    data X and Y. The main advantage of this over ordinary least squares is 
 99    that the best fit of Y to X will be the same as the best fit of X to Y.
100    
101    The fit equations and confidence intervals are implemented following 
102    Warton et al. (2006). Robust fits use the FastMCD covariance estimate 
103    from Rousseeuw and Van Driessen (1999). While there are many alternative 
104    robust covariance estimators (e.g. other papers by D.I. Warton using M-estimators), 
105    the FastMCD algorithm is default in Matlab. When the standard error or 
106    uncertainty of each point is known, then weighted SMA may be preferrable to 
107    robust SMA. The conventional choice of weights for each point i is 
108    W_i = 1 / ( var(X_i) + var(Y_i) ), where var() is the variance 
109    (squared standard error).
110    
111    References 
112    Warton, D. I., Wright, I. J., Falster, D. S. and Westoby, M.: 
113        Bivariate line-fitting methods for allometry, Biol. Rev., 81(02), 259, 
114        doi:10.1017/S1464793106007007, 2006.
115    Rousseeuw, P. J. and Van Driessen, K.: A Fast Algorithm for the Minimum 
116        Covariance Determinant Estimator, Technometrics, 41(3), 1999.
117
118    Parameters
119    ----------
120    X, Y : array_like or str
121        Input values, Must have same length.
122    W    : array_like or str, optional
123        array of weights for each X-Y point, typically W_i = 1/(var(X_i)+var(Y_i)) 
124    data : dict_like, optional
125        data structure containing variables. Used when X, Y, or W are str.
126    alpha : float (default = 0.95)
127        Desired confidence level [0,1] for output. 
128    intercept : bool, default=True
129        Specify if the fitted model should include a non-zero intercept.
130        The model will be forced through the origin (0,0) if intercept=False.
131    robust : bool, default=False
132        Use statistical methods that are robust to the presence of outliers
133    robust_method: {'FastMCD' (default), 'Huber', 'Biweight'}
134        Method for calculating robust variance and covariance. Options:
135        - 'MCD' or 'FastMCD' for Fast MCD
136        - 'Huber' for Huber's T: reduce, not eliminate, influence of outliers
137        - 'Biweight' for Tukey's Biweight: reduces then eliminates influence of outliers
138
139        
140    Returns
141    -------
142    fitresult : dict 
143        Contains the following keys:
144        - slope (float)
145            Slope or Gradient of Y vs. X
146        - intercept (float)
147            Y intercept.
148        - slope_ste (float)
149            Standard error of slope estimate
150        - intercept_ste (float)
151            standard error of intercept estimate
152        - slope_interval ([float, float])
153            confidence interval for gradient at confidence level alpha
154        - intercept_interval ([float, float])
155            confidence interval for intercept at confidence level alpha
156        - alpha (float)
157            confidence level [0,1] for slope and intercept intervals
158        - df_model (float)
159            degrees of freedom for model
160        - df_resid (float)
161            degrees of freedom for residuals
162        - params ([float,float])
163            array of fitted parameters
164        - fittedvalues (ndarray)
165            array of fitted values
166        - resid (ndarray)
167            array of residual values
168        - method (str)
169            name of the fit method
170    '''
171
172    def str2var( v, data ):
173        '''Extract variable named v from Dataframe named data'''
174        try:
175            return data[v]
176        except Exception as exc:
177            raise ValueError( 'Argument data must be provided with a key named '+v ) from exc
178
179    # If variables are provided as strings, get values from the data structure
180    if isinstance( X, str ):
181        X = str2var( X, data )
182    if isinstance( Y, str ):
183        Y = str2var( Y, data )
184    if isinstance( W, str ):
185        W = str2var( W, data )
186
187    # Make sure arrays have the same length
188    assert ( len(X) == len(Y) ), 'Arrays X and Y must have the same length'
189    if W is None:
190        W = np.zeros_like(X) + 1
191    else:
192        assert ( len(W) == len(X) ), 'Array W must have the same length as X and Y'
193
194    # Make sure alpha is within the range 0-1
195    assert (alpha < 1), 'alpha must be less than 1'
196    assert (alpha > 0), 'alpha must be greater than 0'
197
198    # Drop any NaN elements of X, Y, or W
199    # Infinite values are allowed but will make the result undefined
200    # idx = ~np.logical_or( np.isnan(X0), np.isnan(Y0) )
201    idx = ~np.isnan(X) * ~np.isnan(Y) * ~np.isnan(W)
202
203    X0 = X[idx]
204    Y0 = Y[idx]
205    W0 = W[idx]
206
207    # Number of observations
208    N = len(X0)
209
210    include_intercept = intercept
211
212    # Degrees of freedom for the model
213    if include_intercept:
214        dfmod = 2
215    else:
216        dfmod = 1
217
218    method = 'SMA'
219
220    # Choose whether to use methods robust to outliers
221    if robust:
222
223        method = 'rSMA'
224
225        # Choose the robust method
226        if ((robust_method.lower() =='mcd') or (robust_method.lower() == 'fastmcd') ):
227            # FAST MCD
228
229            if not include_intercept:
230                # intercept=False could possibly be supported by calculating
231                # using mcd.support_ as weights in an explicit variance/covariance calculation
232                raise NotImplementedError('FastMCD method only supports SMA with intercept')
233
234            # Fit robust model of mean and covariance
235            mcd = MinCovDet().fit( np.array([X0,Y0]).T )
236
237            # Robust mean
238            Xmean = mcd.location_[0]
239            Ymean = mcd.location_[1]
240
241            # Robust variance of X, Y
242            Vx    = mcd.covariance_[0,0]
243            Vy    = mcd.covariance_[1,1]
244
245            # Robust covariance
246            Vxy   = mcd.covariance_[0,1]
247
248            # Number of observations used in mean and covariance estimate
249            # excludes observations marked as outliers
250            N = mcd.support_.sum()
251
252        elif ((robust_method.lower() =='biweight') or (robust_method.lower() == 'huber') ):
253
254            # Tukey's Biweight and Huber's T
255            if robust_method.lower()=='biweight':
256                norm = norms.TukeyBiweight()
257            else:
258                norm = norms.HuberT()
259
260            # Get weights for downweighting outliers
261            # Fitting a linear model the easiest way to get these
262            # Options include "TukeyBiweight" (totally removes large deviates)
263            # "HuberT" (linear, not squared weighting of large deviates)
264            rweights = smf.rlm('y~x+1',{'x':X0,'y':Y0},M=norm).fit().weights
265
266            # Sum of weight and weights squared, for convienience
267            rsum  = np.sum( rweights )
268            rsum2 = np.sum( rweights**2 )
269
270            # Mean
271            Xmean = np.sum( X0 * rweights ) / rsum
272            Ymean = np.sum( Y0 * rweights ) / rsum
273
274            # Force intercept through zero, if requested
275            if not include_intercept:
276                Xmean = 0
277                Ymean = 0
278
279            # Variance & Covariance
280            Vx    = np.sum( (X0-Xmean)**2 * rweights**2 ) / rsum2
281            Vy    = np.sum( (Y0-Ymean)**2 * rweights**2 ) / rsum2
282            Vxy   = np.sum( (X0-Xmean) * (Y0-Ymean) * rweights**2 ) / rsum2
283
284            # Effective number of observations
285            N = rsum
286
287        else:
288
289            raise NotImplementedError("sma hasn't implemented robust_method={:%s}".\
290                                      format(robust_method))
291    else:
292
293        if include_intercept:
294
295            wsum = np.sum(W)
296
297            # Average values
298            Xmean = np.sum(X0 * W0) / wsum
299            Ymean = np.sum(Y0 * W0) / wsum
300
301            # Covariance matrix
302            cov = np.cov( X0, Y0, ddof=1, aweights=W0**2 )
303
304            # Variance
305            Vx = cov[0,0]
306            Vy = cov[1,1]
307
308            # Covariance
309            Vxy = cov[0,1]
310
311        else:
312
313            # Force the line to pass through origin by setting means to zero
314            Xmean = 0
315            Ymean = 0
316
317            wsum = np.sum(W0)
318
319            # Sum of squares in place of variance and covariance
320            Vx = np.sum( X0**2 * W0 ) / wsum
321            Vy = np.sum( Y0**2 * W0 ) / wsum
322            Vxy= np.sum( X0*Y0 * W0 ) / wsum
323
324    # Standard deviation
325    Sx = np.sqrt( Vx )
326    Sy = np.sqrt( Vy )
327
328    # Correlation coefficient (equivalent to np.corrcoef()[1,0] for non-robust cases)
329    R = Vxy / np.sqrt( Vx * Vy )
330
331    #############
332    # SLOPE
333
334    Slope  = np.sign(R) * Sy / Sx
335
336    # Standard error of slope estimate
337    ste_slope = np.sqrt( 1/(N-dfmod) * Sy**2 / Sx**2 * (1-R**2) )
338
339    # Confidence interval for Slope
340    B = (1-R**2)/(N-dfmod) * stats.f.isf(1-alpha, 1, N-dfmod)
341    ci_grad = Slope * ( np.sqrt( B+1 ) + np.sqrt(B)*np.array([-1,+1]) )
342
343    #############
344    # INTERCEPT
345
346    if include_intercept:
347        Intercept = Ymean - Slope * Xmean
348
349        # Standard deviation of residuals
350        # New Method: Formula from smatr R package (Warton)
351        # This formula avoids large residuals of outliers when using robust=True
352        Sr = np.sqrt((Vy - 2 * Slope * Vxy + Slope**2 *  Vx ) * (N-1) / (N-dfmod) )
353
354        # OLD METHOD
355        # Standard deviation of residuals
356        #resid = Y0 - (Intercept + Slope * X0 )
357        # Population standard deviation of the residuals
358        #Sr = np.std( resid, ddof=0 )
359
360        # Standard error of the intercept estimate
361        ste_int = np.sqrt( Sr**2/N + Xmean**2 * ste_slope**2  )
362
363        # Confidence interval for Intercept
364        tcrit = stats.t.isf((1-alpha)/2,N-dfmod)
365        ci_int = Intercept + ste_int * np.array([-tcrit,tcrit])
366
367    else:
368
369        # Set Intercept quantities to zero
370        Intercept = 0
371        ste_int   = 0
372        ci_int    = np.array([0,0])
373
374    result = dict( method           = method,
375                   fitintercept     = include_intercept,
376                   slope            = Slope,
377                   intercept        = Intercept,
378                   slope_ste        = ste_slope,
379                   intercept_ste    = ste_int,
380                   slope_interval   = ci_grad,
381                   intercept_interval = ci_int,
382                   alpha            = alpha,
383                   df_model         = dfmod,
384                   df_resid         = N-dfmod,
385                   params           = np.array([Slope,Intercept]),
386                   nobs             = N,
387                   fittedvalues     = Intercept + Slope * X0,
388                   resid            = Intercept + Slope * X0 - Y0 )
389
390    # return Slope, Intercept, ste_slope, ste_int, ci_grad, ci_int
391    return result
392
393def york( x, y, err_x=1, err_y=1, rerr_xy=0 ):
394    '''York regression accounting for error in x and y
395    Follows the notation and algorithm of York et al. (2004) Section III
396    
397    Parameters
398    ----------
399    x, y : ndarray
400        dependent (x) and independent (y) variables for fitting
401    err_x, err_y : ndarray (default=1)
402        standard deviation of errors/uncertainty in x and y
403    rerr_xy : float (default=0)
404        correlation coefficient for errors in x and y, 
405        default to rerr_xy=0 meaning that the errors in x are unrelated to errors in y
406        err_x, err_y, and rerr_xy can be constants or arrays of the same length as x and y
407    
408    Returns
409    -------
410    fitresult : dict 
411        Contains the following keys:
412        - slope (float)
413            Slope or Gradient of Y vs. X
414        - intercept (float)
415            Y intercept.
416        - slope_ste (float)
417            Standard error of slope estimate
418        - intercept_ste (float)
419            standard error of intercept estimate
420        - slope_interval ([float, float])
421            confidence interval for gradient at confidence level alpha
422        - intercept_interval ([float, float])
423            confidence interval for intercept at confidence level alpha
424        - alpha (float)
425            confidence level [0,1] for slope and intercept intervals
426        - df_model (float)
427            degrees of freedom for model
428        - df_resid (float)
429            degrees of freedom for residuals
430        - params ([float,float])
431            array of fitted parameters
432        - fittedvalues (ndarray)
433            array of fitted values
434        - resid (ndarray)
435            array of residual values
436    '''
437
438    # relative error tolerance required for convergence
439    rtol = 1e-15
440
441    # Initial guess for slope, from ordinary least squares
442    result = stats.linregress( x, y )
443    b = result[0]
444
445    # Weights for x and y
446    wx = 1 / err_x**2
447    wy = 1 / err_y**2
448
449    # Combined weights
450    alpha = np.sqrt( wx * wy )
451
452    # Iterate until solution converges, but not more 50 times
453    maxiter=50
454    for i in range(1,maxiter):
455
456        # Weight for point i
457        W = wx * wy / ( wx + b**2 * wy - 2 * b * rerr_xy * alpha )
458        Wsum = np.sum( W )
459
460        # Weighted means
461        Xbar = np.sum( W * x ) / Wsum
462        Ybar = np.sum( W * y ) / Wsum
463
464        # Deviation from weighted means
465        U = x - Xbar
466        V = y - Ybar
467
468        # parameter needed for slope
469        beta = W * ( U / wy + b*V / wx - (b*U + V) * rerr_xy / alpha )
470
471        # Update slope estimate
472        bnew = np.sum( W * beta * V ) / np.sum( W * beta * U )
473
474        # Break from loop if new value is very close to old value
475        if np.abs( (bnew-b)/b ) < rtol:
476            break
477        else:
478            b = bnew
479
480    if i==maxiter:
481        raise ValueError( f'York regression failed to converge in {maxiter:d} iterations' )
482
483    # Intercept
484    a = Ybar - b * Xbar
485
486    # least-squares adjusted points, expectation values of X and Y
487    xa = Xbar + beta
488    ya = Ybar + b*beta
489
490    # Mean of adjusted points
491    xabar = np.sum( W * xa ) / Wsum
492    yabar = np.sum( W * ya ) / Wsum
493
494    # Devaiation of adjusted points from their means
495    u = xa - xabar
496    v = ya - yabar
497
498    # Variance of slope and intercept estimates
499    varb = 1 / np.sum( W * u**2 )
500    vara = 1 / Wsum + xabar**2 * varb
501
502    # Standard error of slope and intercept
503    siga = np.sqrt( vara )
504    sigb = np.sqrt( varb )
505
506    # Define a named tuple type that will contain the results
507    # result = namedtuple( 'result', 'slope intercept sigs sigi params sigma' )
508
509    # Return results as a named tuple, User can access as a regular tuple too
510    # return result( b, a, sigb, siga, [b,a], [sigb, siga] )
511
512    dfmod = 2
513    N = np.sum( ~np.isnan(x) * ~np.isnan(y) )
514
515    result = dict( method        = 'York',
516                fitintercept     = True,
517                slope            = b,
518                intercept        = a,
519                slope_ste        = sigb,
520                intercept_ste    = siga,
521                slope_interval   = [None,None],
522                intercept_interval = [None,None],
523                alpha            = alpha,
524                df_model         = dfmod,
525                df_resid         = N-dfmod,
526                params           = np.array([b,a]),
527                nobs             = N,
528                fittedvalues     = a + b * x,
529                resid            = a + b * x - y )
530
531    return result
532
533def sen( x, y, alpha=0.95, method='separate' ):
534    ''''Theil-Sen slope estimate
535    
536    This function wraps `scipy.stats.theilslopes` and provides
537    results in the same dict format as the other line fitting methods 
538    in this module
539    
540    Parameters
541    ----------
542    x, y : ndarray
543        dependent (x) and independent (y) variables for fitting
544    alpha : float (default = 0.95)
545        Desired confidence level [0,1] for output. 
546    method : {'separate' (default), 'joint'}
547        Method for estimating intercept. 
548        - 'separate' uses np.median(y) - slope * np.median(x)
549        - 'joint' uses np.median( y - slope * x )
550            
551    Returns
552    -------
553    fitresult : dict 
554        Contains the following keys:
555        - slope (float)
556            Slope or Gradient of Y vs. X
557        - intercept (float)
558            Y intercept.
559        - slope_ste (float)
560            Standard error of slope estimate
561        - intercept_ste (float)
562            standard error of intercept estimate
563        - slope_interval ([float, float])
564            confidence interval for gradient at confidence level alpha
565        - intercept_interval ([float, float])
566            confidence interval for intercept at confidence level alpha
567        - alpha (float)
568            confidence level [0,1] for slope and intercept intervals
569        - df_model (float)
570            degrees of freedom for model
571        - df_resid (float)
572            degrees of freedom for residuals
573        - params ([float,float])
574            array of fitted parameters
575        - fittedvalues (ndarray)
576            array of fitted values
577        - resid (ndarray)
578            array of residual values
579    '''
580
581    slope, intercept, low_slope, high_slope = theilslopes(y,x,alpha,method)
582
583    dfmod = 2
584    N = np.sum( ~np.isnan(x) * ~np.isnan(y) )
585
586    result = dict( method        = 'Theil-Sen',
587                fitintercept     = True,
588                slope            = slope,
589                intercept        = intercept,
590                slope_ste        = None,
591                intercept_ste    = None,
592                slope_interval   = [low_slope,high_slope],
593                intercept_interval = [None,None],
594                alpha            = alpha,
595                df_model         = dfmod,
596                df_resid         = N-dfmod,
597                params           = np.array([slope,intercept]),
598                nobs             = N,
599                fittedvalues     = intercept + slope * x,
600                resid            = intercept + slope * x - y )
601
602    return result
603
604#@jit(nopython=True)
605def sen_numba( x, y ):
606    '''Estimate linear trend using the Thiel-Sen method
607    
608    This non-parametric method finds the median slope among all
609    combinations of time points. 
610    scipy.stats.theilslopes provides the same slope estimate, with  
611    confidence intervals. However, this function is faster for 
612    large datasets due to Numba 
613    
614    Parameters
615    ----------
616    x : array_like (N,)
617        independent variable
618    y : array_like (N,)
619        dependent variable
620    
621    Returns
622    -------
623    sen : float
624        the median slope
625    slopes : array (N*N,)
626        all slope estimates from all combinations of x and y
627    '''
628
629    with warnings.catch_warnings():
630        warnings.simplefilter('always', DeprecationWarning)
631        warnings.warn(f'Sen function is slow unless numba.jit is used. Use scipy.stats.theilslopes instead.',
632                    DeprecationWarning, stacklevel=2)
633        
634    if len( x ) != len( y ):
635        print('Inputs x and y must have same dimension')
636        return np.nan
637
638    # Find number of time points
639    n = len( x )
640
641    # Array to hold all slope estimates
642    slopes = np.zeros(  np.ceil( n * ( n-1 ) / 2 ).astype('int') )
643    slopes[:] = np.nan
644
645    count = 0
646
647    for i in range(n):
648        for j in range(i+1, n):
649
650            # Slope between elements i and j
651            slopeij = ( y[j] - y[i] ) / ( x[j] - x[i] )
652
653            slopes[count] = slopeij
654
655            count += 1
656
657    # Thiel-Sen estimate is the median slope, neglecting NaN
658    sen = np.nanmedian( slopes )
659
660    return sen, slopes
def bivariate_line_equation( fitresult, floatformat='{:.3f}', ystring='include', include_error=False):
36def bivariate_line_equation(fitresult,
37                    floatformat='{:.3f}',
38                    ystring='include',
39                    include_error=False ):
40    '''Write equation for the fitted line as a string
41    
42    Parameters
43    ----------
44    fitresult : dict
45        results of the line fit
46    floatformat : str
47        format string for the numerical values (default='{:.3f}')
48    ystring : {'include' (default), 'separate', 'none'}
49        specifies whether "y =" should be included in result, a separate item in tuple, or none
50    include_error : bool
51        specifies whether uncertainty terms should be included in the equation
52    
53    Returns
54    -------
55    fitline_string : str
56        equation for the the fitted line, in the form "y = a x + b" or "y = a x"
57        If uncertainty terms are included, then "y = (a ± c) x + (b ± d)" or "y = (a ± c) x"
58    '''
59
60    # Left-hand side
61    lhs = "y_"+fitresult['method']
62
63    # Right-hand side
64    if fitresult['fitintercept']:
65        if include_error:
66            rhs = f'({floatformat:s} ± {floatformat:s}) x + ({floatformat:s} ± {floatformat:s})'.\
67                    format( fitresult['slope'], fitresult['slope_ste'], fitresult['intercept'], fitresult['intercept_ste'] )
68        else:
69            rhs = f'{floatformat:s} x + {floatformat:s}'.\
70                    format( fitresult['slope'], fitresult['intercept'] )
71    else:
72        if include_error:
73            rhs = f'({floatformat:s} ± {floatformat:s}) x'.\
74                    format( fitresult['slope'], fitresult['slope_ste'] )
75        else:
76            rhs = f'{floatformat:s} x'.\
77                    format( fitresult['slope'] )
78
79    # Combine right and left-hand sides
80    if ystring=='include':
81        equation = f'{lhs:s} = {rhs:s}'
82    elif ystring=='separate':
83        equation = (lhs,rhs)
84    elif ystring=='none':
85        equation = rhs
86    else:
87        raise ValueError('Unrecognized value of ystring: '+ystring)
88
89    return equation

Write equation for the fitted line as a string

Parameters
  • fitresult (dict): results of the line fit
  • floatformat (str): format string for the numerical values (default='{:.3f}')
  • ystring ({'include' (default), 'separate', 'none'}): specifies whether "y =" should be included in result, a separate item in tuple, or none
  • include_error (bool): specifies whether uncertainty terms should be included in the equation
Returns
  • fitline_string (str): equation for the the fitted line, in the form "y = a x + b" or "y = a x" If uncertainty terms are included, then "y = (a ± c) x + (b ± d)" or "y = (a ± c) x"
def sma( X, Y, W=None, data=None, alpha=0.95, intercept=True, robust=False, robust_method='FastMCD'):
 91def sma(X,Y,W=None,
 92           data=None,
 93           alpha=0.95,
 94           intercept=True,
 95           robust=False,robust_method='FastMCD'):
 96    '''Standard Major-Axis (SMA) line fitting
 97    
 98    Calculate standard major axis, aka reduced major axis, fit to 
 99    data X and Y. The main advantage of this over ordinary least squares is 
100    that the best fit of Y to X will be the same as the best fit of X to Y.
101    
102    The fit equations and confidence intervals are implemented following 
103    Warton et al. (2006). Robust fits use the FastMCD covariance estimate 
104    from Rousseeuw and Van Driessen (1999). While there are many alternative 
105    robust covariance estimators (e.g. other papers by D.I. Warton using M-estimators), 
106    the FastMCD algorithm is default in Matlab. When the standard error or 
107    uncertainty of each point is known, then weighted SMA may be preferrable to 
108    robust SMA. The conventional choice of weights for each point i is 
109    W_i = 1 / ( var(X_i) + var(Y_i) ), where var() is the variance 
110    (squared standard error).
111    
112    References 
113    Warton, D. I., Wright, I. J., Falster, D. S. and Westoby, M.: 
114        Bivariate line-fitting methods for allometry, Biol. Rev., 81(02), 259, 
115        doi:10.1017/S1464793106007007, 2006.
116    Rousseeuw, P. J. and Van Driessen, K.: A Fast Algorithm for the Minimum 
117        Covariance Determinant Estimator, Technometrics, 41(3), 1999.
118
119    Parameters
120    ----------
121    X, Y : array_like or str
122        Input values, Must have same length.
123    W    : array_like or str, optional
124        array of weights for each X-Y point, typically W_i = 1/(var(X_i)+var(Y_i)) 
125    data : dict_like, optional
126        data structure containing variables. Used when X, Y, or W are str.
127    alpha : float (default = 0.95)
128        Desired confidence level [0,1] for output. 
129    intercept : bool, default=True
130        Specify if the fitted model should include a non-zero intercept.
131        The model will be forced through the origin (0,0) if intercept=False.
132    robust : bool, default=False
133        Use statistical methods that are robust to the presence of outliers
134    robust_method: {'FastMCD' (default), 'Huber', 'Biweight'}
135        Method for calculating robust variance and covariance. Options:
136        - 'MCD' or 'FastMCD' for Fast MCD
137        - 'Huber' for Huber's T: reduce, not eliminate, influence of outliers
138        - 'Biweight' for Tukey's Biweight: reduces then eliminates influence of outliers
139
140        
141    Returns
142    -------
143    fitresult : dict 
144        Contains the following keys:
145        - slope (float)
146            Slope or Gradient of Y vs. X
147        - intercept (float)
148            Y intercept.
149        - slope_ste (float)
150            Standard error of slope estimate
151        - intercept_ste (float)
152            standard error of intercept estimate
153        - slope_interval ([float, float])
154            confidence interval for gradient at confidence level alpha
155        - intercept_interval ([float, float])
156            confidence interval for intercept at confidence level alpha
157        - alpha (float)
158            confidence level [0,1] for slope and intercept intervals
159        - df_model (float)
160            degrees of freedom for model
161        - df_resid (float)
162            degrees of freedom for residuals
163        - params ([float,float])
164            array of fitted parameters
165        - fittedvalues (ndarray)
166            array of fitted values
167        - resid (ndarray)
168            array of residual values
169        - method (str)
170            name of the fit method
171    '''
172
173    def str2var( v, data ):
174        '''Extract variable named v from Dataframe named data'''
175        try:
176            return data[v]
177        except Exception as exc:
178            raise ValueError( 'Argument data must be provided with a key named '+v ) from exc
179
180    # If variables are provided as strings, get values from the data structure
181    if isinstance( X, str ):
182        X = str2var( X, data )
183    if isinstance( Y, str ):
184        Y = str2var( Y, data )
185    if isinstance( W, str ):
186        W = str2var( W, data )
187
188    # Make sure arrays have the same length
189    assert ( len(X) == len(Y) ), 'Arrays X and Y must have the same length'
190    if W is None:
191        W = np.zeros_like(X) + 1
192    else:
193        assert ( len(W) == len(X) ), 'Array W must have the same length as X and Y'
194
195    # Make sure alpha is within the range 0-1
196    assert (alpha < 1), 'alpha must be less than 1'
197    assert (alpha > 0), 'alpha must be greater than 0'
198
199    # Drop any NaN elements of X, Y, or W
200    # Infinite values are allowed but will make the result undefined
201    # idx = ~np.logical_or( np.isnan(X0), np.isnan(Y0) )
202    idx = ~np.isnan(X) * ~np.isnan(Y) * ~np.isnan(W)
203
204    X0 = X[idx]
205    Y0 = Y[idx]
206    W0 = W[idx]
207
208    # Number of observations
209    N = len(X0)
210
211    include_intercept = intercept
212
213    # Degrees of freedom for the model
214    if include_intercept:
215        dfmod = 2
216    else:
217        dfmod = 1
218
219    method = 'SMA'
220
221    # Choose whether to use methods robust to outliers
222    if robust:
223
224        method = 'rSMA'
225
226        # Choose the robust method
227        if ((robust_method.lower() =='mcd') or (robust_method.lower() == 'fastmcd') ):
228            # FAST MCD
229
230            if not include_intercept:
231                # intercept=False could possibly be supported by calculating
232                # using mcd.support_ as weights in an explicit variance/covariance calculation
233                raise NotImplementedError('FastMCD method only supports SMA with intercept')
234
235            # Fit robust model of mean and covariance
236            mcd = MinCovDet().fit( np.array([X0,Y0]).T )
237
238            # Robust mean
239            Xmean = mcd.location_[0]
240            Ymean = mcd.location_[1]
241
242            # Robust variance of X, Y
243            Vx    = mcd.covariance_[0,0]
244            Vy    = mcd.covariance_[1,1]
245
246            # Robust covariance
247            Vxy   = mcd.covariance_[0,1]
248
249            # Number of observations used in mean and covariance estimate
250            # excludes observations marked as outliers
251            N = mcd.support_.sum()
252
253        elif ((robust_method.lower() =='biweight') or (robust_method.lower() == 'huber') ):
254
255            # Tukey's Biweight and Huber's T
256            if robust_method.lower()=='biweight':
257                norm = norms.TukeyBiweight()
258            else:
259                norm = norms.HuberT()
260
261            # Get weights for downweighting outliers
262            # Fitting a linear model the easiest way to get these
263            # Options include "TukeyBiweight" (totally removes large deviates)
264            # "HuberT" (linear, not squared weighting of large deviates)
265            rweights = smf.rlm('y~x+1',{'x':X0,'y':Y0},M=norm).fit().weights
266
267            # Sum of weight and weights squared, for convienience
268            rsum  = np.sum( rweights )
269            rsum2 = np.sum( rweights**2 )
270
271            # Mean
272            Xmean = np.sum( X0 * rweights ) / rsum
273            Ymean = np.sum( Y0 * rweights ) / rsum
274
275            # Force intercept through zero, if requested
276            if not include_intercept:
277                Xmean = 0
278                Ymean = 0
279
280            # Variance & Covariance
281            Vx    = np.sum( (X0-Xmean)**2 * rweights**2 ) / rsum2
282            Vy    = np.sum( (Y0-Ymean)**2 * rweights**2 ) / rsum2
283            Vxy   = np.sum( (X0-Xmean) * (Y0-Ymean) * rweights**2 ) / rsum2
284
285            # Effective number of observations
286            N = rsum
287
288        else:
289
290            raise NotImplementedError("sma hasn't implemented robust_method={:%s}".\
291                                      format(robust_method))
292    else:
293
294        if include_intercept:
295
296            wsum = np.sum(W)
297
298            # Average values
299            Xmean = np.sum(X0 * W0) / wsum
300            Ymean = np.sum(Y0 * W0) / wsum
301
302            # Covariance matrix
303            cov = np.cov( X0, Y0, ddof=1, aweights=W0**2 )
304
305            # Variance
306            Vx = cov[0,0]
307            Vy = cov[1,1]
308
309            # Covariance
310            Vxy = cov[0,1]
311
312        else:
313
314            # Force the line to pass through origin by setting means to zero
315            Xmean = 0
316            Ymean = 0
317
318            wsum = np.sum(W0)
319
320            # Sum of squares in place of variance and covariance
321            Vx = np.sum( X0**2 * W0 ) / wsum
322            Vy = np.sum( Y0**2 * W0 ) / wsum
323            Vxy= np.sum( X0*Y0 * W0 ) / wsum
324
325    # Standard deviation
326    Sx = np.sqrt( Vx )
327    Sy = np.sqrt( Vy )
328
329    # Correlation coefficient (equivalent to np.corrcoef()[1,0] for non-robust cases)
330    R = Vxy / np.sqrt( Vx * Vy )
331
332    #############
333    # SLOPE
334
335    Slope  = np.sign(R) * Sy / Sx
336
337    # Standard error of slope estimate
338    ste_slope = np.sqrt( 1/(N-dfmod) * Sy**2 / Sx**2 * (1-R**2) )
339
340    # Confidence interval for Slope
341    B = (1-R**2)/(N-dfmod) * stats.f.isf(1-alpha, 1, N-dfmod)
342    ci_grad = Slope * ( np.sqrt( B+1 ) + np.sqrt(B)*np.array([-1,+1]) )
343
344    #############
345    # INTERCEPT
346
347    if include_intercept:
348        Intercept = Ymean - Slope * Xmean
349
350        # Standard deviation of residuals
351        # New Method: Formula from smatr R package (Warton)
352        # This formula avoids large residuals of outliers when using robust=True
353        Sr = np.sqrt((Vy - 2 * Slope * Vxy + Slope**2 *  Vx ) * (N-1) / (N-dfmod) )
354
355        # OLD METHOD
356        # Standard deviation of residuals
357        #resid = Y0 - (Intercept + Slope * X0 )
358        # Population standard deviation of the residuals
359        #Sr = np.std( resid, ddof=0 )
360
361        # Standard error of the intercept estimate
362        ste_int = np.sqrt( Sr**2/N + Xmean**2 * ste_slope**2  )
363
364        # Confidence interval for Intercept
365        tcrit = stats.t.isf((1-alpha)/2,N-dfmod)
366        ci_int = Intercept + ste_int * np.array([-tcrit,tcrit])
367
368    else:
369
370        # Set Intercept quantities to zero
371        Intercept = 0
372        ste_int   = 0
373        ci_int    = np.array([0,0])
374
375    result = dict( method           = method,
376                   fitintercept     = include_intercept,
377                   slope            = Slope,
378                   intercept        = Intercept,
379                   slope_ste        = ste_slope,
380                   intercept_ste    = ste_int,
381                   slope_interval   = ci_grad,
382                   intercept_interval = ci_int,
383                   alpha            = alpha,
384                   df_model         = dfmod,
385                   df_resid         = N-dfmod,
386                   params           = np.array([Slope,Intercept]),
387                   nobs             = N,
388                   fittedvalues     = Intercept + Slope * X0,
389                   resid            = Intercept + Slope * X0 - Y0 )
390
391    # return Slope, Intercept, ste_slope, ste_int, ci_grad, ci_int
392    return result

Standard Major-Axis (SMA) line fitting

Calculate standard major axis, aka reduced major axis, fit to data X and Y. The main advantage of this over ordinary least squares is that the best fit of Y to X will be the same as the best fit of X to Y.

The fit equations and confidence intervals are implemented following Warton et al. (2006). Robust fits use the FastMCD covariance estimate from Rousseeuw and Van Driessen (1999). While there are many alternative robust covariance estimators (e.g. other papers by D.I. Warton using M-estimators), the FastMCD algorithm is default in Matlab. When the standard error or uncertainty of each point is known, then weighted SMA may be preferrable to robust SMA. The conventional choice of weights for each point i is W_i = 1 / ( var(X_i) + var(Y_i) ), where var() is the variance (squared standard error).

References Warton, D. I., Wright, I. J., Falster, D. S. and Westoby, M.: Bivariate line-fitting methods for allometry, Biol. Rev., 81(02), 259, doi:10.1017/S1464793106007007, 2006. Rousseeuw, P. J. and Van Driessen, K.: A Fast Algorithm for the Minimum Covariance Determinant Estimator, Technometrics, 41(3), 1999.

Parameters
  • X, Y (array_like or str): Input values, Must have same length.
  • W (array_like or str, optional): array of weights for each X-Y point, typically W_i = 1/(var(X_i)+var(Y_i))
  • data (dict_like, optional): data structure containing variables. Used when X, Y, or W are str.
  • alpha (float (default = 0.95)): Desired confidence level [0,1] for output.
  • intercept (bool, default=True): Specify if the fitted model should include a non-zero intercept. The model will be forced through the origin (0,0) if intercept=False.
  • robust (bool, default=False): Use statistical methods that are robust to the presence of outliers
  • robust_method ({'FastMCD' (default), 'Huber', 'Biweight'}): Method for calculating robust variance and covariance. Options:
    • 'MCD' or 'FastMCD' for Fast MCD
    • 'Huber' for Huber's T: reduce, not eliminate, influence of outliers
    • 'Biweight' for Tukey's Biweight: reduces then eliminates influence of outliers
Returns
  • fitresult (dict): Contains the following keys:
    • slope (float) Slope or Gradient of Y vs. X
    • intercept (float) Y intercept.
    • slope_ste (float) Standard error of slope estimate
    • intercept_ste (float) standard error of intercept estimate
    • slope_interval ([float, float]) confidence interval for gradient at confidence level alpha
    • intercept_interval ([float, float]) confidence interval for intercept at confidence level alpha
    • alpha (float) confidence level [0,1] for slope and intercept intervals
    • df_model (float) degrees of freedom for model
    • df_resid (float) degrees of freedom for residuals
    • params ([float,float]) array of fitted parameters
    • fittedvalues (ndarray) array of fitted values
    • resid (ndarray) array of residual values
    • method (str) name of the fit method
def smafit(*args, **kwargs):
32def smafit(*args,**kwargs):
33    '''Alias for `sma`'''
34    return sma(*args,**kwargs)

Alias for sma

def sen(x, y, alpha=0.95, method='separate'):
534def sen( x, y, alpha=0.95, method='separate' ):
535    ''''Theil-Sen slope estimate
536    
537    This function wraps `scipy.stats.theilslopes` and provides
538    results in the same dict format as the other line fitting methods 
539    in this module
540    
541    Parameters
542    ----------
543    x, y : ndarray
544        dependent (x) and independent (y) variables for fitting
545    alpha : float (default = 0.95)
546        Desired confidence level [0,1] for output. 
547    method : {'separate' (default), 'joint'}
548        Method for estimating intercept. 
549        - 'separate' uses np.median(y) - slope * np.median(x)
550        - 'joint' uses np.median( y - slope * x )
551            
552    Returns
553    -------
554    fitresult : dict 
555        Contains the following keys:
556        - slope (float)
557            Slope or Gradient of Y vs. X
558        - intercept (float)
559            Y intercept.
560        - slope_ste (float)
561            Standard error of slope estimate
562        - intercept_ste (float)
563            standard error of intercept estimate
564        - slope_interval ([float, float])
565            confidence interval for gradient at confidence level alpha
566        - intercept_interval ([float, float])
567            confidence interval for intercept at confidence level alpha
568        - alpha (float)
569            confidence level [0,1] for slope and intercept intervals
570        - df_model (float)
571            degrees of freedom for model
572        - df_resid (float)
573            degrees of freedom for residuals
574        - params ([float,float])
575            array of fitted parameters
576        - fittedvalues (ndarray)
577            array of fitted values
578        - resid (ndarray)
579            array of residual values
580    '''
581
582    slope, intercept, low_slope, high_slope = theilslopes(y,x,alpha,method)
583
584    dfmod = 2
585    N = np.sum( ~np.isnan(x) * ~np.isnan(y) )
586
587    result = dict( method        = 'Theil-Sen',
588                fitintercept     = True,
589                slope            = slope,
590                intercept        = intercept,
591                slope_ste        = None,
592                intercept_ste    = None,
593                slope_interval   = [low_slope,high_slope],
594                intercept_interval = [None,None],
595                alpha            = alpha,
596                df_model         = dfmod,
597                df_resid         = N-dfmod,
598                params           = np.array([slope,intercept]),
599                nobs             = N,
600                fittedvalues     = intercept + slope * x,
601                resid            = intercept + slope * x - y )
602
603    return result

'Theil-Sen slope estimate

This function wraps scipy.stats.theilslopes and provides results in the same dict format as the other line fitting methods in this module

Parameters
  • x, y (ndarray): dependent (x) and independent (y) variables for fitting
  • alpha (float (default = 0.95)): Desired confidence level [0,1] for output.
  • method ({'separate' (default), 'joint'}): Method for estimating intercept.
    • 'separate' uses np.median(y) - slope * np.median(x)
    • 'joint' uses np.median( y - slope * x )
Returns
  • fitresult (dict): Contains the following keys:
    • slope (float) Slope or Gradient of Y vs. X
    • intercept (float) Y intercept.
    • slope_ste (float) Standard error of slope estimate
    • intercept_ste (float) standard error of intercept estimate
    • slope_interval ([float, float]) confidence interval for gradient at confidence level alpha
    • intercept_interval ([float, float]) confidence interval for intercept at confidence level alpha
    • alpha (float) confidence level [0,1] for slope and intercept intervals
    • df_model (float) degrees of freedom for model
    • df_resid (float) degrees of freedom for residuals
    • params ([float,float]) array of fitted parameters
    • fittedvalues (ndarray) array of fitted values
    • resid (ndarray) array of residual values
def sen_slope(*args, **kwargs):
29def sen_slope(*args,**kwargs):
30    '''Alias for `sen`'''
31    return sen(*args,**kwargs)

Alias for sen

def sen_numba(x, y):
606def sen_numba( x, y ):
607    '''Estimate linear trend using the Thiel-Sen method
608    
609    This non-parametric method finds the median slope among all
610    combinations of time points. 
611    scipy.stats.theilslopes provides the same slope estimate, with  
612    confidence intervals. However, this function is faster for 
613    large datasets due to Numba 
614    
615    Parameters
616    ----------
617    x : array_like (N,)
618        independent variable
619    y : array_like (N,)
620        dependent variable
621    
622    Returns
623    -------
624    sen : float
625        the median slope
626    slopes : array (N*N,)
627        all slope estimates from all combinations of x and y
628    '''
629
630    with warnings.catch_warnings():
631        warnings.simplefilter('always', DeprecationWarning)
632        warnings.warn(f'Sen function is slow unless numba.jit is used. Use scipy.stats.theilslopes instead.',
633                    DeprecationWarning, stacklevel=2)
634        
635    if len( x ) != len( y ):
636        print('Inputs x and y must have same dimension')
637        return np.nan
638
639    # Find number of time points
640    n = len( x )
641
642    # Array to hold all slope estimates
643    slopes = np.zeros(  np.ceil( n * ( n-1 ) / 2 ).astype('int') )
644    slopes[:] = np.nan
645
646    count = 0
647
648    for i in range(n):
649        for j in range(i+1, n):
650
651            # Slope between elements i and j
652            slopeij = ( y[j] - y[i] ) / ( x[j] - x[i] )
653
654            slopes[count] = slopeij
655
656            count += 1
657
658    # Thiel-Sen estimate is the median slope, neglecting NaN
659    sen = np.nanmedian( slopes )
660
661    return sen, slopes

Estimate linear trend using the Thiel-Sen method

This non-parametric method finds the median slope among all combinations of time points. scipy.stats.theilslopes provides the same slope estimate, with
confidence intervals. However, this function is faster for large datasets due to Numba

Parameters
  • x (array_like (N,)): independent variable
  • y (array_like (N,)): dependent variable
Returns
  • sen (float): the median slope
  • slopes (array (N*N,)): all slope estimates from all combinations of x and y
def york(x, y, err_x=1, err_y=1, rerr_xy=0):
394def york( x, y, err_x=1, err_y=1, rerr_xy=0 ):
395    '''York regression accounting for error in x and y
396    Follows the notation and algorithm of York et al. (2004) Section III
397    
398    Parameters
399    ----------
400    x, y : ndarray
401        dependent (x) and independent (y) variables for fitting
402    err_x, err_y : ndarray (default=1)
403        standard deviation of errors/uncertainty in x and y
404    rerr_xy : float (default=0)
405        correlation coefficient for errors in x and y, 
406        default to rerr_xy=0 meaning that the errors in x are unrelated to errors in y
407        err_x, err_y, and rerr_xy can be constants or arrays of the same length as x and y
408    
409    Returns
410    -------
411    fitresult : dict 
412        Contains the following keys:
413        - slope (float)
414            Slope or Gradient of Y vs. X
415        - intercept (float)
416            Y intercept.
417        - slope_ste (float)
418            Standard error of slope estimate
419        - intercept_ste (float)
420            standard error of intercept estimate
421        - slope_interval ([float, float])
422            confidence interval for gradient at confidence level alpha
423        - intercept_interval ([float, float])
424            confidence interval for intercept at confidence level alpha
425        - alpha (float)
426            confidence level [0,1] for slope and intercept intervals
427        - df_model (float)
428            degrees of freedom for model
429        - df_resid (float)
430            degrees of freedom for residuals
431        - params ([float,float])
432            array of fitted parameters
433        - fittedvalues (ndarray)
434            array of fitted values
435        - resid (ndarray)
436            array of residual values
437    '''
438
439    # relative error tolerance required for convergence
440    rtol = 1e-15
441
442    # Initial guess for slope, from ordinary least squares
443    result = stats.linregress( x, y )
444    b = result[0]
445
446    # Weights for x and y
447    wx = 1 / err_x**2
448    wy = 1 / err_y**2
449
450    # Combined weights
451    alpha = np.sqrt( wx * wy )
452
453    # Iterate until solution converges, but not more 50 times
454    maxiter=50
455    for i in range(1,maxiter):
456
457        # Weight for point i
458        W = wx * wy / ( wx + b**2 * wy - 2 * b * rerr_xy * alpha )
459        Wsum = np.sum( W )
460
461        # Weighted means
462        Xbar = np.sum( W * x ) / Wsum
463        Ybar = np.sum( W * y ) / Wsum
464
465        # Deviation from weighted means
466        U = x - Xbar
467        V = y - Ybar
468
469        # parameter needed for slope
470        beta = W * ( U / wy + b*V / wx - (b*U + V) * rerr_xy / alpha )
471
472        # Update slope estimate
473        bnew = np.sum( W * beta * V ) / np.sum( W * beta * U )
474
475        # Break from loop if new value is very close to old value
476        if np.abs( (bnew-b)/b ) < rtol:
477            break
478        else:
479            b = bnew
480
481    if i==maxiter:
482        raise ValueError( f'York regression failed to converge in {maxiter:d} iterations' )
483
484    # Intercept
485    a = Ybar - b * Xbar
486
487    # least-squares adjusted points, expectation values of X and Y
488    xa = Xbar + beta
489    ya = Ybar + b*beta
490
491    # Mean of adjusted points
492    xabar = np.sum( W * xa ) / Wsum
493    yabar = np.sum( W * ya ) / Wsum
494
495    # Devaiation of adjusted points from their means
496    u = xa - xabar
497    v = ya - yabar
498
499    # Variance of slope and intercept estimates
500    varb = 1 / np.sum( W * u**2 )
501    vara = 1 / Wsum + xabar**2 * varb
502
503    # Standard error of slope and intercept
504    siga = np.sqrt( vara )
505    sigb = np.sqrt( varb )
506
507    # Define a named tuple type that will contain the results
508    # result = namedtuple( 'result', 'slope intercept sigs sigi params sigma' )
509
510    # Return results as a named tuple, User can access as a regular tuple too
511    # return result( b, a, sigb, siga, [b,a], [sigb, siga] )
512
513    dfmod = 2
514    N = np.sum( ~np.isnan(x) * ~np.isnan(y) )
515
516    result = dict( method        = 'York',
517                fitintercept     = True,
518                slope            = b,
519                intercept        = a,
520                slope_ste        = sigb,
521                intercept_ste    = siga,
522                slope_interval   = [None,None],
523                intercept_interval = [None,None],
524                alpha            = alpha,
525                df_model         = dfmod,
526                df_resid         = N-dfmod,
527                params           = np.array([b,a]),
528                nobs             = N,
529                fittedvalues     = a + b * x,
530                resid            = a + b * x - y )
531
532    return result

York regression accounting for error in x and y Follows the notation and algorithm of York et al. (2004) Section III

Parameters
  • x, y (ndarray): dependent (x) and independent (y) variables for fitting
  • err_x, err_y (ndarray (default=1)): standard deviation of errors/uncertainty in x and y
  • rerr_xy (float (default=0)): correlation coefficient for errors in x and y, default to rerr_xy=0 meaning that the errors in x are unrelated to errors in y err_x, err_y, and rerr_xy can be constants or arrays of the same length as x and y
Returns
  • fitresult (dict): Contains the following keys:
    • slope (float) Slope or Gradient of Y vs. X
    • intercept (float) Y intercept.
    • slope_ste (float) Standard error of slope estimate
    • intercept_ste (float) standard error of intercept estimate
    • slope_interval ([float, float]) confidence interval for gradient at confidence level alpha
    • intercept_interval ([float, float]) confidence interval for intercept at confidence level alpha
    • alpha (float) confidence level [0,1] for slope and intercept intervals
    • df_model (float) degrees of freedom for model
    • df_resid (float) degrees of freedom for residuals
    • params ([float,float]) array of fitted parameters
    • fittedvalues (ndarray) array of fitted values
    • resid (ndarray) array of residual values