pytesmo Package

anomaly Module

Created on June 20, 2013

@author: Alexander Gruber Alexander.Gruber@geo.tuwien.ac.at

pytesmo.anomaly.calc_anomaly(Ser, window_size=35, climatology=None)[source]

Calculates the anomaly of a time series (Pandas series). Both, climatology based, or moving-average based anomalies can be calculated

Parameters :

Ser : pandas.Series (index must be a DateTimeIndex)

window_size : float, optional

The window-size [days] of the moving-average window to calculate the anomaly reference (only used if climatology is not provided) Default: 35 (days)

climatology : pandas.Series (index: 1-366), optional

if provided, anomalies will be based on the climatology

timespann : [timespan_from, timespan_to], datetime.datetime(y,m,d), optional

If set, only a subset

Returns :

anomaly : pandas.Series

Series containing the calculated anomalies

pytesmo.anomaly.calc_climatology(Ser, moving_avg_orig=5, moving_avg_clim=30, median=False, timespan=None)[source]

Calculates the climatology of a data set

Parameters :

Ser : pandas.Series (index must be a DateTimeIndex)

moving_avg_orig : float, optional

The size of the moving_average window [days] that will be applied on the input Series (gap filling, short-term rainfall correction) Default: 5

moving_avg_clim : float, optional

The size of the moving_average window [days] that will be applied on the calculated climatology (long-term event correction) Default: 35

median : boolean, optional

if set to True, the climatology will be based on the median conditions

timespan : [timespan_from, timespan_to], datetime.datetime(y,m,d), optional

Set this to calculate the climatology based on a subset of the input Series

Returns :

climatology : pandas.Series

Series containing the calculated climatology

pytesmo.anomaly.moving_average(Ser, window_size=1, no_date=False, sample_to_days=False, fast=False)[source]

Applies a moving average (box) filter on an input time series

Parameters :

Ser : pandas.Series (index must be a DateTimeIndex)

window_size : float, optional

The size of the moving_average window [days] that will be applied on the input Series Default: 1

no_date : boolean, optional

Set this if the index is no DateTimeIndex. The window_size will then refer to array elements instead of days.

sample_to_days : boolean, optional

If set the series will be sampled to full days (gaps are filled)

fast: boolean, optional :

uses the pandas implementation which is faster but does fill the timeseries end-window/2 with NaN values

Returns :

Ser : pandas.Series

moving-average filtered time series

metrics Module

Created on Apr 17, 2013

@author: Christoph Paulik christoph.paulik@geo.tuwien.ac.at @author: Sebastian Hahn sebastian.hahn@geo.tuwien.ac.at @author: Alexander Gruber alexander.gruber@geo.tuwien.ac.at

pytesmo.metrics.RSS(x, y)[source]

Redidual sum of squares

Parameters :

x : numpy.array

1D numpy array to calculate the metric

y : numpy.array

1D numpy array to calculate the metric

Returns :

Residual sum of squares :

pytesmo.metrics.bias(x, y)[source]

Bias

pytesmo.metrics.kendalltau(x, y)[source]

Wrapper for scipy.stats.kendalltau

Parameters :

x : numpy.array

1D numpy array to calculate the metric

y : numpy.array

1D numpy array to calculate the metric

Returns :

Kendall’s tau : float

The tau statistic

p-value : float

The two-sided p-value for a hypothesis test whose null hypothesis is an absence of association, tau = 0.

See also

scipy.stats.kendalltau

pytesmo.metrics.mse(x, y)[source]

Mean square error (MSE) as a decomposition of the RMSD into individual error components

pytesmo.metrics.nash_sutcliffe(x, y)[source]

Nash Sutcliffe model efficiency coefficient

Parameters :

x : numpy.array

1D numpy array to calculate the metric

y : numpy.array

1D numpy array to calculate the metric

Returns :

Nash Sutcliffe coefficient : float

Nash Sutcliffe model efficiency coefficient

pytesmo.metrics.nrmsd(x, y)[source]

Normalized root-mean-square deviation

pytesmo.metrics.pearsonr(x, y)[source]

Wrapper for scipy.stats.pearsonr

Parameters :

x : numpy.array

1D numpy array to calculate the metric

y : numpy.array

1D numpy array to calculate the metric

Returns :

Pearson’s r : float

Pearson’s correlation coefficent

p-value : float

2 tailed p-value

See also

scipy.stats.pearsonr

pytesmo.metrics.rmsd(x, y)[source]

Root-mean-square deviation

pytesmo.metrics.spearmanr(x, y)[source]

Wrapper for scipy.stats.spearmanr

Parameters :

x : numpy.array

1D numpy array to calculate the metric

y : numpy.array

1D numpy array to calculate the metric

Returns :

rho : float

Spearman correlation coefficient

p-value : float

The two-sided p-value for a hypothesis test whose null hypothesis is that two sets of data are uncorrelated

See also

scipy.stats.spearmenr

pytesmo.metrics.tcol_error(x, y, z)[source]

Triple collocation error estimate

Parameters :

x : numpy.array

1D numpy array to calculate the errors

y : numpy.array

1D numpy array to calculate the errors

z : numpy.array

1D numpy array to calculate the errors

Returns :

triple collocation error for x : float

triple collocation error for y : float

triple collocation error for z : float

pytesmo.metrics.ubrmsd(x, y)[source]

Unbiased root-mean-square deviation

scaling Module

Created on Apr 17, 2013

@author: Christoph Paulik christoph.paulik@geo.tuwien.ac.at

pytesmo.scaling.add_scaled(df, method='linreg', label_in=None, label_scale=None)[source]

takes a dataframe and appends a scaled time series to it. If no labels are given the first column will be scaled to the second column of the DataFrame

Parameters :

df : pandas.DataFrame

input dataframe

method : string

scaling method

label_in: string, optional :

the column of the dataframe that should be scaled to that with label_scale default is the first column

label_scale : string, optional

the column of the dataframe the label_in column should be scaled to default is the second column

Returns :

df : pandas.DataFrame

input dataframe with new column labeled label_in+’_scaled_’+method

pytesmo.scaling.cdf_match(in_data, scale_to)[source]
  1. computes discrete cumulative density functions of in_data- and scale_to at their respective bin_edges;
  2. computes continuous CDFs by 6th order polynomial fitting;
  3. CDF of in_data is matched to CDF of scale_to
Parameters :

in_data: numpy.array :

input dataset which will be scaled

scale_to: numpy.array :

in_data will be scaled to this dataset

Returns :

CDF matched values: numpy.array :

dataset in_data with CDF as scale_to

pytesmo.scaling.lin_cdf_match(in_data, scale_to)[source]

computes cumulative density functions of in_data and scale_to at their respective bin-edges by linear interpolation; then matches CDF of in_data to CDF of scale_to

Parameters :

in_data: numpy.array :

input dataset which will be scaled

scale_to: numpy.array :

in_data will be scaled to this dataset

Returns :

CDF matched values: numpy.array :

dataset in_data with CDF as scale_to

pytesmo.scaling.linreg(in_data, scale_to)[source]

scales the input datasets using linear regression

Parameters :

in_data : numpy.array

input dataset which will be scaled

scale_to : numpy.array

in_data will be scaled to this dataset

Returns :

scaled dataset : numpy.array

dataset scaled using linear regression

pytesmo.scaling.mean_std(in_data, scale_to)[source]

scales the input datasets so that they have the same mean and standard deviation afterwards

Parameters :

in_data : numpy.array

input dataset which will be scaled

scale_to : numpy.array

in_data will be scaled to this dataset

Returns :

scaled dataset : numpy.array

dataset in_data with same mean and standard deviation as scale_to

pytesmo.scaling.min_max(in_data, scale_to)[source]

scales the input datasets so that they have the same minimum and maximum afterwards

Parameters :

in_data : numpy.array

input dataset which will be scaled

scale_to : numpy.array

in_data will be scaled to this dataset

Returns :

scaled dataset : numpy.array

dataset in_data with same maximum and minimum as scale_to

temporal_matching Module

Created on Apr 12, 2013

Provides a temporal matching function

@author: Sebastian Hahn Sebastian.Hahn@geo.tuwien.ac.at

pytesmo.temporal_matching.df_match(reference, *args, **kwds)[source]

Finds temporal match between the reference pandas.DataFrame (index has to be datetime) and n other pandas.DataFrame (index has to be datetime).

Parameters :

reference : pandas.DataFrame

The index of this dataframe will be the reference.

*args : pandas.DataFrame

The index of this dataframe(s) will be matched.

window : float

Fraction of days of the maximum pos./neg. distance allowed, i.e. the value of window represents the half-winow size (e.g. window=0.5, will search for matches between -12 and +12 hours) (default: None)

dropna : boolean

Drop rows containing only NaNs (default: False)

dropduplicates : boolean

Drop duplicated temporal matched (default: False)

Returns :

temporal_matched_args : pandas.DataFrame or tuple of pandas.DataFrame

Dataframe with index from matched reference index

Table Of Contents

Previous topic

Examples

Next topic

grid Package

This Page