pytesmo package¶
Subpackages¶
Submodules¶
pytesmo.df_metrics module¶
Module contains wrappers for methods in pytesmo.metrics which can be given pandas.DataFrames instead of single numpy.arrays . If the DataFrame has more columns than the function has input parameters the function will be applied pairwise
Created on Aug 14, 2013
@author: Christoph Paulik Christoph.Paulik@geo.tuwien.ac.at
- exception pytesmo.df_metrics.DataFrameDimensionError[source]¶
Bases: exceptions.Exception
- pytesmo.df_metrics.RSS(df)[source]¶
Redidual sum of squares
Returns: result : namedtuple
with column names of df for which the calculation was done as name of the element separated by ‘_and_’
See also
- pytesmo.df_metrics.bias(df)[source]¶
Bias
Returns: bias : pandas.Dataframe
of shape (len(df.columns),len(df.columns))
See Also
pytesmo.metrics.bias
- pytesmo.df_metrics.kendalltau(df)[source]¶
Wrapper for scipy.stats.kendalltau
Returns: result : namedtuple
with column names of df for which the calculation was done as name of the element separated by ‘_and_’
- pytesmo.df_metrics.mse(df)[source]¶
Mean square error (MSE) as a decomposition of the RMSD into individual error components
Returns: result : namedtuple
with column names of df for which the calculation was done as name of the element separated by ‘_and_’
See also
- pytesmo.df_metrics.nash_sutcliffe(df)[source]¶
Nash Sutcliffe model efficiency coefficient
Returns: result : namedtuple
with column names of df for which the calculation was done as name of the element separated by ‘_and_’
See also
- pytesmo.df_metrics.nrmsd(df)[source]¶
Normalized root-mean-square deviation
Returns: result : namedtuple
with column names of df for which the calculation was done as name of the element separated by ‘_and_’
See also
- pytesmo.df_metrics.pairwise_apply(df, method, comm=False)[source]¶
Compute given method pairwise for all columns, excluding NA/null values
Parameters: df : pandas.DataFrame
input data, method will be applied to each column pair
method : function
method to apply to each column pair. has to take 2 input arguments of type numpy.array and return one value or tuple of values
Returns: results : pandas.DataFrame
- pytesmo.df_metrics.pearsonr(df)[source]¶
Wrapper for scipy.stats.pearsonr
Returns: result : namedtuple
with column names of df for which the calculation was done as name of the element separated by ‘_and_’
See also
- pytesmo.df_metrics.rmsd(df)[source]¶
Root-mean-square deviation
Returns: result : namedtuple
with column names of df for which the calculation was done as name of the element separated by ‘_and_’
See also
- pytesmo.df_metrics.spearmanr(df)[source]¶
Wrapper for scipy.stats.spearmanr
Returns: result : namedtuple
with column names of df for which the calculation was done as name of the element separated by ‘_and_’
See also
pytesmo.metrics.spearmenr, scipy.stats.spearmenr
- pytesmo.df_metrics.tcol_error(df)[source]¶
Triple collocation error estimate In this case df has to have exactly 3 columns, since triple wise application of a function is not yet implemented and would probably return a complicated structure
Returns: result : namedtuple
with column names of df
See also
pytesmo.metrics module¶
Created on Apr 17, 2013
@author: Christoph Paulik christoph.paulik@geo.tuwien.ac.at @author: Sebastian Hahn sebastian.hahn@geo.tuwien.ac.at @author: Alexander Gruber alexander.gruber@geo.tuwien.ac.at
- pytesmo.metrics.RSS(x, y)[source]¶
Redidual sum of squares
Parameters: x : numpy.array
1D numpy array to calculate the metric
y : numpy.array
1D numpy array to calculate the metric
Returns: Residual sum of squares
- pytesmo.metrics.kendalltau(x, y)[source]¶
Wrapper for scipy.stats.kendalltau
Parameters: x : numpy.array
1D numpy array to calculate the metric
y : numpy.array
1D numpy array to calculate the metric
Returns: Kendall’s tau : float
The tau statistic
p-value : float
The two-sided p-value for a hypothesis test whose null hypothesis is an absence of association, tau = 0.
See also
- pytesmo.metrics.mse(x, y)[source]¶
Mean square error (MSE) as a decomposition of the RMSD into individual error components
- pytesmo.metrics.nash_sutcliffe(x, y)[source]¶
Nash Sutcliffe model efficiency coefficient
Parameters: x : numpy.array
1D numpy array to calculate the metric
y : numpy.array
1D numpy array to calculate the metric
Returns: Nash Sutcliffe coefficient : float
Nash Sutcliffe model efficiency coefficient
- pytesmo.metrics.pearsonr(x, y)[source]¶
Wrapper for scipy.stats.pearsonr
Parameters: x : numpy.array
1D numpy array to calculate the metric
y : numpy.array
1D numpy array to calculate the metric
Returns: Pearson’s r : float
Pearson’s correlation coefficent
p-value : float
2 tailed p-value
See also
- pytesmo.metrics.spearmanr(x, y)[source]¶
Wrapper for scipy.stats.spearmanr
Parameters: x : numpy.array
1D numpy array to calculate the metric
y : numpy.array
1D numpy array to calculate the metric
Returns: rho : float
Spearman correlation coefficient
p-value : float
The two-sided p-value for a hypothesis test whose null hypothesis is that two sets of data are uncorrelated
See also
scipy.stats.spearmenr
- pytesmo.metrics.tcol_error(x, y, z)[source]¶
Triple collocation error estimate
Parameters: x : numpy.array
1D numpy array to calculate the errors
y : numpy.array
1D numpy array to calculate the errors
z : numpy.array
1D numpy array to calculate the errors
Returns: triple collocation error for x : float
triple collocation error for y : float
triple collocation error for z : float
pytesmo.scaling module¶
Created on Apr 17, 2013
@author: Christoph Paulik christoph.paulik@geo.tuwien.ac.at
- pytesmo.scaling.add_scaled(df, method='linreg', label_in=None, label_scale=None)[source]¶
takes a dataframe and appends a scaled time series to it. If no labels are given the first column will be scaled to the second column of the DataFrame
Parameters: df : pandas.DataFrame
input dataframe
method : string
scaling method
label_in: string, optional
the column of the dataframe that should be scaled to that with label_scale default is the first column
label_scale : string, optional
the column of the dataframe the label_in column should be scaled to default is the second column
Returns: df : pandas.DataFrame
input dataframe with new column labeled label_in+’_scaled_’+method
- pytesmo.scaling.cdf_match(in_data, scale_to)[source]¶
- computes discrete cumulative density functions of in_data- and scale_to at their respective bin_edges;
- computes continuous CDFs by 6th order polynomial fitting;
- CDF of in_data is matched to CDF of scale_to
Parameters: in_data: numpy.array
input dataset which will be scaled
scale_to: numpy.array
in_data will be scaled to this dataset
Returns: CDF matched values: numpy.array
dataset in_data with CDF as scale_to
- pytesmo.scaling.lin_cdf_match(in_data, scale_to)[source]¶
computes cumulative density functions of in_data and scale_to at their respective bin-edges by linear interpolation; then matches CDF of in_data to CDF of scale_to
Parameters: in_data: numpy.array
input dataset which will be scaled
scale_to: numpy.array
in_data will be scaled to this dataset
Returns: CDF matched values: numpy.array
dataset in_data with CDF as scale_to
- pytesmo.scaling.linreg(in_data, scale_to)[source]¶
scales the input datasets using linear regression
Parameters: in_data : numpy.array
input dataset which will be scaled
scale_to : numpy.array
in_data will be scaled to this dataset
Returns: scaled dataset : numpy.array
dataset scaled using linear regression
- pytesmo.scaling.mean_std(in_data, scale_to)[source]¶
scales the input datasets so that they have the same mean and standard deviation afterwards
Parameters: in_data : numpy.array
input dataset which will be scaled
scale_to : numpy.array
in_data will be scaled to this dataset
Returns: scaled dataset : numpy.array
dataset in_data with same mean and standard deviation as scale_to
- pytesmo.scaling.min_max(in_data, scale_to)[source]¶
scales the input datasets so that they have the same minimum and maximum afterwards
Parameters: in_data : numpy.array
input dataset which will be scaled
scale_to : numpy.array
in_data will be scaled to this dataset
Returns: scaled dataset : numpy.array
dataset in_data with same maximum and minimum as scale_to
- pytesmo.scaling.scale(df, method='linreg', reference_index=0)[source]¶
takes pandas.DataFrame and scales all columns to the column specified by reference_index with the chosen method
Parameters: df : pandas.DataFrame
containing matched time series that should be scaled
method : string, optional
method definition, has to be a function in globals() that takes 2 numpy.array as input and returns one numpy.array of same length
reference_index : int, optional
default 0, column index of reference dataset in dataframe
Returns: scaled data : pandas.DataFrame
all time series of the input DataFrame scaled to the one specified by reference_index
pytesmo.temporal_matching module¶
Created on Apr 12, 2013
Provides a temporal matching function
@author: Sebastian Hahn Sebastian.Hahn@geo.tuwien.ac.at
- pytesmo.temporal_matching.df_match(reference, *args, **kwds)[source]¶
Finds temporal match between the reference pandas.DataFrame (index has to be datetime) and n other pandas.DataFrame (index has to be datetime).
Parameters: reference : pandas.DataFrame or pandas.TimeSeries
The index of this dataframe will be the reference.
*args : pandas.DataFrame or pandas.TimeSeries
The index of this dataframe(s) will be matched.
window : float
Fraction of days of the maximum pos./neg. distance allowed, i.e. the value of window represents the half-winow size (e.g. window=0.5, will search for matches between -12 and +12 hours) (default: None)
dropna : boolean
Drop rows containing only NaNs (default: False)
dropduplicates : boolean
Drop duplicated temporal matched (default: False)
Returns: temporal_matched_args : pandas.DataFrame or tuple of pandas.DataFrame
Dataframe with index from matched reference index
- pytesmo.temporal_matching.matching(reference, *args, **kwargs)[source]¶
Finds temporal match between the reference pandas.TimeSeries (index has to be datetime) and n other pandas.TimeSeries (index has to be datetime).
Parameters: reference : pandas.TimeSeries
The index of this Series will be the reference.
*args : pandas.TimeSeries
The index of these Series(s) will be matched.
window : float
Fraction of days of the maximum pos./neg. distance allowed, i.e. the value of window represents the half-winow size (e.g. window=0.5, will search for matches between -12 and +12 hours) (default: None)
Returns: temporal_match : pandas.DataFrame
containing the index of the reference Series and a column for each of the other input Series