pyemma.coordinates.transform.PCA

class pyemma.coordinates.transform.PCA(dim=-1, var_cutoff=0.95, mean=None)

Principal component analysis.

__init__(dim=-1, var_cutoff=0.95, mean=None)

Principal component analysis.

Given a sequence of multivariate data \(X_t\), computes the mean-free covariance matrix.

\[C = (X - \mu)^T (X - \mu)\]

and solves the eigenvalue problem

\[C r_i = \sigma_i r_i,\]

where \(r_i\) are the principal components and \(\sigma_i\) are their respective variances.

When used as a dimension reduction method, the input data is projected onto the dominant principal components.

Parameters:
  • dim (int, optional, default -1) – the number of dimensions (independent components) to project onto. A call to the map function reduces the d-dimensional input to only dim dimensions such that the data preserves the maximum possible autocorrelation amongst dim-dimensional linear projections. -1 means all numerically available dimensions will be used unless reduced by var_cutoff. Setting dim to a positive value is exclusive with var_cutoff.
  • var_cutoff (float in the range [0,1], optional, default 0.95) – Determines the number of output dimensions by including dimensions until their cumulative kinetic variance exceeds the fraction subspace_variance. var_cutoff=1.0 means all numerically available dimensions (see epsilon) will be used, unless set by dim. Setting var_cutoff smaller than 1.0 is exclusive with dim
  • mean (ndarray, optional, default None) – Optionally pass pre-calculated means to avoid their re-computation. The shape has to match the input dimension.

Methods

__init__([dim, var_cutoff, mean]) Principal component analysis.
describe(*args, **kwargs) Get a descriptive string representation of this class.
dimension() output dimension
fit(X, **kwargs) For compatibility with sklearn
fit_transform(X, **kwargs) For compatibility with sklearn
get_output([dimensions, stride]) Maps all input data of this transformer and returns it as an array or list of arrays.
iterator([stride, lag]) Returns an iterator that allows to access the transformed data.
map(X) Deprecated: use transform(X)
n_frames_total([stride]) Returns total number of frames.
number_of_trajectories() Returns the number of trajectories.
output_type() By default transformers return single precision floats.
parametrize([stride]) Parametrize this Transformer
register_progress_callback(call_back[, stage]) Registers the progress reporter.
trajectory_length(itraj[, stride]) Returns the length of trajectory of the requested index.
trajectory_lengths([stride]) Returns the length of each trajectory.
transform(X) Maps the input data through the transformer to correspondingly shaped output data array/list.

Attributes

chunksize chunksize defines how much data is being processed at once.
covariance_matrix
data_producer where the transformer obtains its data.
in_memory are results stored in memory?
mean
name The name of this instance
ntraj
chunksize

chunksize defines how much data is being processed at once.

data_producer

where the transformer obtains its data.

describe(*args, **kwargs)

Get a descriptive string representation of this class.

dimension()

output dimension

fit(X, **kwargs)

For compatibility with sklearn

fit_transform(X, **kwargs)

For compatibility with sklearn

get_output(dimensions=slice(0, None, None), stride=1)

Maps all input data of this transformer and returns it as an array or list of arrays.

Parameters:
  • dimensions (list-like of indexes or slice) – indices of dimensions you like to keep, default = all
  • stride (int) – only take every n’th frame, default = 1
Returns:

output – the mapped data, where T is the number of time steps of the input data, or if stride > 1, floor(T_in / stride). d is the output dimension of this transformer. If the input consists of a list of trajectories, Y will also be a corresponding list of trajectories

Return type:

ndarray(T, d) or list of ndarray(T_i, d)

Notes

  • This function may be RAM intensive if stride is too large or too many dimensions are selected.
  • if in_memory attribute is True, then results of this methods are cached.

Example

plotting trajectories

>>> import pyemma.coordinates as coor 
>>> import matplotlib.pyplot as plt 

Fill with some actual data!

>>> tica = coor.tica() 
>>> trajs = tica.get_output(dimensions=(0,), stride=100) 
>>> for traj in trajs: 
...     plt.figure() 
...     plt.plot(traj[:, 0]) 
in_memory

are results stored in memory?

iterator(stride=1, lag=0)

Returns an iterator that allows to access the transformed data.

Parameters:
  • stride (int) – Only transform every N’th frame, default = 1
  • lag (int) – Configure the iterator such that it will return time-lagged data with a lag time of lag. If lag is used together with stride the operation will work as if the striding operation is applied before the time-lagged trajectory is shifted by lag steps. Therefore the effective lag time will be stride*lag.
Returns:

iterator – If lag = 0, a call to the .next() method of this iterator will return the pair (itraj, X) : (int, ndarray(n, m)), where itraj corresponds to input sequence number (eg. trajectory index) and X is the transformed data, n = chunksize or n < chunksize at end of input.

If lag > 0, a call to the .next() method of this iterator will return the tuple (itraj, X, Y) : (int, ndarray(n, m), ndarray(p, m)) where itraj and X are the same as above and Y contain the time-lagged data.

Return type:

a TransformerIterator

logger

The logger for this class instance

map(X)

Deprecated: use transform(X)

Maps the input data through the transformer to correspondingly shaped output data array/list.

n_frames_total(stride=1)

Returns total number of frames.

Parameters:stride (int) – return value is the number of frames in trajectories when running through them with a step size of stride.
Returns:int
Return type:n_frames_total
name

The name of this instance

number_of_trajectories()

Returns the number of trajectories.

Returns:int
Return type:number of trajectories
output_type()

By default transformers return single precision floats.

parametrize(stride=1)

Parametrize this Transformer

register_progress_callback(call_back, stage=0)

Registers the progress reporter.

Parameters:
  • call_back (function) –

    This function will be called with the following arguments:

    1. stage (int)
    2. instance of pyemma.utils.progressbar.ProgressBar
    3. optional *args and named keywords (**kw), for future changes
  • stage (int, optional, default=0) – The stage you want the given call back function to be fired.
trajectory_length(itraj, stride=1)

Returns the length of trajectory of the requested index.

Parameters:
  • itraj (int) – trajectory index
  • stride (int) – return value is the number of frames in the trajectory when running through it with a step size of stride.
Returns:

int

Return type:

length of trajectory

trajectory_lengths(stride=1)

Returns the length of each trajectory.

Parameters:stride (int) – return value is the number of frames of the trajectories when running through them with a step size of stride.
Returns:array(dtype=int)
Return type:containing length of each trajectory
transform(X)

Maps the input data through the transformer to correspondingly shaped output data array/list.

Parameters:X (ndarray(T, n) or list of ndarray(T_i, n)) – The input data, where T is the number of time steps and n is the number of dimensions. If a list is provided, the number of time steps is allowed to vary, but the number of dimensions are required to be to be consistent.
Returns:Y – The mapped data, where T is the number of time steps of the input data and d is the output dimension of this transformer. If called with a list of trajectories, Y will also be a corresponding list of trajectories
Return type:ndarray(T, d) or list of ndarray(T_i, d)