pyemma.coordinates.transform.PCA¶
-
class
pyemma.coordinates.transform.
PCA
(dim=-1, var_cutoff=0.95, mean=None)¶ Principal component analysis.
-
__init__
(dim=-1, var_cutoff=0.95, mean=None)¶ Principal component analysis.
Given a sequence of multivariate data \(X_t\), computes the mean-free covariance matrix.
\[C = (X - \mu)^T (X - \mu)\]and solves the eigenvalue problem
\[C r_i = \sigma_i r_i,\]where \(r_i\) are the principal components and \(\sigma_i\) are their respective variances.
When used as a dimension reduction method, the input data is projected onto the dominant principal components.
Parameters: - dim (int, optional, default -1) – the number of dimensions (independent components) to project onto. A call to the
map
function reduces the d-dimensional input to only dim dimensions such that the data preserves the maximum possible autocorrelation amongst dim-dimensional linear projections. -1 means all numerically available dimensions will be used unless reduced by var_cutoff. Setting dim to a positive value is exclusive with var_cutoff. - var_cutoff (float in the range [0,1], optional, default 0.95) – Determines the number of output dimensions by including dimensions until their cumulative kinetic variance exceeds the fraction subspace_variance. var_cutoff=1.0 means all numerically available dimensions (see epsilon) will be used, unless set by dim. Setting var_cutoff smaller than 1.0 is exclusive with dim
- mean (ndarray, optional, default None) – Optionally pass pre-calculated means to avoid their re-computation. The shape has to match the input dimension.
- dim (int, optional, default -1) – the number of dimensions (independent components) to project onto. A call to the
Methods
__init__
([dim, var_cutoff, mean])Principal component analysis. describe
(*args, **kwargs)Get a descriptive string representation of this class. dimension
()output dimension fit
(X, **kwargs)For compatibility with sklearn fit_transform
(X, **kwargs)For compatibility with sklearn get_output
([dimensions, stride])Maps all input data of this transformer and returns it as an array or list of arrays. iterator
([stride, lag])Returns an iterator that allows to access the transformed data. map
(X)Deprecated: use transform(X) n_frames_total
([stride])Returns total number of frames. number_of_trajectories
()Returns the number of trajectories. output_type
()By default transformers return single precision floats. parametrize
([stride])Parametrize this Transformer register_progress_callback
(call_back[, stage])Registers the progress reporter. trajectory_length
(itraj[, stride])Returns the length of trajectory of the requested index. trajectory_lengths
([stride])Returns the length of each trajectory. transform
(X)Maps the input data through the transformer to correspondingly shaped output data array/list. Attributes
chunksize
chunksize defines how much data is being processed at once. covariance_matrix
data_producer
where the transformer obtains its data. in_memory
are results stored in memory? mean
name
The name of this instance ntraj
-
chunksize
¶ chunksize defines how much data is being processed at once.
-
data_producer
¶ where the transformer obtains its data.
-
describe
(*args, **kwargs)¶ Get a descriptive string representation of this class.
-
dimension
()¶ output dimension
-
fit
(X, **kwargs)¶ For compatibility with sklearn
-
fit_transform
(X, **kwargs)¶ For compatibility with sklearn
-
get_output
(dimensions=slice(0, None, None), stride=1)¶ Maps all input data of this transformer and returns it as an array or list of arrays.
Parameters: - dimensions (list-like of indexes or slice) – indices of dimensions you like to keep, default = all
- stride (int) – only take every n’th frame, default = 1
Returns: output – the mapped data, where T is the number of time steps of the input data, or if stride > 1, floor(T_in / stride). d is the output dimension of this transformer. If the input consists of a list of trajectories, Y will also be a corresponding list of trajectories
Return type: ndarray(T, d) or list of ndarray(T_i, d)
Notes
- This function may be RAM intensive if stride is too large or too many dimensions are selected.
- if in_memory attribute is True, then results of this methods are cached.
Example
plotting trajectories
>>> import pyemma.coordinates as coor >>> import matplotlib.pyplot as plt
Fill with some actual data!
>>> tica = coor.tica() >>> trajs = tica.get_output(dimensions=(0,), stride=100) >>> for traj in trajs: ... plt.figure() ... plt.plot(traj[:, 0])
-
in_memory
¶ are results stored in memory?
-
iterator
(stride=1, lag=0)¶ Returns an iterator that allows to access the transformed data.
Parameters: - stride (int) – Only transform every N’th frame, default = 1
- lag (int) – Configure the iterator such that it will return time-lagged data with a lag time of lag. If lag is used together with stride the operation will work as if the striding operation is applied before the time-lagged trajectory is shifted by lag steps. Therefore the effective lag time will be stride*lag.
Returns: iterator – If lag = 0, a call to the .next() method of this iterator will return the pair (itraj, X) : (int, ndarray(n, m)), where itraj corresponds to input sequence number (eg. trajectory index) and X is the transformed data, n = chunksize or n < chunksize at end of input.
If lag > 0, a call to the .next() method of this iterator will return the tuple (itraj, X, Y) : (int, ndarray(n, m), ndarray(p, m)) where itraj and X are the same as above and Y contain the time-lagged data.
Return type: a
TransformerIterator
-
logger
¶ The logger for this class instance
-
map
(X)¶ Deprecated: use transform(X)
Maps the input data through the transformer to correspondingly shaped output data array/list.
-
n_frames_total
(stride=1)¶ Returns total number of frames.
Parameters: stride (int) – return value is the number of frames in trajectories when running through them with a step size of stride. Returns: int Return type: n_frames_total
-
name
¶ The name of this instance
-
number_of_trajectories
()¶ Returns the number of trajectories.
Returns: int Return type: number of trajectories
-
output_type
()¶ By default transformers return single precision floats.
-
parametrize
(stride=1)¶ Parametrize this Transformer
-
register_progress_callback
(call_back, stage=0)¶ Registers the progress reporter.
Parameters: - call_back (function) –
This function will be called with the following arguments:
- stage (int)
- instance of pyemma.utils.progressbar.ProgressBar
- optional *args and named keywords (**kw), for future changes
- stage (int, optional, default=0) – The stage you want the given call back function to be fired.
- call_back (function) –
-
trajectory_length
(itraj, stride=1)¶ Returns the length of trajectory of the requested index.
Parameters: - itraj (int) – trajectory index
- stride (int) – return value is the number of frames in the trajectory when running through it with a step size of stride.
Returns: int
Return type: length of trajectory
-
trajectory_lengths
(stride=1)¶ Returns the length of each trajectory.
Parameters: stride (int) – return value is the number of frames of the trajectories when running through them with a step size of stride. Returns: array(dtype=int) Return type: containing length of each trajectory
-
transform
(X)¶ Maps the input data through the transformer to correspondingly shaped output data array/list.
Parameters: X (ndarray(T, n) or list of ndarray(T_i, n)) – The input data, where T is the number of time steps and n is the number of dimensions. If a list is provided, the number of time steps is allowed to vary, but the number of dimensions are required to be to be consistent. Returns: Y – The mapped data, where T is the number of time steps of the input data and d is the output dimension of this transformer. If called with a list of trajectories, Y will also be a corresponding list of trajectories Return type: ndarray(T, d) or list of ndarray(T_i, d)
-