pyemma.coordinates.data.DataInMemory¶
-
class
pyemma.coordinates.data.
DataInMemory
(data, chunksize=5000)¶ multi-dimensional data fully stored in memory.
Used to pass arbitrary coordinates to pipeline. Data is being flattened to two dimensions to ensure it is compatible.
Parameters: data (ndarray (nframe, ndim) or list of ndarrays (nframe, ndim)) – Data has to be either one 2d array which stores amount of frames in first dimension and coordinates/features in second dimension or a list of this arrays. -
__init__
(data, chunksize=5000)¶
Methods
__init__
(data[, chunksize])describe
()dimension
()Returns the number of output dimensions fit
(X, **kwargs)For compatibility with sklearn fit_transform
(X, **kwargs)For compatibility with sklearn get_output
([dimensions, stride])Maps all input data of this transformer and returns it as an array or list of arrays. iterator
([stride, lag])Returns an iterator that allows to access the transformed data. load_from_files
(files)construct this by loading all files into memory map
(X)Deprecated: use transform(X) n_frames_total
([stride])Returns the total number of frames, over all trajectories number_of_trajectories
()Returns the number of trajectories output_type
()By default transformers return single precision floats. parametrize
([stride])Parametrize this Transformer register_progress_callback
(call_back[, stage])Registers the progress reporter. trajectory_length
(itraj[, stride])Returns the length of trajectory trajectory_lengths
([stride])Returns the length of each trajectory transform
(X)Attributes
chunksize
chunksize defines how much data is being processed at once. data_producer
where the transformer obtains its data. in_memory
are results stored in memory? name
The name of this instance ntraj
-
chunksize
¶ chunksize defines how much data is being processed at once.
-
data_producer
¶ where the transformer obtains its data.
-
dimension
()¶ Returns the number of output dimensions
Returns:
-
fit
(X, **kwargs)¶ For compatibility with sklearn
-
fit_transform
(X, **kwargs)¶ For compatibility with sklearn
-
get_output
(dimensions=slice(0, None, None), stride=1)¶ Maps all input data of this transformer and returns it as an array or list of arrays.
Parameters: - dimensions (list-like of indexes or slice) – indices of dimensions you like to keep, default = all
- stride (int) – only take every n’th frame, default = 1
Returns: output – the mapped data, where T is the number of time steps of the input data, or if stride > 1, floor(T_in / stride). d is the output dimension of this transformer. If the input consists of a list of trajectories, Y will also be a corresponding list of trajectories
Return type: ndarray(T, d) or list of ndarray(T_i, d)
Notes
- This function may be RAM intensive if stride is too large or too many dimensions are selected.
- if in_memory attribute is True, then results of this methods are cached.
Example
plotting trajectories
>>> import pyemma.coordinates as coor >>> import matplotlib.pyplot as plt
Fill with some actual data!
>>> tica = coor.tica() >>> trajs = tica.get_output(dimensions=(0,), stride=100) >>> for traj in trajs: ... plt.figure() ... plt.plot(traj[:, 0])
-
in_memory
¶ are results stored in memory?
-
iterator
(stride=1, lag=0)¶ Returns an iterator that allows to access the transformed data.
Parameters: - stride (int) – Only transform every N’th frame, default = 1
- lag (int) – Configure the iterator such that it will return time-lagged data with a lag time of lag. If lag is used together with stride the operation will work as if the striding operation is applied before the time-lagged trajectory is shifted by lag steps. Therefore the effective lag time will be stride*lag.
Returns: iterator – If lag = 0, a call to the .next() method of this iterator will return the pair (itraj, X) : (int, ndarray(n, m)), where itraj corresponds to input sequence number (eg. trajectory index) and X is the transformed data, n = chunksize or n < chunksize at end of input.
If lag > 0, a call to the .next() method of this iterator will return the tuple (itraj, X, Y) : (int, ndarray(n, m), ndarray(p, m)) where itraj and X are the same as above and Y contain the time-lagged data.
Return type: a
TransformerIterator
-
classmethod
load_from_files
(files)¶ construct this by loading all files into memory
Parameters: files (str or list of str) – filenames to read from
-
logger
¶ The logger for this class instance
-
map
(X)¶ Deprecated: use transform(X)
Maps the input data through the transformer to correspondingly shaped output data array/list.
-
n_frames_total
(stride=1)¶ Returns the total number of frames, over all trajectories
Parameters: stride – return value is the number of frames in trajectories when running through them with a step size of stride Returns: the total number of frames, over all trajectories
-
name
¶ The name of this instance
-
number_of_trajectories
()¶ Returns the number of trajectories
Returns: number of trajectories
-
output_type
()¶ By default transformers return single precision floats.
-
parametrize
(stride=1)¶ Parametrize this Transformer
-
register_progress_callback
(call_back, stage=0)¶ Registers the progress reporter.
Parameters: - call_back (function) –
This function will be called with the following arguments:
- stage (int)
- instance of pyemma.utils.progressbar.ProgressBar
- optional *args and named keywords (**kw), for future changes
- stage (int, optional, default=0) – The stage you want the given call back function to be fired.
- call_back (function) –
-
trajectory_length
(itraj, stride=1)¶ Returns the length of trajectory
Parameters: - itraj – trajectory index
- stride – return value is the number of frames in trajectory when running through it with a step size of stride
Returns: length of trajectory
-
trajectory_lengths
(stride=1)¶ Returns the length of each trajectory
Parameters: stride – return value is the number of frames in trajectories when running through them with a step size of stride Returns: numpy array containing length of each trajectory
-