pyemma.msm.timescales_hmsm¶
-
pyemma.msm.
timescales_hmsm
(dtrajs, nstates, lags=None, nits=None, reversible=True, connected=True, errors=None, nsamples=100, n_jobs=1, show_progress=True)¶ Calculate implied timescales from Hidden Markov state models estimated at a series of lag times.
Warning: this can be slow!
Parameters: - dtrajs (array-like or list of array-likes) – discrete trajectories
- nstates (int) – number of hidden states
- lags (array-like of integers (optional)) – integer lag times at which the implied timescales will be calculated
- nits (int (optional)) – number of implied timescales to be computed. Will compute less if the number of states are smaller. None means the number of timescales will be determined automatically.
- connected (boolean (optional)) – If true compute the connected set before transition matrix estimation at each lag separately
- reversible (boolean (optional)) – Estimate transition matrix reversibly (True) or nonreversibly (False)
- errors (None | 'bayes') – Specifies whether to compute statistical uncertainties (by default not), an which algorithm to use if yes. The only option is currently ‘bayes’. This algorithm is much faster than MSM-based error calculation because the involved matrices are much smaller.
- nsamples (int) – Number of approximately independent HMSM samples generated for each lag time for uncertainty quantification. Only used if errors is not None.
- = 1 (n_jobs) – how many subprocesses to start to estimate the models for each lag time.
- show_progress (bool, default=True) – Show progressbars for calculation?
Returns: itsobj
Return type: ImpliedTimescales
objectSee also
ImpliedTimescales()
- The object returned by this function.
pyemma.plots.plot_implied_timescales()
- Plotting function for the
ImpliedTimescales
object
Example
>>> from pyemma import msm >>> import numpy as np >>> np.set_printoptions(precision=3) >>> dtraj = [0,1,1,2,2,2,1,2,2,2,1,0,0,1,1,1,2,2,1,1,2,1,1,0,0,0,1,1,2,2,1] # mini-trajectory >>> ts = msm.timescales_hmsm(dtraj, 2, [1,2,3,4,5]) >>> print(ts.timescales) [[ 1.691] [ 7.184] [ 2.037] [ 41.015] [ 10.853]]
-
class
pyemma.msm.estimators.implied_timescales.
ImpliedTimescales
(estimator, lags=None, nits=None, n_jobs=1, show_progress=True)¶ Implied timescales for a series of lag times.
Methods
estimate
(X, **params)Estimates the model given the data X fit
(X)Estimates parameters - for compatibility with sklearn. get_params
([deep])Get parameters for this estimator. get_sample_conf
([conf, process])Returns the confidence interval that contains alpha % of the sample data get_sample_mean
([process])Returns the sample means of implied timescales. get_sample_std
([process])Returns the standard error of implied timescales. get_timescales
([process])Returns the implied timescale estimates register_progress_callback
(call_back[, stage])Registers the progress reporter. set_params
(**params)Set the parameters of this estimator. Attributes
estimators
Returns the estimators for all lagtimes . fraction_of_frames
Returns the fraction of frames used to compute the count matrix at each lagtime. lags
Return the list of lag times for which timescales were computed. lagtimes
Return the list of lag times for which timescales were computed. logger
The logger for this class instance model
The model estimated by this Estimator models
Returns the models for all lagtimes . name
The name of this instance number_of_timescales
Return the number of timescales. sample_mean
Returns the sample means of implied timescales. sample_std
Returns the standard error of implied timescales. samples_available
Returns True if samples are available and thus sample timescales
Returns the implied timescale estimates -
estimate
(X, **params)¶ Estimates the model given the data X
Parameters: - X (object) – A reference to the data from which the model will be estimated
- **params –
__init__ method of this estimator. The present settings will overwrite the settings of parameters given in the __init__ method, i.e. the parameter values after this call will be those that have been used for this estimation. Use this option if only one or a few parameters change with respect to the __init__ settings for this run, and if you don’t need to remember the original settings of these changed parameters.
Returns: model – The estimated model.
Return type: object
-
estimators
¶ Returns the estimators for all lagtimes .
-
fit
(X)¶ Estimates parameters - for compatibility with sklearn.
Parameters: X (object) – A reference to the data from which the model will be estimated Returns: model – The estimated model. Return type: object
-
fraction_of_frames
¶ Returns the fraction of frames used to compute the count matrix at each lagtime.
Notes
In a list of discrete trajectories with varying lengths, the estimation at longer lagtimes will mean discarding some trajectories for which not even one count can be computed. This function returns the fraction of frames that was actually used in computing the count matrix.
Be aware: this fraction refers to the full count matrix, and not that of the largest connected set. Hence, the output is not necessarily the active fraction. For that, use the
EstimatedMSM.active_count_fraction()
function of theEstimatedMSM
class object.
-
get_params
(deep=True)¶ Get parameters for this estimator. :param deep: If True, will return the parameters for this estimator and
contained subobjects that are estimators.Returns: params – Parameter names mapped to their values. Return type: mapping of string to any
-
get_sample_conf
(conf=0.95, process=None)¶ Returns the confidence interval that contains alpha % of the sample data
etc.
Parameters: conf (float, default = 0.95) – the confidence interval. Use:
- conf = 0.6827 for 1-sigma confidence interval
- conf = 0.9545 for 2-sigma confidence interval
- conf = 0.9973 for 3-sigma confidence interval
Returns: (L,R) – lower and upper timescales bounding the confidence interval - if process is None, will return two (l x k) arrays, where l is the number of lag times and k is the number of computed timescales.
- if process is an integer, will return two (l)-arrays with the selected process time scale for every lag time
Return type: (float[],float[]) or (float[][],float[][])
-
get_sample_mean
(process=None)¶ Returns the sample means of implied timescales. Need to generate the samples first, e.g. by calling bootstrap
Parameters: process (int or None, default = None) – index in [0:n-1] referring to the process whose timescale will be returned. By default, process = None and all computed process timescales will be returned. Returns: - if process is None, will return a (l x k) array, where l is the number of lag times
- and k is the number of computed timescales.
- if process is an integer, will return a (l) array with the selected process time scale
- for every lag time
-
get_sample_std
(process=None)¶ Returns the standard error of implied timescales. Need to generate the samples first, e.g. by calling bootstrap
Parameters: process (int or None, default = None) – index in [0:n-1] referring to the process whose timescale will be returned. By default, process = None and all computed process timescales will be returned. Returns: - if process is None, will return a (l x k) array, where l is the number of lag times
- and k is the number of computed timescales.
- if process is an integer, will return a (l) array with the selected process time scale
- for every lag time
-
get_timescales
(process=None)¶ Returns the implied timescale estimates
Parameters: process (int or None, default = None) – index in [0:n-1] referring to the process whose timescale will be returned. By default, process = None and all computed process timescales will be returned. Returns: - if process is None, will return a (l x k) array, where l is the number of lag times
- and k is the number of computed timescales.
- if process is an integer, will return a (l) array with the selected process time scale
- for every lag time
-
lags
¶ Return the list of lag times for which timescales were computed.
-
lagtimes
¶ Return the list of lag times for which timescales were computed.
-
logger
¶ The logger for this class instance
-
model
¶ The model estimated by this Estimator
-
models
¶ Returns the models for all lagtimes .
-
name
¶ The name of this instance
-
number_of_timescales
¶ Return the number of timescales.
-
register_progress_callback
(call_back, stage=0)¶ Registers the progress reporter.
Parameters: - call_back (function) –
This function will be called with the following arguments:
- stage (int)
- instance of pyemma.utils.progressbar.ProgressBar
- optional *args and named keywords (**kw), for future changes
- stage (int, optional, default=0) – The stage you want the given call back function to be fired.
- call_back (function) –
-
sample_mean
¶ Returns the sample means of implied timescales. Need to generate the samples first, e.g. by calling bootstrap
Returns: timescales – mean timescales for all processes and lag times. l is the number of lag times and k is the number of computed timescales. Return type: ndarray((l x k), dtype=float)
-
sample_std
¶ Returns the standard error of implied timescales. Need to generate the samples first, e.g. by calling bootstrap
Returns: timescales – standard deviations of timescales for all processes and lag times. l is the number of lag times and k is the number of computed timescales. Return type: ndarray((l x k), dtype=float)
-
samples_available
¶ Returns True if samples are available and thus sample means, standard errors and confidence intervals can be obtained
-
set_params
(**params)¶ Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object. :returns: :rtype: self
-
timescales
¶ Returns the implied timescale estimates
Returns: timescales – timescales for all processes and lag times. l is the number of lag times and k is the number of computed timescales. Return type: ndarray((l x k), dtype=float)
-
References
Implied timescales as a lagtime-selection and MSM-validation approach were suggested in [1]. Hidden Markov state model estimation is done here as described in [2]. For uncertainty quantification we employ the Bayesian sampling algorithm described in [3].
[1] Swope, W. C. and J. W. Pitera and F. Suits: Describing protein folding kinetics by molecular dynamics simulations: 1. Theory. J. Phys. Chem. B 108: 6571-6581 (2004) [2] F. Noe, H. Wu, J.-H. Prinz and N. Plattner: Projected and hidden Markov models for calculating kinetics and metastable states of complex molecules. J. Chem. Phys. 139, 184114 (2013) [3] J. D. Chodera et al: Bayesian hidden Markov model analysis of single-molecule force spectroscopy: Characterizing kinetics under measurement uncertainty arXiv:1108.1430 (2011)