Overview of momi2

Before assembling the watch, lay out every part and understand what it does.

../../_images/fig_mini_momi2.png

momi2 at a glance. Panel A: The W-matrix (Polanski-Kimmel coefficients) as a heatmap – the combinatorial bridge between sample configurations and expected branch lengths. Panel B: Moran model transition dynamics showing how a delta distribution at frequency \(k\) spreads over time. Panel C: Admixture tensor – expected migrating lineage counts as a function of admixture fraction. Panel D: Expected time spent with \(j\) lineages under constant-size epochs with different population sizes \(N\).

What Does momi2 Do?

Given DNA sequence data from one or more populations, momi2 infers the demographic history that produced the observed patterns of genetic variation.

\[\text{Input: } \mathbf{D} = \text{observed SFS from } k \text{ populations}\]
\[\text{Output: } \hat{\boldsymbol{\Theta}} = \text{best-fit demographic parameters (sizes, times, admixture fractions)}\]

Concretely, momi2 asks: What history of population size changes, splits, and admixture events would produce a frequency spectrum that looks like the one we observe?

Think of a fine mechanical watch. A watchmaker can deduce every gear ratio inside the case by studying the positions of the hands on the dial. momi2 works the same way: it computes the expected hand positions (the expected SFS) for a given gear configuration (a demographic model), then adjusts the gears until the hands match what is observed. But unlike moments, which turns the gears forward in time to see where the hands end up, momi2 traces the hands backward through the coalescent to figure out what gear configuration must have produced them.

From the inferred history, you learn:

  • Population sizes through time – expansions, contractions, bottlenecks

  • Divergence times – when populations split from each other

  • Admixture fractions – the proportion of ancestry contributed by each source during a pulse admixture event

  • Growth rates – exponential growth or decline within epochs

How momi2 Differs from moments and dadi

All three tools infer demography from the SFS. The difference is how they compute the expected SFS under a given model.

Property

dadi

moments

momi2

Direction

Forward (diffusion PDE)

Forward (moment ODEs)

Backward (coalescent)

Core equation

Wright-Fisher diffusion PDE on a frequency grid

System of ODEs for SFS entries

Tensor products over a demographic event tree

Population model

Continuous diffusion

Continuous diffusion (via moments)

Moran model (discrete)

Gradient computation

Finite differences

Analytic or finite differences

Automatic differentiation (autograd)

Multi-population scaling

Grid grows as \(O(n^D)\) – expensive beyond 3 populations

ODE system grows polynomially – practical for 4–5 populations

Tensor operations along event tree – scales well for many populations

Selection

Yes

Yes

No (neutral models only)

The key trade-off

momi2 gives up selection modeling in exchange for two advantages: exact gradients via automatic differentiation (enabling faster, more reliable optimization) and a coalescent formulation that scales more naturally to many populations. If you need selection, use moments or dadi. If you have many populations and neutral models, momi2 is often the better choice.

The Key Innovation: Tensors + Autograd

The central insight of momi2 is that the expected SFS under any demographic model can be written as a sequence of tensor operations – matrix multiplications, convolutions, and antidiagonal summations – applied to likelihood tensors that track allele configurations across populations.

Each operation corresponds to a demographic event:

  • Moran transition (a period of constant or exponential population size): multiply the likelihood tensor along the relevant axis by the Moran transition matrix \(P(t) = V e^{\Lambda t} V^{-1}\)

  • Population merge (a split event, viewed backward in time): convolve two population axes into one

  • Admixture pulse: contract the likelihood tensor with a 3-tensor encoding the binomial mixing of lineages

Because every one of these operations is differentiable, the Python library autograd can compute exact gradients of the log-likelihood with respect to all demographic parameters by backpropagating through the entire computation graph. No hand-coded Jacobians, no finite-difference approximations, no truncation error.

The Moran Model Connection

Within each epoch (a period between demographic events), momi2 needs to compute how the distribution of lineage counts evolves over time. It uses the Moran model – a continuous-time Markov chain on the number of derived alleles in a sample of size \(n\).

The Moran model has a tridiagonal rate matrix with a known eigendecomposition. This means the transition matrix for any time \(t\) can be computed in closed form:

\[P(t) = V \, e^{\Lambda t} \, V^{-1}\]

where \(V\) is the matrix of eigenvectors and \(\Lambda\) is the diagonal matrix of eigenvalues. No numerical ODE integration is needed – just a single matrix operation per epoch.

For epochs with exponential growth, the time is rescaled by integrating the inverse population size: \(\tau = \int_0^T 2/N(s)\, ds\), and the expected coalescence times are computed via closed-form expressions involving exponential integrals.

Parameters

Parameter

Meaning

Typical units

\(N_e\)

Effective population size (reference)

Individuals

\(N_i(t)\)

Population size of deme \(i\) at time \(t\)

Individuals (or scaled by \(N_e\))

\(T_{\text{split}}\)

Time of population split (backward from present)

Generations (or scaled: \(T / 4N_e\))

\(f\)

Admixture fraction in a pulse event

Proportion (0 to 1)

\(g\)

Exponential growth rate within an epoch

Per generation

\(\mu\)

Per-site per-generation mutation rate

\(\sim 10^{-8}\)

The Flow in Detail

Step 1: Observed SFS
    Count derived allele frequencies across k populations
    from SNP data. Result: a k-dimensional histogram.

Step 2: Demographic model
    Specify a population tree: which populations split when,
    with what sizes, growth rates, and admixture events.

Step 3: Expected SFS via tensor computation
    Traverse the event tree bottom-up (leaves to root):
    - At each leaf: initialize a likelihood tensor
    - At each epoch boundary: apply Moran transition matrix
    - At each split: convolve two population tensors
    - At each pulse: apply admixture 3-tensor
    Accumulate the expected SFS along the way.

Step 4: Likelihood computation
    Compare observed and expected SFS via composite
    log-likelihood (multinomial or Poisson).

Step 5: Optimization
    Compute gradients via autograd, then optimize
    parameters using TNC, L-BFGS-B, ADAM, or SVRG.

Step 6: Uncertainty quantification
    Use the autograd Hessian or bootstrap to estimate
    confidence intervals on inferred parameters.

Ready to Build

The next four chapters disassemble momi2 gear by gear:

  • The Coalescent SFSThe dial: how expected branch lengths in the coalescent translate into expected SFS entries, and how demographic events modify these expectations.

  • The Moran ModelThe escapement: the Moran model as a continuous-time Markov chain, its eigendecomposition, and how it governs lineage dynamics within each epoch.

  • Tensor MachineryThe gear train: likelihood tensors, convolution for population merges, the junction tree algorithm, and how the full SFS is assembled from parts.

  • Automatic Differentiation & InferenceThe mainspring: autograd-powered optimization, composite likelihoods, stochastic gradient methods, and uncertainty quantification.