The Moran Model
The escapement of the watch: a discrete population model whose eigendecomposition lets us tell time exactly.
“The Moran model provides a tractable Markov chain whose eigensystem is known in closed form.”
—Kamm, Terhorst, Song, and Durbin (2017)
Biology Aside – Why a population model?
The central question of demographic inference is: given DNA from living
individuals, what was the population’s history? To answer this, we need a
mathematical model that predicts how allele frequencies – the proportions
of different genetic variants in the population – change over time. A
variant that exists in 5% of chromosomes today may have been at 2% a
thousand generations ago, or it may not have existed at all. The Moran
model gives us a precise, tractable way to compute the probability of any
such frequency trajectory. It is the mathematical engine inside momi2
that converts a demographic scenario (population sizes, split times,
migration rates) into a predicted site frequency spectrum that can be
compared to real data.
Step 1: The Moran Model as a Continuous-Time Markov Chain
The Moran model describes how allele frequencies change in a finite population of size \(n\). Unlike the Wright-Fisher model (which has discrete, non-overlapping generations), the Moran model operates in continuous time: at rate proportional to \(n\), one individual is chosen to reproduce and one to die.
The state of the system is the number of derived alleles \(i \in \{0, 1, \ldots, n\}\). The transition rates are:
The factor \(i(n-i)/2\) counts the number of ways to pick one derived and one ancestral individual (divided by 2 because either could be the parent).
Biology Aside – Derived and ancestral alleles
At any position in the genome, chromosomes carry one of two variants: the ancestral allele (inherited from a distant common ancestor of all samples) and the derived allele (introduced by a mutation at some point in the past). Knowing which allele is ancestral typically requires an outgroup species – for example, chimpanzee sequence is used to polarize human variants. The number of derived copies \(i\) in a sample of \(n\) chromosomes is what the site frequency spectrum (SFS) tabulates. The Moran model tracks exactly this quantity – how \(i\) fluctuates over evolutionary time due to the randomness of reproduction and death (genetic drift).
Probability Aside – Moran vs. Wright-Fisher
The Moran model and the Wright-Fisher model are different discrete models
that converge to the same diffusion limit as \(n \to \infty\). The key
advantage of the Moran model for momi2 is that its rate matrix has a
known eigendecomposition, which allows exact computation of transition
probabilities for any time \(t\). The Wright-Fisher model, being a
discrete-time chain with binomial transitions, does not have such a clean
eigensystem.
Step 2: The Rate Matrix
The rate matrix \(Q\) is an \((n+1) \times (n+1)\) tridiagonal matrix:
States 0 (all ancestral) and \(n\) (all derived) are absorbing – their rows are all zeros. This makes sense: once an allele fixes or is lost, there is no further change.
Biology Aside – Fixation and loss
In a real population, most new mutations are eventually lost – they drift to zero copies and disappear. A small fraction reach fixation – every chromosome carries the derived allele, and the variant becomes invisible to the SFS (it no longer varies). The absorbing boundaries at states 0 and \(n\) capture exactly this: lineages that leave the polymorphic frequency range never return. Only the transient frequencies \(1, 2, \ldots, n-1\) contribute to observable genetic variation. The rate at which alleles transit through these intermediate frequencies depends on the population size history – which is precisely what demographic inference aims to recover.
import numpy as np
from scipy import sparse
def moran_rate_matrix(n):
"""Construct the Moran model rate matrix for sample size n.
Returns a (n+1) x (n+1) tridiagonal matrix Q where:
- Q[i, i+1] = i*(n-i)/2 (gain one derived copy)
- Q[i, i-1] = i*(n-i)/2 (lose one derived copy)
- Q[i, i] = -i*(n-i) (total departure rate)
"""
i = np.arange(n + 1, dtype=float)
off_diag = i * (n - i) / 2.0
diag = -2.0 * off_diag
Q = (np.diag(off_diag[:-1], k=1)
+ np.diag(diag, k=0)
+ np.diag(off_diag[1:], k=-1))
return Q
# Verification: rows sum to zero (a property of all rate matrices)
n = 10
Q = moran_rate_matrix(n)
row_sums = Q.sum(axis=1)
assert np.allclose(row_sums, 0), f"Row sums: {row_sums}"
# Verification: Q is symmetric (detailed balance for the Moran model)
assert np.allclose(Q, Q.T), "Rate matrix should be symmetric"
Step 3: Eigendecomposition
Because \(Q\) is a real symmetric tridiagonal matrix, it has a complete set of real eigenvalues and orthogonal eigenvectors:
where \(\Lambda = \text{diag}(\lambda_0, \lambda_1, \ldots, \lambda_n)\) and \(V\) is the matrix of eigenvectors.
The eigenvalues of the Moran rate matrix are:
Where does this formula come from? Recall from the coalescent that \(\binom{j}{2} = j(j-1)/2\) is the rate at which \(j\) lineages coalesce. This is not a coincidence: the Moran model and the coalescent describe the same process from different perspectives (forward vs. backward in time). The eigenvalue \(\lambda_j\) captures the rate at which the \(j\)-th mode of genetic variation decays – and this rate equals the coalescence rate for \(j\) lineages.
Concretely:
\(\lambda_0 = 0\) and \(\lambda_1 = 0\): these correspond to the two absorbing states (allele fixed at 0 or at \(n\)). Once the allele is fixed or lost, nothing changes – hence zero decay rate.
\(\lambda_2 = -1\): the slowest-decaying transient mode. This corresponds to the last pair of lineages coalescing, which happens at rate \(\binom{2}{2} = 1\).
\(\lambda_n = -n(n-1)/2\): the fastest-decaying mode. This corresponds to all \(n\) lineages present, coalescing at rate \(\binom{n}{2}\).
import numpy as np
n = 10 # population size
eigenvalues = [-j*(j-1)/2 for j in range(n+1)]
print("Eigenvalues of the Moran rate matrix (n=10):")
for j, lam in enumerate(eigenvalues):
label = " (absorbing)" if lam == 0 else ""
print(f" j={j:2d}: lambda = {lam:8.1f}{label}")
Plain-language summary – What eigendecomposition gives us
An eigendecomposition breaks a complicated matrix into simpler components, much like splitting white light into individual colours with a prism. Each eigenvalue \(\lambda_j\) controls one “mode” of the system’s behaviour:
Modes with \(\lambda = 0\): These never decay. They correspond to the allele being permanently fixed or lost – the absorbing states.
Modes with large negative \(\lambda\): These decay quickly. They correspond to fast transient fluctuations that die out early.
Modes with small negative \(\lambda\): These decay slowly. They correspond to the long-lived genetic variation that persists over many generations.
The practical payoff is enormous: to compute how allele frequencies change over a time span \(t\), we simply multiply each mode by \(e^{\lambda_j t}\). Modes with large negative eigenvalues are exponentially damped (ancient variation is lost), while the \(\lambda = 0\) modes persist forever. This gives us an exact, closed-form solution instead of requiring iterative numerical simulation.
def moran_eigensystem(n):
"""Compute the eigendecomposition of the Moran rate matrix.
Returns (V, eigenvalues, V_inv) where Q = V @ diag(eigenvalues) @ V_inv.
"""
Q = moran_rate_matrix(n)
eigenvalues, V = np.linalg.eigh(Q) # eigh for symmetric matrices
V_inv = np.linalg.inv(V)
return V, eigenvalues, V_inv
# Verification: eigenvalues match the theoretical formula
n = 10
V, eigs, V_inv = moran_eigensystem(n)
j = np.arange(n + 1)
theoretical_eigs = -j * (j - 1) / 2.0
# sort both to compare
assert np.allclose(np.sort(eigs), np.sort(theoretical_eigs), atol=1e-10)
# Verification: reconstruct Q from eigensystem
Q_reconstructed = V @ np.diag(eigs) @ V_inv
Q_original = moran_rate_matrix(n)
assert np.allclose(Q_reconstructed, Q_original, atol=1e-10)
Step 4: The Transition Matrix
The transition probability matrix for time \(t\) is the matrix exponential:
Entry \(P_{ij}(t)\) gives the probability that the number of derived alleles transitions from \(i\) to \(j\) in time \(t\).
This is the core computational advantage of the Moran model: instead of
numerically integrating an ODE system (as moments does) or discretizing a
PDE on a grid (as dadi does), momi2 computes exact transition
probabilities with a single matrix multiplication. The eigendecomposition is
computed once per sample size and cached; applying it for different times
\(t\) requires only exponentiating the eigenvalues.
Biology Aside – What the transition matrix means biologically
Each entry \(P_{ij}(t)\) answers a concrete biological question: if a
variant is currently present in \(i\) out of \(n\) chromosomes,
what is the probability that it will be present in \(j\) chromosomes
after \(t\) units of evolutionary time? When the population is
small, drift is strong and alleles rapidly fix or are lost – the matrix
concentrates probability at the edges (\(j = 0\) or \(j = n\)).
When the population is large, drift is weak and alleles linger at
intermediate frequencies – the matrix is more diffuse. By computing
\(P(t)\) for different values of \(t\) (corresponding to different
epoch durations in a demographic model), momi2 can predict how genetic
variation is shaped by any sequence of population size changes, splits, and
bottlenecks.
def moran_transition(t, n):
"""Compute the Moran transition matrix P(t) = exp(Q*t).
t: time (in Moran model units, scaled by 2/N)
n: sample size
Returns (n+1) x (n+1) transition probability matrix.
"""
V, eigs, V_inv = moran_eigensystem(n)
D = np.diag(np.exp(t * eigs))
P = V @ D @ V_inv
# clamp small numerical errors
P = np.clip(P, 0, None)
P = P / P.sum(axis=1, keepdims=True) # normalize rows
return P
# Verification: P(0) is the identity matrix
n = 10
P0 = moran_transition(0, n)
assert np.allclose(P0, np.eye(n + 1), atol=1e-10)
# Verification: rows of P(t) sum to 1
P1 = moran_transition(1.0, n)
assert np.allclose(P1.sum(axis=1), 1.0, atol=1e-10)
# Verification: all entries non-negative
assert np.all(P1 >= -1e-15)
# Verification: P(s+t) = P(s) @ P(t) (Chapman-Kolmogorov)
P_s = moran_transition(0.5, n)
P_t = moran_transition(0.3, n)
P_st = moran_transition(0.8, n)
assert np.allclose(P_s @ P_t, P_st, atol=1e-8)
Step 5: Connecting to the Coalescent
How does the Moran transition matrix relate to the coalescent? The connection is through lineage counting.
Biology Aside – The coalescent, in brief
The coalescent is a way of thinking about genetic ancestry backward in
time. Start with \(n\) sampled chromosomes today and trace their
lineages into the past. Occasionally two lineages merge (“coalesce”) when
they find a common ancestor. The rate of coalescence depends on the
population size: in a small population, lineages bump into each other
quickly; in a large population, it takes longer. By connecting the Moran
model to the coalescent, momi2 can translate a forward-in-time model
of allele frequency change into the backward-in-time framework used to
compute the expected SFS. This bridge is what makes the tensor machinery
in the next chapter possible.
Consider \(n\) sampled chromosomes in a population of size \(N\). Going backward in time, we track how many distinct lineages remain. The Moran model (running backward) describes how the configuration of alleles in the sample evolves as we trace back through the population history.
Specifically, the transition matrix \(P(t)\) applied to a likelihood vector \(\mathbf{v}\) of length \(n+1\) computes:
This transforms the likelihood of observing a given allele configuration at the bottom of an epoch to the likelihood at the top, accounting for all possible coalescence and drift events that could have occurred during the epoch.
The time scaling is crucial. Real time \(T\) (in generations) maps to Moran model time via:
For constant size: \(t = 2T/N\). For exponential growth with rate \(g\) and bottom size \(N_0\):
Plain-language summary – Why time must be scaled
Genetic drift runs on a “molecular clock” whose speed depends on population size. In a population of 1,000 individuals, 100 generations of drift produce the same amount of frequency change as 10,000 generations in a population of 100,000. The integral above converts real time (generations) into the Moran model’s intrinsic time units, which account for varying population size. When the population is small, the integral accumulates quickly (drift is fast); when it is large, the integral accumulates slowly (drift is slow). This is the reason that demographic events like bottlenecks leave such strong signatures in the SFS: a brief period of small population size corresponds to a large “jump” in drift-scaled time.
Step 6: Applying Moran Transitions to Tensors
In momi2’s tensor framework, the Moran transition is applied to a
specific axis of a multi-dimensional likelihood tensor. If the tensor has
one axis per population, and we need to apply the Moran transition for
population \(k\), we multiply along axis \(k\):
This is an einsum operation, which momi2 implements via optimized
batched matrix multiplication:
def moran_action(t, tensor, axis):
"""Apply Moran transition matrix to a tensor along a given axis.
t: scaled time for this epoch
tensor: multi-dimensional likelihood tensor
axis: which axis (population) to apply the transition to
Returns tensor with the Moran transition applied.
"""
n = tensor.shape[axis] - 1
P = moran_transition(t, n)
# einsum: contract axis of tensor with P
return np.tensordot(tensor, P.T, axes=([axis], [0]))
Calculus Aside – Why eigendecomposition beats ODE integration
moments computes the SFS by integrating a system of ODEs forward in
time: \(d\phi/dt = A \phi\). For each epoch, this requires calling an
ODE solver (like RK45), which takes many small steps. momi2 instead
computes \(\phi(t) = e^{At} \phi(0)\) directly via the eigensystem.
Since the eigensystem is computed once and cached, each epoch requires only
\(O(n^2)\) work for the matrix-vector product. This makes momi2
particularly efficient when the same sample size appears in many epochs.
Step 7: The Moran Model in momi2’s Source Code
In the actual momi2 implementation, the Moran eigensystem is memoized
(cached by sample size \(n\)) and the transition matrix computation handles
edge cases for numerical stability:
# Simplified from momi/moran_model.py
from functools import lru_cache
@lru_cache(maxsize=None)
def cached_eigensystem(n):
"""Memoized eigendecomposition -- computed once per sample size."""
Q = moran_rate_matrix(n)
eigenvalues, V = np.linalg.eig(Q)
V_inv = np.linalg.inv(V)
return V, eigenvalues, V_inv
def transition_with_checks(t, n):
"""Transition matrix with probability validation."""
V, eigs, V_inv = cached_eigensystem(n)
D = np.diag(np.exp(t * eigs))
P = V @ D @ V_inv
# numerical cleanup: clamp negatives, normalize rows
P = np.maximum(P, 0)
row_sums = P.sum(axis=1, keepdims=True)
P = P / row_sums
return P
The memoization is critical for performance: in a typical inference run, the optimizer evaluates hundreds of parameter combinations, but the sample sizes remain fixed. The eigensystem is computed once; only the time argument \(t\) changes between evaluations.
Exercises
Exercise 1: Eigenvalue verification
Compute the eigenvalues of the Moran rate matrix for \(n = 5, 10, 20\) and verify they match the formula \(\lambda_j = -j(j-1)/2\).
Exercise 2: Stationary distribution
Show that the Moran model has two absorbing states (0 and \(n\)). For a large time \(t\), what does \(P(t)\) converge to? What is the probability of fixation starting from \(i\) derived alleles?
Exercise 3: Chapman-Kolmogorov
Verify the semigroup property \(P(s+t) = P(s) P(t)\) numerically for several values of \(s, t\) and \(n\). Why does this property make the eigendecomposition approach so natural?
Exercise 4: Time scaling
For a population that grows exponentially from \(N_0 = 1000\) to \(N_T = 10000\) over \(T = 500\) generations, compute the scaled Moran time \(t\). How does this compare to the constant-size case \(t = 2T/N_0\)?
Solutions
Solution 1: Eigenvalue verification
The theoretical eigenvalues are \(\lambda_j = -j(j-1)/2\) for \(j = 0, 1, \ldots, n\).
import numpy as np
for n in [5, 10, 20]:
Q = moran_rate_matrix(n)
numerical_eigs = np.sort(np.linalg.eigvalsh(Q))
j = np.arange(n + 1)
theoretical_eigs = np.sort(-j * (j - 1) / 2.0)
assert np.allclose(numerical_eigs, theoretical_eigs, atol=1e-10), \
f"Mismatch for n={n}"
print(f"n={n}: eigenvalues match")
print(f" Theoretical: {theoretical_eigs[:6]} ...")
print(f" Numerical: {numerical_eigs[:6]} ...")
For every \(n\), the numerically computed eigenvalues agree with the closed-form formula to machine precision. Note the two zero eigenvalues (\(j = 0\) and \(j = 1\)), corresponding to the absorbing states.
Solution 2: Stationary distribution
The two absorbing states are \(i = 0\) and \(i = n\), since \(q(0, \cdot) = 0\) and \(q(n, \cdot) = 0\) (the departure rates vanish). For large \(t\), all probability mass concentrates on these absorbing states.
The fixation probability starting from \(i\) derived alleles is \(i/n\) by symmetry of the Moran model (the up and down rates are equal).
import numpy as np
n = 10
t_large = 100.0 # large time
P = moran_transition(t_large, n)
# For large t, P[i, :] should have mass only at 0 and n
for i in range(n + 1):
interior_mass = P[i, 1:n].sum()
assert interior_mass < 1e-6, \
f"Interior mass at state {i}: {interior_mass}"
# Fixation probability: P[i, n] should be approximately i/n
fix_prob = P[i, n]
expected_fix = i / n
print(f" i={i}: P(fixation) = {fix_prob:.6f}, "
f"expected = {expected_fix:.6f}")
if 0 < i < n:
assert abs(fix_prob - expected_fix) < 1e-4
This result follows from the martingale property of the Moran model: \(E[X(t) \mid X(0) = i] = i\) for all \(t\), and the only distributions on \(\{0, n\}\) with mean \(i\) assign probability \(i/n\) to \(n\) and \(1 - i/n\) to \(0\).
Solution 3: Chapman-Kolmogorov
The semigroup property \(P(s+t) = P(s) P(t)\) follows directly from the eigendecomposition:
import numpy as np
for n in [5, 10, 15]:
for s, t in [(0.1, 0.2), (0.5, 0.3), (1.0, 1.0), (0.01, 0.99)]:
P_s = moran_transition(s, n)
P_t = moran_transition(t, n)
P_st = moran_transition(s + t, n)
P_product = P_s @ P_t
max_err = np.max(np.abs(P_product - P_st))
assert max_err < 1e-8, \
f"n={n}, s={s}, t={t}: max error = {max_err}"
print(f" n={n}, s={s}, t={t}: max error = {max_err:.2e}")
The eigendecomposition makes this natural because exponentiating diagonal matrices is trivial: \(e^{\Lambda s} e^{\Lambda t} = e^{\Lambda(s+t)}\). The intermediate \(V^{-1} V = I\) cancels. This is precisely why the eigendecomposition is the method of choice: it converts matrix exponentiation (an expensive operation) into scalar exponentiation (trivial), and the semigroup property is automatically satisfied.
Solution 4: Time scaling
For exponential growth from \(N_0 = 1000\) to \(N_T = 10000\) over \(T = 500\) generations, the growth rate is:
The scaled Moran time is:
import numpy as np
N0 = 1000
NT = 10000
T = 500
g = np.log(NT / N0) / T
print(f"Growth rate g = {g:.6f}")
# Scaled time under exponential growth
t_exp = 2.0 * (np.exp(g * T) - 1) / (N0 * g)
print(f"Scaled Moran time (exponential growth): t = {t_exp:.6f}")
# Scaled time under constant size N0
t_const = 2.0 * T / N0
print(f"Scaled Moran time (constant size N0): t = {t_const:.6f}")
# Ratio
print(f"Ratio t_exp / t_const = {t_exp / t_const:.4f}")
The exponential growth case gives \(t_{\text{exp}} \approx 3.91\), while the constant-size case gives \(t_{\text{const}} = 1.0\). The growing population accumulates less drift-scaled time because drift is weaker when the population is large. The ratio \(t_{\text{exp}} / t_{\text{const}} \approx 3.91\) reflects the harmonic-mean-like weighting of population sizes: most of the integral is dominated by the early period (backward in time) when the population is small (\(N_0\)), but the large recent size (\(N_T\)) significantly slows the accumulation of drift during the recent epoch.
Next: Tensor Machinery