The Watchmaker's Guide to Population Genetics
The Workshop
The Watchmaker’s Philosophy
Why Build It Yourself?
The Watchmaker’s Way
The Gears of Understanding
On Mathematical Rigor
On Teaching Probability and Calculus
On Python Implementations
Your Journey
The Workbench (Prerequisites)
The Workbench (Prerequisites)
Likelihood-Based Probabilistic Inference
Why Likelihood?
The Likelihood Function
The Toolkit: Key Distributions
The Exponential Distribution: Coalescence Waiting Times
The Poisson Distribution: Mutations and the SFS
The Gamma Distribution: Ages and Rates
The Gaussian Distribution: Smoothness Priors
Maximum Likelihood Estimation (MLE)
Worked Example: Inferring Population Size from the SFS
Fisher Information and Confidence Intervals
Bayesian Inference
Conjugate Priors: When Bayesian Inference Has Closed-Form Solutions
Composite and Approximate Likelihoods
Worked Example: Composite Likelihood from Two Data Sources
The Other Paradigm: Neural Networks and Amortized Inference
The key idea
What amortized inference does well
What likelihood-based inference does well
Why this book focuses on the likelihood approach
Summary
Coalescent Theory
The Big Idea
The Wright-Fisher Model (Forward in Time)
Going Backwards: The Coalescent
The probability that two specific lineages coalesce in a given generation
Waiting time to coalescence
The Coalescent with
\(n\)
Samples
Expected Number of Lineages at Time
\(t\)
Mutations on the Coalescent Tree
Summary
Ancestral Recombination Graphs
Why Trees Aren’t Enough
What Is Recombination?
What We’ve Established So Far
Recombination in the Coalescent
The Structure of an ARG: A Directed Acyclic Graph
Marginal Trees
The Tree Sequence Representation
Branch Lengths and the ARG
Why ARG Inference Is Hard
Summary
Hidden Markov Models
Why HMMs for ARG Inference?
A Warm-Up Example: Weather and Umbrellas
The Core Idea
Formal Definition
The Forward Algorithm
Scaling for Numerical Stability
Stochastic Traceback (Sampling)
The Li-Stephens Trick: Linear-Time Transitions
The Li-Stephens Transition Structure
The
\(O(K)\)
Forward Step
Summary
The Sequentially Markov Coalescent
The Problem with the Full Coalescent
What Does “Markov” Mean, and Why Does It Matter?
Intuitive Explanation
Formal Definition
Why Markov Matters for Computation
What Makes CwR Non-Markov?
The Mechanism
What Are Ghost Lineages? A Concrete Example
The SMC Approximation
Why Does This Restore the Markov Property?
How Good Is the Approximation?
The SMC Transition Probability
Deriving
\(r_i\)
: The Recombination Probability
Deriving
\(q_j\)
: The Re-joining Weights
PSMC: The Pairwise Case
The Cumulative Distribution Function
Why SMC Enables HMM Inference
Summary
The Diffusion Approximation
The Big Idea
From Wright-Fisher to Continuous Frequency
Mean and variance of
\(\Delta x\)
The diffusion timescale
Code: WF trajectories converging to SDE paths
Stochastic Differential Equations
Euler-Maruyama simulation
From SDEs to PDEs: The Fokker-Planck Equation
The two terms: diffusion and advection
Boundary Conditions
Absorbing boundaries
Why
\(x(1-x)\)
vanishes at boundaries
The flux condition
Reflecting boundaries and mutation
Stationary Distributions
The neutral case
With mutation: the Beta distribution
With selection: exponential tilting
Numerical Solutions: Finite Differences for PDEs
Discretizing
\(x\)
on a grid
Finite-difference approximations
The method of lines
Crank-Nicolson time stepping
The curse of dimensionality
Code: 1D diffusion solver
Connection to the Site Frequency Spectrum
The binomial bridge
How dadi and moments differ
Summary
Ordinary Differential Equations
The Big Idea
What Is an ODE?
Euler’s Method
The Runge-Kutta Family
RK2: The Midpoint Method
RK4: The Classic Method
RK45: Adaptive Step Size (Dormand-Prince)
Systems of Coupled ODEs
Stiffness and Implicit Methods
The Matrix Exponential
Summary
Markov Chain Monte Carlo
The Big Idea: Why Sample?
Bayesian Inference in 60 Seconds
Markov Chains
Stationary Distribution
The Metropolis-Hastings Algorithm
Gibbs Sampling
Convergence Diagnostics
Practical Considerations
Proposal Tuning
Data-Informed Proposals
Parallel Tempering
When MCMC Is Not Enough
MCMC in Population Genetics: Three Applications
ARGweaver: Gibbs Sampling over ARGs
SINGER: MH with Data-Informed Proposals
PHLASH: Beyond MCMC
Summary
Timepieces
Timepieces
Verification Status
Timepiece I: PSMC
The Mechanism at a Glance
Why Just Two Sequences?
Chapters
Timepiece II: SMC++
The Mechanism at a Glance
Chapters
Timepiece III: The Li & Stephens HMM
The Mechanism at a Glance
Chapters
Timepiece IV: msprime
The Mechanism at a Glance
Chapters
Timepiece V: ARGweaver
The Mechanism at a Glance
Chapters
Timepiece VI: tsinfer
The Mechanism at a Glance
Chapters
Timepiece VII: SINGER
The Mechanism at a Glance
Chapters
Timepiece VII: Threads
The Mechanism at a Glance
Chapters
Timepiece IX: tsdate
The Mechanism at a Glance
Where tsinfer Ends and tsdate Begins
Chapters
Timepiece X: moments
The Mechanism at a Glance
Chapters
Timepiece XI: dadi
The Mechanism at a Glance
dadi vs. moments
Chapters
Timepiece XII: momi2
The Mechanism at a Glance
Chapters
Timepiece XIII: Gamma-SMC
The Mechanism at a Glance
PSMC vs. Gamma-SMC
Chapters
Timepiece XIV: PHLASH
The Mechanism at a Glance
Chapters
Timepiece XV: CLUES
The Mechanism at a Glance
Why Detect Selection?
Chapters
Timepiece XVI: SLiM
The Mechanism at a Glance
Chapters
Timepiece XVII: Relate
The Mechanism at a Glance
Where tsinfer and SINGER End and Relate Begins
Chapters
Timepiece XVIII: discoal
The Mechanism at a Glance
Chapters
The Watchmaker's Guide to Population Genetics
Index
Index