Timepiece VII: SINGER
Sampling and Inference of Genealogies with Recombination
The Mechanism at a Glance
SINGER is a Bayesian method for sampling Ancestral Recombination Graphs (ARGs) from their posterior distribution, given observed genetic variation data. It uses an iterative threading algorithm: one haplotype at a time is “threaded” onto a growing partial ARG using Hidden Markov Models.
If PSMC is a two-hand watch (two lineages, one coalescence time), then SINGER is a grand complication – a mechanism of extraordinary complexity that tracks the complete genealogical history of many individuals simultaneously. Where PSMC reads population size from two haplotypes, SINGER reconstructs the full ancestral recombination graph: every coalescence event, every recombination, every marginal tree, for as many samples as you can provide.
Primary Reference
The four gears of SINGER:
Branch Sampling (the first gear train) – An HMM that determines which branch each new lineage joins at each genomic position. This solves the topological question: where in the existing tree does the new haplotype attach?
Time Sampling (the second gear train) – A second HMM that determines when (at what time) the lineage joins, conditioned on the branch choice. This uses the PSMC transition density from Timepiece I – the simpler mechanism reappears as a component in the more complex one.
ARG Rescaling (the regulator) – A post-processing step that adjusts coalescence times to better match the mutation clock, like a watchmaker calibrating the beat rate against a reference frequency.
SGPR (the winding mechanism) – Sub-Graph Pruning and Re-grafting: the MCMC update mechanism that explores the space of ARGs by removing and re-threading subsets of the genealogy.
These gears mesh together into a complete MCMC sampler:
Initialize ARG by threading haplotypes 1, 2, ..., n
|
v
+---> Pick a sub-graph to prune (SGPR)
| |
| v
| Re-thread using Branch + Time Sampling
| |
| v
| Accept/reject (Metropolis-Hastings)
| |
| v
| Rescale the ARG
| |
+--------------+
(repeat)
Prerequisites for this Timepiece
SINGER draws on all the prerequisite chapters and builds on earlier Timepieces:
Coalescent Theory – coalescence times and rates
Ancestral Recombination Graphs – the data structure SINGER infers
Hidden Markov Models – forward algorithm, stochastic traceback, Li-Stephens trick
The SMC – the Markov approximation enabling HMM inference
PSMC – the transition density reused in time sampling
If you’ve built those earlier mechanisms, you have every tool you need. SINGER is where all the gears finally mesh together into the most complex Timepiece in our collection.
Chapters
- Overview of SINGER
- Branch Sampling
- Step 1: Joining Probability for a Branch
- Step 2: The Deterministic Approximation
- Step 3: Representative Joining Time
- Step 4: Emission Probabilities
- Step 5: Transition Probabilities (New Recombination)
- Step 6: Partial Branch States
- Step 7: Putting It All Together – The Branch Sampling HMM
- Step 8: Stochastic Traceback
- Exercises
- Solutions
- Time Sampling
- The Setup
- Step 1: Discretizing the Time Interval
- Step 2: The PSMC Transition Density
- Step 3: Transition Probabilities Between Sub-Intervals
- Step 4: The Linearization Trick
- Step 5: Type B and Type C Transitions
- Step 6: The Complete Time Sampling Algorithm
- Step 7: Inference of Recombination Times
- Exercises
- Solutions
- ARG Rescaling
- Sub-Graph Pruning and Re-grafting (SGPR)
- Demo: Running SINGER on Simulated Data