.. _escapement_complication:

=========================================
Complication II: Escapement
=========================================

   *Simulation-Free Deep Coalescent Inference via Variational Genealogies*

The Mechanism at a Glance
==========================

Mainspring learns to invert simulations: train on millions of msprime outputs, then
hope real data looks like the training distribution. This is **amortized inference** --
fast at test time, but fundamentally limited by the simulation fidelity gap. If the
real biological process differs from the training simulations (and it always does), the
network fails silently.

Escapement takes the opposite approach: **no simulations at all**. Instead of learning a
simulator-to-inference mapping, it uses the coalescent likelihood itself -- the same
equations derived in every Timepiece -- as a differentiable loss function. The network
trains directly on the observed data.

Every Timepiece in this book derives two things:

1. **A prior**: :math:`P(\text{genealogy} \mid N_e)` from coalescent theory
2. **A likelihood**: :math:`P(\text{data} \mid \text{genealogy}, \mu)` from the mutation model

These are analytical. You don't need to simulate -- you can evaluate them in closed form
for any proposed genealogy. The intractable part is the posterior:

.. math::

   P(\text{genealogy} \mid \text{data}, N_e, \mu)
   \propto P(\text{data} \mid \text{genealogy}, \mu) \cdot P(\text{genealogy} \mid N_e)

The space of genealogies is combinatorial and enormous. Classical methods handle this
differently: PSMC discretizes and uses an HMM. ARGweaver and SINGER sample with MCMC.
tsinfer finds a point estimate via heuristics. tsdate uses variational approximation
with hand-derived updates.

Escapement introduces a new option: **learn the variational posterior with a neural
network**, optimized against the true coalescent likelihood.

.. admonition:: The name

   The escapement is the only part of a mechanical watch that needs no external
   calibration -- it generates its own rhythm from first principles. The geometry of
   the escape wheel and pallet fork, combined with the physics of the balance spring,
   produces a precise oscillation without reference to any external standard.
   Similarly, Escapement generates its inference from the mathematical principles of
   the coalescent, without reference to any external simulation standard.

The four modules of Escapement:

1. **The Genealogy Encoder** (the escape wheel) -- A Transformer that processes the
   genotype matrix, producing per-sample, per-position latent vectors. The same
   architecture as Mainspring's encoder, but optimized against a different loss.

2. **The Variational Tree Posterior** (the pallet fork) -- Produces a distribution
   over tree sequences: soft parent assignments via Gumbel-softmax (topology),
   reparameterized gamma distributions (branch lengths, from tsdate/Gamma-SMC),
   and Bernoulli breakpoints (from the SMC recombination model).

3. **The Differentiable Likelihood** (the balance spring) -- Pure math, no neural
   networks. Given a sampled genealogy, computes the Poisson mutation likelihood
   (from tsdate), the Kingman coalescent prior (from msprime/PSMC), and the
   entropy of the variational posterior. These three terms form the ELBO.

4. **Demographic Inference** (the regulator) -- :math:`N_e(t)` parameterized as a
   neural spline, Gaussian process, or piecewise-constant function. Optimized
   jointly with the variational posterior by maximizing the ELBO.

.. code-block:: text

   Observed genotype matrix D ∈ {0,1}^{n × L}
                      |
                      v
            +--------------------------+
            |  GENEALOGY ENCODER       |
            |  Transformer over        |
            |  genomic windows         |
            +--------------------------+
                      |
                      v
            +--------------------------+
            |  VARIATIONAL POSTERIOR   |
            |  q(τ | D, φ)            |
            |                          |
            |  Topology: Gumbel-softmax|
            |  Times: Gamma(α, β)      |
            |  Breaks: Bernoulli(σ)    |
            +--------------------------+
                      |
                      v (sample τ ~ q)
            +--------------------------+
            |  DIFFERENTIABLE          |
            |  LIKELIHOOD              |
            |  (pure math, no NN)      |
            |                          |
            |  log P(D | τ, μ)         |
            |  + log P(τ | N_e, ρ)     |
            |  + H[q]                  |
            |  = ELBO                  |
            +--------------------------+
                      |
                      v (maximize ELBO)
            +--------------------------+
            |  DEMOGRAPHIC INFERENCE   |
            |  N_e(t): neural spline   |
            |  or GP in log-space      |
            +--------------------------+
                      |
                      v
            Posterior over genealogies
            + N_e(t) with uncertainty

.. admonition:: Prerequisites for this Complication

   Escapement synthesizes ideas from many Timepieces. Before starting, you should
   have worked through:

   - :ref:`PSMC <psmc_timepiece>` -- the SMC factorization that makes the prior tractable
   - :ref:`tsdate <tsdate_timepiece>` -- variational gamma posteriors for coalescence times
   - :ref:`tsinfer <tsinfer_timepiece>` -- attention as the copying model
   - :ref:`Gamma-SMC <gamma_smc_timepiece>` -- continuous time, no grid
   - :ref:`msprime <msprime_timepiece>` -- the coalescent prior
   - :ref:`Probabilistic Inference <probabilistic_inference>` -- variational inference
     and the ELBO

   Familiarity with variational autoencoders and the reparameterization trick is
   assumed.

Chapters
========

.. toctree::
   :maxdepth: 2

   overview
   variational_inference
   architecture
   differentiable_likelihood
   training
   comparison
