Metadata-Version: 2.4
Name: emu-gmm
Version: 0.3.1
Summary: Measure-theoretic Generalized Method of Moments estimation; estimation via E_mu.
License: CC-BY-NC-SA-4.0
License-File: LICENSE.org
Author: Ethan Ligon
Author-email: ligon@berkeley.edu
Requires-Python: >=3.11,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering
Requires-Dist: haliax (>=1.4.dev452,<1.5)
Requires-Dist: jax (>=0.4.25)
Requires-Dist: jax_dataclasses (>=1.6.1)
Requires-Dist: jaxtyping (>=0.2.30)
Requires-Dist: numpy (>=1.26)
Requires-Dist: optimistix (>=0.0.7)
Requires-Dist: pandas (>=2.0)
Requires-Dist: scipy (>=1.11)
Description-Content-Type: text/plain

#+TITLE: emu-gmm
#+SUBTITLE: Measure-theoretic GMM; estimation via $\mathbb{E}_\mu$
#+AUTHOR: Ethan Ligon
#+OPTIONS: toc:nil num:nil

=emu-gmm= is a JAX-native framework for Generalized Method of Moments estimation. The framework is named for the operator at its centre: =emu= reads as $\mathbb{E}_\mu$, the expectation under a measure $\mu$.  One operator interface --- implemented against empirical, analytical, or synthetic measures --- drives sample estimation, identification analysis, and simulation-based inference through a single computational pipeline.

The most important architectural commitment is that variance construction is orthogonal to integration: $\mathbb{E}_\mu[\psi]$ is a property of the measure; $V_\mu(\theta)$ is a property of a separate =CovarianceStrategy= that can be swapped (iid vs cluster-robust vs replicate-weight) without changing what is being integrated.

* Status

v2 (v0.2.0): the v1 pipeline plus Riemannian-manifold parameter geometry and the stratified / design-aware covariance ladder. The implemented menu --- everything importable and composable today --- is tabulated in =docs/design.org= under "Contract surface (implemented)", the single source of truth for status claims; the green gate is =make check= (ruff + black + mypy + the full pytest suite). The three measure paths (synthetic, analytical, empirical) all run end-to-end against the bundled multi-asset Euler example, and the empirical path additionally has a real-data acceptance test (=tests/test_estimator_realdata.py=). The architecture and theoretical scope are pinned in =docs/=.

* Quickstart

Bootstrap and try the worked example:

#+begin_src shell
make setup                                 # poetry install + .venv/
poetry run python examples/run_euler.py    # multi-asset Hansen-Singleton demo
#+end_src

The minimal estimation surface in code (synthetic-data variant):

#+begin_src python
import jax
from emu_gmm import estimate, SyntheticMeasure, SyntheticCovariance
from emu_gmm.examples.euler import (
    EulerParams, euler_residual, euler_sampler_factory,
)

measure = SyntheticMeasure(
    key=jax.random.PRNGKey(0),
    n_sim=5000,
    sampler=euler_sampler_factory(5000),
)

result = estimate(
    model=euler_residual,
    measure=measure,
    covariance=SyntheticCovariance(),
    theta_init=EulerParams(beta=0.9, gamma=1.0),
)

print(f"beta = {result.theta_hat.beta:.4f}  (truth 0.96)")
print(f"gamma = {result.theta_hat.gamma:.4f}  (truth 2.00)")
print(f"J-stat = {result.J_stat:.3f}  dof={result.J_dof}  p={result.J_pvalue:.3f}")
print(result.to_pandas()["Sigma_theta"])
#+end_src

The same =euler_residual= drives the analytical and empirical variants of the demo; only the =Measure= / =CovarianceStrategy= change. See =docs/api-sketch.org= Section 5 for the three side-by-side, and =examples/run_euler.py= for the runnable script.

* Documents

- =docs/howto.org= --- *start here.* Integration HOWTO: the architecture
  at the level you need to wire an application to =estimate()= --- what
  code you write (the moment function, the parameter container, the
  measure/covariance choice) and what the framework hands back.
- =docs/design.org= --- architecture specification (four review rounds; stable). Its "Contract surface (implemented)" section is the single source of truth for the implemented menu.
- =docs/api-sketch.org= --- the v1 API surface, retained as the v1 design record; superseded on status by =design.org='s contract-surface section.
- =docs/implementation-plan.org= --- phased task list (Phases 1-7 complete, Phase 8 polish underway; the v2 roadmap is Section 13).
- =docs/mcar-asymptotics.org= --- companion theoretical note; consistency, asymptotic normality, PD properties of the pairwise-overlap estimator under MCAR.
- =docs/refs.bib= --- project-local bibliography (entries not in =~/bibtex/main.bib=).

* Setup

Requires Python >= 3.11 and [[https://python-poetry.org/][Poetry]].

#+begin_src shell
make setup
#+end_src

This runs =poetry install= and creates =.venv/= in the project root. Activate with =direnv allow= or =poetry shell=.

* Development

#+begin_src shell
make check         # ruff + black + mypy + full pytest
make quick-check   # same, skipping slow tests
make test          # pytest only
#+end_src

To install pre-commit hooks (ruff + black on every commit):

#+begin_src shell
poetry run pre-commit install
#+end_src

* Layout

#+begin_example
docs/                      design specs and theoretical notes
src/emu_gmm/
    __init__.py            public API re-exports
    types.py               protocols + EstimationResult / Diagnostics
    estimator.py           estimate() / build_estimator() entry points
    measures/              SyntheticMeasure, AnalyticalMeasure, EmpiricalMeasure
    covariance/            the CovarianceStrategy ladder (iid, clustered,
                           stratified / design-aware, sum, analytical, synthetic)
    weighting.py           Identity, Fixed, IteratedWeighting, ContinuouslyUpdated
    regularization.py      DiagonalTikhonov
    penalty.py             TikhonovPenalty
    optimizer.py           optimistix_lm, scipy_lm, linear_solver
    manifolds/             parameter geometry (Euclidean, Positive, PSDFixedRank,
                           Product, ManifoldLeaf) + riemannian_lm
    parameter_space.py     ParameterSpace / on field-to-manifold declarations
    inference/             j_test, k_statistic, k_confidence_set, bootstraps
    numerics/              ridge_inverse
    studies/               Monte Carlo driver (subpackage-only; not re-exported)
    diagnostics.py         build_diagnostics, log_to_stdout
    examples/              shared example models (euler.py)
    _internal/             axes, params, cholesky, labels (private)
tests/                     test suite mirroring src/emu_gmm/
examples/                  runnable demo scripts
Makefile                   build automation
pyproject.toml             Poetry configuration
poetry.lock                pinned dependencies (committed)
#+end_example

* Public API

Everything user-facing is re-exported at the package top level
(=from emu_gmm import ...=), with one deliberate exception: the Monte
Carlo =studies= subpackage is subpackage-only (import from
=emu_gmm.studies= directly). The implemented menu --- entry points,
measures, covariance strategies, weighting, regularization, penalty,
optimisers, manifold geometry, inference helpers, and result types ---
is tabulated, one line of semantics and a module pointer per row, in
=docs/design.org= under "Contract surface (implemented)". That table is
the single source of truth this README defers to;
=sorted(emu_gmm.__all__)= is the export list it tracks.

* Using emu-gmm correctly (and reporting problems)

=emu-gmm= is intended as a /spare set of correct interfaces/. Your only
modelling input is a per-observation residual =psi(x_i, theta) -> R^M=;
the package owns the moment expectation, the design-aware covariance
=V_X=, the criterion, the \(J\)-statistic, standard errors, and
\(p\)-values. Two rules keep estimation correct:

1. *Read results off the package, never recompute them by hand.* Take the
   criterion, =J_stat=, =standard_errors=, and \(p\)-values from
   =EstimationResult= / =k_statistic= / the inference helpers. A
   hand-rolled criterion is how subtle scaling bugs get reintroduced ---
   notably the textbook habit of scaling moments by \(\sqrt{N}\) and
   weighting by \(V_X\), which is correct only when every moment has the
   /same/ number of observations. =emu-gmm= keeps moments as
   per-coordinate means and folds all the per-moment \(N_j\) bookkeeping
   into \(V_X\) (the \(1/(N_j N_k)\) normalisation); see =docs/design.org=
   "Scaling convention" and =docs/mcar-asymptotics.org=.

2. *Express per-moment observability only through the =mask=.* Different
   moments may be observed for different numbers of units; the \((N, M)\)
   =mask= on =EmpiricalMeasure= is the single place that lives. Do not
   work around it by pre-aggregating or by assuming a common \(N\).

If an interface is missing a knob, or you believe it gives a wrong
answer: *file an issue against =emu-gmm=, do not reimplement the
statistic in your own project.* A local reimplementation forfeits the
"single correct implementation" guarantee for everyone downstream. The
fix belongs in the shared package.

* License

[[file:LICENSE.org][CC-BY-NC-SA-4.0]].

