crabbymetrics
  • Home
  • API
  • Binding Crash Course
  • Regression And GLMs
    • OLS
    • Ridge
    • Fixed Effects OLS
    • ElasticNet
    • Logit
    • Multinomial Logit
    • Poisson
    • GMM
    • FTRL
    • MEstimator Poisson
  • Causal Inference
    • Balancing Weights
    • EPLM
    • Average Derivative
    • Double ML And AIPW
    • Richer Regression
    • TwoSLS
    • Synthetic Control
    • Synthetic DID
    • Horizontal Panel Ridge
    • Matrix Completion
    • Interactive Fixed Effects
    • Staggered Panel Event Study
  • Transforms
    • PCA And Kernel Basis
  • Ablations
    • Variance Estimators
    • Semiparametric Estimator Comparisons
    • Bridging Finite And Superpopulation
    • Panel Estimator DGP Comparisons
    • Same Root Panel Case Studies
  • Optimization
    • Optimizers
    • GMM With Optimizers
  • Ding: First Course
    • Overview And TOC
    • Ch 1 Correlation And Simpson
    • Ch 2 Potential Outcomes
    • Ch 3 CRE And Fisher RT
    • Ch 4 CRE And Neyman
    • Ch 9 Bridging Finite And Superpopulation
    • Ch 11 Propensity Score
    • Ch 12 Double Robust ATE
    • Ch 13 Double Robust ATT
    • Ch 21 Experimental IV
    • Ch 23 Econometric IV
    • Ch 27 Mediation

On this page

  • 1 Overview
  • 2 Sitrep
  • 3 Public Surface
  • 4 Optimizer Surface
  • 5 Conventions
  • 6 Estimator Reference
  • 7 Transformer Reference
  • 8 Optimizer Reference
  • 9 Verified Runtime Snapshot
  • 10 Usage Examples
    • 10.1 OLS
    • 10.2 Ridge
    • 10.3 Poisson
    • 10.4 TwoSLS
    • 10.5 EPLM
    • 10.6 AverageDerivative
    • 10.7 PartiallyLinearDML
    • 10.8 AIPW
    • 10.9 GMM
    • 10.10 FixedEffectsOLS
    • 10.11 MEstimator
  • 11 Current Sharp Edges

crabbymetrics API Reference

Library sitrep and verified Python surface

1 Overview

crabbymetrics is a compact Rust-backed econometrics library exposed to Python through pyo3 and packaged with maturin. The public API is intentionally small: estimator classes, transformer classes, an Optimizers helper namespace, numpy arrays as the only runtime dependency, and a scikit-adjacent fit / predict / summary pattern.

The page below combines two things:

  • A sitrep on the current state of the library.
  • A verified API reference rendered against the live Python module in this repository.
Show code
import inspect
from html import escape

import numpy as np
import crabbymetrics as cm
from IPython.display import HTML, display


def html_table(headers, rows, table_attrs=""):
    parts = [
        f"<table {table_attrs}>".strip(),
        "<thead>",
        "<tr>",
        *[f"<th>{escape(str(header))}</th>" for header in headers],
        "</tr>",
        "</thead>",
        "<tbody>",
    ]
    for row in rows:
        parts.append("<tr>")
        for cell in row:
            parts.append(f"<td>{cell}</td>")
        parts.append("</tr>")
    parts.extend(["</tbody>", "</table>"])
    return "".join(parts)

2 Sitrep

  • src/lib.rs registers the Python classes, while estimator implementations are split by family under src/estimators/.
  • Shared linear algebra, covariance, bootstrap, and NumPy conversion helpers live in src/utils.rs.
  • Packaging is lean. The Python package declares only numpy at runtime, while native builds are handled by maturin.
  • Release automation already exists in .github/workflows/wheels.yml for Linux and macOS wheels across Python 3.10 to 3.14.
  • Coverage is broad for a small codebase: OLS, Ridge, fixed-effects OLS, horizontal panel ridge, synthetic control, synthetic DID, matrix completion, balancing weights, EPLM, average-derivative estimators, partially linear DML, AIPW, ElasticNet, binary and multinomial logit, Poisson, TwoSLS, FTRL, a first-class callback-driven GMM, a lower-level MEstimator, plus PCA and KernelBasis as design-matrix preprocessors.
  • The module also exposes an Optimizers class with static methods for LBFGS, BFGS, nonlinear conjugate gradient, Gauss-Newton least squares, and simulated annealing.
  • The main maturity gaps are still API breadth and test depth. The public contract is now documented, but the estimator surface is intentionally narrow.
  • The public API is consistent, but not fully uniform. summary() always returns a plain Python dict, yet the exact keys vary by estimator.
  • TwoSLS now uses the closed-form linear IV / 2SLS estimator and accepts matrix-valued endogenous regressors and excluded instruments.
  • GMM now sits between the built-in estimators and MEstimator: users supply per-observation moments, optionally a sample Jacobian, and the class handles just-identified versus two-step fitting plus summary-time covariance choices.
  • MEstimator is still the lowest-level likelihood-style interface. Users are responsible for gradients, per-observation score matrices, and numerical safeguards inside their objective functions.

3 Public Surface

The table below is generated from the live module so the constructor and method signatures match the current build.

Show code
classes = [
    cm.OLS,
    cm.Ridge,
    cm.FixedEffectsOLS,
    cm.HorizontalPanelRidge,
    cm.SyntheticControl,
    cm.SyntheticDID,
    cm.MatrixCompletion,
    cm.InteractiveFixedEffects,
    cm.BalancingWeights,
    cm.EPLM,
    cm.AverageDerivative,
    cm.PartiallyLinearDML,
    cm.AIPW,
    cm.ElasticNet,
    cm.Logit,
    cm.MultinomialLogit,
    cm.Poisson,
    cm.TwoSLS,
    cm.GMM,
    cm.FTRL,
    cm.MEstimator,
    cm.PCA,
    cm.KernelBasis,
    cm.Optimizers,
]

rows = []
for cls in classes:
    methods = []
    for method_name in sorted(name for name in dir(cls) if not name.startswith("_")):
        method = getattr(cls, method_name)
        if not callable(method):
            continue
        methods.append(
            f"<code>{escape(method_name)}{escape(str(inspect.signature(method)))}</code>"
        )
    rows.append(
        [
            f"<code>{escape(cls.__name__)}{escape(str(inspect.signature(cls)))}</code>",
            "<br>".join(methods),
        ]
    )

display(HTML(html_table(["Constructor", "Methods"], rows)))
Constructor Methods
OLS() bootstrap(self, /, n_bootstrap, seed=None)
fit(self, /, x, y)
fit_weighted(self, /, x, y, sample_weight)
predict(self, /, x)
summary(self, /, vcov='hc1', lags=None, clusters=None)
Ridge(penalty=None, cv=5) bootstrap(self, /, n_bootstrap, seed=None)
fit(self, /, x, y)
fit_weighted(self, /, x, y, sample_weight)
predict(self, /, x)
summary(self, /, vcov='hc1', lags=None, clusters=None)
FixedEffectsOLS() bootstrap(self, /, n_bootstrap, seed=None)
fit(self, /, x, fe, y)
fit_weighted(self, /, x, fe, y, sample_weight)
summary(self, /, vcov='hc1', lags=None, clusters=None)
HorizontalPanelRidge(penalty=1.0) fit(self, /, y, w)
predict(self, /)
summary(self, /)
treatment_effect(self, /)
SyntheticControl(max_iterations=500) bootstrap(self, /, n_bootstrap, seed=None)
fit(self, /, donors, treated)
predict(self, /, donors)
summary(self, /)
SyntheticDID(zeta_omega=None, zeta_lambda=None, max_iterations=1000) fit(self, /, y, w)
predict(self, /)
summary(self, /)
treatment_effect(self, /)
MatrixCompletion(lambda_l=None, lambda_fraction=0.25, fit_unit_effects=True, fit_time_effects=True, max_iterations=500, effect_iterations=2, tolerance=1e-06) fit(self, /, y, w)
predict(self, /)
summary(self, /)
InteractiveFixedEffects(rank=0, force=3) fit(self, /, y)
predict(self, /)
summary(self, /)
BalancingWeights(objective='quadratic', solver='auto', autoscale=False, min_weight=0.0, max_weight=1.0, l2_norm=0.0, max_iterations=200, tolerance=1e-08) fit(self, /, covariates, target_covariates, baseline_weights=None, target_weights=None)
get_weights(self, /)
summary(self, /)
EPLM(fd_eps=1e-06) fit(self, /, y, d, w)
summary(self, /, vcov=None, lags=None, clusters=None)
AverageDerivative(method='dr', fd_eps=1e-06) fit(self, /, y, d, w)
summary(self, /, vcov=None, lags=None, clusters=None)
PartiallyLinearDML(penalty=None, cv=5, n_folds=5, seed=42) fit(self, /, y, d, x)
summary(self, /, vcov=None, lags=None, clusters=None)
AIPW(penalty=None, cv=5, n_folds=5, propensity_clip=0.02, seed=42) fit(self, /, y, d, x)
summary(self, /, vcov=None, lags=None, clusters=None)
ElasticNet(penalty=1.0, l1_ratio=0.5, tolerance=0.0001, max_iterations=1000) bootstrap(self, /, n_bootstrap, seed=None)
fit(self, /, x, y)
predict(self, /, x)
summary(self, /)
Logit(alpha=1.0, max_iterations=100, gradient_tolerance=0.0001) bootstrap(self, /, n_bootstrap, seed=None)
fit(self, /, x, y)
predict(self, /, x)
summary(self, /)
MultinomialLogit(alpha=1.0, max_iterations=100, gradient_tolerance=0.0001) bootstrap(self, /, n_bootstrap, seed=None)
fit(self, /, x, y)
predict(self, /, x)
summary(self, /)
Poisson(alpha=0.0, max_iterations=100, tolerance=0.0001) bootstrap(self, /, n_bootstrap, seed=None)
fit(self, /, x, y)
predict(self, /, x)
summary(self, /, vcov='vanilla')
TwoSLS() bootstrap(self, /, n_bootstrap, seed=None)
fit(self, /, x_endog, x_exog, z, y)
fit_weighted(self, /, x_endog, x_exog, z, y, sample_weight)
predict(self, /, x)
summary(self, /, vcov='hc1', lags=None, clusters=None)
GMM(moment_fn, jacobian_fn=None, max_iterations=100, tolerance=1e-06, ridge=1e-08, fd_eps=1e-06) fit(self, /, data, theta0, weighting='auto')
summary(self, /, vcov='sandwich', omega='iid', lags=None, clusters=None)
FTRL(alpha=0.1, beta=1.0, l1_ratio=1.0, l2_ratio=1.0) bootstrap(self, /, n_bootstrap, seed=None)
fit(self, /, x, y)
predict(self, /, x)
summary(self, /)
MEstimator(objective_fn, score_fn, max_iterations=100, tolerance=1e-06) bootstrap(self, /, n_bootstrap, seed=None)
compute_vcov(self, /)
fit(self, /, data, theta0)
summary(self, /)
PCA(n_components, whiten=False) fit(self, /, x)
fit_transform(self, /, x)
inverse_transform(self, /, scores)
summary(self, /)
transform(self, /, x)
KernelBasis(kernel='gaussian', bandwidth=0.5, coef0=1.0, degree=2.0) fit(self, /, x)
fit_transform(self, /, x)
summary(self, /)
transform(self, /, x)
Optimizers() minimize_bfgs(fun, x0, grad, max_iterations=100, tolerance=1e-06)
minimize_gauss_newton_ls(residual_fn, x0, jacobian_fn, max_iterations=100, tolerance=1e-06)
minimize_lbfgs(fun, x0, grad, max_iterations=100, tolerance=1e-06)
minimize_nonlinear_cg(fun, x0, grad, max_iterations=100, restart_iters=10, restart_orthogonality=0.1, tolerance=1e-06)
minimize_simulated_annealing(fun, x0, lower=None, upper=None, temp=15.0, step_size=0.1, max_iterations=5000, seed=None)

4 Optimizer Surface

The optimizer helpers live under cm.Optimizers as static methods. They mirror the usual scipy pattern: pass a Python callback, a starting value, and any derivative information the method needs, then receive a plain dictionary with x, fun, nit, success, message, and method.

Show code
optimizer_rows = []
for name in sorted(n for n in dir(cm.Optimizers) if n.startswith("minimize_")):
    fn = getattr(cm.Optimizers, name)
    optimizer_rows.append(
        [f"<code>Optimizers.{escape(name)}{escape(str(inspect.signature(fn)))}</code>"]
    )

display(HTML(html_table(["Function"], optimizer_rows)))
Function
Optimizers.minimize_bfgs(fun, x0, grad, max_iterations=100, tolerance=1e-06)
Optimizers.minimize_gauss_newton_ls(residual_fn, x0, jacobian_fn, max_iterations=100, tolerance=1e-06)
Optimizers.minimize_lbfgs(fun, x0, grad, max_iterations=100, tolerance=1e-06)
Optimizers.minimize_nonlinear_cg(fun, x0, grad, max_iterations=100, restart_iters=10, restart_orthogonality=0.1, tolerance=1e-06)
Optimizers.minimize_simulated_annealing(fun, x0, lower=None, upper=None, temp=15.0, step_size=0.1, max_iterations=5000, seed=None)

5 Conventions

  • Input features are always 2D NumPy arrays.
  • Regression targets use float64 arrays.
  • PCA and KernelBasis fit on x alone and return transformed feature matrices that can be passed into the regression estimators.
  • The Optimizers methods return plain dictionaries rather than scipy result objects, but the schema is intentionally similar.
  • FixedEffectsOLS.fit(x, fe, y) expects fe to be a 2D uint32 matrix of zero-based category codes, one column per factor.
  • Panel causal estimators use fit(Y, W) as the paved path. Y is a balanced (n_units, n_periods) outcome matrix and W is a same-shaped binary absorbing treatment matrix; estimators infer treated units, first-treatment cohorts, never-treated donors, pre/post windows, and event-time summaries internally.
  • HorizontalPanelRidge.fit(Y, W) uses never-treated donor paths as contemporaneous features, trains cohort-specific ridge forecasts on pre-treatment periods, and predicts treated-unit counterfactual paths.
  • SyntheticDID.fit(Y, W) fits cohort-specific donor and pre-period weights under the same panel contract and stores the full counterfactual/effect matrices.
  • MatrixCompletion.fit(Y, W) treats untreated cells (W == 0) as observed entries, completes treated cells as counterfactuals, and stores the resulting ATT, treatment effects, cohort-level group means, weighted/unweighted group means, and event-study summaries.
  • SyntheticControl.fit(donors, treated) remains a lower-level simplex-weight API for a single treated path, with donor outcomes in an (n_periods, n_donors) matrix and a treated pre-period outcome vector.
  • InteractiveFixedEffects.fit(y) remains the no-covariate fect principal-components factor-model helper for balanced panels; panel_factor() and panel_fe() expose the lower-level fect SVD helpers.
  • BalancingWeights.fit(covariates, target_covariates, baseline_weights=None, target_weights=None) is the public low-level calibration-weights API, but not the paved path for panel causal estimators.
  • EPLM.fit(y, d, w) and AverageDerivative.fit(y, d, w) currently target a scalar continuous treatment and a 2D control matrix.
  • PartiallyLinearDML.fit(y, d, x) targets a scalar continuous treatment with cross-fit ridge nuisances.
  • AIPW.fit(y, d, x) targets a binary treatment coded as 0/1 and estimates the ATE with cross-fit ridge nuisance models.
  • Logit, MultinomialLogit, and FTRL expect integer class labels at fit time.
  • Most estimators store their training data internally so summary() and bootstrap() can be called after fit().
  • OLS, Ridge, ElasticNet, Logit, MultinomialLogit, Poisson, and TwoSLS now always fit an intercept.
  • FixedEffectsOLS, EPLM, AverageDerivative, PartiallyLinearDML, AIPW, GMM, and MEstimator deliberately have no predict() method. They are estimation-first interfaces.
  • When an estimator has an intercept, bootstrap draws place it in the first column.
  • summary() returns dictionaries, not rich result objects. Downstream code should index by key.
  • bootstrap(n_bootstrap, seed=None) now accepts an omitted seed across the full public surface.
  • OLS, Ridge, FixedEffectsOLS, and TwoSLS now share the same robust-inference surface: summary(vcov="vanilla" | "hc1" | "newey_west" | "cluster", lags=None, clusters=None).

6 Estimator Reference

Show code
reference_rows = [
    [
        "<code>OLS</code>",
        "<code>fit(x, y)</code>",
        "1D float predictions",
        "<code>intercept</code>, <code>coef</code>, <code>intercept_se</code>, <code>coef_se</code>",
        "<code>(B, p + 1)</code> with intercept, else <code>(B, p)</code>",
        "Uses Linfa linear regression; <code>summary(vcov='hc1')</code> is default, and the linear-summary surface also accepts <code>'vanilla'</code>, <code>'newey_west'</code>, and <code>'cluster'</code>.",
    ],
    [
        "<code>Ridge</code>",
        "<code>fit(x, y)</code>",
        "1D float predictions",
        "<code>intercept</code>, <code>coef</code>, <code>intercept_se</code>, <code>coef_se</code>, <code>penalty</code>, <code>penalties</code>, <code>best_penalty_index</code>, <code>cv_mse</code>, <code>intercept_path</code>, <code>coef_path</code>",
        "<code>(B, p + 1)</code> with intercept",
        "Closed-form L2-regularized least squares solved through an augmented QR least-squares system. Passing a penalty grid runs cross-validation, stores the full coefficient path, refits the selected penalty on the full sample, and exposes the same linear-summary covariance options as OLS.",
    ],
    [
        "<code>FixedEffectsOLS</code>",
        "<code>fit(x, fe_uint32, y)</code>",
        "No dedicated <code>predict()</code> method",
        "<code>coef</code>, <code>coef_se</code>",
        "<code>(B, p)</code>",
        "Uses <code>within</code> to partial out fixed effects, then Linfa OLS with <code>fit_intercept=False</code> on the residualized design. The summary covariance options match the other linear estimators.",
    ],
    [
        "<code>HorizontalPanelRidge</code>",
        "<code>fit(Y, W)</code>",
        "2D treated-unit counterfactual matrix",
        "<code>att</code>, <code>counterfactual</code>, <code>treatment_effect</code>, <code>event_study</code>, <code>group_means</code>, <code>cohort_intercepts</code>, <code>cohort_coef</code>, <code>pre_rmse</code>, <code>penalty</code>, <code>control_units</code>, <code>treated_units</code>, <code>cohorts</code>",
        "No bootstrap method",
        "Horizontal panel forecasting estimator. It infers absorbing treatment cohorts from <code>W</code>, trains cohort-specific ridge forecasts from never-treated donor paths, and stores both low-level coefficients and high-level causal summaries.",
    ],
    [
        "<code>SyntheticControl</code>",
        "<code>fit(donors, treated)</code>",
        "1D float synthetic path from donor outcomes",
        "<code>weights</code>, <code>pre_rmse</code>",
        "<code>(B, n_donors)</code>",
        "Fits nonnegative donor weights that sum to one by minimizing pre-treatment squared error under a simplex constraint.",
    ],
    [
        "<code>SyntheticDID</code>",
        "<code>fit(Y, W)</code>",
        "2D treated-unit counterfactual matrix",
        "<code>att</code>, <code>unit_weights</code>, <code>time_weights</code>, <code>counterfactual</code>, <code>synthetic_outcome</code>, <code>treatment_effect</code>, <code>event_study</code>, <code>group_means</code>, <code>pre_rmse</code>, <code>unit_intercept</code>, <code>time_intercept</code>, <code>zeta_omega</code>, <code>zeta_lambda</code>, <code>control_units</code>, <code>treated_units</code>, <code>cohorts</code>",
        "No bootstrap method",
        "Synthetic difference-in-differences for balanced outcome panels. It infers cohorts from <code>W</code>, fits cohort-specific simplex unit/time weights, and keeps weights available while exposing event-study and group-mean summaries as the modal output.",
    ],
    [
        "<code>MatrixCompletion</code>",
        "<code>fit(Y, W)</code>",
        "2D completed/counterfactual panel",
        "<code>att</code>, <code>completed</code>, <code>counterfactual</code>, <code>treatment_effect</code>, <code>event_study</code>, <code>group_means</code>, <code>low_rank</code>, <code>unit_effects</code>, <code>time_effects</code>, <code>singular_values</code>, <code>lambda_l</code>, <code>objective</code>, <code>iterations</code>, <code>history_objective</code>, <code>history_rmse</code>",
        "No bootstrap method",
        "MCPanel-style nuclear-norm matrix completion using untreated cells from <code>W</code> as observed entries; treated cells are completed as counterfactuals and summarized through the common panel causal outputs.",
    ],
    [
        "<code>BalancingWeights</code>",
        "<code>fit(covariates, target_covariates, baseline_weights=None, target_weights=None)</code>",
        "No dedicated <code>predict()</code> method",
        "<code>weights</code>, <code>beta</code>, <code>weighted_mean</code>, <code>target_mean</code>, <code>mean_diff</code>, <code>l2_diff</code>, <code>effective_sample_size</code>, <code>success</code>",
        "No bootstrap method",
        "Dual calibration solver for entropy and quadratic balancing weights. Intended for ATT-style reweighting, covariate shift, and domain adaptation tasks where one sample is reweighted to match another sample's covariate mean.",
    ],
    [
        "<code>EPLM</code>",
        "<code>fit(y, d, w)</code>",
        "No dedicated <code>predict()</code> method",
        "<code>coef</code>, <code>se</code>, <code>vcov</code>, <code>nuisance_coef</code>",
        "No bootstrap method",
        "Robins-Newey style partially linear E-estimator implemented as an exactly identified stacked-moment system with a linear working model for <code>E[D|W]</code>.",
    ],
    [
        "<code>AverageDerivative</code>",
        "<code>fit(y, d, w)</code>",
        "No dedicated <code>predict()</code> method",
        "<code>method</code>, <code>coef</code>, <code>se</code>, <code>vcov</code>",
        "No bootstrap method",
        "Graham-Pinto style average-derivative estimator with <code>method='ob' | 'ipw' | 'dr'</code> under a scalar continuous-treatment working model.",
    ],
    [
        "<code>PartiallyLinearDML</code>",
        "<code>fit(y, d, x)</code>",
        "No dedicated <code>predict()</code> method",
        "<code>coef</code>, <code>se</code>, <code>vcov</code>, <code>outcome_penalties</code>, <code>treatment_penalties</code>",
        "No bootstrap method",
        "Cross-fit ridge implementation of the partially linear Double-ML score. The nuisance fits are selected fold by fold from a common ridge penalty grid.",
    ],
    [
        "<code>AIPW</code>",
        "<code>fit(y, d, x)</code>",
        "No dedicated <code>predict()</code> method",
        "<code>ate</code>, <code>se</code>, <code>vcov</code>, <code>outcome0_penalties</code>, <code>outcome1_penalties</code>, <code>propensity_penalties</code>",
        "No bootstrap method",
        "Cross-fit ridge AIPW estimator for a binary-treatment ATE. Outcome regressions are fit separately by treatment arm and the ridge propensity fit is clipped away from 0 and 1.",
    ],
    [
        "<code>ElasticNet</code>",
        "<code>fit(x, y)</code>",
        "1D float predictions",
        "<code>intercept</code>, <code>coef</code>, <code>intercept_se</code>, <code>coef_se</code>",
        "<code>(B, p + 1)</code> with intercept, else <code>(B, p)</code>",
        "Regularized point estimates with HC1-style covariance built from residuals.",
    ],
    [
        "<code>Logit</code>",
        "<code>fit(x, y_int32)</code>",
        "1D <code>int32</code> class labels",
        "<code>intercept</code>, <code>coef</code>, <code>intercept_se</code>, <code>coef_se</code>",
        "<code>(B, p + 1)</code> with intercept, else <code>(B, p)</code>",
        "Binary logistic regression. Standard errors come from the Fisher information.",
    ],
    [
        "<code>MultinomialLogit</code>",
        "<code>fit(x, y_int32)</code>",
        "1D <code>int32</code> class labels",
        "<code>coef</code> and <code>se</code> matrices of shape <code>(classes, features_with_intercept)</code>",
        "<code>(B, classes * features_with_intercept)</code>",
        "Intercept and slopes are packed together by class in the summary matrix.",
    ],
    [
        "<code>Poisson</code>",
        "<code>fit(x, y)</code>",
        "1D float mean counts",
        "<code>intercept</code>, <code>coef</code>, <code>intercept_se</code>, <code>coef_se</code>",
        "<code>(B, p + 1)</code> with intercept, else <code>(B, p)</code>",
        "Custom Newton-CG Poisson MLE with optional L2 penalty via <code>alpha</code>; <code>summary(vcov='vanilla')</code> gives Fisher SEs and <code>summary(vcov='sandwich')</code> gives QMLE sandwich SEs.",
    ],
    [
        "<code>TwoSLS</code>",
        "<code>fit(x_endog, x_exog, z, y)</code>",
        "1D float predictions from the stage-two design matrix",
        "<code>intercept</code>, <code>coef</code>, <code>intercept_se</code>, <code>coef_se</code>",
        "<code>(B, k + 1)</code> with intercept, else <code>(B, k)</code>",
        "Closed-form linear IV / 2SLS estimator. Supports multiple endogenous regressors, and its summary method now shares the same <code>vanilla</code>, <code>hc1</code>, <code>newey_west</code>, and <code>cluster</code> covariance interface as the other linear estimators.",
    ],
    [
        "<code>FTRL</code>",
        "<code>fit(x, y_int32)</code>",
        "1D float scores in <code>[0, 1]</code>",
        "<code>coef</code>, <code>coef_se</code>",
        "<code>(B, p)</code>",
        "Online-style classification model with a different summary schema and no explicit intercept output.",
    ],
    [
        "<code>GMM</code>",
        "<code>fit(data_dict, theta0, weighting='auto')</code>",
        "No dedicated <code>predict()</code> method",
        "<code>coef</code>, <code>se</code>, <code>vcov</code>, <code>criterion</code>, <code>nit</code>, <code>weighting</code>, <code>vcov_type</code>, <code>omega_type</code>, <code>weight_matrix</code>, <code>nobs</code>, <code>n_moments</code>, <code>j_stat</code>, <code>j_df</code>",
        "No bootstrap method",
        "Requires <code>moment_fn(theta, data)</code> returning an <code>(n, m)</code> matrix of per-observation moments and optionally <code>jacobian_fn(theta, data)</code> returning the sample Jacobian of shape <code>(m, p)</code>.",
    ],
    [
        "<code>MEstimator</code>",
        "<code>fit(data_dict, theta0)</code>",
        "No dedicated <code>predict()</code> method",
        "<code>coef</code>, <code>se</code>",
        "<code>(B, p)</code>",
        "Requires <code>objective_fn(theta, data) -&gt; (obj, grad)</code> and <code>score_fn(theta, data)</code> returning an <code>(n, p)</code> matrix.",
    ],
]

display(
    HTML(
        "<div style='width: 100%; overflow-x: auto;'>"
        + html_table(
            ["Estimator", "Fit", "Predict", "Summary Keys", "Bootstrap", "Notes"],
            reference_rows,
            table_attrs="style='width: 100%; min-width: 100%; table-layout: auto;'",
        )
        + "</div>"
    )
)
Estimator Fit Predict Summary Keys Bootstrap Notes
OLS fit(x, y) 1D float predictions intercept, coef, intercept_se, coef_se (B, p + 1) with intercept, else (B, p) Uses Linfa linear regression; summary(vcov='hc1') is default, and the linear-summary surface also accepts 'vanilla', 'newey_west', and 'cluster'.
Ridge fit(x, y) 1D float predictions intercept, coef, intercept_se, coef_se, penalty, penalties, best_penalty_index, cv_mse, intercept_path, coef_path (B, p + 1) with intercept Closed-form L2-regularized least squares solved through an augmented QR least-squares system. Passing a penalty grid runs cross-validation, stores the full coefficient path, refits the selected penalty on the full sample, and exposes the same linear-summary covariance options as OLS.
FixedEffectsOLS fit(x, fe_uint32, y) No dedicated predict() method coef, coef_se (B, p) Uses within to partial out fixed effects, then Linfa OLS with fit_intercept=False on the residualized design. The summary covariance options match the other linear estimators.
HorizontalPanelRidge fit(Y, W) 2D treated-unit counterfactual matrix att, counterfactual, treatment_effect, event_study, group_means, cohort_intercepts, cohort_coef, pre_rmse, penalty, control_units, treated_units, cohorts No bootstrap method Horizontal panel forecasting estimator. It infers absorbing treatment cohorts from W, trains cohort-specific ridge forecasts from never-treated donor paths, and stores both low-level coefficients and high-level causal summaries.
SyntheticControl fit(donors, treated) 1D float synthetic path from donor outcomes weights, pre_rmse (B, n_donors) Fits nonnegative donor weights that sum to one by minimizing pre-treatment squared error under a simplex constraint.
SyntheticDID fit(Y, W) 2D treated-unit counterfactual matrix att, unit_weights, time_weights, counterfactual, synthetic_outcome, treatment_effect, event_study, group_means, pre_rmse, unit_intercept, time_intercept, zeta_omega, zeta_lambda, control_units, treated_units, cohorts No bootstrap method Synthetic difference-in-differences for balanced outcome panels. It infers cohorts from W, fits cohort-specific simplex unit/time weights, and keeps weights available while exposing event-study and group-mean summaries as the modal output.
MatrixCompletion fit(Y, W) 2D completed/counterfactual panel att, completed, counterfactual, treatment_effect, event_study, group_means, low_rank, unit_effects, time_effects, singular_values, lambda_l, objective, iterations, history_objective, history_rmse No bootstrap method MCPanel-style nuclear-norm matrix completion using untreated cells from W as observed entries; treated cells are completed as counterfactuals and summarized through the common panel causal outputs.
BalancingWeights fit(covariates, target_covariates, baseline_weights=None, target_weights=None) No dedicated predict() method weights, beta, weighted_mean, target_mean, mean_diff, l2_diff, effective_sample_size, success No bootstrap method Dual calibration solver for entropy and quadratic balancing weights. Intended for ATT-style reweighting, covariate shift, and domain adaptation tasks where one sample is reweighted to match another sample's covariate mean.
EPLM fit(y, d, w) No dedicated predict() method coef, se, vcov, nuisance_coef No bootstrap method Robins-Newey style partially linear E-estimator implemented as an exactly identified stacked-moment system with a linear working model for E[D|W].
AverageDerivative fit(y, d, w) No dedicated predict() method method, coef, se, vcov No bootstrap method Graham-Pinto style average-derivative estimator with method='ob' | 'ipw' | 'dr' under a scalar continuous-treatment working model.
PartiallyLinearDML fit(y, d, x) No dedicated predict() method coef, se, vcov, outcome_penalties, treatment_penalties No bootstrap method Cross-fit ridge implementation of the partially linear Double-ML score. The nuisance fits are selected fold by fold from a common ridge penalty grid.
AIPW fit(y, d, x) No dedicated predict() method ate, se, vcov, outcome0_penalties, outcome1_penalties, propensity_penalties No bootstrap method Cross-fit ridge AIPW estimator for a binary-treatment ATE. Outcome regressions are fit separately by treatment arm and the ridge propensity fit is clipped away from 0 and 1.
ElasticNet fit(x, y) 1D float predictions intercept, coef, intercept_se, coef_se (B, p + 1) with intercept, else (B, p) Regularized point estimates with HC1-style covariance built from residuals.
Logit fit(x, y_int32) 1D int32 class labels intercept, coef, intercept_se, coef_se (B, p + 1) with intercept, else (B, p) Binary logistic regression. Standard errors come from the Fisher information.
MultinomialLogit fit(x, y_int32) 1D int32 class labels coef and se matrices of shape (classes, features_with_intercept) (B, classes * features_with_intercept) Intercept and slopes are packed together by class in the summary matrix.
Poisson fit(x, y) 1D float mean counts intercept, coef, intercept_se, coef_se (B, p + 1) with intercept, else (B, p) Custom Newton-CG Poisson MLE with optional L2 penalty via alpha; summary(vcov='vanilla') gives Fisher SEs and summary(vcov='sandwich') gives QMLE sandwich SEs.
TwoSLS fit(x_endog, x_exog, z, y) 1D float predictions from the stage-two design matrix intercept, coef, intercept_se, coef_se (B, k + 1) with intercept, else (B, k) Closed-form linear IV / 2SLS estimator. Supports multiple endogenous regressors, and its summary method now shares the same vanilla, hc1, newey_west, and cluster covariance interface as the other linear estimators.
FTRL fit(x, y_int32) 1D float scores in [0, 1] coef, coef_se (B, p) Online-style classification model with a different summary schema and no explicit intercept output.
GMM fit(data_dict, theta0, weighting='auto') No dedicated predict() method coef, se, vcov, criterion, nit, weighting, vcov_type, omega_type, weight_matrix, nobs, n_moments, j_stat, j_df No bootstrap method Requires moment_fn(theta, data) returning an (n, m) matrix of per-observation moments and optionally jacobian_fn(theta, data) returning the sample Jacobian of shape (m, p).
MEstimator fit(data_dict, theta0) No dedicated predict() method coef, se (B, p) Requires objective_fn(theta, data) -> (obj, grad) and score_fn(theta, data) returning an (n, p) matrix.

7 Transformer Reference

Show code
transformer_rows = [
    [
        "<code>PCA</code>",
        "<code>fit(x)</code>",
        "<code>transform(x)</code>, <code>fit_transform(x)</code>, <code>inverse_transform(scores)</code>",
        "<code>n_components</code>, <code>n_features</code>, <code>whiten</code>, <code>components</code>, <code>mean</code>, <code>explained_variance</code>, <code>explained_variance_ratio</code>, <code>singular_values</code>",
        "Principal-components basis estimated from <code>x</code> alone. Useful for principal-components regression and other low-rank pipelines.",
    ],
    [
        "<code>KernelBasis</code>",
        "<code>fit(x)</code>",
        "<code>transform(x)</code>, <code>fit_transform(x)</code>",
        "<code>kernel</code>, <code>n_train</code>, <code>n_features</code>, <code>bandwidth</code>, <code>coef0</code>, <code>degree</code>, <code>diagonal</code>",
        "Stores the fitted training design and returns kernel features against that basis, so downstream regressions can work on nonlinear right-hand sides.",
    ],
]

display(
    HTML(
        html_table(
            ["Transformer", "Fit", "Transform", "Summary Keys", "Notes"],
            transformer_rows,
        )
    )
)
Transformer Fit Transform Summary Keys Notes
PCA fit(x) transform(x), fit_transform(x), inverse_transform(scores) n_components, n_features, whiten, components, mean, explained_variance, explained_variance_ratio, singular_values Principal-components basis estimated from x alone. Useful for principal-components regression and other low-rank pipelines.
KernelBasis fit(x) transform(x), fit_transform(x) kernel, n_train, n_features, bandwidth, coef0, degree, diagonal Stores the fitted training design and returns kernel features against that basis, so downstream regressions can work on nonlinear right-hand sides.

8 Optimizer Reference

Show code
optimizer_reference_rows = [
    [
        "<code>Optimizers.minimize_lbfgs</code>",
        "<code>fun(theta) -> float</code>, <code>grad(theta) -> (p,)</code>",
        "<code>x</code>, <code>fun</code>, <code>nit</code>, <code>success</code>, <code>message</code>, <code>method</code>",
        "Limited-memory quasi-Newton method for smooth unconstrained objectives.",
    ],
    [
        "<code>Optimizers.minimize_bfgs</code>",
        "<code>fun(theta) -> float</code>, <code>grad(theta) -> (p,)</code>",
        "<code>x</code>, <code>fun</code>, <code>nit</code>, <code>success</code>, <code>message</code>, <code>method</code>",
        "Dense quasi-Newton method with an explicit inverse-Hessian approximation.",
    ],
    [
        "<code>Optimizers.minimize_nonlinear_cg</code>",
        "<code>fun(theta) -> float</code>, <code>grad(theta) -> (p,)</code>",
        "<code>x</code>, <code>fun</code>, <code>nit</code>, <code>success</code>, <code>message</code>, <code>method</code>",
        "Conjugate-gradient optimizer with Armijo backtracking and a gradient-norm success check.",
    ],
    [
        "<code>Optimizers.minimize_gauss_newton_ls</code>",
        "<code>residual(theta) -> (n,)</code>, <code>jacobian(theta) -> (n, p)</code>",
        "<code>x</code>, <code>fun</code>, <code>nit</code>, <code>success</code>, <code>message</code>, <code>method</code>",
        "Least-squares Gauss-Newton routine for moment or residual systems.",
    ],
    [
        "<code>Optimizers.minimize_simulated_annealing</code>",
        "<code>fun(theta) -> float</code>",
        "<code>x</code>, <code>fun</code>, <code>nit</code>, <code>success</code>, <code>message</code>, <code>method</code>",
        "Stochastic global optimizer with optional box bounds, temperature, and RNG seed.",
    ],
]

display(
    HTML(
        html_table(
            ["Function", "Required Callbacks", "Returns", "Notes"],
            optimizer_reference_rows,
        )
    )
)
Function Required Callbacks Returns Notes
Optimizers.minimize_lbfgs fun(theta) -> float, grad(theta) -> (p,) x, fun, nit, success, message, method Limited-memory quasi-Newton method for smooth unconstrained objectives.
Optimizers.minimize_bfgs fun(theta) -> float, grad(theta) -> (p,) x, fun, nit, success, message, method Dense quasi-Newton method with an explicit inverse-Hessian approximation.
Optimizers.minimize_nonlinear_cg fun(theta) -> float, grad(theta) -> (p,) x, fun, nit, success, message, method Conjugate-gradient optimizer with Armijo backtracking and a gradient-norm success check.
Optimizers.minimize_gauss_newton_ls residual(theta) -> (n,), jacobian(theta) -> (n, p) x, fun, nit, success, message, method Least-squares Gauss-Newton routine for moment or residual systems.
Optimizers.minimize_simulated_annealing fun(theta) -> float x, fun, nit, success, message, method Stochastic global optimizer with optional box bounds, temperature, and RNG seed.

9 Verified Runtime Snapshot

The next chunk fits each estimator on synthetic data and records the key summary schema and bootstrap output shape. This acts as a lightweight executable check that the reference above matches the current build.

Show code
rng = np.random.default_rng(123)
runtime_rows = []

# OLS
x = rng.normal(size=(40, 3))
y = 0.5 + x @ np.array([1.0, -2.0, 0.3]) + rng.normal(scale=0.2, size=40)
model = cm.OLS()
model.fit(x, y)
runtime_rows.append(
    [
        "<code>OLS</code>",
        ", ".join(model.summary().keys()),
        str(model.bootstrap(3).shape),
    ]
)

# Ridge
model = cm.Ridge(penalty=np.array([0.0, 0.1, 1.0]), cv=3)
model.fit(x, y)
runtime_rows.append(
    [
        "<code>Ridge</code>",
        ", ".join(model.summary().keys()),
        str(model.bootstrap(3, seed=1).shape),
    ]
)

# FixedEffectsOLS
n = 120
x = rng.normal(size=(n, 2))
worker = rng.integers(0, 20, size=n, dtype=np.uint32)
firm = rng.integers(0, 12, size=n, dtype=np.uint32)
fe = np.column_stack([worker, firm]).astype(np.uint32)
y = (
    x @ np.array([1.1, -0.4])
    + rng.normal(size=20)[worker]
    + rng.normal(size=12)[firm]
    + rng.normal(scale=0.2, size=n)
)
model = cm.FixedEffectsOLS()
model.fit(x, fe, y)
runtime_rows.append(
    [
        "<code>FixedEffectsOLS</code>",
        ", ".join(model.summary().keys()),
        str(model.bootstrap(3, seed=1).shape),
    ]
)

# ElasticNet
model = cm.ElasticNet(penalty=0.1, l1_ratio=0.5)
model.fit(x, y)
runtime_rows.append(
    [
        "<code>ElasticNet</code>",
        ", ".join(model.summary().keys()),
        str(model.bootstrap(3, seed=1).shape),
    ]
)

# SyntheticControl
donors_pre = rng.normal(size=(50, 3))
weights_true = np.array([0.5, 0.35, 0.15])
treated_pre = donors_pre @ weights_true
model = cm.SyntheticControl(max_iterations=300)
model.fit(donors_pre, treated_pre)
runtime_rows.append(
    [
        "<code>SyntheticControl</code>",
        ", ".join(model.summary().keys()),
        str(model.bootstrap(3, seed=1).shape),
    ]
)

# SyntheticDID
panel = rng.normal(size=(8, 12))
w = np.zeros_like(panel)
w[6:, 8:] = 1.0
panel[6:, 8:] += 0.8
model = cm.SyntheticDID(max_iterations=300)
model.fit(panel, w)
runtime_rows.append(
    [
        "<code>SyntheticDID</code>",
        ", ".join(model.summary().keys()),
        "No bootstrap method",
    ]
)

# Logit
x = rng.normal(size=(60, 3))
eta = -0.2 + x @ np.array([0.8, -0.5, 1.2])
prob = 1.0 / (1.0 + np.exp(-eta))
y_bin = rng.binomial(1, prob, size=60).astype(np.int32)
model = cm.Logit(max_iterations=200)
model.fit(x, y_bin)
runtime_rows.append(
    [
        "<code>Logit</code>",
        ", ".join(model.summary().keys()),
        str(model.bootstrap(3, seed=1).shape),
    ]
)

# MultinomialLogit
x = rng.normal(size=(80, 2))
coef = np.array([[0.5, -0.7], [-0.3, 0.4], [0.2, 0.6]])
intercept = np.array([0.2, -0.1, 0.0])
logits = x @ coef.T + intercept
weights = np.exp(logits - logits.max(axis=1, keepdims=True))
prob = weights / weights.sum(axis=1, keepdims=True)
y_multi = np.array([rng.choice(3, p=row) for row in prob], dtype=np.int32)
model = cm.MultinomialLogit(max_iterations=200)
model.fit(x, y_multi)
runtime_rows.append(
    [
        "<code>MultinomialLogit</code>",
        ", ".join(model.summary().keys()),
        str(model.bootstrap(3, seed=1).shape),
    ]
)

# Poisson
x = rng.normal(size=(70, 2))
mu = np.exp(0.1 + x @ np.array([0.4, -0.2]))
y_pois = rng.poisson(mu).astype(float)
model = cm.Poisson(max_iterations=200)
model.fit(x, y_pois)
runtime_rows.append(
    [
        "<code>Poisson</code>",
        ", ".join(model.summary().keys()),
        str(model.bootstrap(3, seed=1).shape),
    ]
)

# TwoSLS
n = 80
z = rng.normal(size=(n, 2))
x_exog = rng.normal(size=(n, 1))
v = rng.normal(size=(n, 2))
u = 0.5 * v[:, 0] - 0.35 * v[:, 1] + rng.normal(scale=0.1, size=n)
x_endog = z @ np.array([[0.9, 0.1], [-0.25, 0.8]]) + 0.2 * x_exog + v
y_iv = 0.4 + x_endog @ np.array([1.3, -0.7]) - 0.6 * x_exog[:, 0] + u
model = cm.TwoSLS()
model.fit(x_endog, x_exog, z, y_iv)
runtime_rows.append(
    [
        "<code>TwoSLS</code>",
        ", ".join(model.summary().keys()),
        str(model.bootstrap(3, seed=1).shape),
    ]
)

# FTRL
x = rng.normal(size=(80, 4))
eta = x @ np.array([0.7, -0.4, 0.2, 0.3])
prob = 1.0 / (1.0 + np.exp(-eta))
y_ftrl = rng.binomial(1, prob, size=80).astype(np.int32)
model = cm.FTRL()
model.fit(x, y_ftrl)
runtime_rows.append(
    [
        "<code>FTRL</code>",
        ", ".join(model.summary().keys()),
        str(model.bootstrap(3, seed=1).shape),
    ]
)

# GMM
def gmm_moments(theta, data):
    X = data["X"]
    y = data["y"]
    centered = X[:, 0] - theta[0]
    resid = y - theta[1] - theta[2] * centered
    return np.column_stack([X[:, 0] - theta[0], resid, centered * resid])


def gmm_jacobian(theta, data):
    X = data["X"]
    y = data["y"]
    centered = X[:, 0] - theta[0]
    resid = y - theta[1] - theta[2] * centered
    return np.array(
        [
            [-1.0, 0.0, 0.0],
            [theta[2], -1.0, -centered.mean()],
            [(-resid + theta[2] * centered).mean(), -centered.mean(), -(centered**2).mean()],
        ]
    )


x = rng.normal(size=(60, 1))
y = 1.2 + 0.8 * (x[:, 0] - x[:, 0].mean()) + rng.normal(scale=0.2, size=60)
model = cm.GMM(gmm_moments, jacobian_fn=gmm_jacobian, max_iterations=200)
model.fit({"X": x, "y": y}, np.zeros(3))
runtime_rows.append(
    [
        "<code>GMM</code>",
        ", ".join(model.summary().keys()),
        "n/a",
    ]
)

# MEstimator
def ols_objective(theta, data):
    X = data["X"]
    y = data["y"]
    indices = data.get("indices", np.arange(len(y)))
    Xb = X[indices]
    yb = y[indices]
    residual = yb - Xb @ theta
    return 0.5 * np.sum(residual**2), -(Xb.T @ residual)


def ols_scores(theta, data):
    X = data["X"]
    y = data["y"]
    residual = y - X @ theta
    return -X * residual[:, None]


x = rng.normal(size=(50, 3))
y = x @ np.array([1.0, -0.5, 0.2]) + rng.normal(scale=0.2, size=50)
model = cm.MEstimator(ols_objective, ols_scores, max_iterations=200)
model.fit({"X": x, "y": y, "n": len(y)}, np.zeros(x.shape[1]))
runtime_rows.append(
    [
        "<code>MEstimator</code>",
        ", ".join(model.summary().keys()),
        str(model.bootstrap(3, seed=1).shape),
    ]
)

display(HTML(html_table(["Estimator", "Summary Keys", "Bootstrap Shape"], runtime_rows)))
Estimator Summary Keys Bootstrap Shape
OLS intercept, coef, intercept_se, coef_se, vcov_type (3, 4)
Ridge intercept, coef, intercept_se, coef_se, penalty, penalties, vcov_type, best_penalty_index, cv_mse, intercept_path, coef_path (3, 4)
FixedEffectsOLS coef, coef_se, vcov_type (3, 2)
ElasticNet intercept, coef, intercept_se, coef_se (3, 3)
SyntheticControl weights, pre_rmse (3, 3)
SyntheticDID att, unit_weights, time_weights, counterfactual, synthetic_outcome, treatment_effect, event_study, group_means, pre_rmse, unit_intercept, time_intercept, zeta_omega, zeta_lambda, control_units, treated_units, cohorts No bootstrap method
Logit intercept, coef, intercept_se, coef_se (3, 4)
MultinomialLogit coef, se (3, 9)
Poisson intercept, coef, intercept_se, coef_se, vcov_type (3, 3)
TwoSLS intercept, coef, intercept_se, coef_se, vcov_type (3, 4)
FTRL coef, coef_se (3, 4)
GMM coef, se, vcov, criterion, nit, weighting, vcov_type, omega_type, weight_matrix, nobs, n_moments, j_stat, j_df n/a
MEstimator coef, se (3, 3)

10 Usage Examples

10.1 OLS

rng = np.random.default_rng(0)
x = rng.normal(size=(200, 3))
beta = np.array([1.2, -0.7, 0.5])
y = 0.4 + x @ beta + rng.normal(scale=0.2, size=200)

model = cm.OLS()
model.fit(x, y)

summary = model.summary()
summary_vanilla = model.summary(vcov="vanilla")
summary_hac = model.summary(vcov="newey_west", lags=4)
clusters = np.repeat(np.arange(20, dtype=np.int64), len(y) // 20)
summary_cluster = model.summary(vcov="cluster", clusters=clusters)
bootstrap_draws = model.bootstrap(5, seed=42)

summary["coef"], summary_vanilla["coef_se"], summary_hac["coef_se"], summary_cluster["coef_se"]
(array([ 1.18200348, -0.71923052,  0.51060512]),
 array([0.01415748, 0.01431653, 0.01472891]),
 array([0.01376542, 0.01521404, 0.01344638]),
 array([0.01486123, 0.01548593, 0.01560255]))

10.2 Ridge

rng = np.random.default_rng(2)
x = rng.normal(size=(250, 4))
beta = np.array([1.0, -0.8, 0.0, 0.3])
y = 0.6 + x @ beta + rng.normal(scale=0.4, size=250)

model = cm.Ridge(penalty=np.array([0.0, 0.05, 0.2, 1.0]), cv=5)
model.fit(x, y)

summary = model.summary()
summary_hac = model.summary(vcov="newey_west", lags=4)

summary["penalty"], summary["best_penalty_index"], summary_hac["coef_se"], summary["coef_path"][:, :2]
(0.0,
 0,
 array([0.01932908, 0.02579194, 0.02352418, 0.02311446]),
 array([[ 0.98320114,  0.9830328 ],
        [-0.81795424, -0.81778592],
        [ 0.04373202,  0.04372377],
        [ 0.2482089 ,  0.24816171]]))

10.3 Poisson

rng = np.random.default_rng(4)
x = rng.normal(size=(300, 2))
mu = np.exp(0.2 + x @ np.array([0.5, -0.25]))
y = rng.poisson(mu).astype(float)

model = cm.Poisson(max_iterations=200, tolerance=1e-8)
model.fit(x, y)

vanilla = model.summary(vcov="vanilla")
sandwich = model.summary(vcov="sandwich")

vanilla["coef"], sandwich["coef_se"]
(array([ 0.51333285, -0.24653569]), array([0.04435521, 0.04865882]))

10.4 TwoSLS

rng = np.random.default_rng(6)
n = 500
z = rng.normal(size=(n, 3))
x_exog = rng.normal(size=(n, 1))
v = rng.normal(size=(n, 1))
eps = rng.normal(size=n)
clusters = np.repeat(np.arange(25, dtype=np.int64), n // 25)

x_endog = z @ np.array([[0.8], [0.35], [-0.25]]) + 0.2 * x_exog + v
y = 0.5 + 1.1 * x_endog[:, 0] - 0.6 * x_exog[:, 0] + (0.6 * v[:, 0] + 0.25 * eps)

model = cm.TwoSLS()
model.fit(x_endog, x_exog, z, y)

hc1 = model.summary()
cluster = model.summary(vcov="cluster", clusters=clusters)

hc1["coef"], cluster["coef_se"]
(array([ 1.09071698, -0.56326909]), array([0.03436151, 0.03780577]))

10.5 EPLM

rng = np.random.default_rng(7)
w = rng.normal(size=(300, 3))
d = 0.4 + w @ np.array([0.7, -0.4, 0.2]) + rng.normal(scale=0.8, size=300)
y = 1.3 * d + 0.8 + w @ np.array([0.3, -0.1, 0.2]) + rng.normal(scale=0.5, size=300)

model = cm.EPLM()
model.fit(y, d, w)

summary = model.summary()
summary["coef"], summary["se"]
(1.275186460120313, 0.03939548509156887)

10.6 AverageDerivative

rng = np.random.default_rng(8)
w = rng.normal(size=(320, 2))
d = 0.2 + w @ np.array([0.6, -0.3]) + rng.normal(scale=0.6, size=320)
wc = w - w.mean(axis=0)
y = 1.0 + wc @ np.array([0.2, -0.3]) + d * (0.9 + wc @ np.array([0.15, -0.1])) + rng.normal(scale=0.4, size=320)

model = cm.AverageDerivative(method="dr")
model.fit(y, d, w)

summary = model.summary()
summary["method"], summary["coef"], summary["se"]
('dr', 0.90461952484454, 0.03794115183556153)

10.7 PartiallyLinearDML

rng = np.random.default_rng(10)
x = rng.normal(size=(400, 4))
d = 0.3 + x @ np.array([0.5, -0.4, 0.2, 0.1]) + rng.normal(scale=0.8, size=400)
y = 1.4 * d + 1.0 + x @ np.array([0.4, -0.2, 0.1, 0.3]) + rng.normal(scale=0.6, size=400)

penalty_grid = np.logspace(-4, 2, 25)

model = cm.PartiallyLinearDML(penalty=penalty_grid, cv=5, n_folds=5, seed=1)
model.fit(y, d, x)

summary = model.summary()
summary["coef"], summary["se"], summary["outcome_penalties"][:2]
(1.4404211526609862, 0.042450917748467834, array([3.16227766, 1.77827941]))

10.8 AIPW

rng = np.random.default_rng(11)
x = rng.normal(size=(450, 3))
pi = 1.0 / (1.0 + np.exp(-(0.1 + x @ np.array([0.7, -0.4, 0.2]))))
d = rng.binomial(1, pi, size=450).astype(float)
y = 0.5 + x @ np.array([0.3, -0.2, 0.4]) + 1.1 * d + rng.normal(scale=0.8, size=450)

penalty_grid = np.logspace(-4, 2, 25)

model = cm.AIPW(penalty=penalty_grid, cv=5, n_folds=5, seed=1)
model.fit(y, d, x)

summary = model.summary()
summary["ate"], summary["se"], summary["propensity_penalties"][:2]
(1.1263630572688277, 0.08781046463340195, array([31.6227766, 31.6227766]))

10.9 GMM

def simple_iv_moments(theta, data):
    resid = data["y"] - data["x"] * theta[0]
    return data["z"] * resid[:, None]


def simple_iv_jacobian(theta, data):
    del theta
    return -(data["z"].T @ data["x"][:, None]) / data["x"].shape[0]


rng = np.random.default_rng(9)
n = 500
z = rng.normal(size=(n, 3))
v = rng.normal(size=n)
eps = rng.normal(size=n)
x = z @ np.array([0.9, 0.4, -0.3]) + v
y = 1.0 * x + 0.5 * v + 0.3 * eps

model = cm.GMM(simple_iv_moments, jacobian_fn=simple_iv_jacobian, max_iterations=200)
model.fit({"x": x, "y": y, "z": z}, np.array([0.0]), weighting="identity")

vanilla = model.summary(vcov="vanilla")
sandwich = model.summary(vcov="sandwich", omega="iid")

vanilla["coef"], sandwich["se"]
(array([0.95457291]), array([0.02801009]))

10.10 FixedEffectsOLS

This estimator follows the Frisch-Waugh-Lovell path directly: partial out the fixed effects with within, then regress the residualized outcome on the residualized regressors with no intercept.

rng = np.random.default_rng(12)
n = 400
x = rng.normal(size=(n, 2))
worker = rng.integers(0, 30, size=n, dtype=np.uint32)
firm = rng.integers(0, 15, size=n, dtype=np.uint32)
fe = np.column_stack([worker, firm]).astype(np.uint32)
beta = np.array([1.0, -0.6])
y = (
    x @ beta
    + rng.normal(scale=0.8, size=30)[worker]
    + rng.normal(scale=0.5, size=15)[firm]
    + rng.normal(scale=0.15, size=n)
)

model = cm.FixedEffectsOLS()
model.fit(x, fe, y)

cluster = model.summary(vcov="cluster", clusters=worker.astype(np.int64))
cluster
{'coef': array([ 1.01369715, -0.60410017]),
 'coef_se': array([0.00786918, 0.00722453]),
 'vcov_type': 'cluster'}

10.11 MEstimator

This is the lowest-level public interface in the library. The callback contract matters more than the optimizer choice:

  • objective_fn(theta, data) must return (objective_value, gradient_vector).
  • score_fn(theta, data) must return an (n_obs, n_params) matrix of per-observation scores.
  • For bootstrap support, the data dictionary must include n, and user code should respect an optional indices key.
import numpy as np


def ols_objective(theta, data):
    X = data["X"]
    y = data["y"]
    indices = data.get("indices", np.arange(len(y)))
    Xb = X[indices]
    yb = y[indices]
    residual = yb - Xb @ theta
    objective = 0.5 * np.sum(residual**2)
    gradient = -(Xb.T @ residual)
    return objective, gradient


def ols_scores(theta, data):
    X = data["X"]
    y = data["y"]
    residual = y - X @ theta
    return -X * residual[:, None]


rng = np.random.default_rng(7)
X = rng.normal(size=(300, 2))
y = X @ np.array([1.5, -0.4]) + rng.normal(scale=0.1, size=300)

model = cm.MEstimator(
    objective_fn=ols_objective,
    score_fn=ols_scores,
    max_iterations=200,
)
model.fit({"X": X, "y": y, "n": len(y)}, np.zeros(X.shape[1]))

model.summary()
{'coef': array([ 1.49784531, -0.39974307]),
 'se': array([0.60847816, 0.63437929])}

11 Current Sharp Edges

  • There is still no rich result object layer. Consumers should expect raw dictionaries and NumPy arrays.
  • Most estimators do not expose fit statistics or formula handling.
  • GMM is intentionally narrow: closed-form linear IV still lives in TwoSLS, and the generic class only does just-identified Gauss-Newton or two-step overidentified weighting.
  • MEstimator favors flexibility over guardrails. If your objective has fragile regions, add clipping or other stability controls inside the callback.