crabbymetrics
  • Home
  • API
  • Examples
    • OLS
    • Fixed Effects OLS
    • ElasticNet
    • Logit
    • Multinomial Logit
    • Poisson
    • TwoSLS
    • FTRL
    • MEstimator Poisson

On this page

  • 1 Overview
  • 2 Sitrep
  • 3 Public Surface
  • 4 Conventions
  • 5 Estimator Reference
  • 6 Verified Runtime Snapshot
  • 7 Usage Examples
    • 7.1 OLS
    • 7.2 FixedEffectsOLS
    • 7.3 MEstimator
  • 8 Current Sharp Edges

crabbymetrics API Reference

Library sitrep and verified Python surface

1 Overview

crabbymetrics is a compact Rust-backed econometrics library exposed to Python through pyo3 and packaged with maturin. The public API is intentionally small: nine estimator classes, numpy arrays as the only runtime dependency, and a scikit-adjacent fit / predict / summary / bootstrap pattern.

The page below combines two things:

  • A sitrep on the current state of the library.
  • A verified API reference rendered against the live Python module in this repository.
import inspect
from html import escape

import numpy as np
import crabbymetrics as cm
from IPython.display import HTML, display


def html_table(headers, rows):
    parts = [
        "<table>",
        "<thead>",
        "<tr>",
        *[f"<th>{escape(str(header))}</th>" for header in headers],
        "</tr>",
        "</thead>",
        "<tbody>",
    ]
    for row in rows:
        parts.append("<tr>")
        for cell in row:
            parts.append(f"<td>{cell}</td>")
        parts.append("</tr>")
    parts.extend(["</tbody>", "</table>"])
    return "".join(parts)

2 Sitrep

  • The implementation is concentrated in two Rust files: src/lib.rs registers the Python classes and src/estimators.rs holds nearly all estimator logic.
  • Shared linear algebra, covariance, bootstrap, and NumPy conversion helpers live in src/utils.rs.
  • Packaging is lean. The Python package declares only numpy at runtime, while native builds are handled by maturin.
  • Release automation already exists in .github/workflows/wheels.yml for Linux and macOS wheels across Python 3.10 to 3.14.
  • Coverage is broad for a small codebase: OLS, fixed-effects OLS, ElasticNet, binary and multinomial logit, Poisson, TwoSLS, FTRL, and a callback-driven general MEstimator.
  • The main maturity gaps are still API breadth and test depth. The public contract is now documented, but the estimator surface is intentionally narrow.
  • The public API is consistent, but not fully uniform. summary() always returns a plain Python dict, yet the exact keys vary by estimator.
  • TwoSLS is effectively a single-endogenous implementation today, even though x_endog is typed as a matrix.
  • MEstimator is flexible but low-level. Users are responsible for gradients, per-observation score matrices, and numerical safeguards inside their objective functions.

3 Public Surface

The table below is generated from the live module so the constructor and method signatures match the current build.

classes = [
    cm.OLS,
    cm.FixedEffectsOLS,
    cm.ElasticNet,
    cm.Logit,
    cm.MultinomialLogit,
    cm.Poisson,
    cm.TwoSLS,
    cm.FTRL,
    cm.MEstimator,
]

rows = []
for cls in classes:
    methods = []
    for method_name in sorted(name for name in dir(cls) if not name.startswith("_")):
        method = getattr(cls, method_name)
        methods.append(
            f"<code>{escape(method_name)}{escape(str(inspect.signature(method)))}</code>"
        )
    rows.append(
        [
            f"<code>{escape(cls.__name__)}{escape(str(inspect.signature(cls)))}</code>",
            "<br>".join(methods),
        ]
    )

display(HTML(html_table(["Constructor", "Methods"], rows)))
Constructor Methods
OLS() bootstrap(self, /, n_bootstrap, seed=None)
fit(self, /, x, y)
predict(self, /, x)
summary(self, /)
FixedEffectsOLS() bootstrap(self, /, n_bootstrap, seed=None)
fit(self, /, x, fe, y)
summary(self, /)
ElasticNet(penalty=1.0, l1_ratio=0.5, tolerance=0.0001, max_iterations=1000) bootstrap(self, /, n_bootstrap, seed=None)
fit(self, /, x, y)
predict(self, /, x)
summary(self, /)
Logit(alpha=1.0, max_iterations=100, gradient_tolerance=0.0001) bootstrap(self, /, n_bootstrap, seed=None)
fit(self, /, x, y)
predict(self, /, x)
summary(self, /)
MultinomialLogit(alpha=1.0, max_iterations=100, gradient_tolerance=0.0001) bootstrap(self, /, n_bootstrap, seed=None)
fit(self, /, x, y)
predict(self, /, x)
summary(self, /)
Poisson(alpha=0.0, max_iterations=100, tolerance=0.0001) bootstrap(self, /, n_bootstrap, seed=None)
fit(self, /, x, y)
predict(self, /, x)
summary(self, /)
TwoSLS() bootstrap(self, /, n_bootstrap, seed=None)
fit(self, /, x_endog, x_exog, z, y)
predict(self, /, x)
summary(self, /)
FTRL(alpha=0.1, beta=1.0, l1_ratio=1.0, l2_ratio=1.0) bootstrap(self, /, n_bootstrap, seed=None)
fit(self, /, x, y)
predict(self, /, x)
summary(self, /)
MEstimator(objective_fn, score_fn, max_iterations=100, tolerance=1e-06) bootstrap(self, /, n_bootstrap, seed=None)
compute_vcov(self, /)
fit(self, /, data, theta0)
summary(self, /)

4 Conventions

  • Input features are always 2D NumPy arrays.
  • Regression targets use float64 arrays.
  • FixedEffectsOLS.fit(x, fe, y) expects fe to be a 2D uint32 matrix of zero-based category codes, one column per factor.
  • Logit, MultinomialLogit, and FTRL expect integer class labels at fit time.
  • Most estimators store their training data internally so summary() and bootstrap() can be called after fit().
  • OLS, ElasticNet, Logit, MultinomialLogit, Poisson, and TwoSLS now always fit an intercept.
  • FixedEffectsOLS deliberately has no predict() method. It stores slope coefficients and uses the absorbed-effects representation only for estimation and bootstrap.
  • When an estimator has an intercept, bootstrap draws place it in the first column.
  • summary() returns dictionaries, not rich result objects. Downstream code should index by key.
  • bootstrap(n_bootstrap, seed=None) now accepts an omitted seed across the full public surface.

5 Estimator Reference

reference_rows = [
    [
        "<code>OLS</code>",
        "<code>fit(x, y)</code>",
        "1D float predictions",
        "<code>intercept</code>, <code>coef</code>, <code>intercept_se</code>, <code>coef_se</code>",
        "<code>(B, p + 1)</code> with intercept, else <code>(B, p)</code>",
        "Uses Linfa linear regression with HC1 robust standard errors.",
    ],
    [
        "<code>FixedEffectsOLS</code>",
        "<code>fit(x, fe_uint32, y)</code>",
        "No dedicated <code>predict()</code> method",
        "<code>coef</code>, <code>coef_se</code>",
        "<code>(B, p)</code>",
        "Uses <code>within</code> to partial out fixed effects, then Linfa OLS with <code>fit_intercept=False</code> on the residualized design.",
    ],
    [
        "<code>ElasticNet</code>",
        "<code>fit(x, y)</code>",
        "1D float predictions",
        "<code>intercept</code>, <code>coef</code>, <code>intercept_se</code>, <code>coef_se</code>",
        "<code>(B, p + 1)</code> with intercept, else <code>(B, p)</code>",
        "Regularized point estimates with HC1-style covariance built from residuals.",
    ],
    [
        "<code>Logit</code>",
        "<code>fit(x, y_int32)</code>",
        "1D <code>int32</code> class labels",
        "<code>intercept</code>, <code>coef</code>, <code>intercept_se</code>, <code>coef_se</code>",
        "<code>(B, p + 1)</code> with intercept, else <code>(B, p)</code>",
        "Binary logistic regression. Standard errors come from the Fisher information.",
    ],
    [
        "<code>MultinomialLogit</code>",
        "<code>fit(x, y_int32)</code>",
        "1D <code>int32</code> class labels",
        "<code>coef</code> and <code>se</code> matrices of shape <code>(classes, features_with_intercept)</code>",
        "<code>(B, classes * features_with_intercept)</code>",
        "Intercept and slopes are packed together by class in the summary matrix.",
    ],
    [
        "<code>Poisson</code>",
        "<code>fit(x, y)</code>",
        "1D float mean counts",
        "<code>intercept</code>, <code>coef</code>, <code>intercept_se</code>, <code>coef_se</code>",
        "<code>(B, p + 1)</code> with intercept, else <code>(B, p)</code>",
        "Custom Newton-CG Poisson MLE with optional L2 penalty via <code>alpha</code>.",
    ],
    [
        "<code>TwoSLS</code>",
        "<code>fit(x_endog, x_exog, z, y)</code>",
        "1D float predictions from the stage-two design matrix",
        "<code>intercept</code>, <code>coef</code>, <code>intercept_se</code>, <code>coef_se</code>",
        "<code>(B, k + 1)</code> with intercept, else <code>(B, k)</code>",
        "Currently single-endogenous in practice. Exogenous regressors follow the endogenous block in <code>predict()</code>.",
    ],
    [
        "<code>FTRL</code>",
        "<code>fit(x, y_int32)</code>",
        "1D float scores in <code>[0, 1]</code>",
        "<code>coef</code>, <code>coef_se</code>",
        "<code>(B, p)</code>",
        "Online-style classification model with a different summary schema and no explicit intercept output.",
    ],
    [
        "<code>MEstimator</code>",
        "<code>fit(data_dict, theta0)</code>",
        "No dedicated <code>predict()</code> method",
        "<code>coef</code>, <code>se</code>",
        "<code>(B, p)</code>",
        "Requires <code>objective_fn(theta, data) -&gt; (obj, grad)</code> and <code>score_fn(theta, data)</code> returning an <code>(n, p)</code> matrix.",
    ],
]

display(
    HTML(
        html_table(
            ["Estimator", "Fit", "Predict", "Summary Keys", "Bootstrap", "Notes"],
            reference_rows,
        )
    )
)
Estimator Fit Predict Summary Keys Bootstrap Notes
OLS fit(x, y) 1D float predictions intercept, coef, intercept_se, coef_se (B, p + 1) with intercept, else (B, p) Uses Linfa linear regression with HC1 robust standard errors.
FixedEffectsOLS fit(x, fe_uint32, y) No dedicated predict() method coef, coef_se (B, p) Uses within to partial out fixed effects, then Linfa OLS with fit_intercept=False on the residualized design.
ElasticNet fit(x, y) 1D float predictions intercept, coef, intercept_se, coef_se (B, p + 1) with intercept, else (B, p) Regularized point estimates with HC1-style covariance built from residuals.
Logit fit(x, y_int32) 1D int32 class labels intercept, coef, intercept_se, coef_se (B, p + 1) with intercept, else (B, p) Binary logistic regression. Standard errors come from the Fisher information.
MultinomialLogit fit(x, y_int32) 1D int32 class labels coef and se matrices of shape (classes, features_with_intercept) (B, classes * features_with_intercept) Intercept and slopes are packed together by class in the summary matrix.
Poisson fit(x, y) 1D float mean counts intercept, coef, intercept_se, coef_se (B, p + 1) with intercept, else (B, p) Custom Newton-CG Poisson MLE with optional L2 penalty via alpha.
TwoSLS fit(x_endog, x_exog, z, y) 1D float predictions from the stage-two design matrix intercept, coef, intercept_se, coef_se (B, k + 1) with intercept, else (B, k) Currently single-endogenous in practice. Exogenous regressors follow the endogenous block in predict().
FTRL fit(x, y_int32) 1D float scores in [0, 1] coef, coef_se (B, p) Online-style classification model with a different summary schema and no explicit intercept output.
MEstimator fit(data_dict, theta0) No dedicated predict() method coef, se (B, p) Requires objective_fn(theta, data) -> (obj, grad) and score_fn(theta, data) returning an (n, p) matrix.

6 Verified Runtime Snapshot

The next chunk fits each estimator on synthetic data and records the key summary schema and bootstrap output shape. This acts as a lightweight executable check that the reference above matches the current build.

rng = np.random.default_rng(123)
runtime_rows = []

# OLS
x = rng.normal(size=(40, 3))
y = 0.5 + x @ np.array([1.0, -2.0, 0.3]) + rng.normal(scale=0.2, size=40)
model = cm.OLS()
model.fit(x, y)
runtime_rows.append(
    [
        "<code>OLS</code>",
        ", ".join(model.summary().keys()),
        str(model.bootstrap(3).shape),
    ]
)

# FixedEffectsOLS
n = 120
x = rng.normal(size=(n, 2))
worker = rng.integers(0, 20, size=n, dtype=np.uint32)
firm = rng.integers(0, 12, size=n, dtype=np.uint32)
fe = np.column_stack([worker, firm]).astype(np.uint32)
y = (
    x @ np.array([1.1, -0.4])
    + rng.normal(size=20)[worker]
    + rng.normal(size=12)[firm]
    + rng.normal(scale=0.2, size=n)
)
model = cm.FixedEffectsOLS()
model.fit(x, fe, y)
runtime_rows.append(
    [
        "<code>FixedEffectsOLS</code>",
        ", ".join(model.summary().keys()),
        str(model.bootstrap(3, seed=1).shape),
    ]
)

# ElasticNet
model = cm.ElasticNet(penalty=0.1, l1_ratio=0.5)
model.fit(x, y)
runtime_rows.append(
    [
        "<code>ElasticNet</code>",
        ", ".join(model.summary().keys()),
        str(model.bootstrap(3, seed=1).shape),
    ]
)

# Logit
x = rng.normal(size=(60, 3))
eta = -0.2 + x @ np.array([0.8, -0.5, 1.2])
prob = 1.0 / (1.0 + np.exp(-eta))
y_bin = rng.binomial(1, prob, size=60).astype(np.int32)
model = cm.Logit(max_iterations=200)
model.fit(x, y_bin)
runtime_rows.append(
    [
        "<code>Logit</code>",
        ", ".join(model.summary().keys()),
        str(model.bootstrap(3, seed=1).shape),
    ]
)

# MultinomialLogit
x = rng.normal(size=(80, 2))
coef = np.array([[0.5, -0.7], [-0.3, 0.4], [0.2, 0.6]])
intercept = np.array([0.2, -0.1, 0.0])
logits = x @ coef.T + intercept
weights = np.exp(logits - logits.max(axis=1, keepdims=True))
prob = weights / weights.sum(axis=1, keepdims=True)
y_multi = np.array([rng.choice(3, p=row) for row in prob], dtype=np.int32)
model = cm.MultinomialLogit(max_iterations=200)
model.fit(x, y_multi)
runtime_rows.append(
    [
        "<code>MultinomialLogit</code>",
        ", ".join(model.summary().keys()),
        str(model.bootstrap(3, seed=1).shape),
    ]
)

# Poisson
x = rng.normal(size=(70, 2))
mu = np.exp(0.1 + x @ np.array([0.4, -0.2]))
y_pois = rng.poisson(mu).astype(float)
model = cm.Poisson(max_iterations=200)
model.fit(x, y_pois)
runtime_rows.append(
    [
        "<code>Poisson</code>",
        ", ".join(model.summary().keys()),
        str(model.bootstrap(3, seed=1).shape),
    ]
)

# TwoSLS
n = 80
z = rng.normal(size=(n, 1))
x_exog = rng.normal(size=(n, 1))
u = rng.normal(size=n)
x_endog = (0.9 * z[:, 0] + 0.2 * u + rng.normal(scale=0.1, size=n)).reshape(-1, 1)
y_iv = 0.4 + 1.3 * x_endog[:, 0] - 0.6 * x_exog[:, 0] + u + rng.normal(scale=0.1, size=n)
model = cm.TwoSLS()
model.fit(x_endog, x_exog, z, y_iv)
runtime_rows.append(
    [
        "<code>TwoSLS</code>",
        ", ".join(model.summary().keys()),
        str(model.bootstrap(3, seed=1).shape),
    ]
)

# FTRL
x = rng.normal(size=(80, 4))
eta = x @ np.array([0.7, -0.4, 0.2, 0.3])
prob = 1.0 / (1.0 + np.exp(-eta))
y_ftrl = rng.binomial(1, prob, size=80).astype(np.int32)
model = cm.FTRL()
model.fit(x, y_ftrl)
runtime_rows.append(
    [
        "<code>FTRL</code>",
        ", ".join(model.summary().keys()),
        str(model.bootstrap(3, seed=1).shape),
    ]
)

# MEstimator
def ols_objective(theta, data):
    X = data["X"]
    y = data["y"]
    indices = data.get("indices", np.arange(len(y)))
    Xb = X[indices]
    yb = y[indices]
    residual = yb - Xb @ theta
    return 0.5 * np.sum(residual**2), -(Xb.T @ residual)


def ols_scores(theta, data):
    X = data["X"]
    y = data["y"]
    residual = y - X @ theta
    return -X * residual[:, None]


x = rng.normal(size=(50, 3))
y = x @ np.array([1.0, -0.5, 0.2]) + rng.normal(scale=0.2, size=50)
model = cm.MEstimator(ols_objective, ols_scores, max_iterations=200)
model.fit({"X": x, "y": y, "n": len(y)}, np.zeros(x.shape[1]))
runtime_rows.append(
    [
        "<code>MEstimator</code>",
        ", ".join(model.summary().keys()),
        str(model.bootstrap(3, seed=1).shape),
    ]
)

display(HTML(html_table(["Estimator", "Summary Keys", "Bootstrap Shape"], runtime_rows)))
Estimator Summary Keys Bootstrap Shape
OLS intercept, coef, intercept_se, coef_se (3, 4)
FixedEffectsOLS coef, coef_se (3, 2)
ElasticNet intercept, coef, intercept_se, coef_se (3, 3)
Logit intercept, coef, intercept_se, coef_se (3, 4)
MultinomialLogit coef, se (3, 9)
Poisson intercept, coef, intercept_se, coef_se (3, 3)
TwoSLS intercept, coef, intercept_se, coef_se (3, 3)
FTRL coef, coef_se (3, 4)
MEstimator coef, se (3, 3)

7 Usage Examples

7.1 OLS

rng = np.random.default_rng(0)
x = rng.normal(size=(200, 3))
beta = np.array([1.2, -0.7, 0.5])
y = 0.4 + x @ beta + rng.normal(scale=0.2, size=200)

model = cm.OLS()
model.fit(x, y)

summary = model.summary()
bootstrap_draws = model.bootstrap(5, seed=42)

summary, bootstrap_draws[:2]
({'intercept': 0.39277223269364075,
  'coef': array([ 1.18200348, -0.71923052,  0.51060512]),
  'intercept_se': 0.014288725808334476,
  'coef_se': array([0.01372427, 0.01536895, 0.01524985])},
 array([[ 0.3748746 ,  1.18080508, -0.71074639,  0.50098661],
        [ 0.37294673,  1.15877656, -0.73534743,  0.52333748]]))

7.2 FixedEffectsOLS

This estimator follows the Frisch-Waugh-Lovell path directly: partial out the fixed effects with within, then regress the residualized outcome on the residualized regressors with no intercept.

rng = np.random.default_rng(12)
n = 400
x = rng.normal(size=(n, 2))
worker = rng.integers(0, 30, size=n, dtype=np.uint32)
firm = rng.integers(0, 15, size=n, dtype=np.uint32)
fe = np.column_stack([worker, firm]).astype(np.uint32)
beta = np.array([1.0, -0.6])
y = (
    x @ beta
    + rng.normal(scale=0.8, size=30)[worker]
    + rng.normal(scale=0.5, size=15)[firm]
    + rng.normal(scale=0.15, size=n)
)

model = cm.FixedEffectsOLS()
model.fit(x, fe, y)

model.summary()
{'coef': array([ 1.01369715, -0.60410017]),
 'coef_se': array([0.00809961, 0.00688549])}

7.3 MEstimator

This is the lowest-level public interface in the library. The callback contract matters more than the optimizer choice:

  • objective_fn(theta, data) must return (objective_value, gradient_vector).
  • score_fn(theta, data) must return an (n_obs, n_params) matrix of per-observation scores.
  • For bootstrap support, the data dictionary must include n, and user code should respect an optional indices key.
import numpy as np


def ols_objective(theta, data):
    X = data["X"]
    y = data["y"]
    indices = data.get("indices", np.arange(len(y)))
    Xb = X[indices]
    yb = y[indices]
    residual = yb - Xb @ theta
    objective = 0.5 * np.sum(residual**2)
    gradient = -(Xb.T @ residual)
    return objective, gradient


def ols_scores(theta, data):
    X = data["X"]
    y = data["y"]
    residual = y - X @ theta
    return -X * residual[:, None]


rng = np.random.default_rng(7)
X = rng.normal(size=(300, 2))
y = X @ np.array([1.5, -0.4]) + rng.normal(scale=0.1, size=300)

model = cm.MEstimator(
    objective_fn=ols_objective,
    score_fn=ols_scores,
    max_iterations=200,
)
model.fit({"X": X, "y": y, "n": len(y)}, np.zeros(X.shape[1]))

model.summary()
{'coef': array([ 1.49784531, -0.39974307]),
 'se': array([0.60847816, 0.63437929])}

8 Current Sharp Edges

  • There is still no rich result object layer. Consumers should expect raw dictionaries and NumPy arrays.
  • Most estimators do not expose fit statistics or formula handling.
  • TwoSLS should be treated as a single-endogenous implementation until the stage-one and bootstrap paths are generalized.
  • MEstimator favors flexibility over guardrails. If your objective has fragile regions, add clipping or other stability controls inside the callback.