crabbymetrics
  • Home
  • API
  • Binding Crash Course
  • Regression And GLMs
    • OLS
    • Ridge
    • Fixed Effects OLS
    • ElasticNet
    • Logit
    • Multinomial Logit
    • Poisson
    • GMM
    • FTRL
    • MEstimator Poisson
  • Causal Inference
    • Balancing Weights
    • EPLM
    • Average Derivative
    • Double ML And AIPW
    • Richer Regression
    • TwoSLS
    • Synthetic Control
    • Synthetic DID
    • Horizontal Panel Ridge
    • Matrix Completion
    • Interactive Fixed Effects
    • Staggered Panel Event Study
  • Transforms
    • PCA And Kernel Basis
  • Ablations
    • Variance Estimators
    • Semiparametric Estimator Comparisons
    • Bridging Finite And Superpopulation
    • Panel Estimator DGP Comparisons
    • Same Root Panel Case Studies
  • Optimization
    • Optimizers
    • GMM With Optimizers
  • Ding: First Course
    • Overview And TOC
    • Ch 1 Correlation And Simpson
    • Ch 2 Potential Outcomes
    • Ch 3 CRE And Fisher RT
    • Ch 4 CRE And Neyman
    • Ch 9 Bridging Finite And Superpopulation
    • Ch 11 Propensity Score
    • Ch 12 Double Robust ATE
    • Ch 13 Double Robust ATT
    • Ch 21 Experimental IV
    • Ch 23 Econometric IV
    • Ch 27 Mediation

MatrixCompletion Example

Low-rank counterfactual completion for treated panel cells

MatrixCompletion treats untreated panel cells as observed entries and treated cells as missing counterfactuals. The public panel API is the same as the other high-level panel estimators: fit(Y, W) where Y is a balanced outcome matrix and W is a same-shaped absorbing-treatment indicator.

This example uses a genuinely low-rank untreated panel, adds unit and time effects, masks treated cells through W, and asks MatrixCompletion to fill the counterfactual surface.

1 Simulate A Low-Rank Panel

import matplotlib.pyplot as plt
import numpy as np

from crabbymetrics import MatrixCompletion

np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(4402)

n_units = 52
n_periods = 30
time = np.arange(n_periods)
rank = 2

unit_effects = rng.normal(scale=0.4, size=n_units)
time_effects = 0.05 * time + 0.4 * np.sin(time / 4.0)
loadings = rng.normal(size=(n_units, rank))
factors = np.vstack(
    [
        np.sin(np.linspace(0.0, 2.5 * np.pi, n_periods)),
        np.cos(np.linspace(0.0, 1.5 * np.pi, n_periods)),
    ]
)

Y0 = unit_effects[:, None] + time_effects[None, :] + loadings @ factors
Y = Y0 + rng.normal(scale=0.08, size=Y0.shape)
W = np.zeros_like(Y)

treated_units = np.arange(8)
cohort_starts = np.r_[np.repeat(18, 4), np.repeat(23, 4)]
for unit, start in zip(treated_units, cohort_starts):
    W[unit, start:] = 1.0
    Y[unit, start:] += 0.9 + 0.04 * np.arange(n_periods - start)

true_att = float((Y - Y0)[W == 1].mean())
print("Y shape:", Y.shape)
print("treated cells:", int(W.sum()))
print("true ATT:", round(true_att, 4))
Y shape: (52, 30)
treated cells: 76
true ATT: 1.085

2 Fit Matrix Completion

model = MatrixCompletion(
    lambda_fraction=0.20,
    fit_unit_effects=True,
    fit_time_effects=True,
    max_iterations=400,
    tolerance=1e-6,
)
model.fit(Y, W)
summary = model.summary()

print("estimated ATT:", round(float(summary["att"]), 4))
print("lambda_l:", round(float(summary["lambda_l"]), 4))
print("iterations:", summary["iterations"])
print("final observed-cell RMSE:", round(float(summary["history_rmse"][-1]), 4))
estimated ATT: 1.1784
lambda_l: 0.0088
iterations: 7
final observed-cell RMSE: 0.2593
completed = np.asarray(summary["completed"])
counterfactual = np.asarray(summary["counterfactual"])
treatment_effect = np.asarray(summary["treatment_effect"])

fig, axes = plt.subplots(1, 3, figsize=(13, 4.0), constrained_layout=True)

for ax, matrix, title in [
    (axes[0], Y, "Observed Y"),
    (axes[1], counterfactual, "Counterfactual Surface"),
    (axes[2], treatment_effect, "Observed - Counterfactual"),
]:
    im = ax.imshow(matrix, aspect="auto", cmap="viridis")
    ax.set_title(title)
    ax.set_xlabel("Period")
    ax.set_ylabel("Unit")
    fig.colorbar(im, ax=ax, shrink=0.78)

plt.show()

summary() also returns the low-rank component, singular values, event-study summaries, and group means. The important contract is that the estimator owns the panel bookkeeping: users supply matrices, not hand-built donor lists or long data frames.