pyfector.fect
Public entry point for pyfector.
This module exposes fect(), the single function most users call,
and the FectResult dataclass that packages its output.
fect() orchestrates the full pipeline:
pyfector.panel.prepare_panel()converts the input DataFrame to dense(T, N)matrices with treatment timing and unit classification.pyfector.panel.initial_fit()computes the starting fitY0and initial covariate coefficients on the control subsample.- If the user passes a range for
r(IFE) or leaveslam=None(MC),pyfector.cvchooses the hyperparameter by cross-validation. pyfector.estimatorsiterates the EM loop for the chosen method (fe,ife,mc, orcfe).- Treatment effects are computed from
eff = Y - Y_ct; the overall ATT and dynamic ATT by relative event time are derived in_compute_effects()vianumpy.bincount(). Raw missing outcomes remain excluded from effect averages; model imputations are counterfactual predictions, not substitutes for unobserved treated outcomes. - When
se=Truethe requested inference routine (bootstraporjackknife) frompyfector.inferenceis called with a closure that re-runs the EM loop on resampled units.
Example
::
import pyfector
result = pyfector.fect(
data=df,
Y="outcome", D="treat",
index=("unit", "year"),
X=["gdp", "pop"],
method="ife",
r=(0, 5), # CV over 0..5 factors
se=True,
nboots=200,
device="cpu", # or "gpu"
n_jobs=4,
seed=42,
)
result.summary()
result.plot(kind="gap")
1""" 2Public entry point for pyfector. 3 4This module exposes :func:`fect`, the single function most users call, 5and the :class:`FectResult` dataclass that packages its output. 6 7:func:`fect` orchestrates the full pipeline: 8 91. :func:`pyfector.panel.prepare_panel` converts the input DataFrame to 10 dense ``(T, N)`` matrices with treatment timing and unit 11 classification. 122. :func:`pyfector.panel.initial_fit` computes the starting fit ``Y0`` 13 and initial covariate coefficients on the control subsample. 143. If the user passes a range for ``r`` (IFE) or leaves ``lam=None`` 15 (MC), :mod:`pyfector.cv` chooses the hyperparameter by 16 cross-validation. 174. :mod:`pyfector.estimators` iterates the EM loop for the chosen 18 method (``fe``, ``ife``, ``mc``, or ``cfe``). 195. Treatment effects are computed from ``eff = Y - Y_ct``; the overall 20 ATT and dynamic ATT by relative event time are derived in 21 :func:`_compute_effects` via :func:`numpy.bincount`. Raw missing 22 outcomes remain excluded from effect averages; model imputations are 23 counterfactual predictions, not substitutes for unobserved treated 24 outcomes. 256. When ``se=True`` the requested inference routine 26 (``bootstrap`` or ``jackknife``) from :mod:`pyfector.inference` is 27 called with a closure that re-runs the EM loop on resampled units. 28 29Example 30------- 31:: 32 33 import pyfector 34 35 result = pyfector.fect( 36 data=df, 37 Y="outcome", D="treat", 38 index=("unit", "year"), 39 X=["gdp", "pop"], 40 method="ife", 41 r=(0, 5), # CV over 0..5 factors 42 se=True, 43 nboots=200, 44 device="cpu", # or "gpu" 45 n_jobs=4, 46 seed=42, 47 ) 48 result.summary() 49 result.plot(kind="gap") 50""" 51 52from __future__ import annotations 53 54from dataclasses import dataclass, field 55from typing import Literal 56import os 57 58import numpy as np 59 60from .backend import set_device, get_backend, to_numpy, to_device, make_rng 61from .panel import PanelData, prepare_panel, initial_fit 62from .estimators import estimate_ife, estimate_mc, estimate_cfe, EstimationResult 63from .cv import cv_ife, cv_mc, CVResult 64from .inference import bootstrap, jackknife, InferenceResult 65 66 67@dataclass 68class FectResult: 69 """Container for all fect estimation results.""" 70 # Method info 71 method: str 72 r_cv: int | None = None 73 lambda_cv: float | None = None 74 75 # Point estimates 76 att_avg: float = 0.0 77 att_avg_unit: float = 0.0 78 79 # Dynamic effects 80 att_on: np.ndarray | None = None 81 time_on: np.ndarray | None = None 82 count_on: np.ndarray | None = None 83 84 # Exit effects (treatment reversal) 85 att_off: np.ndarray | None = None 86 time_off: np.ndarray | None = None 87 88 # Coefficients 89 beta: np.ndarray | None = None 90 covariate_names: list[str] = field(default_factory=list) 91 92 # Fixed effects 93 mu: float = 0.0 94 alpha: np.ndarray | None = None # unit FE 95 xi: np.ndarray | None = None # time FE 96 factors: np.ndarray | None = None 97 loadings: np.ndarray | None = None 98 99 # Counterfactual and effects matrices 100 Y_ct: np.ndarray | None = None # T×N counterfactual 101 eff: np.ndarray | None = None # T×N treatment effects 102 residuals: np.ndarray | None = None 103 104 # Model fit 105 sigma2: float = 0.0 106 IC: float = 0.0 107 PC: float = 0.0 108 rmse: float = 0.0 109 niter: int = 0 110 converged: bool = False 111 112 # Inference 113 inference: InferenceResult | None = None 114 115 # CV 116 cv_result: CVResult | None = None 117 118 # Panel metadata 119 panel: PanelData | None = None 120 121 # Reproducibility 122 seed: int | None = None 123 124 def summary(self) -> str: 125 """Print summary table of results.""" 126 lines = [] 127 lines.append(f"pyfector estimation results") 128 lines.append(f"{'='*60}") 129 lines.append(f"Method: {self.method}") 130 if self.r_cv is not None: 131 lines.append(f"Number of factors (CV): {self.r_cv}") 132 if self.lambda_cv is not None: 133 lines.append(f"Lambda (CV): {self.lambda_cv:.6f}") 134 lines.append(f"Converged: {self.converged} (iter={self.niter})") 135 lines.append(f"Sigma^2: {self.sigma2:.6f}") 136 lines.append(f"") 137 lines.append(f"ATT (average): {self.att_avg:.6f}") 138 if self.inference is not None: 139 inf = self.inference 140 lines.append(f" SE: {inf.att_avg_se:.6f}") 141 lines.append(f" CI: [{inf.att_avg_ci[0]:.6f}, {inf.att_avg_ci[1]:.6f}]") 142 lines.append(f" p-val: {inf.att_avg_pval:.4f}") 143 144 if self.beta is not None and len(self.beta) > 0: 145 lines.append(f"") 146 lines.append(f"Coefficients:") 147 for i, name in enumerate(self.covariate_names): 148 lines.append(f" {name}: {self.beta[i]:.6f}") 149 150 if self.att_on is not None and self.time_on is not None: 151 lines.append(f"") 152 lines.append(f"Dynamic effects (ATT by relative time):") 153 lines.append(f" {'Time':>6s} {'ATT':>10s} {'Count':>6s}", ) 154 for i, t in enumerate(self.time_on): 155 count = self.count_on[i] if self.count_on is not None else "" 156 att = self.att_on[i] 157 if self.inference is not None: 158 se = self.inference.att_on_se[i] 159 lines.append(f" {t:>6.0f} {att:>10.4f} ({se:.4f}) {count}") 160 else: 161 lines.append(f" {t:>6.0f} {att:>10.4f} {count}") 162 163 lines.append(f"{'='*60}") 164 if self.panel is not None: 165 lines.append(f"N={self.panel.N}, T={self.panel.T}") 166 if self.seed is not None: 167 lines.append(f"Seed: {self.seed}") 168 return "\n".join(lines) 169 170 def __repr__(self): 171 return self.summary() 172 173 def plot(self, kind="gap", **kwargs): 174 """Plot results. Shortcut for ``pyfector.plot(self, kind, ...)``.""" 175 from .plotting import plot as _plot 176 return _plot(self, kind=kind, **kwargs) 177 178 def diagnose(self, **kwargs): 179 """Run diagnostic tests. Shortcut for ``pyfector.run_diagnostics(self, ...)``.""" 180 from .diagnostics import run_diagnostics 181 return run_diagnostics(self, **kwargs) 182 183 184def fect( 185 data, 186 Y: str, 187 D: str, 188 index: tuple[str, str], 189 X: list[str] | None = None, 190 W: str | None = None, 191 group: str | None = None, 192 method: Literal["fe", "ife", "mc", "cfe", "both"] = "ife", 193 force: Literal["none", "unit", "time", "two-way"] = "two-way", 194 r: int | tuple[int, int] = 0, 195 lam: float | None = None, 196 nlambda: int = 10, 197 CV: bool = True, 198 k: int = 10, 199 cv_prop: float = 0.1, 200 cv_nobs: int = 3, 201 cv_treat: bool = True, 202 cv_donut: int = 0, 203 criterion: str = "mspe", 204 cv_rule: Literal["min", "onepct"] = "min", 205 se: bool = False, 206 vartype: Literal["bootstrap", "jackknife"] = "bootstrap", 207 nboots: int = 200, 208 alpha: float = 0.05, 209 tol: float = 1e-7, 210 max_iter: int = 5000, 211 min_T0: int = 1, 212 min_T0_strict: bool = False, 213 max_missing: float = 1.0, 214 normalize: bool = False, 215 # CFE-specific 216 Z: list[str] | None = None, 217 Q: list[str] | None = None, 218 # Performance 219 device: Literal["cpu", "gpu"] = "cpu", 220 n_jobs: int | None = -1, 221 seed: int | None = None, 222) -> FectResult: 223 """Estimate counterfactual treatment effects for panel data. 224 225 This is the main Python entry point for the counterfactual estimator 226 workflow. Where the paper and the historical R package differ, 227 pyfector defaults to the paper's statistical definition and exposes 228 R-package-style behavior through explicit options. 229 230 Missing outcome policy 231 ---------------------- 232 pyfector distinguishes raw missing outcomes from counterfactual 233 missingness caused by treatment. Observed untreated cells 234 (``D == 0`` and non-missing ``Y``) fit the response surface. Observed 235 treated cells (``D == 1`` and non-missing ``Y``) contribute to ATT as 236 ``Y - Y_ct``. If a treated outcome is missing in the input data, the 237 model can still produce a counterfactual ``Y_ct`` for that cell, but 238 the cell is not counted in ``att_avg`` or ``att_on`` because the 239 treated potential outcome was not observed. 240 241 By default, ``min_T0`` is enforced only for treated and reversal 242 units. Sparse controls are retained if they have at least one 243 observed outcome, because they may still inform the low-rank response 244 surface. Set ``min_T0_strict=True`` to require controls to satisfy 245 ``min_T0`` too, matching the more conservative R fect sparse-panel 246 behavior. 247 248 Parameters 249 ---------- 250 data : polars.DataFrame, pandas.DataFrame 251 Long-format panel data. 252 Y, D : str 253 Column names for outcome and binary treatment indicator. 254 index : (str, str) 255 Column names for (unit_id, time_period). 256 X : list of str, optional 257 Time-varying covariates. 258 W : str, optional 259 Observation weight column. 260 group : str, optional 261 Reserved for grouped estimation. Currently raises 262 ``NotImplementedError`` when supplied. 263 method : {"fe", "ife", "mc", "cfe", "both"} 264 Estimation method. 265 force : {"none", "unit", "time", "two-way"} 266 Fixed effects specification. 267 r : int or (int, int) 268 Number of factors. If tuple, CV selects from range. 269 lam : float, optional 270 Nuclear norm penalty for MC. If None with CV=True, auto-selected. 271 nlambda : int 272 Number of automatically generated lambda candidates for MC CV. 273 CV : bool 274 If True, cross-validate over ``r`` for IFE when ``r`` is a tuple, 275 or over ``lam`` for MC when ``lam`` is None. 276 k : int 277 Number of CV folds. 278 cv_prop : float 279 Fraction of eligible observed control cells masked per CV fold. 280 cv_nobs : int 281 Number of consecutive within-unit observations to mask as a block. 282 cv_treat : bool 283 If True, restrict CV masks to pre-treatment cells of ever-treated 284 units. If False, use all observed control cells. 285 cv_donut : int 286 Exclude this many periods around treatment onset from CV evaluation. 287 criterion : {"mspe", "gmspe", "mad"} 288 Cross-validation loss. 289 cv_rule : {"min", "onepct"} 290 CV selection rule. ``"min"`` chooses the strict minimum-score 291 candidate and is the paper-faithful default. ``"onepct"`` chooses 292 the simplest candidate within 1% of the best score (lower ``r`` for 293 IFE, higher ``lam`` for MC). 294 se : bool 295 Compute standard errors via bootstrap/jackknife. 296 vartype : {"bootstrap", "jackknife"} 297 Inference method when ``se=True``. 298 nboots : int 299 Number of bootstrap replications. Ignored for jackknife. 300 alpha : float 301 Significance level for confidence intervals and tests. 302 tol : float 303 EM convergence tolerance for final point estimation. 304 max_iter : int 305 Maximum EM iterations. 306 min_T0 : int 307 Minimum untreated/pre-treatment observed periods. By default this is 308 enforced only for treated and treatment-reversal units. 309 min_T0_strict : bool 310 If True, enforce ``min_T0`` on all units, including controls. This 311 matches R fect's conservative handling of sparse control rows. 312 max_missing : float 313 Maximum missing-outcome fraction per unit, in ``[0, 1]``. Units with 314 no observed outcomes are always dropped, regardless of this threshold, 315 because they provide neither fitting information nor observed treated 316 effects. 317 normalize : bool 318 If True, estimate on an outcome standardized by its observed standard 319 deviation, then transform effects back to the original scale. 320 Z, Q : list of str, optional 321 Reserved CFE interaction arguments. Currently raise 322 ``NotImplementedError`` when supplied. 323 device : {"cpu", "gpu"} 324 Compute device. 325 n_jobs : int, optional 326 Parallel workers for CV and bootstrap. ``-1`` or ``None`` uses 327 all available CPUs. 328 seed : int, optional 329 Random seed for full reproducibility. 330 """ 331 # Set device 332 set_device(device) 333 xp = get_backend() 334 n_jobs = _resolve_n_jobs(n_jobs) 335 if device == "gpu": 336 n_jobs = 1 337 338 if group is not None: 339 raise NotImplementedError("The `group` argument is not implemented yet.") 340 if Z is not None or Q is not None: 341 raise NotImplementedError("The `Z` and `Q` CFE interaction arguments are not implemented yet.") 342 if criterion not in {"mspe", "gmspe", "mad"}: 343 raise ValueError("criterion must be 'mspe', 'gmspe', or 'mad'") 344 if cv_rule not in {"min", "onepct"}: 345 raise ValueError("cv_rule must be 'min' or 'onepct'") 346 if min_T0 < 0: 347 raise ValueError("min_T0 must be non-negative") 348 if not 0.0 <= max_missing <= 1.0: 349 raise ValueError("max_missing must be between 0 and 1") 350 351 # Map force string to int 352 force_map = {"none": 0, "unit": 1, "time": 2, "two-way": 3} 353 force_int = force_map[force] 354 355 # Prepare panel data 356 panel = prepare_panel( 357 data, Y=Y, D=D, index=index, X=X, W=W, 358 group=group, min_T0=min_T0, min_T0_strict=min_T0_strict, 359 max_missing=max_missing, 360 ) 361 362 # Move to device 363 Y_mat = to_device(panel.Y) 364 D_mat = to_device(panel.D) 365 I_mat = to_device(panel.I) 366 II_mat = to_device(panel.II) 367 X_mat = to_device(panel.X) if panel.X is not None else None 368 W_mat = to_device(panel.W) if panel.W is not None else None 369 370 # Normalize 371 norm_factor = 1.0 372 if normalize: 373 sd_y = float(xp.std(Y_mat[I_mat > 0])) 374 if sd_y > 0: 375 Y_mat = Y_mat / sd_y 376 norm_factor = sd_y 377 378 # Initial fit 379 Y0, beta0 = initial_fit(Y_mat, X_mat, II_mat, force_int) 380 381 # Determine r and lambda 382 r_cv = None 383 lambda_cv = None 384 cv_result = None 385 386 if method == "ife": 387 if isinstance(r, tuple) and CV: 388 cv_result = cv_ife( 389 Y_mat, Y0, X_mat, I_mat, II_mat, D_mat, W_mat, beta0, 390 force=force_int, r_range=r, k=k, cv_prop=cv_prop, 391 cv_nobs=cv_nobs, cv_treat=cv_treat, cv_donut=cv_donut, 392 criterion=criterion, cv_rule=cv_rule, 393 tol=tol, max_iter=max_iter, 394 n_jobs=n_jobs, seed=seed, 395 ) 396 r_cv = cv_result.best_r 397 else: 398 r_cv = r if isinstance(r, int) else r[0] 399 400 elif method == "mc": 401 if lam is None and CV: 402 cv_result = cv_mc( 403 Y_mat, Y0, X_mat, I_mat, II_mat, D_mat, W_mat, beta0, 404 force=force_int, nlambda=nlambda, k=k, cv_prop=cv_prop, 405 cv_nobs=cv_nobs, cv_treat=cv_treat, cv_donut=cv_donut, 406 criterion=criterion, cv_rule=cv_rule, 407 tol=tol, max_iter=max_iter, 408 n_jobs=n_jobs, seed=seed, 409 ) 410 lambda_cv = cv_result.best_lambda 411 else: 412 lambda_cv = lam if lam is not None else 0.0 413 414 elif method == "fe": 415 r_cv = 0 416 417 elif method == "cfe": 418 r_cv = r if isinstance(r, int) else r[0] 419 420 # Point estimation 421 if method in ("fe", "ife"): 422 est = estimate_ife( 423 Y_mat, Y0, X_mat, II_mat, W_mat, beta0, 424 r=r_cv, force=force_int, tol=tol, max_iter=max_iter, 425 ) 426 elif method == "mc": 427 est = estimate_mc( 428 Y_mat, Y0, X_mat, II_mat, W_mat, beta0, 429 lam=lambda_cv, force=force_int, tol=tol, max_iter=max_iter, 430 ) 431 elif method == "cfe": 432 est = estimate_cfe( 433 Y_mat, Y0, X_mat, II_mat, W_mat, beta0, 434 r=r_cv, force=force_int, tol=tol, max_iter=max_iter, 435 ) 436 elif method == "both": 437 # Run both IFE and MC, return IFE results with MC comparison 438 if isinstance(r, tuple) and CV: 439 cv_result = cv_ife( 440 Y_mat, Y0, X_mat, I_mat, II_mat, D_mat, W_mat, beta0, 441 force=force_int, r_range=r, k=k, cv_prop=cv_prop, 442 cv_nobs=cv_nobs, cv_treat=cv_treat, cv_donut=cv_donut, 443 criterion=criterion, cv_rule=cv_rule, 444 tol=tol, max_iter=max_iter, 445 n_jobs=n_jobs, seed=seed, 446 ) 447 r_cv = cv_result.best_r 448 else: 449 r_cv = r if isinstance(r, int) else r[0] 450 est = estimate_ife( 451 Y_mat, Y0, X_mat, II_mat, W_mat, beta0, 452 r=r_cv, force=force_int, tol=tol, max_iter=max_iter, 453 ) 454 else: 455 raise ValueError(f"Unknown method: {method}") 456 457 # Compute effects 458 eff = Y_mat - est.fit 459 Y_ct = est.fit 460 461 # Denormalize 462 if normalize and norm_factor != 1.0: 463 eff = eff * norm_factor 464 Y_ct = Y_ct * norm_factor 465 Y_mat = Y_mat * norm_factor 466 if est.beta is not None: 467 est = est._replace(beta=est.beta * norm_factor) 468 469 # ATT computation 470 T_on = to_device(panel.T_on) 471 att_avg, att_on, time_on, count_on, att_avg_unit = _compute_effects( 472 to_numpy(eff), to_numpy(D_mat), to_numpy(panel.T_on), to_numpy(I_mat), 473 ) 474 475 # Build result 476 result = FectResult( 477 method=method, 478 r_cv=r_cv, 479 lambda_cv=lambda_cv, 480 att_avg=att_avg, 481 att_avg_unit=att_avg_unit, 482 att_on=att_on, 483 time_on=time_on, 484 count_on=count_on, 485 beta=to_numpy(est.beta) if est.beta is not None else None, 486 covariate_names=panel.covariate_names, 487 mu=est.mu, 488 alpha=to_numpy(est.alpha) if est.alpha is not None else None, 489 xi=to_numpy(est.xi) if est.xi is not None else None, 490 factors=to_numpy(est.factors) if est.factors is not None else None, 491 loadings=to_numpy(est.loadings) if est.loadings is not None else None, 492 Y_ct=to_numpy(Y_ct), 493 eff=to_numpy(eff), 494 residuals=to_numpy(est.residuals), 495 sigma2=est.sigma2, 496 IC=est.IC, 497 PC=est.PC, 498 niter=est.niter, 499 converged=est.converged, 500 cv_result=cv_result, 501 panel=panel, 502 seed=seed, 503 ) 504 505 # Inference 506 if se: 507 result.inference = _run_inference( 508 result, panel, Y_mat, X_mat, W_mat, beta0, Y0, 509 method=method, r_cv=r_cv, lambda_cv=lambda_cv, 510 force_int=force_int, tol=tol, max_iter=max_iter, 511 vartype=vartype, nboots=nboots, alpha=alpha, 512 n_jobs=n_jobs, seed=seed, normalize=normalize, 513 norm_factor=norm_factor, 514 ) 515 516 return result 517 518 519def _compute_effects(eff, D, T_on, I): 520 """Compute overall ATT, per-unit ATT, and dynamic ATT by event time. 521 522 ATT averages are defined only over observed outcome cells. A missing 523 raw treated outcome has no observed ``Y(1)``, so it is excluded even 524 though the estimator may have produced a counterfactual ``Y_ct`` for 525 that matrix position. 526 527 Dynamic ATT grouping is done with :func:`numpy.bincount` so the cost 528 is O(n_observed_ever_treated_cells) regardless of the number of 529 distinct relative-time values. Per-unit ATT uses a column-wise 530 masked mean via ``np.add.reduce`` rather than a Python loop. 531 """ 532 treated = (D > 0) & (I > 0) 533 n_treated = int(np.sum(treated)) 534 535 att_avg = float(np.sum(eff[treated]) / max(n_treated, 1)) 536 537 # Per-unit ATT (post-treatment only) — sum then divide by count, 538 # keeping only units that have at least one treated observation. 539 col_counts = treated.sum(axis=0) # (N,) 540 col_sums = (eff * treated).sum(axis=0) # (N,) 541 has_any = col_counts > 0 542 if np.any(has_any): 543 unit_atts = col_sums[has_any] / col_counts[has_any] 544 att_avg_unit = float(unit_atts.mean()) 545 else: 546 att_avg_unit = 0.0 547 548 # Dynamic ATT by relative event time. Include pre-treatment periods 549 # for ever-treated units (counterfactual gaps before onset). 550 ever_treated = np.any(D > 0, axis=0) # (N,) 551 all_periods = (I > 0) & ever_treated[np.newaxis, :] # (T, N) 552 553 T_on_flat = T_on[all_periods] 554 eff_flat = eff[all_periods] 555 valid = ~np.isnan(T_on_flat) 556 T_on_flat = T_on_flat[valid].astype(np.int64) 557 eff_flat = eff_flat[valid] 558 559 if T_on_flat.size == 0: 560 return att_avg, np.array([]), np.array([]), np.array([], dtype=np.int64), att_avg_unit 561 562 # Bincount over the shifted integer indices. 563 offset = int(T_on_flat.min()) 564 idx = T_on_flat - offset 565 minlength = int(T_on_flat.max() - offset + 1) 566 sums = np.bincount(idx, weights=eff_flat, minlength=minlength) 567 counts = np.bincount(idx, minlength=minlength) 568 keep = counts > 0 569 time_on = (offset + np.arange(minlength))[keep].astype(np.float64) 570 att_on = (sums[keep] / counts[keep]).astype(np.float64) 571 count_on = counts[keep].astype(np.int64) 572 573 return att_avg, att_on, time_on, count_on, att_avg_unit 574 575 576def _run_inference( 577 result, panel, Y_mat, X_mat, W_mat, beta0, Y0, 578 method, r_cv, lambda_cv, force_int, tol, max_iter, 579 vartype, nboots, alpha, n_jobs, seed, normalize, norm_factor, 580): 581 """Run bootstrap or jackknife inference. 582 583 Bootstrap replicates use a relaxed convergence tolerance 584 (``max(tol, 1e-3)``) because bootstrap SEs converge to 3–4 585 significant digits regardless of inner precision. The full-sample 586 fit is used as the warm-start initialiser for each replicate, 587 which cuts EM iterations by ~30-60 %. 588 """ 589 xp = get_backend() 590 591 # Relaxed tolerance: bootstrap SEs don't benefit from tight inner tol. 592 boot_tol = max(tol, 1e-3) 593 594 # Pre-move panel data to device once (avoids CPU→GPU copy per bootstrap rep) 595 II_dev = to_device(panel.II) 596 D_dev = to_device(panel.D) 597 I_dev = to_device(panel.I) 598 T_on_np = panel.T_on # keep on CPU for ATT computation 599 600 # Warm-start: use full-sample fitted values as initial imputation 601 # for each replicate instead of recomputing initial_fit from scratch. 602 Y0_full = to_device(result.Y_ct) 603 604 def _estimate_fn(unit_idx): 605 """Re-estimate on a subset of units.""" 606 Y_sub = Y_mat[:, unit_idx] 607 II_sub = II_dev[:, unit_idx] 608 D_sub = D_dev[:, unit_idx] 609 I_sub = I_dev[:, unit_idx] 610 X_sub = X_mat[:, unit_idx, :] if X_mat is not None else None 611 W_sub = W_mat[:, unit_idx] if W_mat is not None else None 612 T_on_sub = T_on_np[:, unit_idx] 613 beta0_sub = beta0 614 615 # Warm-start from full-sample fit (columns for resampled units). 616 # This is much closer to the replicate's solution than a cold 617 # initial_fit, cutting EM iterations significantly. 618 Y0_sub = Y0_full[:, unit_idx].copy() 619 620 if method in ("fe", "ife"): 621 est = estimate_ife( 622 Y_sub, Y0_sub, X_sub, II_sub, W_sub, beta0_sub, 623 r=r_cv, force=force_int, tol=boot_tol, max_iter=max_iter, 624 ) 625 elif method == "mc": 626 est = estimate_mc( 627 Y_sub, Y0_sub, X_sub, II_sub, W_sub, beta0_sub, 628 lam=lambda_cv, force=force_int, tol=boot_tol, max_iter=max_iter, 629 ) 630 else: 631 est = estimate_ife( 632 Y_sub, Y0_sub, X_sub, II_sub, W_sub, beta0_sub, 633 r=r_cv or 0, force=force_int, tol=boot_tol, max_iter=max_iter, 634 ) 635 636 eff = to_numpy(Y_sub - est.fit) 637 if normalize and norm_factor != 1.0: 638 eff = eff * norm_factor 639 640 return eff, to_numpy(D_sub), T_on_sub, to_numpy(I_sub) 641 642 if vartype == "bootstrap": 643 return bootstrap( 644 _estimate_fn, to_numpy(Y_mat), to_numpy(panel.D), 645 to_numpy(panel.I), panel.T_on, panel.unit_type, 646 nboots=nboots, alpha=alpha, n_jobs=n_jobs, seed=seed, 647 ) 648 else: 649 return jackknife( 650 _estimate_fn, to_numpy(Y_mat), to_numpy(panel.D), 651 to_numpy(panel.I), panel.T_on, panel.unit_type, 652 alpha=alpha, n_jobs=n_jobs, 653 ) 654 655 656def _resolve_n_jobs(n_jobs: int | None) -> int: 657 """Normalize public n_jobs values before passing them to joblib.""" 658 if n_jobs is None or n_jobs == -1: 659 return os.cpu_count() or 1 660 if n_jobs == 0 or n_jobs < -1: 661 raise ValueError("n_jobs must be a positive integer, -1, or None") 662 return int(n_jobs)
68@dataclass 69class FectResult: 70 """Container for all fect estimation results.""" 71 # Method info 72 method: str 73 r_cv: int | None = None 74 lambda_cv: float | None = None 75 76 # Point estimates 77 att_avg: float = 0.0 78 att_avg_unit: float = 0.0 79 80 # Dynamic effects 81 att_on: np.ndarray | None = None 82 time_on: np.ndarray | None = None 83 count_on: np.ndarray | None = None 84 85 # Exit effects (treatment reversal) 86 att_off: np.ndarray | None = None 87 time_off: np.ndarray | None = None 88 89 # Coefficients 90 beta: np.ndarray | None = None 91 covariate_names: list[str] = field(default_factory=list) 92 93 # Fixed effects 94 mu: float = 0.0 95 alpha: np.ndarray | None = None # unit FE 96 xi: np.ndarray | None = None # time FE 97 factors: np.ndarray | None = None 98 loadings: np.ndarray | None = None 99 100 # Counterfactual and effects matrices 101 Y_ct: np.ndarray | None = None # T×N counterfactual 102 eff: np.ndarray | None = None # T×N treatment effects 103 residuals: np.ndarray | None = None 104 105 # Model fit 106 sigma2: float = 0.0 107 IC: float = 0.0 108 PC: float = 0.0 109 rmse: float = 0.0 110 niter: int = 0 111 converged: bool = False 112 113 # Inference 114 inference: InferenceResult | None = None 115 116 # CV 117 cv_result: CVResult | None = None 118 119 # Panel metadata 120 panel: PanelData | None = None 121 122 # Reproducibility 123 seed: int | None = None 124 125 def summary(self) -> str: 126 """Print summary table of results.""" 127 lines = [] 128 lines.append(f"pyfector estimation results") 129 lines.append(f"{'='*60}") 130 lines.append(f"Method: {self.method}") 131 if self.r_cv is not None: 132 lines.append(f"Number of factors (CV): {self.r_cv}") 133 if self.lambda_cv is not None: 134 lines.append(f"Lambda (CV): {self.lambda_cv:.6f}") 135 lines.append(f"Converged: {self.converged} (iter={self.niter})") 136 lines.append(f"Sigma^2: {self.sigma2:.6f}") 137 lines.append(f"") 138 lines.append(f"ATT (average): {self.att_avg:.6f}") 139 if self.inference is not None: 140 inf = self.inference 141 lines.append(f" SE: {inf.att_avg_se:.6f}") 142 lines.append(f" CI: [{inf.att_avg_ci[0]:.6f}, {inf.att_avg_ci[1]:.6f}]") 143 lines.append(f" p-val: {inf.att_avg_pval:.4f}") 144 145 if self.beta is not None and len(self.beta) > 0: 146 lines.append(f"") 147 lines.append(f"Coefficients:") 148 for i, name in enumerate(self.covariate_names): 149 lines.append(f" {name}: {self.beta[i]:.6f}") 150 151 if self.att_on is not None and self.time_on is not None: 152 lines.append(f"") 153 lines.append(f"Dynamic effects (ATT by relative time):") 154 lines.append(f" {'Time':>6s} {'ATT':>10s} {'Count':>6s}", ) 155 for i, t in enumerate(self.time_on): 156 count = self.count_on[i] if self.count_on is not None else "" 157 att = self.att_on[i] 158 if self.inference is not None: 159 se = self.inference.att_on_se[i] 160 lines.append(f" {t:>6.0f} {att:>10.4f} ({se:.4f}) {count}") 161 else: 162 lines.append(f" {t:>6.0f} {att:>10.4f} {count}") 163 164 lines.append(f"{'='*60}") 165 if self.panel is not None: 166 lines.append(f"N={self.panel.N}, T={self.panel.T}") 167 if self.seed is not None: 168 lines.append(f"Seed: {self.seed}") 169 return "\n".join(lines) 170 171 def __repr__(self): 172 return self.summary() 173 174 def plot(self, kind="gap", **kwargs): 175 """Plot results. Shortcut for ``pyfector.plot(self, kind, ...)``.""" 176 from .plotting import plot as _plot 177 return _plot(self, kind=kind, **kwargs) 178 179 def diagnose(self, **kwargs): 180 """Run diagnostic tests. Shortcut for ``pyfector.run_diagnostics(self, ...)``.""" 181 from .diagnostics import run_diagnostics 182 return run_diagnostics(self, **kwargs)
Container for all fect estimation results.
125 def summary(self) -> str: 126 """Print summary table of results.""" 127 lines = [] 128 lines.append(f"pyfector estimation results") 129 lines.append(f"{'='*60}") 130 lines.append(f"Method: {self.method}") 131 if self.r_cv is not None: 132 lines.append(f"Number of factors (CV): {self.r_cv}") 133 if self.lambda_cv is not None: 134 lines.append(f"Lambda (CV): {self.lambda_cv:.6f}") 135 lines.append(f"Converged: {self.converged} (iter={self.niter})") 136 lines.append(f"Sigma^2: {self.sigma2:.6f}") 137 lines.append(f"") 138 lines.append(f"ATT (average): {self.att_avg:.6f}") 139 if self.inference is not None: 140 inf = self.inference 141 lines.append(f" SE: {inf.att_avg_se:.6f}") 142 lines.append(f" CI: [{inf.att_avg_ci[0]:.6f}, {inf.att_avg_ci[1]:.6f}]") 143 lines.append(f" p-val: {inf.att_avg_pval:.4f}") 144 145 if self.beta is not None and len(self.beta) > 0: 146 lines.append(f"") 147 lines.append(f"Coefficients:") 148 for i, name in enumerate(self.covariate_names): 149 lines.append(f" {name}: {self.beta[i]:.6f}") 150 151 if self.att_on is not None and self.time_on is not None: 152 lines.append(f"") 153 lines.append(f"Dynamic effects (ATT by relative time):") 154 lines.append(f" {'Time':>6s} {'ATT':>10s} {'Count':>6s}", ) 155 for i, t in enumerate(self.time_on): 156 count = self.count_on[i] if self.count_on is not None else "" 157 att = self.att_on[i] 158 if self.inference is not None: 159 se = self.inference.att_on_se[i] 160 lines.append(f" {t:>6.0f} {att:>10.4f} ({se:.4f}) {count}") 161 else: 162 lines.append(f" {t:>6.0f} {att:>10.4f} {count}") 163 164 lines.append(f"{'='*60}") 165 if self.panel is not None: 166 lines.append(f"N={self.panel.N}, T={self.panel.T}") 167 if self.seed is not None: 168 lines.append(f"Seed: {self.seed}") 169 return "\n".join(lines)
Print summary table of results.
174 def plot(self, kind="gap", **kwargs): 175 """Plot results. Shortcut for ``pyfector.plot(self, kind, ...)``.""" 176 from .plotting import plot as _plot 177 return _plot(self, kind=kind, **kwargs)
Plot results. Shortcut for pyfector.plot(self, kind, ...).
179 def diagnose(self, **kwargs): 180 """Run diagnostic tests. Shortcut for ``pyfector.run_diagnostics(self, ...)``.""" 181 from .diagnostics import run_diagnostics 182 return run_diagnostics(self, **kwargs)
Run diagnostic tests. Shortcut for pyfector.run_diagnostics(self, ...).
185def fect( 186 data, 187 Y: str, 188 D: str, 189 index: tuple[str, str], 190 X: list[str] | None = None, 191 W: str | None = None, 192 group: str | None = None, 193 method: Literal["fe", "ife", "mc", "cfe", "both"] = "ife", 194 force: Literal["none", "unit", "time", "two-way"] = "two-way", 195 r: int | tuple[int, int] = 0, 196 lam: float | None = None, 197 nlambda: int = 10, 198 CV: bool = True, 199 k: int = 10, 200 cv_prop: float = 0.1, 201 cv_nobs: int = 3, 202 cv_treat: bool = True, 203 cv_donut: int = 0, 204 criterion: str = "mspe", 205 cv_rule: Literal["min", "onepct"] = "min", 206 se: bool = False, 207 vartype: Literal["bootstrap", "jackknife"] = "bootstrap", 208 nboots: int = 200, 209 alpha: float = 0.05, 210 tol: float = 1e-7, 211 max_iter: int = 5000, 212 min_T0: int = 1, 213 min_T0_strict: bool = False, 214 max_missing: float = 1.0, 215 normalize: bool = False, 216 # CFE-specific 217 Z: list[str] | None = None, 218 Q: list[str] | None = None, 219 # Performance 220 device: Literal["cpu", "gpu"] = "cpu", 221 n_jobs: int | None = -1, 222 seed: int | None = None, 223) -> FectResult: 224 """Estimate counterfactual treatment effects for panel data. 225 226 This is the main Python entry point for the counterfactual estimator 227 workflow. Where the paper and the historical R package differ, 228 pyfector defaults to the paper's statistical definition and exposes 229 R-package-style behavior through explicit options. 230 231 Missing outcome policy 232 ---------------------- 233 pyfector distinguishes raw missing outcomes from counterfactual 234 missingness caused by treatment. Observed untreated cells 235 (``D == 0`` and non-missing ``Y``) fit the response surface. Observed 236 treated cells (``D == 1`` and non-missing ``Y``) contribute to ATT as 237 ``Y - Y_ct``. If a treated outcome is missing in the input data, the 238 model can still produce a counterfactual ``Y_ct`` for that cell, but 239 the cell is not counted in ``att_avg`` or ``att_on`` because the 240 treated potential outcome was not observed. 241 242 By default, ``min_T0`` is enforced only for treated and reversal 243 units. Sparse controls are retained if they have at least one 244 observed outcome, because they may still inform the low-rank response 245 surface. Set ``min_T0_strict=True`` to require controls to satisfy 246 ``min_T0`` too, matching the more conservative R fect sparse-panel 247 behavior. 248 249 Parameters 250 ---------- 251 data : polars.DataFrame, pandas.DataFrame 252 Long-format panel data. 253 Y, D : str 254 Column names for outcome and binary treatment indicator. 255 index : (str, str) 256 Column names for (unit_id, time_period). 257 X : list of str, optional 258 Time-varying covariates. 259 W : str, optional 260 Observation weight column. 261 group : str, optional 262 Reserved for grouped estimation. Currently raises 263 ``NotImplementedError`` when supplied. 264 method : {"fe", "ife", "mc", "cfe", "both"} 265 Estimation method. 266 force : {"none", "unit", "time", "two-way"} 267 Fixed effects specification. 268 r : int or (int, int) 269 Number of factors. If tuple, CV selects from range. 270 lam : float, optional 271 Nuclear norm penalty for MC. If None with CV=True, auto-selected. 272 nlambda : int 273 Number of automatically generated lambda candidates for MC CV. 274 CV : bool 275 If True, cross-validate over ``r`` for IFE when ``r`` is a tuple, 276 or over ``lam`` for MC when ``lam`` is None. 277 k : int 278 Number of CV folds. 279 cv_prop : float 280 Fraction of eligible observed control cells masked per CV fold. 281 cv_nobs : int 282 Number of consecutive within-unit observations to mask as a block. 283 cv_treat : bool 284 If True, restrict CV masks to pre-treatment cells of ever-treated 285 units. If False, use all observed control cells. 286 cv_donut : int 287 Exclude this many periods around treatment onset from CV evaluation. 288 criterion : {"mspe", "gmspe", "mad"} 289 Cross-validation loss. 290 cv_rule : {"min", "onepct"} 291 CV selection rule. ``"min"`` chooses the strict minimum-score 292 candidate and is the paper-faithful default. ``"onepct"`` chooses 293 the simplest candidate within 1% of the best score (lower ``r`` for 294 IFE, higher ``lam`` for MC). 295 se : bool 296 Compute standard errors via bootstrap/jackknife. 297 vartype : {"bootstrap", "jackknife"} 298 Inference method when ``se=True``. 299 nboots : int 300 Number of bootstrap replications. Ignored for jackknife. 301 alpha : float 302 Significance level for confidence intervals and tests. 303 tol : float 304 EM convergence tolerance for final point estimation. 305 max_iter : int 306 Maximum EM iterations. 307 min_T0 : int 308 Minimum untreated/pre-treatment observed periods. By default this is 309 enforced only for treated and treatment-reversal units. 310 min_T0_strict : bool 311 If True, enforce ``min_T0`` on all units, including controls. This 312 matches R fect's conservative handling of sparse control rows. 313 max_missing : float 314 Maximum missing-outcome fraction per unit, in ``[0, 1]``. Units with 315 no observed outcomes are always dropped, regardless of this threshold, 316 because they provide neither fitting information nor observed treated 317 effects. 318 normalize : bool 319 If True, estimate on an outcome standardized by its observed standard 320 deviation, then transform effects back to the original scale. 321 Z, Q : list of str, optional 322 Reserved CFE interaction arguments. Currently raise 323 ``NotImplementedError`` when supplied. 324 device : {"cpu", "gpu"} 325 Compute device. 326 n_jobs : int, optional 327 Parallel workers for CV and bootstrap. ``-1`` or ``None`` uses 328 all available CPUs. 329 seed : int, optional 330 Random seed for full reproducibility. 331 """ 332 # Set device 333 set_device(device) 334 xp = get_backend() 335 n_jobs = _resolve_n_jobs(n_jobs) 336 if device == "gpu": 337 n_jobs = 1 338 339 if group is not None: 340 raise NotImplementedError("The `group` argument is not implemented yet.") 341 if Z is not None or Q is not None: 342 raise NotImplementedError("The `Z` and `Q` CFE interaction arguments are not implemented yet.") 343 if criterion not in {"mspe", "gmspe", "mad"}: 344 raise ValueError("criterion must be 'mspe', 'gmspe', or 'mad'") 345 if cv_rule not in {"min", "onepct"}: 346 raise ValueError("cv_rule must be 'min' or 'onepct'") 347 if min_T0 < 0: 348 raise ValueError("min_T0 must be non-negative") 349 if not 0.0 <= max_missing <= 1.0: 350 raise ValueError("max_missing must be between 0 and 1") 351 352 # Map force string to int 353 force_map = {"none": 0, "unit": 1, "time": 2, "two-way": 3} 354 force_int = force_map[force] 355 356 # Prepare panel data 357 panel = prepare_panel( 358 data, Y=Y, D=D, index=index, X=X, W=W, 359 group=group, min_T0=min_T0, min_T0_strict=min_T0_strict, 360 max_missing=max_missing, 361 ) 362 363 # Move to device 364 Y_mat = to_device(panel.Y) 365 D_mat = to_device(panel.D) 366 I_mat = to_device(panel.I) 367 II_mat = to_device(panel.II) 368 X_mat = to_device(panel.X) if panel.X is not None else None 369 W_mat = to_device(panel.W) if panel.W is not None else None 370 371 # Normalize 372 norm_factor = 1.0 373 if normalize: 374 sd_y = float(xp.std(Y_mat[I_mat > 0])) 375 if sd_y > 0: 376 Y_mat = Y_mat / sd_y 377 norm_factor = sd_y 378 379 # Initial fit 380 Y0, beta0 = initial_fit(Y_mat, X_mat, II_mat, force_int) 381 382 # Determine r and lambda 383 r_cv = None 384 lambda_cv = None 385 cv_result = None 386 387 if method == "ife": 388 if isinstance(r, tuple) and CV: 389 cv_result = cv_ife( 390 Y_mat, Y0, X_mat, I_mat, II_mat, D_mat, W_mat, beta0, 391 force=force_int, r_range=r, k=k, cv_prop=cv_prop, 392 cv_nobs=cv_nobs, cv_treat=cv_treat, cv_donut=cv_donut, 393 criterion=criterion, cv_rule=cv_rule, 394 tol=tol, max_iter=max_iter, 395 n_jobs=n_jobs, seed=seed, 396 ) 397 r_cv = cv_result.best_r 398 else: 399 r_cv = r if isinstance(r, int) else r[0] 400 401 elif method == "mc": 402 if lam is None and CV: 403 cv_result = cv_mc( 404 Y_mat, Y0, X_mat, I_mat, II_mat, D_mat, W_mat, beta0, 405 force=force_int, nlambda=nlambda, k=k, cv_prop=cv_prop, 406 cv_nobs=cv_nobs, cv_treat=cv_treat, cv_donut=cv_donut, 407 criterion=criterion, cv_rule=cv_rule, 408 tol=tol, max_iter=max_iter, 409 n_jobs=n_jobs, seed=seed, 410 ) 411 lambda_cv = cv_result.best_lambda 412 else: 413 lambda_cv = lam if lam is not None else 0.0 414 415 elif method == "fe": 416 r_cv = 0 417 418 elif method == "cfe": 419 r_cv = r if isinstance(r, int) else r[0] 420 421 # Point estimation 422 if method in ("fe", "ife"): 423 est = estimate_ife( 424 Y_mat, Y0, X_mat, II_mat, W_mat, beta0, 425 r=r_cv, force=force_int, tol=tol, max_iter=max_iter, 426 ) 427 elif method == "mc": 428 est = estimate_mc( 429 Y_mat, Y0, X_mat, II_mat, W_mat, beta0, 430 lam=lambda_cv, force=force_int, tol=tol, max_iter=max_iter, 431 ) 432 elif method == "cfe": 433 est = estimate_cfe( 434 Y_mat, Y0, X_mat, II_mat, W_mat, beta0, 435 r=r_cv, force=force_int, tol=tol, max_iter=max_iter, 436 ) 437 elif method == "both": 438 # Run both IFE and MC, return IFE results with MC comparison 439 if isinstance(r, tuple) and CV: 440 cv_result = cv_ife( 441 Y_mat, Y0, X_mat, I_mat, II_mat, D_mat, W_mat, beta0, 442 force=force_int, r_range=r, k=k, cv_prop=cv_prop, 443 cv_nobs=cv_nobs, cv_treat=cv_treat, cv_donut=cv_donut, 444 criterion=criterion, cv_rule=cv_rule, 445 tol=tol, max_iter=max_iter, 446 n_jobs=n_jobs, seed=seed, 447 ) 448 r_cv = cv_result.best_r 449 else: 450 r_cv = r if isinstance(r, int) else r[0] 451 est = estimate_ife( 452 Y_mat, Y0, X_mat, II_mat, W_mat, beta0, 453 r=r_cv, force=force_int, tol=tol, max_iter=max_iter, 454 ) 455 else: 456 raise ValueError(f"Unknown method: {method}") 457 458 # Compute effects 459 eff = Y_mat - est.fit 460 Y_ct = est.fit 461 462 # Denormalize 463 if normalize and norm_factor != 1.0: 464 eff = eff * norm_factor 465 Y_ct = Y_ct * norm_factor 466 Y_mat = Y_mat * norm_factor 467 if est.beta is not None: 468 est = est._replace(beta=est.beta * norm_factor) 469 470 # ATT computation 471 T_on = to_device(panel.T_on) 472 att_avg, att_on, time_on, count_on, att_avg_unit = _compute_effects( 473 to_numpy(eff), to_numpy(D_mat), to_numpy(panel.T_on), to_numpy(I_mat), 474 ) 475 476 # Build result 477 result = FectResult( 478 method=method, 479 r_cv=r_cv, 480 lambda_cv=lambda_cv, 481 att_avg=att_avg, 482 att_avg_unit=att_avg_unit, 483 att_on=att_on, 484 time_on=time_on, 485 count_on=count_on, 486 beta=to_numpy(est.beta) if est.beta is not None else None, 487 covariate_names=panel.covariate_names, 488 mu=est.mu, 489 alpha=to_numpy(est.alpha) if est.alpha is not None else None, 490 xi=to_numpy(est.xi) if est.xi is not None else None, 491 factors=to_numpy(est.factors) if est.factors is not None else None, 492 loadings=to_numpy(est.loadings) if est.loadings is not None else None, 493 Y_ct=to_numpy(Y_ct), 494 eff=to_numpy(eff), 495 residuals=to_numpy(est.residuals), 496 sigma2=est.sigma2, 497 IC=est.IC, 498 PC=est.PC, 499 niter=est.niter, 500 converged=est.converged, 501 cv_result=cv_result, 502 panel=panel, 503 seed=seed, 504 ) 505 506 # Inference 507 if se: 508 result.inference = _run_inference( 509 result, panel, Y_mat, X_mat, W_mat, beta0, Y0, 510 method=method, r_cv=r_cv, lambda_cv=lambda_cv, 511 force_int=force_int, tol=tol, max_iter=max_iter, 512 vartype=vartype, nboots=nboots, alpha=alpha, 513 n_jobs=n_jobs, seed=seed, normalize=normalize, 514 norm_factor=norm_factor, 515 ) 516 517 return result
Estimate counterfactual treatment effects for panel data.
This is the main Python entry point for the counterfactual estimator workflow. Where the paper and the historical R package differ, pyfector defaults to the paper's statistical definition and exposes R-package-style behavior through explicit options.
Missing outcome policy
pyfector distinguishes raw missing outcomes from counterfactual
missingness caused by treatment. Observed untreated cells
(D == 0 and non-missing Y) fit the response surface. Observed
treated cells (D == 1 and non-missing Y) contribute to ATT as
Y - Y_ct. If a treated outcome is missing in the input data, the
model can still produce a counterfactual Y_ct for that cell, but
the cell is not counted in att_avg or att_on because the
treated potential outcome was not observed.
By default, min_T0 is enforced only for treated and reversal
units. Sparse controls are retained if they have at least one
observed outcome, because they may still inform the low-rank response
surface. Set min_T0_strict=True to require controls to satisfy
min_T0 too, matching the more conservative R fect sparse-panel
behavior.
Parameters
data : polars.DataFrame, pandas.DataFrame
Long-format panel data.
Y, D : str
Column names for outcome and binary treatment indicator.
index : (str, str)
Column names for (unit_id, time_period).
X : list of str, optional
Time-varying covariates.
W : str, optional
Observation weight column.
group : str, optional
Reserved for grouped estimation. Currently raises
NotImplementedError when supplied.
method : {"fe", "ife", "mc", "cfe", "both"}
Estimation method.
force : {"none", "unit", "time", "two-way"}
Fixed effects specification.
r : int or (int, int)
Number of factors. If tuple, CV selects from range.
lam : float, optional
Nuclear norm penalty for MC. If None with CV=True, auto-selected.
nlambda : int
Number of automatically generated lambda candidates for MC CV.
CV : bool
If True, cross-validate over r for IFE when r is a tuple,
or over lam for MC when lam is None.
k : int
Number of CV folds.
cv_prop : float
Fraction of eligible observed control cells masked per CV fold.
cv_nobs : int
Number of consecutive within-unit observations to mask as a block.
cv_treat : bool
If True, restrict CV masks to pre-treatment cells of ever-treated
units. If False, use all observed control cells.
cv_donut : int
Exclude this many periods around treatment onset from CV evaluation.
criterion : {"mspe", "gmspe", "mad"}
Cross-validation loss.
cv_rule : {"min", "onepct"}
CV selection rule. "min" chooses the strict minimum-score
candidate and is the paper-faithful default. "onepct" chooses
the simplest candidate within 1% of the best score (lower r for
IFE, higher lam for MC).
se : bool
Compute standard errors via bootstrap/jackknife.
vartype : {"bootstrap", "jackknife"}
Inference method when se=True.
nboots : int
Number of bootstrap replications. Ignored for jackknife.
alpha : float
Significance level for confidence intervals and tests.
tol : float
EM convergence tolerance for final point estimation.
max_iter : int
Maximum EM iterations.
min_T0 : int
Minimum untreated/pre-treatment observed periods. By default this is
enforced only for treated and treatment-reversal units.
min_T0_strict : bool
If True, enforce min_T0 on all units, including controls. This
matches R fect's conservative handling of sparse control rows.
max_missing : float
Maximum missing-outcome fraction per unit, in [0, 1]. Units with
no observed outcomes are always dropped, regardless of this threshold,
because they provide neither fitting information nor observed treated
effects.
normalize : bool
If True, estimate on an outcome standardized by its observed standard
deviation, then transform effects back to the original scale.
Z, Q : list of str, optional
Reserved CFE interaction arguments. Currently raise
NotImplementedError when supplied.
device : {"cpu", "gpu"}
Compute device.
n_jobs : int, optional
Parallel workers for CV and bootstrap. -1 or None uses
all available CPUs.
seed : int, optional
Random seed for full reproducibility.