causalis.scenarios.cuped.model¶
Module Contents¶
Classes¶
CUPED-style regression adjustment estimator for ATE/ITT in randomized experiments. |
API¶
- class causalis.scenarios.cuped.model.CUPEDModel(cov_type: str = 'HC2', alpha: float = 0.05, strict_binary_treatment: bool = True, use_t: Optional[bool] = None, use_t_auto_n_threshold: int = 5000, relative_ci_method: Literal[delta_nocov, bootstrap]='delta_nocov', relative_ci_bootstrap_draws: int = 1000, relative_ci_bootstrap_seed: Optional[int] = None, covariate_variance_min: float = 1e-12, condition_number_warn_threshold: float = 100000000.0, run_regression_checks: bool = True, check_action: Literal[ignore, raise]='ignore', raise_on_yellow: bool = False, corr_near_one_tol: float = 1e-10, vif_warn_threshold: float = 20.0, winsor_q: Optional[float] = 0.01, tiny_one_minus_h_tol: float = 1e-08)¶
CUPED-style regression adjustment estimator for ATE/ITT in randomized experiments.
Fits an outcome regression with pre-treatment covariates (always centered over the full sample, never within treatment groups) implemented as Lin (2013) fully interacted OLS:
Y ~ 1 + D + X^c + D * X^c
The reported effect is the coefficient on D, with robust covariance as requested. This specification ensures the coefficient on D is the ATE/ITT even if the treatment effect is heterogeneous with respect to covariates. This is broader than canonical single-theta CUPED (
Y - theta*(X - mean(X))).Parameters
cov_type : str, default=”HC2” Covariance estimator passed to statsmodels (e.g., “nonrobust”, “HC0”, “HC1”, “HC2”, “HC3”). Note: for cluster-randomized designs, use cluster-robust SEs (not implemented here). alpha : float, default=0.05 Significance level for confidence intervals. strict_binary_treatment : bool, default=True If True, require treatment to be binary {0,1}. use_t : bool | None, default=None If bool, passed to statsmodels
.fit(..., use_t=use_t)directly. If None, automatic policy is used: for robust HC* covariances,use_t=Truewhenn < use_t_auto_n_threshold, elseFalse. For non-robust covariance,use_t=True. use_t_auto_n_threshold : int, default=5000 Sample-size threshold for automaticuse_tselection whenuse_t=Noneand covariance is HC* robust. relative_ci_method : {“delta_nocov”, “bootstrap”}, default=”delta_nocov” Method for relative CI of100 * tau / mu_c. - “delta_nocov”: delta method using robustVar(tau)andVar(mu_c)while settingCov(tau, mu_c)=0(safe fallback without unsupported hybrid IF covariance). - “bootstrap”: percentile bootstrap CI on the relative effect. relative_ci_bootstrap_draws : int, default=1000 Number of bootstrap resamples used whenrelative_ci_method="bootstrap". relative_ci_bootstrap_seed : int | None, default=None RNG seed used for bootstrap relative CI. covariate_variance_min : float, default=1e-12 Minimum variance threshold for retaining a CUPED covariate. Covariates with variance less than or equal to this threshold are dropped before fitting. condition_number_warn_threshold : float, default=1e8 Trigger diagnostics signal when the design matrix condition number exceeds this threshold. run_regression_checks : bool, default=True Whether to compute regression diagnostics payload duringfit(). check_action : {“ignore”, “raise”}, default=”ignore” Action used when a diagnostics threshold is violated. raise_on_yellow : bool, default=False Whencheck_action="raise", also raise on YELLOW assumption flags. corr_near_one_tol : float, default=1e-10 Correlation tolerance used to mark near-duplicate centered covariates. vif_warn_threshold : float, default=20.0 VIF threshold that triggers a diagnostics signal. winsor_q : float | None, default=0.01 Quantile used for winsor sensitivity refit. SetNoneto disable. tiny_one_minus_h_tol : float, default=1e-8 Threshold for flagging near-degenerate1 - leverageterms in HC2/HC3.Notes
Validity requires covariates be pre-treatment. Post-treatment covariates can bias estimates.
Covariates are globally centered over the full sample only. This centering convention is required so the treatment coefficient in the Lin specification remains the ATE/ITT.
The Lin (2013) specification is recommended as a robust regression-adjustment default in RCTs.
Initialization
- fit(data: causalis.dgp.causaldata.CausalData, covariates: Optional[Sequence[str]] = None, run_checks: Optional[bool] = None) causalis.scenarios.cuped.model.CUPEDModel¶
Fit CUPED-style regression adjustment (Lin-interacted OLS) on a CausalData object.
Parameters
data : CausalData Validated dataset with columns: outcome (post), treatment, and confounders (pre covariates). covariates : Sequence[str], required Explicit subset of
data_contracts.confounders_namesto use as CUPED covariates. Pass[]for an unadjusted (naive) fit. run_checks : bool | None, optional Override whether regression checks are computed in this fit call. IfNone, usesself.run_regression_checks.Returns
CUPEDModel Fitted estimator.
Raises
ValueError If
covariatesis omitted, not a sequence of strings, contains columns missing from the DataFrame, contains columns outsidedata_contracts.confounders_names, treatment is not binary whenstrict_binary_treatment=True, or the design matrix is rank deficient.
- estimate(alpha: Optional[float] = None, diagnostic_data: bool = True) causalis.data_contracts.causal_estimate.CausalEstimate¶
Return the adjusted ATE/ITT estimate and inference.
Parameters
alpha : float, optional Override the instance significance level for confidence intervals. diagnostic_data : bool, default True Whether to include diagnostic data_contracts in the result.
Returns
CausalEstimate A results object containing effect estimates and inference.
- summary_dict(alpha: Optional[float] = None) Dict[str, Any]¶
Convenience JSON/logging output.
Parameters
alpha : float, optional Override the instance significance level for confidence intervals.
Returns
dict Dictionary with estimates, inference, and diagnostics.
- assumptions_table() Optional[pandas.DataFrame]¶
Return fitted regression assumptions table (GREEN/YELLOW/RED) when available.
- __repr__() str¶