causalis.scenarios.cuped.model

Module Contents

Classes

CUPEDModel

CUPED-style regression adjustment estimator for ATE/ITT in randomized experiments.

API

class causalis.scenarios.cuped.model.CUPEDModel(cov_type: str = 'HC2', alpha: float = 0.05, strict_binary_treatment: bool = True, use_t: Optional[bool] = None, use_t_auto_n_threshold: int = 5000, relative_ci_method: Literal[delta_nocov, bootstrap]='delta_nocov', relative_ci_bootstrap_draws: int = 1000, relative_ci_bootstrap_seed: Optional[int] = None, covariate_variance_min: float = 1e-12, condition_number_warn_threshold: float = 100000000.0, run_regression_checks: bool = True, check_action: Literal[ignore, raise]='ignore', raise_on_yellow: bool = False, corr_near_one_tol: float = 1e-10, vif_warn_threshold: float = 20.0, winsor_q: Optional[float] = 0.01, tiny_one_minus_h_tol: float = 1e-08)

CUPED-style regression adjustment estimator for ATE/ITT in randomized experiments.

Fits an outcome regression with pre-treatment covariates (always centered over the full sample, never within treatment groups) implemented as Lin (2013) fully interacted OLS:

Y ~ 1 + D + X^c + D * X^c

The reported effect is the coefficient on D, with robust covariance as requested. This specification ensures the coefficient on D is the ATE/ITT even if the treatment effect is heterogeneous with respect to covariates. This is broader than canonical single-theta CUPED (Y - theta*(X - mean(X))).

Parameters

cov_type : str, default=”HC2” Covariance estimator passed to statsmodels (e.g., “nonrobust”, “HC0”, “HC1”, “HC2”, “HC3”). Note: for cluster-randomized designs, use cluster-robust SEs (not implemented here). alpha : float, default=0.05 Significance level for confidence intervals. strict_binary_treatment : bool, default=True If True, require treatment to be binary {0,1}. use_t : bool | None, default=None If bool, passed to statsmodels .fit(..., use_t=use_t) directly. If None, automatic policy is used: for robust HC* covariances, use_t=True when n < use_t_auto_n_threshold, else False. For non-robust covariance, use_t=True. use_t_auto_n_threshold : int, default=5000 Sample-size threshold for automatic use_t selection when use_t=None and covariance is HC* robust. relative_ci_method : {“delta_nocov”, “bootstrap”}, default=”delta_nocov” Method for relative CI of 100 * tau / mu_c. - “delta_nocov”: delta method using robust Var(tau) and Var(mu_c) while setting Cov(tau, mu_c)=0 (safe fallback without unsupported hybrid IF covariance). - “bootstrap”: percentile bootstrap CI on the relative effect. relative_ci_bootstrap_draws : int, default=1000 Number of bootstrap resamples used when relative_ci_method="bootstrap". relative_ci_bootstrap_seed : int | None, default=None RNG seed used for bootstrap relative CI. covariate_variance_min : float, default=1e-12 Minimum variance threshold for retaining a CUPED covariate. Covariates with variance less than or equal to this threshold are dropped before fitting. condition_number_warn_threshold : float, default=1e8 Trigger diagnostics signal when the design matrix condition number exceeds this threshold. run_regression_checks : bool, default=True Whether to compute regression diagnostics payload during fit(). check_action : {“ignore”, “raise”}, default=”ignore” Action used when a diagnostics threshold is violated. raise_on_yellow : bool, default=False When check_action="raise", also raise on YELLOW assumption flags. corr_near_one_tol : float, default=1e-10 Correlation tolerance used to mark near-duplicate centered covariates. vif_warn_threshold : float, default=20.0 VIF threshold that triggers a diagnostics signal. winsor_q : float | None, default=0.01 Quantile used for winsor sensitivity refit. Set None to disable. tiny_one_minus_h_tol : float, default=1e-8 Threshold for flagging near-degenerate 1 - leverage terms in HC2/HC3.

Notes

  • Validity requires covariates be pre-treatment. Post-treatment covariates can bias estimates.

  • Covariates are globally centered over the full sample only. This centering convention is required so the treatment coefficient in the Lin specification remains the ATE/ITT.

  • The Lin (2013) specification is recommended as a robust regression-adjustment default in RCTs.

Initialization

fit(data: causalis.dgp.causaldata.CausalData, covariates: Optional[Sequence[str]] = None, run_checks: Optional[bool] = None) causalis.scenarios.cuped.model.CUPEDModel

Fit CUPED-style regression adjustment (Lin-interacted OLS) on a CausalData object.

Parameters

data : CausalData Validated dataset with columns: outcome (post), treatment, and confounders (pre covariates). covariates : Sequence[str], required Explicit subset of data_contracts.confounders_names to use as CUPED covariates. Pass [] for an unadjusted (naive) fit. run_checks : bool | None, optional Override whether regression checks are computed in this fit call. If None, uses self.run_regression_checks.

Returns

CUPEDModel Fitted estimator.

Raises

ValueError If covariates is omitted, not a sequence of strings, contains columns missing from the DataFrame, contains columns outside data_contracts.confounders_names, treatment is not binary when strict_binary_treatment=True, or the design matrix is rank deficient.

estimate(alpha: Optional[float] = None, diagnostic_data: bool = True) causalis.data_contracts.causal_estimate.CausalEstimate

Return the adjusted ATE/ITT estimate and inference.

Parameters

alpha : float, optional Override the instance significance level for confidence intervals. diagnostic_data : bool, default True Whether to include diagnostic data_contracts in the result.

Returns

CausalEstimate A results object containing effect estimates and inference.

summary_dict(alpha: Optional[float] = None) Dict[str, Any]

Convenience JSON/logging output.

Parameters

alpha : float, optional Override the instance significance level for confidence intervals.

Returns

dict Dictionary with estimates, inference, and diagnostics.

assumptions_table() Optional[pandas.DataFrame]

Return fitted regression assumptions table (GREEN/YELLOW/RED) when available.

__repr__() str