causalis.scenarios.unconfoundedness.refutation.unconfoundedness.unconfoundedness_validation¶
Unconfoundedness diagnostics focused on covariate balance (SMD).
Module Contents¶
Functions¶
Run covariate-balance diagnostics implied by unconfoundedness. |
Data¶
API¶
- causalis.scenarios.unconfoundedness.refutation.unconfoundedness.unconfoundedness_validation.run_unconfoundedness_diagnostics(data: causalis.dgp.causaldata.CausalData, estimate: causalis.data_contracts.causal_estimate.CausalEstimate, *, threshold: float = 0.1, normalize: Optional[bool] = None, return_summary: bool = True) Dict[str, Any]¶
Run covariate-balance diagnostics implied by unconfoundedness.
The diagnostic compares the treated and control pseudo-populations induced by the estimated propensity score. For ATE, the effective weights are .. math:: w_{1i} = ar w_irac{D_i}{\hat m_i}, \qquad w_{0i} = ar w_i rac{1-D_i}{1-\hat m_i},
while for ATTE this implementation uses .. math:: w_{1i} = D_i, \qquad w_{0i} = (1-D_i)rac{\hat m_i}{1-\hat m_i}.
For each confounder :math:`X_j`, the weighted standardized mean difference is .. math:: \mathrm{SMD}_j =rac{|\mu_{1j}^{(w)} - \mu_{0j}^{(w)}|} {\sqrt{(s_{1j}^{2,(w)} + s_{0j}^{2,(w)}) / 2}}.
Smaller weighted SMDs are better. A common rule of thumb is to aim for :math:`|\mathrm{SMD}| < 0.10`. Parameters ---------- data : CausalData Dataset used to fit the estimator. estimate : CausalEstimate Effect estimate with ``diagnostic_data`` containing propensity and, when available, weight information. threshold : float, default 0.10 SMD threshold used for warnings and pass/fail summaries. normalize : bool, optional Override whether pseudo-population weights are mean-normalized. return_summary : bool, default True Include a compact summary table in the returned payload. Returns ------- Dict[str, Any] Diagnostic report with weighted balance tables, severity flags, and an optional summary DataFrame. Raises ------ ValueError If required diagnostic arrays are missing or have incompatible shapes. RuntimeError If balance weights collapse to zero total mass. Examples -------- >>> from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor >>> from causalis.dgp import obs_linear_26_dataset >>> from causalis.scenarios.unconfoundedness.model import IRM >>> data = obs_linear_26_dataset( ... n=1000, ... seed=3141, ... include_oracle=False, ... return_causal_data=True, ... ) >>> irm = IRM( ... data=data, ... ml_g=RandomForestRegressor( ... n_estimators=200, ... max_depth=6, ... min_samples_leaf=5, ... random_state=3141, ... ), ... ml_m=RandomForestClassifier( ... n_estimators=200, ... max_depth=6, ... min_samples_leaf=5, ... random_state=3141, ... ), ... n_folds=3, ... random_state=3141, ... ) >>> estimate = irm.fit().estimate(score="ATE") >>> report = run_unconfoundedness_diagnostics(data, estimate) >>> report["balance"]["smd_max"] # doctest: +SKIP >>> report["balance"]["worst_features"].head() # doctest: +SKIP
- causalis.scenarios.unconfoundedness.refutation.unconfoundedness.unconfoundedness_validation.__all__¶
[‘run_unconfoundedness_diagnostics’]