causalis.scenarios.unconfoundedness.refutation.score.score_validation¶
Score diagnostics focused on orthogonality and EIF stability.
Module Contents¶
Functions¶
Run orthogonality and influence diagnostics for ATE or ATTE scores. |
Data¶
API¶
- causalis.scenarios.unconfoundedness.refutation.score.score_validation.run_score_diagnostics(data: causalis.dgp.causaldata.CausalData, estimate: causalis.data_contracts.causal_estimate.CausalEstimate, *, trimming_threshold: Optional[float] = None, n_basis_funcs: Optional[int] = None, return_summary: bool = True) Dict[str, Any]¶
Run orthogonality and influence diagnostics for ATE or ATTE scores.
The main object is the per-observation score contribution. For ATE, this diagnostic uses .. math:: \hat\psi_i = w_i(\hat g_1(X_i) - \hat g_0(X_i)) + ar w_i \left[ (Y_i - \hat g_1(X_i))rac{D_i}{\hat m_i} - (Y_i - \hat g_0(X_i)) rac{1-D_i}{1-\hat m_i}
ight] - \hat heta.
Good score behavior means: - the empirical score average is close to zero, - finite-basis derivatives with respect to nuisance parts are small, - the influence distribution is not driven by a tiny number of very large :math:`|\hat\psi_i|`. Parameters ---------- data : CausalData Dataset used to fit the estimator. estimate : CausalEstimate Effect estimate with ``diagnostic_data`` containing nuisance predictions and optionally cached score arrays. trimming_threshold : float, optional Propensity clipping threshold. If omitted, the value is inferred from diagnostic or model metadata. n_basis_funcs : int, optional Number of simple basis functions used in orthogonality checks. Defaults to one intercept plus all available confounders. return_summary : bool, default True Include a compact summary table in the returned payload. Returns ------- Dict[str, Any] Diagnostic report with orthogonality checks, influence summaries, optional out-of-sample tests, and a summary table. Raises ------ ValueError If required diagnostic arrays are missing or have incompatible shapes. Examples -------- >>> from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor >>> from causalis.dgp import obs_linear_26_dataset >>> from causalis.scenarios.unconfoundedness.model import IRM >>> data = obs_linear_26_dataset( ... n=1000, ... seed=3141, ... include_oracle=False, ... return_causal_data=True, ... ) >>> irm = IRM( ... data=data, ... ml_g=RandomForestRegressor( ... n_estimators=200, ... max_depth=6, ... min_samples_leaf=5, ... random_state=3141, ... ), ... ml_m=RandomForestClassifier( ... n_estimators=200, ... max_depth=6, ... min_samples_leaf=5, ... random_state=3141, ... ), ... n_folds=3, ... random_state=3141, ... ) >>> estimate = irm.fit().estimate(score="ATE") >>> report = run_score_diagnostics(data, estimate) >>> report["summary"] # doctest: +SKIP >>> report["influence"]["top_influential"].head() # doctest: +SKIP
- causalis.scenarios.unconfoundedness.refutation.score.score_validation.__all__¶
[‘run_score_diagnostics’]