causalis.scenarios.cuped.diagnostics.regression_checks

Module Contents

Classes

RegressionChecks

Lightweight OLS/regression health checks for CUPED diagnostics.

Functions

design_matrix_checks

Return rank/conditioning diagnostics for a numeric design matrix.

near_duplicate_corr_pairs

Find pairs with absolute correlation very close to one.

vif_from_corr

Approximate VIF from inverse correlation matrix of standardized covariates..

leverage_and_cooks

Compute leverage, Cook’s distance, and internally studentized residuals.

winsor_fit_tau

Refit OLS on winsorized outcome and return treatment coefficient.

run_regression_checks

Build a compact payload with design, residual, and influence diagnostics.

assumption_design_rank

Check that the design matrix is full rank.

assumption_condition_number

Check global collinearity via condition number.

assumption_near_duplicates

Check near-duplicate centered covariate pairs.

assumption_vif

Check VIF from centered main-effect covariates.

assumption_ate_gap

Check adjusted-vs-naive ATE gap relative to naive SE.

assumption_residual_tails

Check residual extremes using max standardized residual only.

assumption_leverage

Check leverage concentration.

assumption_cooks

Check Cook’s distance influence diagnostics.

assumption_hc23_stability

Check HC2/HC3 stability when leverage terms approach one.

assumption_winsor_sensitivity

Check sensitivity of adjusted ATE to winsorized-outcome refit.

regression_assumption_rows_from_checks

Run all CUPED regression assumption tests and return row payloads.

regression_assumptions_table_from_checks

Return a table of GREEN/YELLOW/RED assumption flags from checks payload.

regression_assumptions_table_from_diagnostic_data

Build assumption table from CUPEDDiagnosticData payload.

regression_assumptions_table_from_estimate

Build assumptions table from a CUPED estimate.

regression_assumptions_table_from_data

Fit CUPED on CausalData and return the assumptions flag table.

overall_assumption_flag

Return overall GREEN/YELLOW/RED status from an assumptions table.

style_regression_assumptions_table

Return pandas Styler with colored flag cells for notebook display.

Data

FLAG_GREEN

FLAG_YELLOW

FLAG_RED

FLAG_LEVEL

FLAG_COLOR

API

causalis.scenarios.cuped.diagnostics.regression_checks.FLAG_GREEN

‘GREEN’

causalis.scenarios.cuped.diagnostics.regression_checks.FLAG_YELLOW

‘YELLOW’

causalis.scenarios.cuped.diagnostics.regression_checks.FLAG_RED

‘RED’

causalis.scenarios.cuped.diagnostics.regression_checks.FLAG_LEVEL

None

causalis.scenarios.cuped.diagnostics.regression_checks.FLAG_COLOR

None

class causalis.scenarios.cuped.diagnostics.regression_checks.RegressionChecks(/, **data: Any)

Bases: pydantic.BaseModel

Lightweight OLS/regression health checks for CUPED diagnostics.

Initialization

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

ate_naive: float

None

ate_adj: float

None

ate_gap: float

None

ate_gap_over_se_naive: Optional[float]

None

k: int

None

rank: int

None

full_rank: bool

None

condition_number: float

None

p_main_covariates: int

None

near_duplicate_pairs: list[tuple[str, str, float]]

‘Field(…)’

vif: Optional[Dict[str, float]]

None

resid_scale_mad: float

None

n_std_resid_gt_3: int

None

n_std_resid_gt_4: int

None

max_abs_std_resid: float

None

max_leverage: float

None

leverage_cutoff: float

None

n_high_leverage: int

None

max_cooks: float

None

cooks_cutoff: float

None

n_high_cooks: int

None

min_one_minus_h: float

None

n_tiny_one_minus_h: int

None

winsor_q: Optional[float]

None

ate_adj_winsor: Optional[float]

None

ate_adj_winsor_gap: Optional[float]

None

causalis.scenarios.cuped.diagnostics.regression_checks.design_matrix_checks(design: pandas.DataFrame) tuple[int, int, bool, float]

Return rank/conditioning diagnostics for a numeric design matrix.

causalis.scenarios.cuped.diagnostics.regression_checks.near_duplicate_corr_pairs(x: pandas.DataFrame, tol: float, max_pairs: int = 50) list[tuple[str, str, float]]

Find pairs with absolute correlation very close to one.

causalis.scenarios.cuped.diagnostics.regression_checks.vif_from_corr(x: pandas.DataFrame) Optional[Dict[str, float]]

Approximate VIF from inverse correlation matrix of standardized covariates..

causalis.scenarios.cuped.diagnostics.regression_checks.leverage_and_cooks(y: numpy.ndarray, z: numpy.ndarray, params: numpy.ndarray) tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]

Compute leverage, Cook’s distance, and internally studentized residuals.

causalis.scenarios.cuped.diagnostics.regression_checks.winsor_fit_tau(y: pandas.Series, design: pandas.DataFrame, cov_type: str, use_t_fit: bool, winsor_q: Optional[float]) Optional[float]

Refit OLS on winsorized outcome and return treatment coefficient.

causalis.scenarios.cuped.diagnostics.regression_checks.run_regression_checks(y: pandas.Series, design: pandas.DataFrame, result: Any, result_naive: Any, cov_type: str, use_t_fit: bool, corr_near_one_tol: float, tiny_one_minus_h_tol: float, winsor_q: Optional[float]) causalis.scenarios.cuped.diagnostics.regression_checks.RegressionChecks

Build a compact payload with design, residual, and influence diagnostics.

causalis.scenarios.cuped.diagnostics.regression_checks.assumption_design_rank(checks: causalis.scenarios.cuped.diagnostics.regression_checks.RegressionChecks) Dict[str, Any]

Check that the design matrix is full rank.

causalis.scenarios.cuped.diagnostics.regression_checks.assumption_condition_number(checks: causalis.scenarios.cuped.diagnostics.regression_checks.RegressionChecks, warn_threshold: float = 100000000.0, red_multiplier: float = 100.0) Dict[str, Any]

Check global collinearity via condition number.

causalis.scenarios.cuped.diagnostics.regression_checks.assumption_near_duplicates(checks: causalis.scenarios.cuped.diagnostics.regression_checks.RegressionChecks, red_pairs_threshold: int = 3) Dict[str, Any]

Check near-duplicate centered covariate pairs.

causalis.scenarios.cuped.diagnostics.regression_checks.assumption_vif(checks: causalis.scenarios.cuped.diagnostics.regression_checks.RegressionChecks, warn_threshold: float = 20.0, red_multiplier: float = 2.0) Dict[str, Any]

Check VIF from centered main-effect covariates.

causalis.scenarios.cuped.diagnostics.regression_checks.assumption_ate_gap(checks: causalis.scenarios.cuped.diagnostics.regression_checks.RegressionChecks, yellow_threshold: float = 2.0, red_threshold: float = 2.5) Dict[str, Any]

Check adjusted-vs-naive ATE gap relative to naive SE.

causalis.scenarios.cuped.diagnostics.regression_checks.assumption_residual_tails(checks: causalis.scenarios.cuped.diagnostics.regression_checks.RegressionChecks, yellow_abs_std_resid: float = 7.0, red_abs_std_resid: float = 10.0) Dict[str, Any]

Check residual extremes using max standardized residual only.

causalis.scenarios.cuped.diagnostics.regression_checks.assumption_leverage(checks: causalis.scenarios.cuped.diagnostics.regression_checks.RegressionChecks, yellow_multiplier: float = 5.0, red_multiplier: float = 10.0, red_floor: float = 0.5) Dict[str, Any]

Check leverage concentration.

causalis.scenarios.cuped.diagnostics.regression_checks.assumption_cooks(checks: causalis.scenarios.cuped.diagnostics.regression_checks.RegressionChecks, yellow_threshold: float = 0.1, red_threshold: float = 1.0) Dict[str, Any]

Check Cook’s distance influence diagnostics.

causalis.scenarios.cuped.diagnostics.regression_checks.assumption_hc23_stability(checks: causalis.scenarios.cuped.diagnostics.regression_checks.RegressionChecks, cov_type: str, tiny_one_minus_h_tol: float = 1e-08) Dict[str, Any]

Check HC2/HC3 stability when leverage terms approach one.

causalis.scenarios.cuped.diagnostics.regression_checks.assumption_winsor_sensitivity(checks: causalis.scenarios.cuped.diagnostics.regression_checks.RegressionChecks, winsor_reference_se: Optional[float] = None, yellow_sigma: float = 1.0, red_sigma: float = 2.0, yellow_ratio: float = 0.1, red_ratio: float = 0.25) Dict[str, Any]

Check sensitivity of adjusted ATE to winsorized-outcome refit.

causalis.scenarios.cuped.diagnostics.regression_checks.regression_assumption_rows_from_checks(checks: causalis.scenarios.cuped.diagnostics.regression_checks.RegressionChecks, cov_type: str = 'HC2', condition_number_warn_threshold: float = 100000000.0, vif_warn_threshold: float = 20.0, tiny_one_minus_h_tol: float = 1e-08, winsor_reference_se: Optional[float] = None) list[Dict[str, Any]]

Run all CUPED regression assumption tests and return row payloads.

causalis.scenarios.cuped.diagnostics.regression_checks.regression_assumptions_table_from_checks(checks: causalis.scenarios.cuped.diagnostics.regression_checks.RegressionChecks, cov_type: str = 'HC2', condition_number_warn_threshold: float = 100000000.0, vif_warn_threshold: float = 20.0, tiny_one_minus_h_tol: float = 1e-08, winsor_reference_se: Optional[float] = None) pandas.DataFrame

Return a table of GREEN/YELLOW/RED assumption flags from checks payload.

causalis.scenarios.cuped.diagnostics.regression_checks.regression_assumptions_table_from_diagnostic_data(diagnostic_data: causalis.data_contracts.causal_diagnostic_data.CUPEDDiagnosticData, cov_type: str = 'HC2', condition_number_warn_threshold: float = 100000000.0, vif_warn_threshold: float = 20.0, tiny_one_minus_h_tol: float = 1e-08, winsor_reference_se: Optional[float] = None) pandas.DataFrame

Build assumption table from CUPEDDiagnosticData payload.

causalis.scenarios.cuped.diagnostics.regression_checks.regression_assumptions_table_from_estimate(data_or_estimate: causalis.dgp.causaldata.CausalData | causalis.data_contracts.causal_estimate.CausalEstimate, estimate: Optional[causalis.data_contracts.causal_estimate.CausalEstimate] = None, style_regression_assumptions_table: Optional[Callable[[pandas.DataFrame], Any]] = None, cov_type: Optional[str] = None, condition_number_warn_threshold: float = 100000000.0, vif_warn_threshold: float = 20.0, tiny_one_minus_h_tol: float = 1e-08) Any

Build assumptions table from a CUPED estimate.

Supports both call styles:

  1. regression_assumptions_table_from_estimate(estimate, ...)

  2. regression_assumptions_table_from_estimate(data, estimate, ...)

causalis.scenarios.cuped.diagnostics.regression_checks.regression_assumptions_table_from_data(data: causalis.dgp.causaldata.CausalData, covariates: Sequence[str], model_kwargs: Optional[Dict[str, Any]] = None, fit_kwargs: Optional[Dict[str, Any]] = None) pandas.DataFrame

Fit CUPED on CausalData and return the assumptions flag table.

causalis.scenarios.cuped.diagnostics.regression_checks.overall_assumption_flag(table: pandas.DataFrame) str

Return overall GREEN/YELLOW/RED status from an assumptions table.

causalis.scenarios.cuped.diagnostics.regression_checks.style_regression_assumptions_table(table: pandas.DataFrame)

Return pandas Styler with colored flag cells for notebook display.