causalis.scenarios.unconfoundedness.dgp

Synthetic observational datasets for unconfoundedness experiments.

Module Contents

Functions

obs_linear_26_dataset

A pre-configured observational linear dataset with 5 standard confounders. Based on the scenario in docs/cases/dml_ate.ipynb.

generate_obs_hte_26

Observational dataset with nonlinear outcome model, nonlinear treatment assignment, and a heterogeneous (nonlinear) treatment effect tau(X). Based on the scenario in notebooks/cases/dml_atte.ipynb.

generate_obs_hte_26_rich

Observational dataset with richer confounding, nonlinear outcome model, nonlinear treatment assignment, and heterogeneous treatment effects. Adds additional realistic covariates and dependencies to mimic real data.

generate_obs_hte_binary_26

Observational binary-outcome dataset with nonlinear confounding and heterogeneous treatment effects.

API

causalis.scenarios.unconfoundedness.dgp.obs_linear_26_dataset(n: int = 10000, seed: int = 42, include_oracle: bool = True, return_causal_data: bool = True)

A pre-configured observational linear dataset with 5 standard confounders. Based on the scenario in docs/cases/dml_ate.ipynb.

Parameters

n : int, default=10000 Number of samples. seed : int, default=42 Random seed. include_oracle : bool, default=True Whether to include oracle ground-truth columns like ‘cate’, ‘propensity’, etc. return_causal_data : bool, default=True If True, returns a CausalData object. If False, returns a pandas DataFrame.

Returns

pandas.DataFrame or CausalData Generated observational sample, optionally wrapped as CausalData.

causalis.scenarios.unconfoundedness.dgp.generate_obs_hte_26(n: int = 10000, seed: int = 42, include_oracle: bool = True, return_causal_data: bool = True) Union[pandas.DataFrame, causalis.dgp.causaldata.CausalData]

Observational dataset with nonlinear outcome model, nonlinear treatment assignment, and a heterogeneous (nonlinear) treatment effect tau(X). Based on the scenario in notebooks/cases/dml_atte.ipynb.

Parameters

n : int, default=10000 Number of samples. seed : int, default=42 Random seed. include_oracle : bool, default=True Whether to include oracle ground-truth columns like ‘cate’, ‘propensity’, etc. return_causal_data : bool, default=True If True, returns a CausalData object. If False, returns a pandas DataFrame.

Returns

pandas.DataFrame or CausalData Generated heterogeneous-treatment-effect sample, optionally wrapped as CausalData.

causalis.scenarios.unconfoundedness.dgp.generate_obs_hte_26_rich(n: int = 100000, seed: int = 42, include_oracle: bool = True, return_causal_data: bool = True) Union[pandas.DataFrame, causalis.dgp.causaldata.CausalData]

Observational dataset with richer confounding, nonlinear outcome model, nonlinear treatment assignment, and heterogeneous treatment effects. Adds additional realistic covariates and dependencies to mimic real data.

Parameters

n : int, default=100000 Number of samples. seed : int, default=42 Random seed. include_oracle : bool, default=True Whether to include oracle ground-truth columns like ‘cate’, ‘propensity’, etc. return_causal_data : bool, default=True If True, returns a CausalData object. If False, returns a pandas DataFrame.

Returns

pandas.DataFrame or CausalData Generated rich observational sample, optionally wrapped as CausalData.

causalis.scenarios.unconfoundedness.dgp.generate_obs_hte_binary_26(n: int = 100000, seed: int = 42, include_oracle: bool = True, return_causal_data: bool = True) Union[pandas.DataFrame, causalis.dgp.causaldata.CausalData]

Observational binary-outcome dataset with nonlinear confounding and heterogeneous treatment effects.

This scenario follows the structure of generate_obs_hte_26_rich, but uses a binary outcome model and a modified confounder set.

Parameters

n : int, default=100000 Number of samples. seed : int, default=42 Random seed. include_oracle : bool, default=True Whether to include oracle columns like ‘cate’, ‘propensity’, etc. return_causal_data : bool, default=True If True, returns a CausalData object. If False, returns a pandas DataFrame.

Returns

pandas.DataFrame or CausalData Generated binary-outcome sample, optionally wrapped as CausalData.