causalis.dgp¶
Notes¶
Across the package, generators follow a few conventions:
Outcomes are typically stored in
y.Binary treatment indicators are typically stored in
d.Confounders are returned as
x1,x2, … unless explicit names are provided in the generator configuration.When
include_oracle=True, many generators also expose ground-truth columns such as propensities or potential-outcome means. These oracle columns are intended for benchmarking, diagnostics, and unit tests rather than as inputs to downstream estimators.
Examples¶
from causalis.dgp import generate_rct, generate_scm_data, obs_linear_26_dataset rct = generate_rct( … n=1000, … outcome_type=”binary”, … outcome_params={“p”: {“A”: 0.10, “B”: 0.12}}, … return_causal_data=True, … ) rct.treatment, rct.outcome (‘d’, ‘y’) obs = obs_linear_26_dataset(n=1000, seed=3141, return_causal_data=True) sorted(col for col in [“m”, “g0”, “g1”, “cate”] if col in obs.df.columns) [‘cate’, ‘g0’, ‘g1’, ‘m’] panel = generate_scm_data(n_donors=5, n_pre_periods=24, n_post_periods=6) panel.df.shape[0] > 0 # doctest: +SKIP True
Synthetic data-generating processes (DGPs) for causal benchmarking and examples.
This package collects the public dataset builders used across causalis for
single-treatment tabular data, multi-treatment data, instrumental-variable
placeholders, and synthetic-control style panel data.
The most commonly used entry points are:
generate_rctfor randomized experiments with optional oracle columns and optional pre-period covariates.obs_linear_effectandobs_linear_26_datasetfor observational data with confounding and ground-truth nuisance objects.generate_scm_datafor synthetic panel data with one treated unit and a donor pool.generate_multitreatmentfor one-hot multi-arm treatment assignment.
Subpackages¶
Submodules¶
Package Contents¶
Data¶
API¶
- causalis.dgp.__all__¶
[‘CausalDatasetGenerator’, ‘generate_rct’, ‘generate_classic_rct’, ‘classic_rct_gamma’, ‘obs_linear_…