causalis.scenarios.multi_unconfoundedness.dgp¶
Module Contents¶
Functions¶
Pre-configured multi-treatment dataset with Gamma-distributed outcome. |
|
Pre-configured multi-treatment dataset with Binary outcome. |
|
The notebook simulates overlapping |
Data¶
API¶
- causalis.scenarios.multi_unconfoundedness.dgp.generate_multitreatment_gamma_26(n: int = 100000, seed: int = 42, include_oracle: bool = False, return_causal_data: bool = True) Union[pandas.DataFrame, causalis.data_contracts.multicausaldata.MultiCausalData]¶
Pre-configured multi-treatment dataset with Gamma-distributed outcome.
3 treatment classes:
d_0(control),d_1,d_28 confounders with realistic marginals sampled through a Gaussian copula
Gamma outcome with log-link confounding and heterogeneous arm effects
Examples
df = generate_multitreatment_gamma_26(n=256, seed=7, return_causal_data=False) bool(df[[“d_0”, “d_1”, “d_2”]].sum(axis=1).eq(1).all()) True {“tenure_months”, “credit_utilization”, “y”}.issubset(df.columns) True
Notes
Let :math:
X = (\text{tenure}, \text{sessions}, \text{spend}, \text{premium}, \text{urban}, \text{tickets}, \text{discount}, \text{credit})denote the 8 observed confounders. The treatment assignment mechanism is a multinomial logit with calibrated marginal arm rates near :math:(0.50, 0.25, 0.25):.. math::
s_k(X) = \alpha_{d,k} + \beta_{d,k}^{\top} X, \qquad \Pr(D = k \mid X) = \frac{\exp(s_k(X))}{\sum_{j=0}^{2} \exp(s_j(X))}.The confounders are jointly sampled through a Toeplitz copula with :math:
\mathrm{Corr}(X_i, X_j) = 0.3^{|i-j|}.The outcome uses a log link. For arm :math:
k,.. math::
\log \mu_k(X) = \alpha_y + \beta_y^{\top} X + \theta_k + \tau_k(X), \qquad Y(k) \mid X \sim \Gamma(\text{shape}=2, \text{scale}=\mu_k(X)/2).This scenario fixes :math:
\theta = (0, -0.05, 0.10)and uses the heterogeneous shifts.. math::
\tau_1(X) = \min \left\{ -0.22 - 0.0010 \, \text{tenure} - 0.006 \, \text{sessions} - 0.05 \, \text{premium} - 0.04 \, \text{discount} - 0.10 \, (\text{credit} - 0.45), -0.02 \right\},.. math::
\tau_2(X) = \max \left\{ 0.16 + 0.014 \, \text{sessions} + 0.030 \, \log(1 + \text{spend}) + 0.06 \, \text{urban} - 0.006 \, \text{tickets} + 0.12 \, (\text{credit} - 0.45), 0.02 \right\}.So
d_1is always weakly worse than control on the log-mean scale, whiled_2is always weakly better than control.
- causalis.scenarios.multi_unconfoundedness.dgp.generate_multitreatment_binary_26(n: int = 100000, seed: int = 42, include_oracle: bool = False, return_causal_data: bool = True) Union[pandas.DataFrame, causalis.data_contracts.multicausaldata.MultiCausalData]¶
Pre-configured multi-treatment dataset with Binary outcome.
3 treatment classes:
d_0(control),d_1,d_28 confounders with realistic marginals sampled through a Gaussian copula
Binary outcome with a logistic baseline and heterogeneous arm effects
Examples
df = generate_multitreatment_binary_26(n=256, seed=7, return_causal_data=False) bool(df[[“d_0”, “d_1”, “d_2”]].sum(axis=1).eq(1).all()) True {“weekly_active_days”, “engagement_score”, “y”}.issubset(df.columns) True
Notes
Let :math:
X = (\text{tenure}, \text{active days}, \text{income}, \text{premium}, \text{family}, \text{complaints}, \text{discount}, \text{engagement})denote the 8 confounders. Treatment assignment again follows a calibrated multinomial logit with target arm rates near :math:(0.50, 0.25, 0.25):.. math::
s_k(X) = \alpha_{d,k} + \beta_{d,k}^{\top} X, \qquad \Pr(D = k \mid X) = \frac{\exp(s_k(X))}{\sum_{j=0}^{2} \exp(s_j(X))}.The outcome uses a logistic link with
alpha_y = -1.1:.. math::
\operatorname{logit}\Pr(Y(k)=1 \mid X) = -1.1 + \beta_y^{\top} X + \theta_k + \tau_k(X).This scenario fixes :math:
\theta = (0, -0.18, 0.26)and uses.. math::
\tau_1(X) = \min \left\{ -0.16 - 0.0008 \, \text{tenure} - 0.020 \, \text{active days} - 0.08 \, \text{premium} - 0.03 \, \text{complaints} - 0.10 \, (\text{engagement} - 0.60), -0.02 \right\},.. math::
\tau_2(X) = \max \left\{ 0.14 + 0.020 \, \text{active days} + 0.028 \, \log(1 + \text{income}) + 0.05 \, \text{family} - 0.010 \, \text{complaints} + 0.12 \, (\text{engagement} - 0.60), 0.02 \right\}.The clipping keeps
d_1uniformly below control andd_2uniformly above control on the log-odds scale, while the Gaussian copula with :math:\mathrm{Corr}(X_i, X_j) = 0.3^{|i-j|}induces cross-feature dependence.
- causalis.scenarios.multi_unconfoundedness.dgp.generate_multitreatment_irm_26(n: int = 100000, seed: int = 42, include_oracle: bool = False, return_causal_data: bool = True) Union[pandas.DataFrame, causalis.data_contracts.multicausaldata.MultiCausalData]¶
- causalis.scenarios.multi_unconfoundedness.dgp.generate_multi_dml_cx_26(n: int = 100000, seed: int = 42, include_oracle: bool = False, return_causal_data: bool = True) Union[pandas.DataFrame, causalis.data_contracts.multicausaldata.MultiCausalData]¶
The notebook simulates overlapping
contactandrepeatactions. This packaged DGP resolves them into a mutually exclusive one-hot treatment:controlneg_contact_flgerror_flgneg_contact_flg_error_flg
Treatment assignment matches the notebook’s independent Bernoulli contact and repeat mechanisms exactly after overlap-resolution, but is exposed through the shared multi-treatment generator so it integrates with
MultiCausalDataand the scenario tooling.Examples
df = generate_multi_dml_cx_26(n=256, seed=7, return_causal_data=False) treatment_cols = [“control”, “neg_contact_flg”, “error_flg”, “neg_contact_flg_error_flg”] bool(df[treatment_cols].sum(axis=1).eq(1).all()) True {“age”, “prev_apps”, “csat_prev”, “y”}.issubset(df.columns) True
Notes
Write :math:
a(X)for the contact logit and :math:b(X)for the repeat logit. The notebook first draws two conditionally independent Bernoulli actions,.. math::
C \mid X \sim \operatorname{Bernoulli}(\sigma(a(X))), \qquad R \mid X \sim \operatorname{Bernoulli}(\sigma(b(X))),where :math:
\sigma(z) = 1 / (1 + e^{-z}). In this packaged benchmark the pair :math:(C, R)is re-encoded as a one-hot treatment:.. math::
D = \begin{cases} \text{control} & (C, R) = (0, 0), \\ \text{neg\_contact\_flg} & (C, R) = (1, 0), \\ \text{error\_flg} & (C, R) = (0, 1), \\ \text{neg\_contact\_flg\_error\_flg} & (C, R) = (1, 1). \end{cases}Let :math:
p_c = \sigma(a(X))and :math:p_r = \sigma(b(X)). Then the arm probabilities are.. math::
\Pr(D=\text{control}\mid X) = (1-p_c)(1-p_r),.. math::
\Pr(D=\text{neg\_contact\_flg}\mid X) = p_c (1-p_r),.. math::
\Pr(D=\text{error\_flg}\mid X) = (1-p_c) p_r,.. math::
\Pr(D=\text{neg\_contact\_flg\_error\_flg}\mid X) = p_c p_r.Equivalently, this is exactly the softmax model with class scores :math:
(0, a(X), b(X), a(X)+b(X)), which is why the implementation passesg_d=[None, _cx_contact_logit, _cx_repeat_logit, lambda x: _cx_contact_logit(x) + _cx_repeat_logit(x)].The observed outcome uses a binary logit baseline :math:
g_y(X)plus a class effect.. math::
\operatorname{logit}\Pr(Y=1 \mid X, D) = g_y(X) + \theta(D),with :math:
\theta(\text{control}) = \theta(\text{neg\_contact\_flg}) = 0and :math:\theta(\text{error\_flg}) = \theta(\text{neg\_contact\_flg\_error\_flg}) = -0.65.Worked overlap example: if :math:
a(X)=0.8and :math:b(X)=-0.2, then :math:p_c \approx 0.690and :math:p_r \approx 0.450, giving arm probabilities approximately(0.170, 0.379, 0.140, 0.311)for(control, neg_contact_flg, error_flg, neg_contact_flg_error_flg).
- causalis.scenarios.multi_unconfoundedness.dgp.multi_dml_cx_26¶
None