causalis.dgp.multicausaldata.base

Module Contents

Classes

MultiCausalDatasetGenerator

Generate synthetic causal datasets with multi-class (one-hot) treatments.

API

class causalis.dgp.multicausaldata.base.MultiCausalDatasetGenerator

Generate synthetic causal datasets with multi-class (one-hot) treatments.

Treatment assignment is modeled via a multinomial logistic (softmax) model: P(D=k | X, U) = softmax_k(alpha_d[k] + f_k(X) + u_strength_d[k] * U)

Outcome depends on confounders and the assigned treatment class: outcome_type = “continuous”: Y = alpha_y + f_y(X) + u_strength_y * U + sum_k D_k * (theta_k + tau_k(X)) + eps outcome_type = “binary”: logit P(Y=1|X,D,U) = alpha_y + f_y(X) + u_strength_y * U + sum_k D_k * (theta_k + tau_k(X)) outcome_type = “poisson”: log E[Y|X,D,U] = alpha_y + f_y(X) + u_strength_y * U + sum_k D_k * (theta_k + tau_k(X)) outcome_type = “gamma”: log E[Y|X,D,U] = alpha_y + f_y(X) + u_strength_y * U + sum_k D_k * (theta_k + tau_k(X))

Parameters

n_treatments : int, default=3 Number of treatment classes (including control). Column 0 is treated as control. Generated treatment columns are a full one-hot encoding that sums to 1. d_names : list of str, optional Names of treatment columns. If None, uses [“d_0”, “d_1”, …]. theta : float or array-like, optional Constant treatment effects on the link scale for each class. If scalar, applied to all non-control classes (control effect = 0). If length K-1, prepends 0 for control. If length K, uses as provided. tau : callable or list of callables, optional Heterogeneous effects for each class. If callable, applied to non-control classes. Effects are additive with theta on the link scale: tau_link_k(X) = theta_k + tau_k(X). beta_y : array-like, optional Linear coefficients for baseline outcome f_y(X). g_y : callable, optional Nonlinear baseline outcome function g_y(X). alpha_y : float, default=0.0 Outcome intercept on link scale. sigma_y : float, default=1.0 Std dev for continuous outcomes. outcome_type : {“continuous”, “binary”, “poisson”, “gamma”}, default=”continuous” Outcome family. gamma_shape : float, default=2.0 Shape parameter for gamma outcomes. u_strength_y : float, default=0.0 Strength of unobserved confounder in outcome. confounder_specs : list of dict, optional Schema for generating confounders (same format as CausalDatasetGenerator). k : int, default=5 Number of confounders if confounder_specs is None. x_sampler : callable, optional Custom sampler (n, k, seed) -> X ndarray. use_copula : bool, default=False If True and confounder_specs provided, use Gaussian copula for X. copula_corr : array-like, optional Correlation matrix for copula. beta_d : array-like or list, optional Linear coefficients for treatment assignment. If array of shape (k,), applies to all non-control classes. If shape (K,k), uses per class. g_d : callable or list of callables, optional Nonlinear treatment score per class. If callable, applies to non-control classes. alpha_d : float or array-like, optional Intercepts for treatment scores. If scalar, applies to non-control classes. u_strength_d : float or array-like, default=0.0 Unobserved confounder strength in treatment assignment. If scalar, interpreted as [0, c, c, …] so latent U perturbs non-control classes relative to control (and does not cancel in softmax). propensity_sharpness : float, default=1.0 Scales treatment scores to adjust overlap. target_d_rate : array-like, optional Target marginal class probabilities (length K). Calibrates alpha_d using iterative scaling (approximate when u_strength_d != 0). include_oracle : bool, default=True Whether to include oracle columns for propensities and potential outcomes. seed : int, optional Random seed.

n_treatments: int

3

d_names: Optional[List[str]]

None

theta: Optional[Union[float, List[float], numpy.ndarray]]

1.0

tau: Optional[Union[Callable[[numpy.ndarray], numpy.ndarray], List[Optional[Callable[[numpy.ndarray], numpy.ndarray]]]]]

None

beta_y: Optional[numpy.ndarray]

None

g_y: Optional[Callable[[numpy.ndarray], numpy.ndarray]]

None

alpha_y: float

0.0

sigma_y: float

1.0

outcome_type: str

‘continuous’

gamma_shape: float

2.0

u_strength_y: float

0.0

confounder_specs: Optional[List[Dict[str, Any]]]

None

k: int

5

x_sampler: Optional[Callable[[int, int, int], numpy.ndarray]]

None

use_copula: bool

False

copula_corr: Optional[numpy.ndarray]

None

beta_d: Optional[Union[numpy.ndarray, List[Optional[numpy.ndarray]]]]

None

g_d: Optional[Union[Callable[[numpy.ndarray], numpy.ndarray], List[Optional[Callable[[numpy.ndarray], numpy.ndarray]]]]]

None

alpha_d: Optional[Union[float, List[float], numpy.ndarray]]

None

u_strength_d: Union[float, List[float], numpy.ndarray]

0.0

propensity_sharpness: float

1.0

target_d_rate: Optional[Union[List[float], numpy.ndarray]]

None

include_oracle: bool

True

seed: Optional[int]

None

rng: numpy.random.Generator

‘field(…)’

confounder_names_: List[str]

‘field(…)’

__post_init__() None
generate(n: int, U: Optional[numpy.ndarray] = None) pandas.DataFrame
to_multicausal_data(n: int, confounders: Optional[Union[str, List[str]]] = None) causalis.data_contracts.multicausaldata.MultiCausalData