causalis.data_contracts.multicausaldata

Causalis Dataclass for storing Cross-sectional DataFrame and column metadata for causal inference with multiple treatments.

Module Contents

Classes

MultiCausalData

Data contract for cross-sectional causal data with multi-class one-hot treatments.

API

class causalis.data_contracts.multicausaldata.MultiCausalData(/, **data: Any)

Bases: pydantic.BaseModel

Data contract for cross-sectional causal data with multi-class one-hot treatments.

Parameters

df : pd.DataFrame The DataFrame containing the causal data. outcome : str The name of the outcome column. treatment_names : List[str] The names of the treatment columns. confounders : List[str], optional The names of the confounder columns, by default []. user_id : Optional[str], optional The name of the user ID column, by default None. control_treatment : str Name of the control/baseline treatment column.

Notes

This class enforces several constraints on the data, including:

  • Maximum number of treatment_names (default 15).

  • No duplicate column names in the input DataFrame.

  • Disjoint roles for columns (outcome, treatment_names, confounders, user_id).

  • Non-empty normalized names for outcome and user_id (if provided).

  • Existence of all specified columns in the DataFrame.

  • Numeric or boolean types for outcome and confounders.

  • Finite values for outcome, confounders, and treatment_names.

  • Non-constant values for outcome, treatment_names, and confounders.

  • No NaN values in used columns.

  • Binary (0/1) encoding for treatment columns.

  • One-hot treatment assignment (exactly one active treatment per row).

  • A stable control treatment in position 0.

  • No identical values between different columns.

  • Unique values for user_id (if specified).

Initialization

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

model_config

‘ConfigDict(…)’

MAX_TREATMENTS: ClassVar[int]

15

FLOAT_TOL: ClassVar[float]

1e-12

df: pandas.DataFrame

None

outcome: str

None

treatment_names: List[str]

None

confounders: List[str]

‘Field(…)’

user_id: Optional[str]

None

control_treatment: str

None

classmethod from_df(df: pandas.DataFrame, *, outcome: str, treatment_names: Union[str, List[str]], confounders: Optional[Union[str, List[str]]] = None, user_id: Optional[str] = None, control_treatment: str, **kwargs: Any) causalis.data_contracts.multicausaldata.MultiCausalData

Create a MultiCausalData instance from a pandas DataFrame.

Parameters

df : pd.DataFrame The input DataFrame. outcome : str The name of the outcome column. treatment_names : Union[str, List[str]] The name(s) of the treatment column(s). confounders : Union[str, List[str]], optional The name(s) of the confounder column(s), by default None. user_id : str, optional The name of the user ID column, by default None. control_treatment : str Name of the control treatment column. **kwargs : Any Additional keyword arguments passed to the constructor.

Returns

MultiCausalData An instance of MultiCausalData.

property treatments: pandas.DataFrame

Return the treatment columns as a pandas DataFrame.

Returns

pd.DataFrame The treatment columns.

property treatment: pandas.Series

Return the single treatment column as a pandas Series.

Returns

pd.Series The treatment column.

Raises

AttributeError If there is more than one treatment column.

property X: pandas.DataFrame

Return the confounder columns as a pandas DataFrame.

Returns

pd.DataFrame The confounder columns.

get_df(columns: Optional[List[str]] = None, include_outcome: bool = True, include_confounders: bool = True, include_treatments: bool = True, include_user_id: bool = False) pandas.DataFrame

Get a subset of the underlying DataFrame.

Parameters

columns : List[str], optional Specific columns to include, by default None. include_outcome : bool, optional Whether to include the outcome column, by default True. include_confounders : bool, optional Whether to include confounder columns, by default True. include_treatments : bool, optional Whether to include treatment columns, by default True. include_user_id : bool, optional Whether to include the user ID column, by default False.

Returns

pd.DataFrame A copy of the requested DataFrame subset.

Raises

ValueError If any of the requested columns do not exist.

__repr__() str
__str__() str