causalis.data_contracts.causaldata

Causalis Dataclass for storing Cross-sectional DataFrame and column metadata for causal inference.

Module Contents

Classes

CausalData

Container for causal inference datasets.

API

class causalis.data_contracts.causaldata.CausalData(/, **data: Any)

Bases: pydantic.BaseModel

Container for causal inference datasets.

Wraps a pandas DataFrame and stores the names of treatment, outcome, and optional confounder columns. The stored DataFrame is restricted to only those columns. Uses Pydantic for validation and as a data_contracts contract.

Attributes

df : pd.DataFrame The DataFrame containing the data_contracts restricted to outcome, treatment, and confounder columns. NaN values are not allowed in the used columns. treatment_name : str Column name representing the treatment variable. outcome_name : str Column name representing the outcome variable. confounders_names : List[str] Names of the confounder columns (may be empty). user_id_name : str, optional Column name representing the unique identifier for each observation/user.

Initialization

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

model_config

‘ConfigDict(…)’

df: pandas.DataFrame

None

treatment_name: str

‘Field(…)’

outcome_name: str

‘Field(…)’

confounders_names: List[str]

‘Field(…)’

user_id_name: Optional[str]

‘Field(…)’

classmethod from_df(df: pandas.DataFrame, treatment: str, outcome: str, confounders: Optional[Union[str, List[str]]] = None, user_id: Optional[str] = None, **kwargs: Any) causalis.data_contracts.causaldata.CausalData

Friendly constructor for CausalData.

Parameters

df : pd.DataFrame The DataFrame containing the data_contracts. treatment : str Column name representing the treatment variable. outcome : str Column name representing the outcome variable. confounders : Union[str, List[str]], optional Column name(s) representing the confounders/covariates. user_id : str, optional Column name representing the unique identifier for each observation/user. **kwargs : Any Additional arguments passed to the Pydantic model constructor.

Returns

CausalData A validated CausalData instance.

property outcome: pandas.Series

Outcome column as a Series.

Returns

pd.Series The outcome column.

property treatment: pandas.Series

Treatment column as a Series.

Returns

pd.Series The treatment column.

property confounders: List[str]

List of confounder column names.

Returns

List[str] Names of the confounder columns.

property user_id: pandas.Series

user_id column as a Series.

Returns

pd.Series The user_id column.

property X: pandas.DataFrame

Design matrix of confounders.

Returns

pd.DataFrame The DataFrame containing only confounder columns.

get_df(columns: Optional[List[str]] = None, include_treatment: bool = True, include_outcome: bool = True, include_confounders: bool = True, include_user_id: bool = False) pandas.DataFrame

Get a DataFrame with specified columns.

Parameters

columns : List[str], optional Specific column names to include. include_treatment : bool, default True Whether to include the treatment column. include_outcome : bool, default True Whether to include the outcome column. include_confounders : bool, default True Whether to include confounder columns. include_user_id : bool, default False Whether to include the user_id column.

Returns

pd.DataFrame A copy of the internal DataFrame with selected columns.

Raises

ValueError If any specified columns do not exist.

__repr__() str