causalis.shared.rct_design.split¶
Split (assignment) utilities for randomized controlled experiments.
This module provides deterministic assignment of variants to entities based on hashing a composite key (salt | layer_id | experiment_id | entity_id) into the unit interval and mapping it to cumulative variant weights.
The implementation mirrors the reference notebook in docs/cases/rct_design.ipynb.
Module Contents¶
Functions¶
Deterministically assign variants for each row in df based on id_col. |
API¶
- causalis.shared.rct_design.split.assign_variants_df(df: pandas.DataFrame, id_col: str, experiment_id: str, variants: Dict[str, float], *, salt: str = 'global_ab_salt', layer_id: str = 'default', variant_col: str = 'variant') → pandas.DataFrame¶
Deterministically assign variants for each row in df based on id_col.
Parameters
df : pd.DataFrame Input DataFrame with an identifier column. id_col : str Column name in df containing entity identifiers (user_id, session_id, etc.). experiment_id : str Unique identifier for the experiment (versioned for reruns). variants : Dict[str, float] Mapping from variant name to weight (coverage). Weights must be non-negative and their sum must be in (0, 1]. If the sum is < 1, the remaining mass corresponds to “not in experiment” and the assignment will be None. salt : str, default “global_ab_salt” Secret string to de-correlate from other hash uses and make assignments non-gameable. layer_id : str, default “default” Identifier for mutual exclusivity layer or surface. In this case work like another random variant_col : str, default “variant” Name of output column to store assigned variant labels.
Returns
pd.DataFrame A copy of df with an extra column
variant_col. Entities outside experiment coverage will have None in the variant column.