causalis.shared.rct_design.split

Split (assignment) utilities for randomized controlled experiments.

This module provides deterministic assignment of variants to entities based on hashing a composite key (salt | layer_id | experiment_id | entity_id) into the unit interval and mapping it to cumulative variant weights.

The implementation mirrors the reference notebook in docs/cases/rct_design.ipynb.

Module Contents

Functions

assign_variants_df

Deterministically assign variants for each row in df based on id_col.

API

causalis.shared.rct_design.split.assign_variants_df(df: pandas.DataFrame, id_col: str, experiment_id: str, variants: Dict[str, float], *, salt: str = 'global_ab_salt', layer_id: str = 'default', variant_col: str = 'variant') pandas.DataFrame

Deterministically assign variants for each row in df based on id_col.

Parameters

df : pd.DataFrame Input DataFrame with an identifier column. id_col : str Column name in df containing entity identifiers (user_id, session_id, etc.). experiment_id : str Unique identifier for the experiment (versioned for reruns). variants : Dict[str, float] Mapping from variant name to weight (coverage). Weights must be non-negative and their sum must be in (0, 1]. If the sum is < 1, the remaining mass corresponds to “not in experiment” and the assignment will be None. salt : str, default “global_ab_salt” Secret string to de-correlate from other hash uses and make assignments non-gameable. layer_id : str, default “default” Identifier for mutual exclusivity layer or surface. In this case work like another random variant_col : str, default “variant” Name of output column to store assigned variant labels.

Returns

pd.DataFrame A copy of df with an extra column variant_col. Entities outside experiment coverage will have None in the variant column.