modularml.core.splitters
- class modularml.core.splitters.BaseSplitter
Bases:
ABC
- class modularml.core.splitters.ConditionSplitter(**conditions: Dict[str, Dict[str, Any | List[Any] | Callable]])
Bases:
BaseSplitterSplits samples into subsets based on user-defined filtering conditions.
- Parameters:
conditions (Dict[str, Dict[str, Union[Any, List[Any], Callable]]]) – The outer dict defines keys representing the new subset names. Within each subset, an inner dict defines the filtering conditions to construct the subset. The inner dictionary uses that same format as the FeatureSet.filter()) method.
Examples: Below defines three subsets (‘low_temp’, ‘high_temp’, and ‘cell_5’). The ‘low_temp’ subset contains all samples with temperatures under 20, the ‘high_temp’ subsets contains all samples with temperature greater than 20, and the ‘cell_5’ subset contains all samples where cell_id is 5. Note that subsets can have overlapping samples if the split conditions are not carefully **defined. A UserWarning will be raised when this happens, **
- ``` python
- ConditionSplitter(
low_temp={‘temperature’: lambda x: x < 20}, high_temp={‘temperature’: lambda x: x >= 20}, cell_5={‘cell_id’: 5}
)
- __init__(**conditions: Dict[str, Dict[str, Any | List[Any] | Callable]])
Splits samples into subsets based on user-defined filtering conditions.
- Parameters:
conditions (Dict[str, Dict[str, Union[Any, List[Any], Callable]]]) – The outer dict defines keys representing the new subset names. Within each subset, an inner dict defines the filtering conditions to construct the subset. The inner dictionary uses that same format as the FeatureSet.filter()) method.
Examples: Below defines three subsets (‘low_temp’, ‘high_temp’, and ‘cell_5’). The ‘low_temp’ subset contains all samples with temperatures under 20, the ‘high_temp’ subsets contains all samples with temperature greater than 20, and the ‘cell_5’ subset contains all samples where cell_id is 5. Note that subsets can have overlapping samples if the split conditions are not carefully **defined. A UserWarning will be raised when this happens, **
- ``` python
- ConditionSplitter(
low_temp={‘temperature’: lambda x: x < 20}, high_temp={‘temperature’: lambda x: x >= 20}, cell_5={‘cell_id’: 5}
)
- class modularml.core.splitters.RandomSplitter(ratios: Dict[str, float], seed: int = 42)
Bases:
BaseSplitterCreates a random splitter based on sample ratios
- Parameters:
ratios (Dict[str, float]) – Keyword-arguments that define subset names and percent splits. E.g., RandomSplitter(train=0.5, test=0.5). All values must add to exactly 1.0.
seed (int) – The seed of the random generator.
- __init__(ratios: Dict[str, float], seed: int = 42)
Creates a random splitter based on sample ratios
- Parameters:
ratios (Dict[str, float]) – Keyword-arguments that define subset names and percent splits. E.g., RandomSplitter(train=0.5, test=0.5). All values must add to exactly 1.0.
seed (int) – The seed of the random generator.
Modules