modularml.core.splitters.conditon_splitter

Classes

ConditionSplitter(**conditions)

Splits samples into subsets based on user-defined filtering conditions.

class modularml.core.splitters.conditon_splitter.ConditionSplitter(**conditions: Dict[str, Dict[str, Any | List[Any] | Callable]])

Bases: BaseSplitter

Splits samples into subsets based on user-defined filtering conditions.

Parameters:

conditions (Dict[str, Dict[str, Union[Any, List[Any], Callable]]]) – The outer dict defines keys representing the new subset names. Within each subset, an inner dict defines the filtering conditions to construct the subset. The inner dictionary uses that same format as the FeatureSet.filter()) method.

Examples: Below defines three subsets (‘low_temp’, ‘high_temp’, and ‘cell_5’). The ‘low_temp’ subset contains all samples with temperatures under 20, the ‘high_temp’ subsets contains all samples with temperature greater than 20, and the ‘cell_5’ subset contains all samples where cell_id is 5. Note that subsets can have overlapping samples if the split conditions are not carefully **defined. A UserWarning will be raised when this happens, **

``` python
ConditionSplitter(

low_temp={‘temperature’: lambda x: x < 20}, high_temp={‘temperature’: lambda x: x >= 20}, cell_5={‘cell_id’: 5}

)

```

__init__(**conditions: Dict[str, Dict[str, Any | List[Any] | Callable]])

Splits samples into subsets based on user-defined filtering conditions.

Parameters:

conditions (Dict[str, Dict[str, Union[Any, List[Any], Callable]]]) – The outer dict defines keys representing the new subset names. Within each subset, an inner dict defines the filtering conditions to construct the subset. The inner dictionary uses that same format as the FeatureSet.filter()) method.

Examples: Below defines three subsets (‘low_temp’, ‘high_temp’, and ‘cell_5’). The ‘low_temp’ subset contains all samples with temperatures under 20, the ‘high_temp’ subsets contains all samples with temperature greater than 20, and the ‘cell_5’ subset contains all samples where cell_id is 5. Note that subsets can have overlapping samples if the split conditions are not carefully **defined. A UserWarning will be raised when this happens, **

``` python
ConditionSplitter(

low_temp={‘temperature’: lambda x: x < 20}, high_temp={‘temperature’: lambda x: x >= 20}, cell_5={‘cell_id’: 5}

)

```

split(samples: List[Sample]) Dict[str, List[str]]

Applies the condition-based split on the given samples.

Parameters:

samples (List[Sample]) – The list of samples to split.

Returns:

Dictionary mapping subset names to Sample.uuid.

Return type:

Dict[str, List[str]]