API for module: spikeinterface.curation

Class: CurationSorting
  Docstring:
    Class that handles curation of a Sorting object.
    
    Parameters
    ----------
    sorting : BaseSorting
        The sorting object
    properties_policy : "keep" | "remove", default: "keep"
        Policy used to propagate properties after split and merge operation. If "keep" the properties will be
        passed to the new units (if the original units have the same value). If "remove" the new units will have
        an empty value for all the properties
    make_graph : bool
        True to keep a Networkx graph instance with the curation history
    
    Returns
    -------
    sorting : Sorting
        Sorting object with the selected units merged
  __init__(self, sorting, make_graph=False, properties_policy='keep')
  Method: draw_graph(self, **kwargs)
    Docstring:
      Draw the curation graph.
      
      Parameters
      ----------
      **kwargs : dict
          Keyword arguments for Networkx draw function
  Method: merge(self, units_to_merge, new_unit_id=None, delta_time_ms=0.4)
    Docstring:
      Merge a list of units into a new unit.
      
      Parameters
      ----------
      units_to_merge : list[str|int]
          List of unit ids to merge
      new_unit_id : int or str
          The new unit id. If None, a new unit id is automatically selected
      delta_time_ms : float
          Number of ms to consider for duplicated spikes. None won't check for duplications
  Method: redo(self)
    Docstring:
      Redo the last operation.
  Method: redo_available(self)
    Docstring:
      Check if redo is available.
      
      Returns
      -------
      bool
          True if redo is available
  Method: remove_empty_units(self)
    Docstring:
      Remove empty units.
  Method: remove_unit(self, unit_id)
    Docstring:
      Remove a unit.
      
      Parameters
      ----------
      unit_id : int ot str
          The unit id to remove
  Method: remove_units(self, unit_ids)
    Docstring:
      Remove a list of units.
      
      Parameters
      ----------
      unit_ids : list[str|int]
          List of unit ids to remove
  Method: rename(self, renamed_unit_ids)
    Docstring:
      Rename a list of units.
      
      Parameters
      ----------
      renamed_unit_ids : list[str|int]
          List of unit ids to rename exisiting units
  Method: select_units(self, unit_ids, renamed_unit_ids=None)
    Docstring:
      Select a list of units.
      
      Parameters
      ----------
      unit_ids : list[str|int]
          List of unit ids to select
      renamed_unit_ids : list or None, default: None
          List of new unit ids to rename the selected units
  Method: split(self, split_unit_id, indices_list, new_unit_ids=None)
    Docstring:
      Split a unit into multiple units.
      
      Parameters
      ----------
      split_unit_id : int or str
          The unit to split
      indices_list : list or np.array
          A list of index arrays selecting the spikes to split in each segment.
          Each array can contain more than 2 indices (e.g. for splitting in 3 or more units) and it should
          be the same length as the spike train (for each segment).
          If the sorting has only one segment, indices_list can be a single array
      new_unit_ids : list[str|int] ot None
          List of new unit ids. If None, a new unit id is automatically selected
  Method: undo(self)
    Docstring:
      Undo the last operation.
  Method: undo_available(self)
    Docstring:
      Check if undo is available.
      
      Returns
      -------
      bool
          True if undo is available

Class: MergeUnitsSorting
  Docstring:
    Class that handles several merges of units from a Sorting object based on a list of lists of unit_ids.
    
    Parameters
    ----------
    sorting : BaseSorting
        The sorting object
    units_to_merge : list/tuple of lists/tuples
        A list of lists for every merge group. Each element needs to have at least two elements (two units to merge),
        but it can also have more (merge multiple units at once).
    new_unit_ids : None or list
        A new unit_ids for merged units. If given, it needs to have the same length as `units_to_merge`
    properties_policy : "keep" | "remove", default: "keep"
        Policy used to propagate properties. If "keep" the properties will be passed to the new units
         (if the units_to_merge have the same value). If "remove" the new units will have an empty
         value for all the properties of the new unit.
    delta_time_ms : float or None
        Number of ms to consider for duplicated spikes. None won't check for duplications
    
    Returns
    -------
    sorting : Sorting
        Sorting object with the selected units merged
  __init__(self, sorting, units_to_merge, new_unit_ids=None, properties_policy='keep', delta_time_ms=0.4)

Class: SplitUnitSorting
  Docstring:
    Class that handles spliting of a unit. It creates a new Sorting object linked to parent_sorting.
    
    Parameters
    ----------
    sorting : BaseSorting
        The sorting object
    split_unit_id : int
        Unit id of the unit to split
    indices_list : list or np.array
        A list of index arrays selecting the spikes to split in each segment.
        Each array can contain more than 2 indices (e.g. for splitting in 3 or more units) and it should
        be the same length as the spike train (for each segment).
        If the sorting has only one segment, indices_list can be a single array
    new_unit_ids : int
        Unit ids of the new units to be created
    properties_policy : "keep" | "remove", default: "keep"
        Policy used to propagate properties. If "keep" the properties will be passed to the new units
         (if the units_to_merge have the same value). If "remove" the new units will have an empty
         value for all the properties of the new unit
    
    Returns
    -------
    sorting : Sorting
        Sorting object with the selected units split
  __init__(self, sorting, split_unit_id, indices_list, new_unit_ids=None, properties_policy='keep')

Function: apply_curation(sorting_or_analyzer, curation_dict, censor_ms=None, new_id_strategy='append', merging_mode='soft', sparsity_overlap=0.75, verbose=False, **job_kwargs)
  Docstring:
    Apply curation dict to a Sorting or a SortingAnalyzer.
    
    Steps are done in this order:
      1. Apply removal using curation_dict["removed_units"]
      2. Apply merges using curation_dict["merge_unit_groups"]
      3. Set labels using curation_dict["manual_labels"]
    
    A new Sorting or SortingAnalyzer (in memory) is returned.
    The user (an adult) has the responsability to save it somewhere (or not).
    
    Parameters
    ----------
    sorting_or_analyzer : Sorting | SortingAnalyzer
        The Sorting object to apply merges.
    curation_dict : dict
        The curation dict.
    censor_ms : float | None, default: None
        When applying the merges, any consecutive spikes within the `censor_ms` are removed. This can be thought of
        as the desired refractory period. If `censor_ms=None`, no spikes are discarded.
    new_id_strategy : "append" | "take_first", default: "append"
        The strategy that should be used, if `new_unit_ids` is None, to create new unit_ids.
    
            * "append" : new_units_ids will be added at the end of max(sorting.unit_ids)
            * "take_first" : new_unit_ids will be the first unit_id of every list of merges
    merging_mode : "soft" | "hard", default: "soft"
        How merges are performed for SortingAnalyzer. If the `merge_mode` is "soft" , merges will be approximated, with no reloading of
        the waveforms. This will lead to approximations. If `merge_mode` is "hard", recomputations are accurately
        performed, reloading waveforms if needed
    sparsity_overlap : float, default 0.75
        The percentage of overlap that units should share in order to accept merges. If this criteria is not
        achieved, soft merging will not be possible and an error will be raised. This is for use with a SortingAnalyzer input.
    verbose : bool, default: False
        If True, output is verbose
    **job_kwargs : dict
        Job keyword arguments for `merge_units`
    
    Returns
    -------
    sorting_or_analyzer : Sorting | SortingAnalyzer
        The curated object.

Function: apply_sortingview_curation(sorting_or_analyzer, uri_or_json, exclude_labels=None, include_labels=None, skip_merge=False, verbose=None)
  Docstring:
    Apply curation from SortingView manual legacy curation format (before the official "curation_format")
    
    First, merges (if present) are applied. Then labels are loaded and units
    are optionally filtered based on exclude_labels and include_labels.
    
    Parameters
    ----------
    sorting_or_analyzer : Sorting | SortingAnalyzer
        The sorting or analyzer to be curated
    uri_or_json : str or Path
        The URI curation link from SortingView or the path to the curation json file
    exclude_labels : list, default: None
        Optional list of labels to exclude (e.g. ["reject", "noise"]).
        Mutually exclusive with include_labels
    include_labels : list, default: None
        Optional list of labels to include (e.g. ["accept"]).
        Mutually exclusive with exclude_labels,  by default None
    skip_merge : bool, default: False
        If True, merges are not applied (only labels)
    verbose : None
        Deprecated
    
    
    Returns
    -------
    sorting_or_analyzer_curated : BaseSorting
        The curated sorting or analyzer

Function: auto_label_units(sorting_analyzer: spikeinterface.core.sortinganalyzer.SortingAnalyzer, model_folder=None, model_name=None, repo_id=None, label_conversion=None, trust_model=False, trusted=None, export_to_phy=False, enforce_metric_params=False)
  Docstring:
    Automatically labels units based on a model-based classification, either from a model
    hosted on HuggingFaceHub or one available in a local folder.
    
    This function returns the predicted labels and the prediction probabilities, and populates
    the sorting object with the predicted labels and probabilities in the 'classifier_label' and
    'classifier_probability' properties.
    
    Parameters
    ----------
    sorting_analyzer : SortingAnalyzer
        The sorting analyzer object containing the spike sorting results.
    model_folder : str or Path, defualt: None
        The path to the folder containing the model
    repo_id : str | Path, default: None
        Hugging face repo id which contains the model e.g. 'username/model'
    model_name: str | Path, default: None
        Filename of model e.g. 'my_model.skops'. If None, uses first model found.
    label_conversion : dic | None, default: None
        A dictionary for converting the predicted labels (which are integers) to custom labels. If None,
        tries to extract from `model_info.json` file. The dictionary should have the format {old_label: new_label}.
    export_to_phy : bool, default: False
        Whether to export the results to Phy format. Default is False.
    trust_model : bool, default: False
        Whether to trust the model. If True, the `trusted` parameter that is passed to `skops.load` to load the model will be
        automatically inferred. If False, the `trusted` parameter must be provided to indicate the trusted objects.
    trusted : list of str, default: None
        Passed to skops.load. The object will be loaded only if there are only trusted objects and objects of types listed in trusted in the dumped file.
    enforce_metric_params : bool, default: False
            If True and the parameters used to compute the metrics in `sorting_analyzer` are different than the parmeters
            used to compute the metrics used to train the model, this function will raise an error. Otherwise, a warning is raised.
    
    
    Returns
    -------
    classified_units : pd.DataFrame
        A dataframe containing the classified units, indexed by the `unit_ids`, containing the predicted label
        and confidence probability of each labelled unit.
    
    Raises
    ------
    ValueError
        If the pipeline is not an instance of sklearn.pipeline.Pipeline.

Function: auto_merge_units(sorting_analyzer: 'SortingAnalyzer', presets: 'list | None' = ['similarity_correlograms'], steps_params: 'dict' = None, steps: 'list[str] | None' = None, recursive: 'bool' = False, censor_ms=None, sparsity_overlap=0.75, merging_mode='soft', new_id_strategy='append', raise_error: 'bool' = False, extra_outputs: 'bool' = False, force_copy: 'bool' = True, **job_kwargs) -> 'SortingAnalyzer'
  Docstring:
    Automatically finds and apply merges.
    This function enables one to launch several merging presets in sequence and also to apply each
    step recursively.
    Merges are applied sequentially or until no more merges are done, one preset at a time, and extensions
    are not recomputed thanks to the merging units. Internally, the function uses _auto_merge_units_single_iteration()
    that is called for every preset and/or combinations of steps
    
    Parameters
    ----------
    sorting_analyzer : SortingAnalyzer
        The SortingAnalyzer
    presets : str or list, default = "similarity_correlograms"
        A single preset or a list of presets, that should be applied iteratively to the data
    steps_params : dict or list of dict, default None
        The params that should be used for the steps or presets. Should be a single dict if only one steps,
        or a list of dict if multiples steps (same size as presets)
    steps : list or list of list, default None
        The list of steps that should be applied. If list of list is provided, then these lists will be applied
        iteratively. Mutually exclusive with presets
    recursive : bool, default: False
        If True, then each presets of the list is applied until no further merges can be done, before trying
        the next one
    censor_ms : None or float, default: None
        When merging units, any spikes violating this refractory period will be discarded.
    merging_mode : "soft" | "hard", default: "soft"
        How merges are performed. In the "soft" mode, merges will be approximated, with no smart merging
        of the extension data.
    sparsity_overlap : float, default 0.75
        The percentage of overlap that units should share in order to accept merges. If this criteria is not
        achieved, soft merging will not be performed.
    new_id_strategy : "append" | "take_first", default: "append"
            The strategy that should be used, if `new_unit_ids` is None, to create new unit_ids.
                * "append" : new_units_ids will be added at the end of max(sorting.unit_ids)
                * "take_first" : new_unit_ids will be the first unit_id of every list of merges
    raise_error : bool, default: False
        If True, an error is raised if the merges can not be done. Otherwise, warning are displayed
    extra_outputs : bool, default: False
        If True, additional list of merges applied at every preset, and dictionary (`outs`) with processed data are returned.
    force_copy : boolean, default: True
        When new extensions are computed, the default is to make a copy of the analyzer, to avoid overwriting
        already computed extensions. False if you want to overwrite
    
    IMPORTANT: internally, all computations are relying on extensions of the analyzer, that are computed
    with default parameters if not present (i.e. correlograms, template_similarity, ...) If you want to
    have a finer control on these values, please precompute the extensions before applying the auto_merge
    
    If you have errors on sparsity_overlap, this is because you are trying to perform soft_merges for units
    that are barely overlapping. While in theory this should not happen, if this is the case, it means that either
    you are trying to perform too aggressive merges (and thus check params), and/or that you should switch to hard merges.
    
    Returns
    -------
    sorting_analyzer:
        The new sorting analyzer where all the merges from all the presets have been applied
    merges, outs:
        Returned only when extra_outputs=True
        A list with all the merges performed at every steps, and dictionaries that contains data for debugging and plotting.

Function: compute_merge_unit_groups(sorting_analyzer: 'SortingAnalyzer', preset: 'str | None' = 'similarity_correlograms', resolve_graph: 'bool' = True, steps_params: 'dict' = None, compute_needed_extensions: 'bool' = True, extra_outputs: 'bool' = False, steps: 'list[str] | None' = None, force_copy: 'bool' = True, **job_kwargs) -> 'list[tuple[int | str, int | str]] | Tuple[list[tuple[int | str, int | str]], dict]'
  Docstring:
    Algorithm to find and check potential merges between units.
    
    The merges are proposed based on a series of steps with different criteria:
    
        * "num_spikes": enough spikes are found in each unit for computing the correlogram (`min_spikes`)
        * "snr": the SNR of the units is above a threshold (`min_snr`)
        * "remove_contaminated": each unit is not contaminated (by checking auto-correlogram - `contamination_thresh`)
        * "unit_locations": estimated unit locations are close enough (`max_distance_um`)
        * "correlogram": the cross-correlograms of the two units are similar to each auto-corrleogram (`corr_diff_thresh`)
        * "template_similarity": the templates of the two units are similar (`template_diff_thresh`)
        * "presence_distance": the presence of the units is complementary in time (`presence_distance_thresh`)
        * "cross_contamination": the cross-contamination is not significant (`cc_thresh` and `p_value`)
        * "knn": the two units are close in the feature space
        * "quality_score": the unit "quality score" is increased after the merge
    
    The "quality score" factors in the increase in firing rate (**f**) due to the merge and a possible increase in
    contamination (**C**), wheighted by a factor **k** (`firing_contamination_balance`).
    
    .. math::
    
        Q = f(1 - (k + 1)C)
    
    IMPORTANT: internally, all computations are relying on extensions of the analyzer, that are computed
    with default parameters if not present (i.e. correlograms, template_similarity, ...) If you want to
    have a finer control on these values, please precompute the extensions before applying the auto_merge
    
    Parameters
    ----------
    sorting_analyzer : SortingAnalyzer
        The SortingAnalyzer
    preset : "similarity_correlograms" | "x_contaminations" | "temporal_splits" | "feature_neighbors" | None, default: "similarity_correlograms"
        The preset to use for the auto-merge. Presets combine different steps into a recipe and focus on:
    
        * | "similarity_correlograms": mainly focused on template similarity and correlograms.
          | It uses the following steps: "num_spikes", "remove_contaminated", "unit_locations",
          | "template_similarity", "correlogram", "quality_score"
        * | "x_contaminations": similar to "similarity_correlograms", but checks for cross-contamination instead of correlograms.
          | It uses the following steps: "num_spikes", "remove_contaminated", "unit_locations",
          | "template_similarity", "cross_contamination", "quality_score"
        * | "temporal_splits": focused on finding temporal splits using presence distance.
          | It uses the following steps: "num_spikes", "remove_contaminated", "unit_locations",
          | "template_similarity", "presence_distance", "quality_score"
        * | "feature_neighbors": focused on finding unit pairs whose spikes are close in the feature space using kNN.
          | It uses the following steps: "num_spikes", "snr", "remove_contaminated", "unit_locations",
          | "knn", "quality_score"
        If `preset` is None, you can specify the steps manually with the `steps` parameter.
    resolve_graph : bool, default: True
        If True, the function resolves the potential unit pairs to be merged into multiple-unit merges.
    compute_needed_extensions : bool, default : True
        Should we force the computation of needed extensions, if not already computed?
    extra_outputs : bool, default: False
        If True, an additional dictionary (`outs`) with processed data is returned.
    steps : None or list of str, default: None
        Which steps to run, if no preset is used.
        Pontential steps : "num_spikes", "snr", "remove_contaminated", "unit_locations", "correlogram",
        "template_similarity", "presence_distance", "cross_contamination", "knn", "quality_score"
        Please check steps explanations above!
    steps_params : dict
        A dictionary whose keys are the steps, and keys are steps parameters.
    force_copy : boolean, default: True
        When new extensions are computed, the default is to make a copy of the analyzer, to avoid overwriting
        already computed extensions. False if you want to overwrite
    
    Returns
    -------
    merge_unit_groups:
        List of groups that need to be merge.
        When `resolve_graph` is true (default) a list of tuples of 2+ elements
        If `resolve_graph` is false then a list of tuple of 2 elements is returned instead.
    outs:
        Returned only when extra_outputs=True
        A dictionary that contains data for debugging and plotting.
    
    References
    ----------
    This function used to be inspired and built upon similar functions from Lussac [Llobet]_,
    done by Aurelien Wyngaard and Victor Llobet.
    https://github.com/BarbourLab/lussac/blob/v1.0.0/postprocessing/merge_units.py
    
    However, it has been greatly consolidated and refined depending on the presets.

Function: curation_label_to_dataframe(curation_dict)
  Docstring:
    Transform the curation dict into a pandas dataframe.
    For label category with exclusive=True : a column is created and values are the unique label.
    For label category with exclusive=False : one column per possible is created and values are boolean.
    
    If exclusive=False and the same label appears several times then an error is raised.
    
    Parameters
    ----------
    curation_dict : dict
        A curation dictionary
    
    Returns
    -------
    labels : pd.DataFrame
        dataframe with labels.

Class: curation_sorting
  Docstring:
    Class that handles curation of a Sorting object.
    
    Parameters
    ----------
    sorting : BaseSorting
        The sorting object
    properties_policy : "keep" | "remove", default: "keep"
        Policy used to propagate properties after split and merge operation. If "keep" the properties will be
        passed to the new units (if the original units have the same value). If "remove" the new units will have
        an empty value for all the properties
    make_graph : bool
        True to keep a Networkx graph instance with the curation history
    
    Returns
    -------
    sorting : Sorting
        Sorting object with the selected units merged
  __init__(self, sorting, make_graph=False, properties_policy='keep')
  Method: draw_graph(self, **kwargs)
    Docstring:
      Draw the curation graph.
      
      Parameters
      ----------
      **kwargs : dict
          Keyword arguments for Networkx draw function
  Method: merge(self, units_to_merge, new_unit_id=None, delta_time_ms=0.4)
    Docstring:
      Merge a list of units into a new unit.
      
      Parameters
      ----------
      units_to_merge : list[str|int]
          List of unit ids to merge
      new_unit_id : int or str
          The new unit id. If None, a new unit id is automatically selected
      delta_time_ms : float
          Number of ms to consider for duplicated spikes. None won't check for duplications
  Method: redo(self)
    Docstring:
      Redo the last operation.
  Method: redo_available(self)
    Docstring:
      Check if redo is available.
      
      Returns
      -------
      bool
          True if redo is available
  Method: remove_empty_units(self)
    Docstring:
      Remove empty units.
  Method: remove_unit(self, unit_id)
    Docstring:
      Remove a unit.
      
      Parameters
      ----------
      unit_id : int ot str
          The unit id to remove
  Method: remove_units(self, unit_ids)
    Docstring:
      Remove a list of units.
      
      Parameters
      ----------
      unit_ids : list[str|int]
          List of unit ids to remove
  Method: rename(self, renamed_unit_ids)
    Docstring:
      Rename a list of units.
      
      Parameters
      ----------
      renamed_unit_ids : list[str|int]
          List of unit ids to rename exisiting units
  Method: select_units(self, unit_ids, renamed_unit_ids=None)
    Docstring:
      Select a list of units.
      
      Parameters
      ----------
      unit_ids : list[str|int]
          List of unit ids to select
      renamed_unit_ids : list or None, default: None
          List of new unit ids to rename the selected units
  Method: split(self, split_unit_id, indices_list, new_unit_ids=None)
    Docstring:
      Split a unit into multiple units.
      
      Parameters
      ----------
      split_unit_id : int or str
          The unit to split
      indices_list : list or np.array
          A list of index arrays selecting the spikes to split in each segment.
          Each array can contain more than 2 indices (e.g. for splitting in 3 or more units) and it should
          be the same length as the spike train (for each segment).
          If the sorting has only one segment, indices_list can be a single array
      new_unit_ids : list[str|int] ot None
          List of new unit ids. If None, a new unit id is automatically selected
  Method: undo(self)
    Docstring:
      Undo the last operation.
  Method: undo_available(self)
    Docstring:
      Check if undo is available.
      
      Returns
      -------
      bool
          True if undo is available

Function: find_duplicated_spikes(spike_train, censored_period: 'int', method: "'keep_first' | 'keep_last' | 'keep_first_iterative' | 'keep_last_iterative' | 'random'" = 'random', seed: 'Optional[int]' = None) -> 'np.ndarray'
  Docstring:
    Finds the indices where spikes should be considered duplicates.
    When two spikes are closer together than the censored period,
    one of them is taken out based on the method provided.
    
    Parameters
    ----------
    spike_train : np.ndarray
        The spike train on which to look for duplicated spikes.
    censored_period : int
        The censored period for duplicates (in sample time).
    method : "keep_first" |"keep_last" | "keep_first_iterative" | "keep_last_iterative" |random", default: "random"
        Method used to remove the duplicated spikes.
    seed : int | None
        The seed to use if method="random".
    
    Returns
    -------
    indices_of_duplicates : np.ndarray
        The indices of spikes considered to be duplicates.

Function: find_redundant_units(sorting, delta_time: 'float' = 0.4, agreement_threshold=0.2, duplicate_threshold=0.8)
  Docstring:
    Finds redundant or duplicate units by comparing the sorting output with itself.
    
    Parameters
    ----------
    sorting : BaseSorting
        The input sorting object
    delta_time : float, default: 0.4
        The time in ms to consider matching spikes
    agreement_threshold : float, default: 0.2
        Threshold on the agreement scores to flag possible redundant/duplicate units
    duplicate_threshold : float, default: 0.8
        Final threshold on the portion of coincident events over the number of spikes above which the
        unit is flagged as duplicate/redundant
    
    Returns
    -------
    list
        The list of duplicate units
    list of 2-element lists
        The list of duplicate pairs

Function: get_default_classifier_search_spaces()
  Docstring:
    None

Function: get_potential_auto_merge(sorting_analyzer: 'SortingAnalyzer', preset: 'str | None' = 'similarity_correlograms', resolve_graph: 'bool' = False, min_spikes: 'int' = 100, min_snr: 'float' = 2, max_distance_um: 'float' = 150.0, corr_diff_thresh: 'float' = 0.16, template_diff_thresh: 'float' = 0.25, contamination_thresh: 'float' = 0.2, presence_distance_thresh: 'float' = 100, p_value: 'float' = 0.2, cc_thresh: 'float' = 0.1, censored_period_ms: 'float' = 0.3, refractory_period_ms: 'float' = 1.0, sigma_smooth_ms: 'float' = 0.6, adaptative_window_thresh: 'float' = 0.5, censor_correlograms_ms: 'float' = 0.15, firing_contamination_balance: 'float' = 1.5, k_nn: 'int' = 10, knn_kwargs: 'dict | None' = None, presence_distance_kwargs: 'dict | None' = None, extra_outputs: 'bool' = False, steps: 'list[str] | None' = None) -> 'list[tuple[int | str, int | str]] | Tuple[tuple[int | str, int | str], dict]'
  Docstring:
    This function is deprecated. Use compute_merge_unit_groups() instead.
    This will be removed in 0.103.0
    
    Algorithm to find and check potential merges between units.
    
    The merges are proposed based on a series of steps with different criteria:
    
        * "num_spikes": enough spikes are found in each unit for computing the correlogram (`min_spikes`)
        * "snr": the SNR of the units is above a threshold (`min_snr`)
        * "remove_contaminated": each unit is not contaminated (by checking auto-correlogram - `contamination_thresh`)
        * "unit_locations": estimated unit locations are close enough (`max_distance_um`)
        * "correlogram": the cross-correlograms of the two units are similar to each auto-corrleogram (`corr_diff_thresh`)
        * "template_similarity": the templates of the two units are similar (`template_diff_thresh`)
        * "presence_distance": the presence of the units is complementary in time (`presence_distance_thresh`)
        * "cross_contamination": the cross-contamination is not significant (`cc_thresh` and `p_value`)
        * "knn": the two units are close in the feature space
        * "quality_score": the unit "quality score" is increased after the merge
    
    The "quality score" factors in the increase in firing rate (**f**) due to the merge and a possible increase in
    contamination (**C**), wheighted by a factor **k** (`firing_contamination_balance`).
    
    .. math::
    
        Q = f(1 - (k + 1)C)
    
    IMPORTANT: internally, all computations are relying on extensions of the analyzer, that are computed
    with default parameters if not present (i.e. correlograms, template_similarity, ...) If you want to
    have a finer control on these values, please precompute the extensions before applying the auto_merge
    
    Parameters
    ----------
    sorting_analyzer : SortingAnalyzer
        The SortingAnalyzer
    preset : "similarity_correlograms" | "x_contaminations" | "temporal_splits" | "feature_neighbors" | None, default: "similarity_correlograms"
        The preset to use for the auto-merge. Presets combine different steps into a recipe and focus on:
    
        * | "similarity_correlograms": mainly focused on template similarity and correlograms.
          | It uses the following steps: "num_spikes", "remove_contaminated", "unit_locations",
          | "template_similarity", "correlogram", "quality_score"
        * | "x_contaminations": similar to "similarity_correlograms", but checks for cross-contamination instead of correlograms.
          | It uses the following steps: "num_spikes", "remove_contaminated", "unit_locations",
          | "template_similarity", "cross_contamination", "quality_score"
        * | "temporal_splits": focused on finding temporal splits using presence distance.
          | It uses the following steps: "num_spikes", "remove_contaminated", "unit_locations",
          | "template_similarity", "presence_distance", "quality_score"
        * | "feature_neighbors": focused on finding unit pairs whose spikes are close in the feature space using kNN.
          | It uses the following steps: "num_spikes", "snr", "remove_contaminated", "unit_locations",
          | "knn", "quality_score"
    
        If `preset` is None, you can specify the steps manually with the `steps` parameter.
    resolve_graph : bool, default: False
        If True, the function resolves the potential unit pairs to be merged into multiple-unit merges.
    min_spikes : int, default: 100
        Minimum number of spikes for each unit to consider a potential merge.
        Enough spikes are needed to estimate the correlogram
    min_snr : float, default 2
        Minimum Signal to Noise ratio for templates to be considered while merging
    max_distance_um : float, default: 150
        Maximum distance between units for considering a merge
    corr_diff_thresh : float, default: 0.16
        The threshold on the "correlogram distance metric" for considering a merge.
        It needs to be between 0 and 1
    template_diff_thresh : float, default: 0.25
        The threshold on the "template distance metric" for considering a merge.
        It needs to be between 0 and 1
    contamination_thresh : float, default: 0.2
        Threshold for not taking in account a unit when it is too contaminated.
    presence_distance_thresh : float, default: 100
        Parameter to control how present two units should be simultaneously.
    p_value : float, default: 0.2
        The p-value threshold for the cross-contamination test.
    cc_thresh : float, default: 0.1
        The threshold on the cross-contamination for considering a merge.
    censored_period_ms : float, default: 0.3
        Used to compute the refractory period violations aka "contamination".
    refractory_period_ms : float, default: 1
        Used to compute the refractory period violations aka "contamination".
    sigma_smooth_ms : float, default: 0.6
        Parameters to smooth the correlogram estimation.
    adaptative_window_thresh : float, default: 0.5
        Parameter to detect the window size in correlogram estimation.
    censor_correlograms_ms : float, default: 0.15
        The period to censor on the auto and cross-correlograms.
    firing_contamination_balance : float, default: 1.5
        Parameter to control the balance between firing rate and contamination in computing unit "quality score".
    k_nn : int, default 5
        The number of neighbors to consider for every spike in the recording.
    knn_kwargs : dict, default None
        The dict of extra params to be passed to knn.
    extra_outputs : bool, default: False
        If True, an additional dictionary (`outs`) with processed data is returned.
    steps : None or list of str, default: None
        Which steps to run, if no preset is used.
        Pontential steps : "num_spikes", "snr", "remove_contaminated", "unit_locations", "correlogram",
        "template_similarity", "presence_distance", "cross_contamination", "knn", "quality_score"
        Please check steps explanations above!
    presence_distance_kwargs : None|dict, default: None
        A dictionary of kwargs to be passed to compute_presence_distance().
    
    Returns
    -------
    potential_merges:
        A list of tuples of 2 elements (if `resolve_graph`if false) or 2+ elements (if `resolve_graph` is true).
        List of pairs that could be merged.
    outs:
        Returned only when extra_outputs=True
        A dictionary that contains data for debugging and plotting.
    
    References
    ----------
    This function is inspired and built upon similar functions from Lussac [Llobet]_,
    done by Aurelien Wyngaard and Victor Llobet.
    https://github.com/BarbourLab/lussac/blob/v1.0.0/postprocessing/merge_units.py

Function: load_model(model_folder=None, repo_id=None, model_name=None, trust_model=False, trusted=None)
  Docstring:
    Loads a model and model_info from a HuggingFaceHub repo or a local folder.
    
    Parameters
    ----------
    model_folder : str or Path, defualt: None
        The path to the folder containing the model
    repo_id : str | Path, default: None
        Hugging face repo id which contains the model e.g. 'username/model'
    model_name: str | Path, default: None
        Filename of model e.g. 'my_model.skops'. If None, uses first model found.
    trust_model : bool, default: False
        Whether to trust the model. If True, the `trusted` parameter that is passed to `skops.load` to load the model will be
        automatically inferred. If False, the `trusted` parameter must be provided to indicate the trusted objects.
    trusted : list of str, default: None
        Passed to skops.load. The object will be loaded only if there are only trusted objects and objects of types listed in trusted in the dumped file.
    
    
    Returns
    -------
    model, model_info
        A model and metadata about the model

Class: merge_units_sorting
  Docstring:
    Class that handles several merges of units from a Sorting object based on a list of lists of unit_ids.
    
    Parameters
    ----------
    sorting : BaseSorting
        The sorting object
    units_to_merge : list/tuple of lists/tuples
        A list of lists for every merge group. Each element needs to have at least two elements (two units to merge),
        but it can also have more (merge multiple units at once).
    new_unit_ids : None or list
        A new unit_ids for merged units. If given, it needs to have the same length as `units_to_merge`
    properties_policy : "keep" | "remove", default: "keep"
        Policy used to propagate properties. If "keep" the properties will be passed to the new units
         (if the units_to_merge have the same value). If "remove" the new units will have an empty
         value for all the properties of the new unit.
    delta_time_ms : float or None
        Number of ms to consider for duplicated spikes. None won't check for duplications
    
    Returns
    -------
    sorting : Sorting
        Sorting object with the selected units merged
  __init__(self, sorting, units_to_merge, new_unit_ids=None, properties_policy='keep', delta_time_ms=0.4)

Class: remove_duplicated_spikes
  Docstring:
    Class to remove duplicated spikes from the spike trains.
    Spikes are considered duplicated if they are less than x
    ms apart where x is the censored period.
    
    Parameters
    ----------
    sorting : BaseSorting
        The parent sorting.
    censored_period_ms : float
        The censored period to consider 2 spikes to be duplicated (in ms).
    method : "keep_first" | "keep_last" | "keep_first_iterative" | "keep_last_iterative" | "random", default: "keep_first"
        Method used to remove the duplicated spikes.
        If method = "random", will randomly choose to remove the first or last spike.
        If method = "keep_first", for each ISI violation, will remove the second spike.
        If method = "keep_last", for each ISI violation, will remove the first spike.
        If method = "keep_first_iterative", will iteratively keep the first spike and remove the following violations.
        If method = "keep_last_iterative", does the same as "keep_first_iterative" but starting from the end.
        In the iterative methods, if there is a triplet A, B, C where (A, B) and (B, C) are in the censored period
        (but not (A, C)), then only B is removed. In the non iterative methods however, only one spike remains.
    
    Returns
    -------
    sorting_without_duplicated_spikes : Remove_DuplicatedSpikesSorting
        The sorting without any duplicated spikes.
  __init__(self, sorting: 'BaseSorting', censored_period_ms: 'float' = 0.3, method: 'str' = 'keep_first') -> 'None'

Function: remove_excess_spikes(sorting: 'BaseSorting', recording: 'BaseRecording')
  Docstring:
    Remove excess spikes from the spike trains.
    Excess spikes are the ones exceeding a recording number of samples, for each segment.
    
    Parameters
    ----------
    sorting : BaseSorting
        The parent sorting.
    recording : BaseRecording
        The recording to use to get the number of samples.
    
    Returns
    -------
    sorting_without_excess_spikes : Sorting
        The sorting without any excess spikes.

Function: remove_redundant_units(sorting_or_sorting_analyzer, align=True, unit_peak_shifts=None, delta_time=0.4, agreement_threshold=0.2, duplicate_threshold=0.8, remove_strategy='minimum_shift', peak_sign='neg', extra_outputs=False) -> 'BaseSorting'
  Docstring:
    Removes redundant or duplicate units by comparing the sorting output with itself.
    
    When a redundant pair is found, there are several strategies to choose which unit is the best:
    
       * "minimum_shift"
       * "highest_amplitude"
       * "max_spikes"
    
    
    Parameters
    ----------
    sorting_or_sorting_analyzer : BaseSorting or SortingAnalyzer
        If SortingAnalyzer, the spike trains can be optionally realigned using the peak shift in the
        template to improve the matching procedure.
        If BaseSorting, the spike trains are not aligned.
    align : bool, default: False
        If True, spike trains are aligned (if a SortingAnalyzer is used)
    delta_time : float, default: 0.4
        The time in ms to consider matching spikes
    agreement_threshold : float, default: 0.2
        Threshold on the agreement scores to flag possible redundant/duplicate units
    duplicate_threshold : float, default: 0.8
        Final threshold on the portion of coincident events over the number of spikes above which the
        unit is removed
    remove_strategy : "minimum_shift" | "highest_amplitude" | "max_spikes", default: "minimum_shift"
        Which strategy to remove one of the two duplicated units:
    
            * "minimum_shift" : keep the unit with best peak alignment (minimum shift)
                             If shifts are equal then the "highest_amplitude" is used
            * "highest_amplitude" : keep the unit with the best amplitude on unshifted max.
            * "max_spikes" : keep the unit with more spikes
    
    peak_sign : "neg" | "pos" | "both", default: "neg"
        Used when remove_strategy="highest_amplitude"
    extra_outputs : bool, default: False
        If True, will return the redundant pairs.
    unit_peak_shifts : dict
        Dictionary mapping the unit_id to the unit's shift (in number of samples).
        A positive shift means the spike train is shifted back in time, while
        a negative shift means the spike train is shifted forward.
    
    Returns
    -------
    BaseSorting
        Sorting object without redundant units

Class: split_unit_sorting
  Docstring:
    Class that handles spliting of a unit. It creates a new Sorting object linked to parent_sorting.
    
    Parameters
    ----------
    sorting : BaseSorting
        The sorting object
    split_unit_id : int
        Unit id of the unit to split
    indices_list : list or np.array
        A list of index arrays selecting the spikes to split in each segment.
        Each array can contain more than 2 indices (e.g. for splitting in 3 or more units) and it should
        be the same length as the spike train (for each segment).
        If the sorting has only one segment, indices_list can be a single array
    new_unit_ids : int
        Unit ids of the new units to be created
    properties_policy : "keep" | "remove", default: "keep"
        Policy used to propagate properties. If "keep" the properties will be passed to the new units
         (if the units_to_merge have the same value). If "remove" the new units will have an empty
         value for all the properties of the new unit
    
    Returns
    -------
    sorting : Sorting
        Sorting object with the selected units split
  __init__(self, sorting, split_unit_id, indices_list, new_unit_ids=None, properties_policy='keep')

Function: train_model(mode='analyzers', labels=None, analyzers=None, metrics_paths=None, folder=None, metric_names=None, imputation_strategies=None, scaling_techniques=None, classifiers=None, test_size=0.2, overwrite=False, seed=None, search_kwargs=None, verbose=True, enforce_metric_params=False, **job_kwargs)
  Docstring:
    Trains and evaluates machine learning models for spike sorting curation.
    
    This function initializes a ``CurationModelTrainer`` object, loads and preprocesses the data,
    and evaluates the specified combinations of imputation strategies, scaling techniques, and classifiers.
    The evaluation results, including the best model and its parameters, are saved to the output folder.
    
    Parameters
    ----------
    mode : ``"analyzers"`` | ``"csv"``, default: ``"analyzers"``
        Mode to use for training.
    analyzers : list of ``SortingAnalyzer`` | None, default: None
        List of ``SortingAnalyzer`` objects containing the quality metrics and labels to use for training,
        if using ``"analyzers"`` mode.
    labels : list of list | None, default: None
        List of curated labels for each unit; must be in the same order as the metrics data.
    metrics_paths : list of str or None, default: None
        List of paths to the CSV files containing the metrics data if using ``"csv"`` mode.
    folder : str | None, default: None
        The folder where outputs such as models and evaluation metrics will be saved.
    metric_names : list of str | None, default: None
        A list of metrics to use for training. If None, default metrics will be used.
    imputation_strategies : list of str | None, default: None
        A list of imputation strategies to try. Can be ``"knn"``, ``"iterative"``, or any allowed
        strategy passable to the ``sklearn.SimpleImputer``. If None, the default strategies
        ``["median", "most_frequent", "knn", "iterative"]`` will be used.
    scaling_techniques : list of str | None, default: None
        A list of scaling techniques to try. Can be ``"standard_scaler"``, ``"min_max_scaler"``,
        or ``"robust_scaler"``. If None, all techniques will be used.
    classifiers : list of str | dict | None, default: None
        A list of classifiers to evaluate. Optionally, a dictionary of classifiers and their
        hyperparameter search spaces can be provided. If None, default classifiers will be used.
        Check the ``get_classifier_search_space`` method for the default search spaces & format for custom spaces.
    test_size : float, default: 0.2
        Proportion of the dataset to include in the test split, passed to ``train_test_split`` from ``sklearn``.
    overwrite : bool, default: False
        Overwrites the ``folder`` if it already exists.
    seed : int | None, default: None
        Random seed for reproducibility. If None, a random seed will be generated.
    search_kwargs : dict or None, default: None
        Keyword arguments passed to ``BayesSearchCV`` or ``RandomizedSearchCV`` from ``sklearn``. If None, use
        ``search_kwargs = {'cv': 3, 'scoring': 'balanced_accuracy', 'n_iter': 25}``.
    verbose : bool, default: True
        If True, useful information is printed during training.
    enforce_metric_params : bool, default: False
        If True and metric parameters used to calculate metrics for different ``sorting_analyzer`` objects are
        different, an error will be raised.
    
    Returns
    -------
    CurationModelTrainer
        The ``CurationModelTrainer`` object used for training and evaluation.
    
    Notes
    -----
    This function handles the entire workflow of initializing the trainer, loading and preprocessing the data,
    and evaluating the models. The evaluation results are saved to the specified output folder.

Function: validate_curation_dict(curation_dict)
  Docstring:
    Validate that the curation dictionary given as parameter complies with the format
    
    The function do not return anything. This raise an error if something is wring in the format.
    
    Parameters
    ----------
    curation_dict : dict
