dandelion.tools.define_clones¶
-
dandelion.tools.
define_clones
(self, dist=None, action='set', model='ham', norm='len', doublets='drop', fileformat='airr', ncpu=None, dirs=None, outFilePrefix=None, key_added=None, verbose=False)[source]¶ Find clones using changeo’s DefineClones.py.
- Parameters
self (Dandelion, DataFrame, str) – Dandelion object, pandas DataFrame in changeo/airr format, or file path to changeo/airr file after clones have been determined.
dist (float, optional) – The distance threshold for clonal grouping. If None, the value will be retrieved from the Dandelion class .threshold slot.
action (str) – Specifies how to handle multiple V(D)J assignments for initial grouping. Default is ‘set’. The “first” action will use only the first gene listed. The “set” action will use all gene assignments and construct a larger gene grouping composed of any sequences sharing an assignment or linked to another sequence by a common assignment (similar to single-linkage).
model (str) – Specifies which substitution model to use for calculating distance between sequences. Default is ‘ham’. The “ham” model is nucleotide Hamming distance and “aa” is amino acid Hamming distance. The “hh_s1f” and “hh_s5f” models are human specific single nucleotide and 5-mer content models, respectively, from Yaari et al, 2013. The “mk_rs1nf” and “mk_rs5nf” models are mouse specific single nucleotide and 5-mer content models, respectively, from Cui et al, 2016. The “m1n_compat” and “hs1f_compat” models are deprecated models provided backwards compatibility with the “m1n” and “hs1f” models in Change-O v0.3.3 and SHazaM v0.1.4. Both 5-mer models should be considered experimental.
norm (str) – Specifies how to normalize distances. Default is ‘len’. ‘none’ (do not normalize), ‘len’ (normalize by length), or ‘mut’ (normalize by number of mutations between sequences).
doublets (str) – Option to control behaviour when dealing with heavy chain ‘doublets’. Default is ‘drop’. ‘drop’ will filter out the doublets while ‘count’ will retain only the highest umi count contig.
fileformat (str) – format of V(D)J file/objects. Default is ‘airr’. Also accepts ‘changeo’.
ncpu (int, optional) – number of cpus for parallelization. Default is all available cpus.
dirs (str, optional) – If specified, out file will be in this location.
outFilePrefix (str, optional) – If specified, the out file name will have this prefix. None defaults to ‘dandelion_define_clones’
verbose (bool) – Whether or not to print the command used in terminal to call DefineClones.py. Default is False.
- Returns
- Return type
Dandelion object with clone_id annotated in .data slot and .metadata initialized.