dandelion.preprocessing.reassign_alleles

dandelion.preprocessing.reassign_alleles(data, combined_folder, v_germline=None, germline=None, org='human', v_field='v_call_genotyped', germ_types='dmask', novel=True, cloned=False, plot=True, figsize=(4, 3), sample_id_dictionary=None, verbose=False)[source]

Correct allele calls based on a personalized genotype using tigger-reassignAlleles. It uses a subject-specific genotype to correct correct preliminary allele assignments of a set of sequences derived from a single subject.

Parameters
  • data (Sequence) – list of data folders containing the .tsv files. if provided as a single string, it will first be converted to a list; this allows for the function to be run on single/multiple samples.

  • combined_folder (str, PathLike) – name of folder for concatenated data file and genotyped files.

  • v_germline (str, optional) – path to heavy chain v germline fasta. Defaults to IGHV fasta in $GERMLINE environmental variable.

  • germline (str, optional) – path to germline database folder. Defaults to $GERMLINE environmental variable.

  • org (str) – organism of germline database. Default is ‘human’.

  • v_field (str) – name of column containing the germline V segment call. Default is ‘v_call_genotyped’ (airr) for after tigger.

  • germ_types (str) – Specify type of germline for reconstruction. Accepts one of : ‘full’, ‘dmask’, ‘vonly’, ‘region’. Default is ‘dmask’.

  • novel (bool) – whether or not to run novel allele discovery during tigger-genotyping. Default is True (yes).

  • cloned (bool) – whether or not to run CreateGermlines.py with –cloned.

  • plot (bool) – whether or not to plot reassignment summary metrics. Default is True.

  • figsize (Tuple[Union[int,float], Union[int,float]]) – size of figure. Default is (4, 3).

  • sample_id_dictionary (dict, optional) – dictionary for creating a sample_id column in the concatenated file.

  • verbose (bool) – Whether or not to print the command used in the terminal. Default is False.

Returns

Return type

Individual V(D)J data files with v_call_genotyped column containing reassigned heavy chain v calls