dandelion.preprocessing.create_germlines

dandelion.preprocessing.create_germlines(self, germline=None, org='human', seq_field='sequence_alignment', v_field='v_call', d_field='d_call', j_field='j_call', germ_types='dmask', fileformat='airr', initialize_metadata=False)[source]

Runs CreateGermlines.py to reconstruct the germline V(D)J sequence, from which the Ig lineage and mutations can be inferred.

Parameters
  • self (Dandelion, pd.DataFrame, str) – Dandelion object, pandas DataFrame in changeo/airr format, or file path to changeo/airr file after clones have been determined.

  • germline (str, optional) – path to germline database folder. Defaults to $GERMLINE environmental variable.

  • org (str) – organism of germline database. Default is ‘human’.

  • seq_field (str) – name of column containing the aligned sequence. Default is ‘sequence_alignment’ (airr).

  • v_field (str) – name of column containing the germline V segment call. Default is ‘v_call’ (airr).

  • d_field (str) – name of column containing the germline d segment call. Default is ‘d_call’ (airr).

  • j_field (str) – name of column containing the germline j segment call. Default is ‘j_call’ (airr).

  • germ_types (str) – Specify type(s) of germlines to include full germline, germline with D segment masked, or germline for V segment only. Default is ‘dmask’.

  • fileformat (str) – format of V(D)J file/objects. Default is ‘airr’. Also accepts ‘changeo’.

Returns

Return type

V(D)J data file with reconstructed germline sequences.