Interoperability with scirpy

dandelion_logo

It is now possible to convert the file formats between dandelion>=0.1.0 and scirpy>=0.6.2.dev104 to enhance the collaboration between the analysis toolkits.

We will download the airr_rearrangement.tsv file from here:

# bash
wget https://cf.10xgenomics.com/samples/cell-vdj/4.0.0/sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_airr_rearrangement.tsv

Import dandelion module

[1]:
import os
import dandelion as ddl
# change directory to somewhere more workable
os.chdir(os.path.expanduser('/Users/kt16/Documents/scripts/data/dandelion_tutorial/'))
ddl.logging.print_versions()
dandelion==0.1.0 pandas==1.2.3 numpy==1.20.1 matplotlib==3.3.4 networkx==2.5 scipy==1.6.1 skbio==0.5.6
[2]:
import scirpy as ir
ir.__version__
[2]:
'0.6.2.dev105'

dandelion

[3]:
# read in the airr_rearrangement.tsv file
file_location = 'sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_airr_rearrangement.tsv'
vdj = ddl.read_10x_airr(file_location)
vdj
[3]:
Dandelion class object with n_obs = 978 and n_contigs = 2093
    data: 'cell_id', 'clone_id', 'sequence_id', 'sequence', 'sequence_aa', 'productive', 'rev_comp', 'v_call', 'v_cigar', 'd_call', 'd_cigar', 'j_call', 'j_cigar', 'c_call', 'c_cigar', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'junction_length', 'junction_aa_length', 'v_sequence_start', 'v_sequence_end', 'd_sequence_start', 'd_sequence_end', 'j_sequence_start', 'j_sequence_end', 'c_sequence_start', 'c_sequence_end', 'consensus_count', 'duplicate_count', 'is_cell', 'locus'
    metadata: 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'duplicate_count_heavy_0', 'duplicate_count_heavy_1', 'duplicate_count_light_0', 'duplicate_count_light_1', 'junction_aa_heavy', 'junction_aa_light', 'status', 'status_summary', 'productive', 'productive_summary', 'isotype', 'isotype_summary', 'vdj_status', 'vdj_status_summary', 'heavychain_status_summary'
    distance: None
    edges: None
    layout: None
    graph: None

The test file contains a blank clone_id column so we run find_clones to populate it first.

[4]:
ddl.tl.find_clones(vdj)
Finding clones based on heavy chains : 100%|██████████| 157/157 [00:00<00:00, 960.80it/s]
Refining clone assignment based on light chain pairing : 100%|██████████| 978/978 [00:00<00:00, 295983.07it/s]

dandelion : Converting dandelion to scirpy

[5]:
irdata = ddl.to_scirpy(vdj)
irdata
/Users/kt16/miniconda3/envs/dandelion/lib/python3.7/site-packages/anndata/_core/anndata.py:1192: FutureWarning: is_categorical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead
... storing 'extra_chains' as categorical
... storing 'IR_VJ_1_c_cigar' as categorical
... storing 'IR_VJ_2_c_cigar' as categorical
... storing 'IR_VDJ_1_c_cigar' as categorical
... storing 'IR_VDJ_2_c_cigar' as categorical
... storing 'IR_VJ_1_clone_id' as categorical
... storing 'IR_VJ_2_clone_id' as categorical
... storing 'IR_VDJ_1_clone_id' as categorical
... storing 'IR_VDJ_2_clone_id' as categorical
... storing 'IR_VJ_1_d_cigar' as categorical
... storing 'IR_VJ_2_d_cigar' as categorical
... storing 'IR_VDJ_1_d_cigar' as categorical
... storing 'IR_VDJ_2_d_cigar' as categorical
... storing 'IR_VJ_1_d_sequence_end' as categorical
... storing 'IR_VJ_2_d_sequence_end' as categorical
... storing 'IR_VJ_1_d_sequence_start' as categorical
... storing 'IR_VJ_2_d_sequence_start' as categorical
... storing 'IR_VJ_1_germline_alignment' as categorical
... storing 'IR_VJ_2_germline_alignment' as categorical
... storing 'IR_VDJ_1_germline_alignment' as categorical
... storing 'IR_VDJ_2_germline_alignment' as categorical
... storing 'IR_VJ_1_is_cell' as categorical
... storing 'IR_VJ_2_is_cell' as categorical
... storing 'IR_VDJ_1_is_cell' as categorical
... storing 'IR_VDJ_2_is_cell' as categorical
... storing 'IR_VJ_1_j_cigar' as categorical
... storing 'IR_VJ_2_j_cigar' as categorical
... storing 'IR_VDJ_1_j_cigar' as categorical
... storing 'IR_VDJ_2_j_cigar' as categorical
... storing 'IR_VJ_1_junction' as categorical
... storing 'IR_VJ_2_junction' as categorical
... storing 'IR_VDJ_1_junction' as categorical
... storing 'IR_VDJ_2_junction' as categorical
... storing 'IR_VJ_1_junction_aa' as categorical
... storing 'IR_VJ_2_junction_aa' as categorical
... storing 'IR_VDJ_1_junction_aa' as categorical
... storing 'IR_VDJ_2_junction_aa' as categorical
... storing 'IR_VJ_1_productive' as categorical
... storing 'IR_VJ_2_productive' as categorical
... storing 'IR_VDJ_1_productive' as categorical
... storing 'IR_VDJ_2_productive' as categorical
... storing 'IR_VJ_1_rev_comp' as categorical
... storing 'IR_VJ_2_rev_comp' as categorical
... storing 'IR_VDJ_1_rev_comp' as categorical
... storing 'IR_VDJ_2_rev_comp' as categorical
... storing 'IR_VJ_1_sequence' as categorical
... storing 'IR_VJ_2_sequence' as categorical
... storing 'IR_VDJ_1_sequence' as categorical
... storing 'IR_VDJ_2_sequence' as categorical
... storing 'IR_VJ_1_sequence_aa' as categorical
... storing 'IR_VJ_2_sequence_aa' as categorical
... storing 'IR_VDJ_1_sequence_aa' as categorical
... storing 'IR_VDJ_2_sequence_aa' as categorical
... storing 'IR_VJ_1_sequence_alignment' as categorical
... storing 'IR_VJ_2_sequence_alignment' as categorical
... storing 'IR_VDJ_1_sequence_alignment' as categorical
... storing 'IR_VDJ_2_sequence_alignment' as categorical
... storing 'IR_VJ_1_sequence_id' as categorical
... storing 'IR_VJ_2_sequence_id' as categorical
... storing 'IR_VDJ_1_sequence_id' as categorical
... storing 'IR_VDJ_2_sequence_id' as categorical
... storing 'IR_VJ_1_v_cigar' as categorical
... storing 'IR_VJ_2_v_cigar' as categorical
... storing 'IR_VDJ_1_v_cigar' as categorical
... storing 'IR_VDJ_2_v_cigar' as categorical
[5]:
AnnData object with n_obs × n_vars = 994 × 0
    obs: 'multi_chain', 'extra_chains', 'IR_VJ_1_c_call', 'IR_VJ_2_c_call', 'IR_VDJ_1_c_call', 'IR_VDJ_2_c_call', 'IR_VJ_1_c_cigar', 'IR_VJ_2_c_cigar', 'IR_VDJ_1_c_cigar', 'IR_VDJ_2_c_cigar', 'IR_VJ_1_c_sequence_end', 'IR_VJ_2_c_sequence_end', 'IR_VDJ_1_c_sequence_end', 'IR_VDJ_2_c_sequence_end', 'IR_VJ_1_c_sequence_start', 'IR_VJ_2_c_sequence_start', 'IR_VDJ_1_c_sequence_start', 'IR_VDJ_2_c_sequence_start', 'IR_VJ_1_clone_id', 'IR_VJ_2_clone_id', 'IR_VDJ_1_clone_id', 'IR_VDJ_2_clone_id', 'IR_VJ_1_consensus_count', 'IR_VJ_2_consensus_count', 'IR_VDJ_1_consensus_count', 'IR_VDJ_2_consensus_count', 'IR_VJ_1_d_call', 'IR_VJ_2_d_call', 'IR_VDJ_1_d_call', 'IR_VDJ_2_d_call', 'IR_VJ_1_d_cigar', 'IR_VJ_2_d_cigar', 'IR_VDJ_1_d_cigar', 'IR_VDJ_2_d_cigar', 'IR_VJ_1_d_sequence_end', 'IR_VJ_2_d_sequence_end', 'IR_VDJ_1_d_sequence_end', 'IR_VDJ_2_d_sequence_end', 'IR_VJ_1_d_sequence_start', 'IR_VJ_2_d_sequence_start', 'IR_VDJ_1_d_sequence_start', 'IR_VDJ_2_d_sequence_start', 'IR_VJ_1_duplicate_count', 'IR_VJ_2_duplicate_count', 'IR_VDJ_1_duplicate_count', 'IR_VDJ_2_duplicate_count', 'IR_VJ_1_germline_alignment', 'IR_VJ_2_germline_alignment', 'IR_VDJ_1_germline_alignment', 'IR_VDJ_2_germline_alignment', 'IR_VJ_1_is_cell', 'IR_VJ_2_is_cell', 'IR_VDJ_1_is_cell', 'IR_VDJ_2_is_cell', 'IR_VJ_1_j_call', 'IR_VJ_2_j_call', 'IR_VDJ_1_j_call', 'IR_VDJ_2_j_call', 'IR_VJ_1_j_cigar', 'IR_VJ_2_j_cigar', 'IR_VDJ_1_j_cigar', 'IR_VDJ_2_j_cigar', 'IR_VJ_1_j_sequence_end', 'IR_VJ_2_j_sequence_end', 'IR_VDJ_1_j_sequence_end', 'IR_VDJ_2_j_sequence_end', 'IR_VJ_1_j_sequence_start', 'IR_VJ_2_j_sequence_start', 'IR_VDJ_1_j_sequence_start', 'IR_VDJ_2_j_sequence_start', 'IR_VJ_1_junction', 'IR_VJ_2_junction', 'IR_VDJ_1_junction', 'IR_VDJ_2_junction', 'IR_VJ_1_junction_aa', 'IR_VJ_2_junction_aa', 'IR_VDJ_1_junction_aa', 'IR_VDJ_2_junction_aa', 'IR_VJ_1_junction_aa_length', 'IR_VJ_2_junction_aa_length', 'IR_VDJ_1_junction_aa_length', 'IR_VDJ_2_junction_aa_length', 'IR_VJ_1_junction_length', 'IR_VJ_2_junction_length', 'IR_VDJ_1_junction_length', 'IR_VDJ_2_junction_length', 'IR_VJ_1_locus', 'IR_VJ_2_locus', 'IR_VDJ_1_locus', 'IR_VDJ_2_locus', 'IR_VJ_1_productive', 'IR_VJ_2_productive', 'IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_rev_comp', 'IR_VJ_2_rev_comp', 'IR_VDJ_1_rev_comp', 'IR_VDJ_2_rev_comp', 'IR_VJ_1_sequence', 'IR_VJ_2_sequence', 'IR_VDJ_1_sequence', 'IR_VDJ_2_sequence', 'IR_VJ_1_sequence_aa', 'IR_VJ_2_sequence_aa', 'IR_VDJ_1_sequence_aa', 'IR_VDJ_2_sequence_aa', 'IR_VJ_1_sequence_alignment', 'IR_VJ_2_sequence_alignment', 'IR_VDJ_1_sequence_alignment', 'IR_VDJ_2_sequence_alignment', 'IR_VJ_1_sequence_id', 'IR_VJ_2_sequence_id', 'IR_VDJ_1_sequence_id', 'IR_VDJ_2_sequence_id', 'IR_VJ_1_v_call', 'IR_VJ_2_v_call', 'IR_VDJ_1_v_call', 'IR_VDJ_2_v_call', 'IR_VJ_1_v_cigar', 'IR_VJ_2_v_cigar', 'IR_VDJ_1_v_cigar', 'IR_VDJ_2_v_cigar', 'IR_VJ_1_v_sequence_end', 'IR_VJ_2_v_sequence_end', 'IR_VDJ_1_v_sequence_end', 'IR_VDJ_2_v_sequence_end', 'IR_VJ_1_v_sequence_start', 'IR_VJ_2_v_sequence_start', 'IR_VDJ_1_v_sequence_start', 'IR_VDJ_2_v_sequence_start', 'has_ir'

The clone_id is mapped to IR_VJ_1_clone_id column.

transfer = True will perform dandelion’s tl.transfer.

[6]:
irdatax = ddl.to_scirpy(vdj, transfer = True)
irdatax
... storing 'extra_chains' as categorical
... storing 'IR_VJ_1_c_cigar' as categorical
... storing 'IR_VJ_2_c_cigar' as categorical
... storing 'IR_VDJ_1_c_cigar' as categorical
... storing 'IR_VDJ_2_c_cigar' as categorical
... storing 'IR_VJ_1_clone_id' as categorical
... storing 'IR_VJ_2_clone_id' as categorical
... storing 'IR_VDJ_1_clone_id' as categorical
... storing 'IR_VDJ_2_clone_id' as categorical
... storing 'IR_VJ_1_d_cigar' as categorical
... storing 'IR_VJ_2_d_cigar' as categorical
... storing 'IR_VDJ_1_d_cigar' as categorical
... storing 'IR_VDJ_2_d_cigar' as categorical
... storing 'IR_VJ_1_d_sequence_end' as categorical
... storing 'IR_VJ_2_d_sequence_end' as categorical
... storing 'IR_VJ_1_d_sequence_start' as categorical
... storing 'IR_VJ_2_d_sequence_start' as categorical
... storing 'IR_VJ_1_germline_alignment' as categorical
... storing 'IR_VJ_2_germline_alignment' as categorical
... storing 'IR_VDJ_1_germline_alignment' as categorical
... storing 'IR_VDJ_2_germline_alignment' as categorical
... storing 'IR_VJ_1_is_cell' as categorical
... storing 'IR_VJ_2_is_cell' as categorical
... storing 'IR_VDJ_1_is_cell' as categorical
... storing 'IR_VDJ_2_is_cell' as categorical
... storing 'IR_VJ_1_j_cigar' as categorical
... storing 'IR_VJ_2_j_cigar' as categorical
... storing 'IR_VDJ_1_j_cigar' as categorical
... storing 'IR_VDJ_2_j_cigar' as categorical
... storing 'IR_VJ_1_junction' as categorical
... storing 'IR_VJ_2_junction' as categorical
... storing 'IR_VDJ_1_junction' as categorical
... storing 'IR_VDJ_2_junction' as categorical
... storing 'IR_VJ_1_junction_aa' as categorical
... storing 'IR_VJ_2_junction_aa' as categorical
... storing 'IR_VDJ_1_junction_aa' as categorical
... storing 'IR_VDJ_2_junction_aa' as categorical
... storing 'IR_VJ_1_productive' as categorical
... storing 'IR_VJ_2_productive' as categorical
... storing 'IR_VDJ_1_productive' as categorical
... storing 'IR_VDJ_2_productive' as categorical
... storing 'IR_VJ_1_rev_comp' as categorical
... storing 'IR_VJ_2_rev_comp' as categorical
... storing 'IR_VDJ_1_rev_comp' as categorical
... storing 'IR_VDJ_2_rev_comp' as categorical
... storing 'IR_VJ_1_sequence' as categorical
... storing 'IR_VJ_2_sequence' as categorical
... storing 'IR_VDJ_1_sequence' as categorical
... storing 'IR_VDJ_2_sequence' as categorical
... storing 'IR_VJ_1_sequence_aa' as categorical
... storing 'IR_VJ_2_sequence_aa' as categorical
... storing 'IR_VDJ_1_sequence_aa' as categorical
... storing 'IR_VDJ_2_sequence_aa' as categorical
... storing 'IR_VJ_1_sequence_alignment' as categorical
... storing 'IR_VJ_2_sequence_alignment' as categorical
... storing 'IR_VDJ_1_sequence_alignment' as categorical
... storing 'IR_VDJ_2_sequence_alignment' as categorical
... storing 'IR_VJ_1_sequence_id' as categorical
... storing 'IR_VJ_2_sequence_id' as categorical
... storing 'IR_VDJ_1_sequence_id' as categorical
... storing 'IR_VDJ_2_sequence_id' as categorical
... storing 'IR_VJ_1_v_cigar' as categorical
... storing 'IR_VJ_2_v_cigar' as categorical
... storing 'IR_VDJ_1_v_cigar' as categorical
... storing 'IR_VDJ_2_v_cigar' as categorical
[6]:
AnnData object with n_obs × n_vars = 994 × 0
    obs: 'multi_chain', 'extra_chains', 'IR_VJ_1_c_call', 'IR_VJ_2_c_call', 'IR_VDJ_1_c_call', 'IR_VDJ_2_c_call', 'IR_VJ_1_c_cigar', 'IR_VJ_2_c_cigar', 'IR_VDJ_1_c_cigar', 'IR_VDJ_2_c_cigar', 'IR_VJ_1_c_sequence_end', 'IR_VJ_2_c_sequence_end', 'IR_VDJ_1_c_sequence_end', 'IR_VDJ_2_c_sequence_end', 'IR_VJ_1_c_sequence_start', 'IR_VJ_2_c_sequence_start', 'IR_VDJ_1_c_sequence_start', 'IR_VDJ_2_c_sequence_start', 'IR_VJ_1_clone_id', 'IR_VJ_2_clone_id', 'IR_VDJ_1_clone_id', 'IR_VDJ_2_clone_id', 'IR_VJ_1_consensus_count', 'IR_VJ_2_consensus_count', 'IR_VDJ_1_consensus_count', 'IR_VDJ_2_consensus_count', 'IR_VJ_1_d_call', 'IR_VJ_2_d_call', 'IR_VDJ_1_d_call', 'IR_VDJ_2_d_call', 'IR_VJ_1_d_cigar', 'IR_VJ_2_d_cigar', 'IR_VDJ_1_d_cigar', 'IR_VDJ_2_d_cigar', 'IR_VJ_1_d_sequence_end', 'IR_VJ_2_d_sequence_end', 'IR_VDJ_1_d_sequence_end', 'IR_VDJ_2_d_sequence_end', 'IR_VJ_1_d_sequence_start', 'IR_VJ_2_d_sequence_start', 'IR_VDJ_1_d_sequence_start', 'IR_VDJ_2_d_sequence_start', 'IR_VJ_1_duplicate_count', 'IR_VJ_2_duplicate_count', 'IR_VDJ_1_duplicate_count', 'IR_VDJ_2_duplicate_count', 'IR_VJ_1_germline_alignment', 'IR_VJ_2_germline_alignment', 'IR_VDJ_1_germline_alignment', 'IR_VDJ_2_germline_alignment', 'IR_VJ_1_is_cell', 'IR_VJ_2_is_cell', 'IR_VDJ_1_is_cell', 'IR_VDJ_2_is_cell', 'IR_VJ_1_j_call', 'IR_VJ_2_j_call', 'IR_VDJ_1_j_call', 'IR_VDJ_2_j_call', 'IR_VJ_1_j_cigar', 'IR_VJ_2_j_cigar', 'IR_VDJ_1_j_cigar', 'IR_VDJ_2_j_cigar', 'IR_VJ_1_j_sequence_end', 'IR_VJ_2_j_sequence_end', 'IR_VDJ_1_j_sequence_end', 'IR_VDJ_2_j_sequence_end', 'IR_VJ_1_j_sequence_start', 'IR_VJ_2_j_sequence_start', 'IR_VDJ_1_j_sequence_start', 'IR_VDJ_2_j_sequence_start', 'IR_VJ_1_junction', 'IR_VJ_2_junction', 'IR_VDJ_1_junction', 'IR_VDJ_2_junction', 'IR_VJ_1_junction_aa', 'IR_VJ_2_junction_aa', 'IR_VDJ_1_junction_aa', 'IR_VDJ_2_junction_aa', 'IR_VJ_1_junction_aa_length', 'IR_VJ_2_junction_aa_length', 'IR_VDJ_1_junction_aa_length', 'IR_VDJ_2_junction_aa_length', 'IR_VJ_1_junction_length', 'IR_VJ_2_junction_length', 'IR_VDJ_1_junction_length', 'IR_VDJ_2_junction_length', 'IR_VJ_1_locus', 'IR_VJ_2_locus', 'IR_VDJ_1_locus', 'IR_VDJ_2_locus', 'IR_VJ_1_productive', 'IR_VJ_2_productive', 'IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_rev_comp', 'IR_VJ_2_rev_comp', 'IR_VDJ_1_rev_comp', 'IR_VDJ_2_rev_comp', 'IR_VJ_1_sequence', 'IR_VJ_2_sequence', 'IR_VDJ_1_sequence', 'IR_VDJ_2_sequence', 'IR_VJ_1_sequence_aa', 'IR_VJ_2_sequence_aa', 'IR_VDJ_1_sequence_aa', 'IR_VDJ_2_sequence_aa', 'IR_VJ_1_sequence_alignment', 'IR_VJ_2_sequence_alignment', 'IR_VDJ_1_sequence_alignment', 'IR_VDJ_2_sequence_alignment', 'IR_VJ_1_sequence_id', 'IR_VJ_2_sequence_id', 'IR_VDJ_1_sequence_id', 'IR_VDJ_2_sequence_id', 'IR_VJ_1_v_call', 'IR_VJ_2_v_call', 'IR_VDJ_1_v_call', 'IR_VDJ_2_v_call', 'IR_VJ_1_v_cigar', 'IR_VJ_2_v_cigar', 'IR_VDJ_1_v_cigar', 'IR_VDJ_2_v_cigar', 'IR_VJ_1_v_sequence_end', 'IR_VJ_2_v_sequence_end', 'IR_VDJ_1_v_sequence_end', 'IR_VDJ_2_v_sequence_end', 'IR_VJ_1_v_sequence_start', 'IR_VJ_2_v_sequence_start', 'IR_VDJ_1_v_sequence_start', 'IR_VDJ_2_v_sequence_start', 'has_ir', 'clone_id', 'clone_id_by_size', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'duplicate_count_heavy_0', 'duplicate_count_heavy_1', 'duplicate_count_light_0', 'duplicate_count_light_1', 'junction_aa_heavy', 'junction_aa_light', 'status', 'status_summary', 'productive', 'productive_summary', 'isotype', 'isotype_summary', 'vdj_status', 'vdj_status_summary', 'heavychain_status_summary'

dandelion : Converting scirpy to dandelion

[7]:
vdjx = ddl.from_scirpy(irdata)
vdjx
[7]:
Dandelion class object with n_obs = 978 and n_contigs = 2093
    data: 'c_call', 'c_cigar', 'c_sequence_end', 'c_sequence_start', 'clone_id', 'consensus_count', 'd_call', 'd_cigar', 'd_sequence_end', 'd_sequence_start', 'germline_alignment', 'is_cell', 'j_call', 'j_cigar', 'j_sequence_end', 'j_sequence_start', 'junction', 'junction_aa', 'junction_aa_length', 'junction_length', 'locus', 'productive', 'rev_comp', 'sequence', 'sequence_aa', 'sequence_alignment', 'sequence_id', 'v_call', 'v_cigar', 'v_sequence_end', 'v_sequence_start', 'cell_id', 'umi_count'
    metadata: 'clone_id', 'clone_id_by_size', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_heavy_1', 'umi_count_light_0', 'umi_count_light_1', 'junction_aa_heavy', 'junction_aa_light', 'status', 'status_summary', 'productive', 'productive_summary', 'isotype', 'isotype_summary', 'vdj_status', 'vdj_status_summary', 'heavychain_status_summary'
    distance: None
    edges: None
    layout: None
    graph: None
[8]:
vdjx.metadata
[8]:
clone_id clone_id_by_size locus_heavy locus_light productive_heavy productive_light v_call_heavy v_call_light j_call_heavy j_call_light ... junction_aa_light status status_summary productive productive_summary isotype isotype_summary vdj_status vdj_status_summary heavychain_status_summary
AAACCTGTCCGTTGTC-1 148_3_1_266 1066 IGH IGK T T IGHV1-69D IGKV1-8 IGHJ3 IGKJ1 ... CQQYYSYPRTF IGH + IGK IGH + IGK T + T T + T IgM IgM Single + Single Single Single
AAACCTGTCGAGAACG-1 92_4_1_47 1065 IGH IGL T T IGHV1-2 IGLV5-45 IGHJ3 IGLJ3 ... CMIWHSSAWVV IGH + IGL IGH + IGL T + T T + T IgM IgM Single + Single Single Single
AAACCTGTCTTGAGAC-1 149_1_2_419 166 IGH IGK T T IGHV5-51 IGKV1D-8 IGHJ3 IGKJ2 ... CQQYYSFPYTF IGH + IGK IGH + IGK T + T T + T IgM IgM Single + Single Single Single
AAACGGGAGCGACGTA-1 82_1_2_2 600 IGH IGL T T IGHV4-59 IGLV3-19 IGHJ3 IGLJ2 ... CNSRDSSGNHVVF IGH + IGL IGH + IGL T + T T + T IgM IgM Single + Single Single Single
AAACGGGCACTGTTAG-1 70_1_1_92 1075 IGH IGL T T IGHV4-39 IGLV3-21 IGHJ3 IGLJ2 ... CQVWDSSSDHVVF IGH + IGL IGH + IGL T + T T + T IgM IgM Single + Single Single Single
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
TTTGCGCTCTAACGGT-1 138_3_1_217 156 IGH IGL T T IGHV3-43 IGLV2-8 IGHJ6 IGLJ3 ... CGSFAGSNIWVF IGH + IGL IGH + IGL T + T T + T IgM IgM Single + Single Single Single
TTTGGTTGTAGCCTAT-1 85_1_1_240 506 IGH IGK T T IGHV4-39 IGKV6-21 IGHJ2 IGKJ4 ... CHQSSSLPLTF IGH + IGK IGH + IGK T + T T + T IgM IgM Single + Single Single Single
TTTGGTTTCAGAGCTT-1 117_5_2_232 563 IGH IGK T T IGHV7-4-1 IGKV3-11 IGHJ4 IGKJ5 ... CQQRSNWLTF IGH + IGK IGH + IGK T + T T + T IgM IgM Single + Single Single Single
TTTGGTTTCAGTGTTG-1 12_1_1_329 820 IGH IGL T T IGHV2-5 IGLV2-23 IGHJ4 IGLJ2 ... CCSYAGSSTFEVF IGH + IGL IGH + IGL T + T T + T IgM IgM Single + Single Single Single
TTTGGTTTCGGTGTCG-1 140_1_1_82 210 IGH IGK T T IGHV3-21 IGKV3-11 IGHJ2 IGKJ4 ... CQQRSNWPRLTF IGH + IGK IGH + IGK T + T T + T IgM IgM Single + Single Single Single

978 rows × 27 columns

scirpy

scirpy : Converting dandelion to scirpy

[9]:
irdata2 = ir.io.from_dandelion(vdj)
irdata2
/Users/kt16/miniconda3/envs/dandelion/lib/python3.7/site-packages/anndata/_core/anndata.py:1192: FutureWarning: is_categorical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead
... storing 'extra_chains' as categorical
... storing 'IR_VJ_1_c_cigar' as categorical
... storing 'IR_VJ_2_c_cigar' as categorical
... storing 'IR_VDJ_1_c_cigar' as categorical
... storing 'IR_VDJ_2_c_cigar' as categorical
... storing 'IR_VJ_1_clone_id' as categorical
... storing 'IR_VJ_2_clone_id' as categorical
... storing 'IR_VDJ_1_clone_id' as categorical
... storing 'IR_VDJ_2_clone_id' as categorical
... storing 'IR_VJ_1_d_cigar' as categorical
... storing 'IR_VJ_2_d_cigar' as categorical
... storing 'IR_VDJ_1_d_cigar' as categorical
... storing 'IR_VDJ_2_d_cigar' as categorical
... storing 'IR_VJ_1_d_sequence_end' as categorical
... storing 'IR_VJ_2_d_sequence_end' as categorical
... storing 'IR_VJ_1_d_sequence_start' as categorical
... storing 'IR_VJ_2_d_sequence_start' as categorical
... storing 'IR_VJ_1_germline_alignment' as categorical
... storing 'IR_VJ_2_germline_alignment' as categorical
... storing 'IR_VDJ_1_germline_alignment' as categorical
... storing 'IR_VDJ_2_germline_alignment' as categorical
... storing 'IR_VJ_1_is_cell' as categorical
... storing 'IR_VJ_2_is_cell' as categorical
... storing 'IR_VDJ_1_is_cell' as categorical
... storing 'IR_VDJ_2_is_cell' as categorical
... storing 'IR_VJ_1_j_cigar' as categorical
... storing 'IR_VJ_2_j_cigar' as categorical
... storing 'IR_VDJ_1_j_cigar' as categorical
... storing 'IR_VDJ_2_j_cigar' as categorical
... storing 'IR_VJ_1_junction' as categorical
... storing 'IR_VJ_2_junction' as categorical
... storing 'IR_VDJ_1_junction' as categorical
... storing 'IR_VDJ_2_junction' as categorical
... storing 'IR_VJ_1_junction_aa' as categorical
... storing 'IR_VJ_2_junction_aa' as categorical
... storing 'IR_VDJ_1_junction_aa' as categorical
... storing 'IR_VDJ_2_junction_aa' as categorical
... storing 'IR_VJ_1_productive' as categorical
... storing 'IR_VJ_2_productive' as categorical
... storing 'IR_VDJ_1_productive' as categorical
... storing 'IR_VDJ_2_productive' as categorical
... storing 'IR_VJ_1_rev_comp' as categorical
... storing 'IR_VJ_2_rev_comp' as categorical
... storing 'IR_VDJ_1_rev_comp' as categorical
... storing 'IR_VDJ_2_rev_comp' as categorical
... storing 'IR_VJ_1_sequence' as categorical
... storing 'IR_VJ_2_sequence' as categorical
... storing 'IR_VDJ_1_sequence' as categorical
... storing 'IR_VDJ_2_sequence' as categorical
... storing 'IR_VJ_1_sequence_aa' as categorical
... storing 'IR_VJ_2_sequence_aa' as categorical
... storing 'IR_VDJ_1_sequence_aa' as categorical
... storing 'IR_VDJ_2_sequence_aa' as categorical
... storing 'IR_VJ_1_sequence_alignment' as categorical
... storing 'IR_VJ_2_sequence_alignment' as categorical
... storing 'IR_VDJ_1_sequence_alignment' as categorical
... storing 'IR_VDJ_2_sequence_alignment' as categorical
... storing 'IR_VJ_1_sequence_id' as categorical
... storing 'IR_VJ_2_sequence_id' as categorical
... storing 'IR_VDJ_1_sequence_id' as categorical
... storing 'IR_VDJ_2_sequence_id' as categorical
... storing 'IR_VJ_1_v_cigar' as categorical
... storing 'IR_VJ_2_v_cigar' as categorical
... storing 'IR_VDJ_1_v_cigar' as categorical
... storing 'IR_VDJ_2_v_cigar' as categorical
[9]:
AnnData object with n_obs × n_vars = 994 × 0
    obs: 'multi_chain', 'extra_chains', 'IR_VJ_1_c_call', 'IR_VJ_2_c_call', 'IR_VDJ_1_c_call', 'IR_VDJ_2_c_call', 'IR_VJ_1_c_cigar', 'IR_VJ_2_c_cigar', 'IR_VDJ_1_c_cigar', 'IR_VDJ_2_c_cigar', 'IR_VJ_1_c_sequence_end', 'IR_VJ_2_c_sequence_end', 'IR_VDJ_1_c_sequence_end', 'IR_VDJ_2_c_sequence_end', 'IR_VJ_1_c_sequence_start', 'IR_VJ_2_c_sequence_start', 'IR_VDJ_1_c_sequence_start', 'IR_VDJ_2_c_sequence_start', 'IR_VJ_1_clone_id', 'IR_VJ_2_clone_id', 'IR_VDJ_1_clone_id', 'IR_VDJ_2_clone_id', 'IR_VJ_1_consensus_count', 'IR_VJ_2_consensus_count', 'IR_VDJ_1_consensus_count', 'IR_VDJ_2_consensus_count', 'IR_VJ_1_d_call', 'IR_VJ_2_d_call', 'IR_VDJ_1_d_call', 'IR_VDJ_2_d_call', 'IR_VJ_1_d_cigar', 'IR_VJ_2_d_cigar', 'IR_VDJ_1_d_cigar', 'IR_VDJ_2_d_cigar', 'IR_VJ_1_d_sequence_end', 'IR_VJ_2_d_sequence_end', 'IR_VDJ_1_d_sequence_end', 'IR_VDJ_2_d_sequence_end', 'IR_VJ_1_d_sequence_start', 'IR_VJ_2_d_sequence_start', 'IR_VDJ_1_d_sequence_start', 'IR_VDJ_2_d_sequence_start', 'IR_VJ_1_duplicate_count', 'IR_VJ_2_duplicate_count', 'IR_VDJ_1_duplicate_count', 'IR_VDJ_2_duplicate_count', 'IR_VJ_1_germline_alignment', 'IR_VJ_2_germline_alignment', 'IR_VDJ_1_germline_alignment', 'IR_VDJ_2_germline_alignment', 'IR_VJ_1_is_cell', 'IR_VJ_2_is_cell', 'IR_VDJ_1_is_cell', 'IR_VDJ_2_is_cell', 'IR_VJ_1_j_call', 'IR_VJ_2_j_call', 'IR_VDJ_1_j_call', 'IR_VDJ_2_j_call', 'IR_VJ_1_j_cigar', 'IR_VJ_2_j_cigar', 'IR_VDJ_1_j_cigar', 'IR_VDJ_2_j_cigar', 'IR_VJ_1_j_sequence_end', 'IR_VJ_2_j_sequence_end', 'IR_VDJ_1_j_sequence_end', 'IR_VDJ_2_j_sequence_end', 'IR_VJ_1_j_sequence_start', 'IR_VJ_2_j_sequence_start', 'IR_VDJ_1_j_sequence_start', 'IR_VDJ_2_j_sequence_start', 'IR_VJ_1_junction', 'IR_VJ_2_junction', 'IR_VDJ_1_junction', 'IR_VDJ_2_junction', 'IR_VJ_1_junction_aa', 'IR_VJ_2_junction_aa', 'IR_VDJ_1_junction_aa', 'IR_VDJ_2_junction_aa', 'IR_VJ_1_junction_aa_length', 'IR_VJ_2_junction_aa_length', 'IR_VDJ_1_junction_aa_length', 'IR_VDJ_2_junction_aa_length', 'IR_VJ_1_junction_length', 'IR_VJ_2_junction_length', 'IR_VDJ_1_junction_length', 'IR_VDJ_2_junction_length', 'IR_VJ_1_locus', 'IR_VJ_2_locus', 'IR_VDJ_1_locus', 'IR_VDJ_2_locus', 'IR_VJ_1_productive', 'IR_VJ_2_productive', 'IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_rev_comp', 'IR_VJ_2_rev_comp', 'IR_VDJ_1_rev_comp', 'IR_VDJ_2_rev_comp', 'IR_VJ_1_sequence', 'IR_VJ_2_sequence', 'IR_VDJ_1_sequence', 'IR_VDJ_2_sequence', 'IR_VJ_1_sequence_aa', 'IR_VJ_2_sequence_aa', 'IR_VDJ_1_sequence_aa', 'IR_VDJ_2_sequence_aa', 'IR_VJ_1_sequence_alignment', 'IR_VJ_2_sequence_alignment', 'IR_VDJ_1_sequence_alignment', 'IR_VDJ_2_sequence_alignment', 'IR_VJ_1_sequence_id', 'IR_VJ_2_sequence_id', 'IR_VDJ_1_sequence_id', 'IR_VDJ_2_sequence_id', 'IR_VJ_1_v_call', 'IR_VJ_2_v_call', 'IR_VDJ_1_v_call', 'IR_VDJ_2_v_call', 'IR_VJ_1_v_cigar', 'IR_VJ_2_v_cigar', 'IR_VDJ_1_v_cigar', 'IR_VDJ_2_v_cigar', 'IR_VJ_1_v_sequence_end', 'IR_VJ_2_v_sequence_end', 'IR_VDJ_1_v_sequence_end', 'IR_VDJ_2_v_sequence_end', 'IR_VJ_1_v_sequence_start', 'IR_VJ_2_v_sequence_start', 'IR_VDJ_1_v_sequence_start', 'IR_VDJ_2_v_sequence_start', 'has_ir'

likewise, transfer = True will perform dandelion’s tl.transfer.

[10]:
irdata2x = ir.io.from_dandelion(vdj, transfer = True)
irdata2x
... storing 'extra_chains' as categorical
... storing 'IR_VJ_1_c_cigar' as categorical
... storing 'IR_VJ_2_c_cigar' as categorical
... storing 'IR_VDJ_1_c_cigar' as categorical
... storing 'IR_VDJ_2_c_cigar' as categorical
... storing 'IR_VJ_1_clone_id' as categorical
... storing 'IR_VJ_2_clone_id' as categorical
... storing 'IR_VDJ_1_clone_id' as categorical
... storing 'IR_VDJ_2_clone_id' as categorical
... storing 'IR_VJ_1_d_cigar' as categorical
... storing 'IR_VJ_2_d_cigar' as categorical
... storing 'IR_VDJ_1_d_cigar' as categorical
... storing 'IR_VDJ_2_d_cigar' as categorical
... storing 'IR_VJ_1_d_sequence_end' as categorical
... storing 'IR_VJ_2_d_sequence_end' as categorical
... storing 'IR_VJ_1_d_sequence_start' as categorical
... storing 'IR_VJ_2_d_sequence_start' as categorical
... storing 'IR_VJ_1_germline_alignment' as categorical
... storing 'IR_VJ_2_germline_alignment' as categorical
... storing 'IR_VDJ_1_germline_alignment' as categorical
... storing 'IR_VDJ_2_germline_alignment' as categorical
... storing 'IR_VJ_1_is_cell' as categorical
... storing 'IR_VJ_2_is_cell' as categorical
... storing 'IR_VDJ_1_is_cell' as categorical
... storing 'IR_VDJ_2_is_cell' as categorical
... storing 'IR_VJ_1_j_cigar' as categorical
... storing 'IR_VJ_2_j_cigar' as categorical
... storing 'IR_VDJ_1_j_cigar' as categorical
... storing 'IR_VDJ_2_j_cigar' as categorical
... storing 'IR_VJ_1_junction' as categorical
... storing 'IR_VJ_2_junction' as categorical
... storing 'IR_VDJ_1_junction' as categorical
... storing 'IR_VDJ_2_junction' as categorical
... storing 'IR_VJ_1_junction_aa' as categorical
... storing 'IR_VJ_2_junction_aa' as categorical
... storing 'IR_VDJ_1_junction_aa' as categorical
... storing 'IR_VDJ_2_junction_aa' as categorical
... storing 'IR_VJ_1_productive' as categorical
... storing 'IR_VJ_2_productive' as categorical
... storing 'IR_VDJ_1_productive' as categorical
... storing 'IR_VDJ_2_productive' as categorical
... storing 'IR_VJ_1_rev_comp' as categorical
... storing 'IR_VJ_2_rev_comp' as categorical
... storing 'IR_VDJ_1_rev_comp' as categorical
... storing 'IR_VDJ_2_rev_comp' as categorical
... storing 'IR_VJ_1_sequence' as categorical
... storing 'IR_VJ_2_sequence' as categorical
... storing 'IR_VDJ_1_sequence' as categorical
... storing 'IR_VDJ_2_sequence' as categorical
... storing 'IR_VJ_1_sequence_aa' as categorical
... storing 'IR_VJ_2_sequence_aa' as categorical
... storing 'IR_VDJ_1_sequence_aa' as categorical
... storing 'IR_VDJ_2_sequence_aa' as categorical
... storing 'IR_VJ_1_sequence_alignment' as categorical
... storing 'IR_VJ_2_sequence_alignment' as categorical
... storing 'IR_VDJ_1_sequence_alignment' as categorical
... storing 'IR_VDJ_2_sequence_alignment' as categorical
... storing 'IR_VJ_1_sequence_id' as categorical
... storing 'IR_VJ_2_sequence_id' as categorical
... storing 'IR_VDJ_1_sequence_id' as categorical
... storing 'IR_VDJ_2_sequence_id' as categorical
... storing 'IR_VJ_1_v_cigar' as categorical
... storing 'IR_VJ_2_v_cigar' as categorical
... storing 'IR_VDJ_1_v_cigar' as categorical
... storing 'IR_VDJ_2_v_cigar' as categorical
[10]:
AnnData object with n_obs × n_vars = 994 × 0
    obs: 'multi_chain', 'extra_chains', 'IR_VJ_1_c_call', 'IR_VJ_2_c_call', 'IR_VDJ_1_c_call', 'IR_VDJ_2_c_call', 'IR_VJ_1_c_cigar', 'IR_VJ_2_c_cigar', 'IR_VDJ_1_c_cigar', 'IR_VDJ_2_c_cigar', 'IR_VJ_1_c_sequence_end', 'IR_VJ_2_c_sequence_end', 'IR_VDJ_1_c_sequence_end', 'IR_VDJ_2_c_sequence_end', 'IR_VJ_1_c_sequence_start', 'IR_VJ_2_c_sequence_start', 'IR_VDJ_1_c_sequence_start', 'IR_VDJ_2_c_sequence_start', 'IR_VJ_1_clone_id', 'IR_VJ_2_clone_id', 'IR_VDJ_1_clone_id', 'IR_VDJ_2_clone_id', 'IR_VJ_1_consensus_count', 'IR_VJ_2_consensus_count', 'IR_VDJ_1_consensus_count', 'IR_VDJ_2_consensus_count', 'IR_VJ_1_d_call', 'IR_VJ_2_d_call', 'IR_VDJ_1_d_call', 'IR_VDJ_2_d_call', 'IR_VJ_1_d_cigar', 'IR_VJ_2_d_cigar', 'IR_VDJ_1_d_cigar', 'IR_VDJ_2_d_cigar', 'IR_VJ_1_d_sequence_end', 'IR_VJ_2_d_sequence_end', 'IR_VDJ_1_d_sequence_end', 'IR_VDJ_2_d_sequence_end', 'IR_VJ_1_d_sequence_start', 'IR_VJ_2_d_sequence_start', 'IR_VDJ_1_d_sequence_start', 'IR_VDJ_2_d_sequence_start', 'IR_VJ_1_duplicate_count', 'IR_VJ_2_duplicate_count', 'IR_VDJ_1_duplicate_count', 'IR_VDJ_2_duplicate_count', 'IR_VJ_1_germline_alignment', 'IR_VJ_2_germline_alignment', 'IR_VDJ_1_germline_alignment', 'IR_VDJ_2_germline_alignment', 'IR_VJ_1_is_cell', 'IR_VJ_2_is_cell', 'IR_VDJ_1_is_cell', 'IR_VDJ_2_is_cell', 'IR_VJ_1_j_call', 'IR_VJ_2_j_call', 'IR_VDJ_1_j_call', 'IR_VDJ_2_j_call', 'IR_VJ_1_j_cigar', 'IR_VJ_2_j_cigar', 'IR_VDJ_1_j_cigar', 'IR_VDJ_2_j_cigar', 'IR_VJ_1_j_sequence_end', 'IR_VJ_2_j_sequence_end', 'IR_VDJ_1_j_sequence_end', 'IR_VDJ_2_j_sequence_end', 'IR_VJ_1_j_sequence_start', 'IR_VJ_2_j_sequence_start', 'IR_VDJ_1_j_sequence_start', 'IR_VDJ_2_j_sequence_start', 'IR_VJ_1_junction', 'IR_VJ_2_junction', 'IR_VDJ_1_junction', 'IR_VDJ_2_junction', 'IR_VJ_1_junction_aa', 'IR_VJ_2_junction_aa', 'IR_VDJ_1_junction_aa', 'IR_VDJ_2_junction_aa', 'IR_VJ_1_junction_aa_length', 'IR_VJ_2_junction_aa_length', 'IR_VDJ_1_junction_aa_length', 'IR_VDJ_2_junction_aa_length', 'IR_VJ_1_junction_length', 'IR_VJ_2_junction_length', 'IR_VDJ_1_junction_length', 'IR_VDJ_2_junction_length', 'IR_VJ_1_locus', 'IR_VJ_2_locus', 'IR_VDJ_1_locus', 'IR_VDJ_2_locus', 'IR_VJ_1_productive', 'IR_VJ_2_productive', 'IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_rev_comp', 'IR_VJ_2_rev_comp', 'IR_VDJ_1_rev_comp', 'IR_VDJ_2_rev_comp', 'IR_VJ_1_sequence', 'IR_VJ_2_sequence', 'IR_VDJ_1_sequence', 'IR_VDJ_2_sequence', 'IR_VJ_1_sequence_aa', 'IR_VJ_2_sequence_aa', 'IR_VDJ_1_sequence_aa', 'IR_VDJ_2_sequence_aa', 'IR_VJ_1_sequence_alignment', 'IR_VJ_2_sequence_alignment', 'IR_VDJ_1_sequence_alignment', 'IR_VDJ_2_sequence_alignment', 'IR_VJ_1_sequence_id', 'IR_VJ_2_sequence_id', 'IR_VDJ_1_sequence_id', 'IR_VDJ_2_sequence_id', 'IR_VJ_1_v_call', 'IR_VJ_2_v_call', 'IR_VDJ_1_v_call', 'IR_VDJ_2_v_call', 'IR_VJ_1_v_cigar', 'IR_VJ_2_v_cigar', 'IR_VDJ_1_v_cigar', 'IR_VDJ_2_v_cigar', 'IR_VJ_1_v_sequence_end', 'IR_VJ_2_v_sequence_end', 'IR_VDJ_1_v_sequence_end', 'IR_VDJ_2_v_sequence_end', 'IR_VJ_1_v_sequence_start', 'IR_VJ_2_v_sequence_start', 'IR_VDJ_1_v_sequence_start', 'IR_VDJ_2_v_sequence_start', 'has_ir', 'clone_id', 'clone_id_by_size', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'duplicate_count_heavy_0', 'duplicate_count_heavy_1', 'duplicate_count_light_0', 'duplicate_count_light_1', 'junction_aa_heavy', 'junction_aa_light', 'status', 'status_summary', 'productive', 'productive_summary', 'isotype', 'isotype_summary', 'vdj_status', 'vdj_status_summary', 'heavychain_status_summary'

scirpy : Converting scirpy to dandelion

[11]:
vdj3 = ir.io.to_dandelion(irdata2)
vdj3
[11]:
Dandelion class object with n_obs = 978 and n_contigs = 2093
    data: 'c_call', 'c_cigar', 'c_sequence_end', 'c_sequence_start', 'clone_id', 'consensus_count', 'd_call', 'd_cigar', 'd_sequence_end', 'd_sequence_start', 'germline_alignment', 'is_cell', 'j_call', 'j_cigar', 'j_sequence_end', 'j_sequence_start', 'junction', 'junction_aa', 'junction_aa_length', 'junction_length', 'locus', 'productive', 'rev_comp', 'sequence', 'sequence_aa', 'sequence_alignment', 'sequence_id', 'v_call', 'v_cigar', 'v_sequence_end', 'v_sequence_start', 'cell_id', 'umi_count'
    metadata: 'clone_id', 'clone_id_by_size', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_heavy_1', 'umi_count_light_0', 'umi_count_light_1', 'junction_aa_heavy', 'junction_aa_light', 'status', 'status_summary', 'productive', 'productive_summary', 'isotype', 'isotype_summary', 'vdj_status', 'vdj_status_summary', 'heavychain_status_summary'
    distance: None
    edges: None
    layout: None
    graph: None

scirpy : Read from scirpy, convert to dandelion

[12]:
# read in the airr_rearrangement.tsv file
file_location = 'sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_airr_rearrangement.tsv'
irdata_s = ir.io.read_airr(file_location)
irdata_s
/Users/kt16/miniconda3/envs/dandelion/lib/python3.7/site-packages/anndata/_core/anndata.py:1192: FutureWarning: is_categorical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead
... storing 'extra_chains' as categorical
... storing 'IR_VJ_1_c_cigar' as categorical
... storing 'IR_VJ_2_c_cigar' as categorical
... storing 'IR_VDJ_1_c_cigar' as categorical
... storing 'IR_VDJ_2_c_cigar' as categorical
... storing 'IR_VJ_1_c_sequence_end' as categorical
... storing 'IR_VJ_2_c_sequence_end' as categorical
... storing 'IR_VDJ_1_c_sequence_end' as categorical
... storing 'IR_VDJ_2_c_sequence_end' as categorical
... storing 'IR_VJ_1_c_sequence_start' as categorical
... storing 'IR_VJ_2_c_sequence_start' as categorical
... storing 'IR_VDJ_1_c_sequence_start' as categorical
... storing 'IR_VDJ_2_c_sequence_start' as categorical
... storing 'IR_VJ_1_clone_id' as categorical
... storing 'IR_VJ_2_clone_id' as categorical
... storing 'IR_VDJ_1_clone_id' as categorical
... storing 'IR_VDJ_2_clone_id' as categorical
... storing 'IR_VJ_1_d_cigar' as categorical
... storing 'IR_VJ_2_d_cigar' as categorical
... storing 'IR_VDJ_1_d_cigar' as categorical
... storing 'IR_VDJ_2_d_cigar' as categorical
... storing 'IR_VJ_1_d_sequence_end' as categorical
... storing 'IR_VJ_2_d_sequence_end' as categorical
... storing 'IR_VJ_1_d_sequence_start' as categorical
... storing 'IR_VJ_2_d_sequence_start' as categorical
... storing 'IR_VJ_1_germline_alignment' as categorical
... storing 'IR_VJ_2_germline_alignment' as categorical
... storing 'IR_VDJ_1_germline_alignment' as categorical
... storing 'IR_VDJ_2_germline_alignment' as categorical
... storing 'IR_VJ_1_is_cell' as categorical
... storing 'IR_VJ_2_is_cell' as categorical
... storing 'IR_VDJ_1_is_cell' as categorical
... storing 'IR_VDJ_2_is_cell' as categorical
... storing 'IR_VJ_1_j_cigar' as categorical
... storing 'IR_VJ_2_j_cigar' as categorical
... storing 'IR_VDJ_1_j_cigar' as categorical
... storing 'IR_VDJ_2_j_cigar' as categorical
... storing 'IR_VJ_1_junction' as categorical
... storing 'IR_VJ_2_junction' as categorical
... storing 'IR_VDJ_1_junction' as categorical
... storing 'IR_VDJ_2_junction' as categorical
... storing 'IR_VJ_1_junction_aa' as categorical
... storing 'IR_VJ_2_junction_aa' as categorical
... storing 'IR_VDJ_1_junction_aa' as categorical
... storing 'IR_VDJ_2_junction_aa' as categorical
... storing 'IR_VJ_1_junction_aa_length' as categorical
... storing 'IR_VJ_2_junction_aa_length' as categorical
... storing 'IR_VDJ_1_junction_aa_length' as categorical
... storing 'IR_VDJ_2_junction_aa_length' as categorical
... storing 'IR_VJ_1_productive' as categorical
... storing 'IR_VJ_2_productive' as categorical
... storing 'IR_VDJ_1_productive' as categorical
... storing 'IR_VDJ_2_productive' as categorical
... storing 'IR_VJ_1_rev_comp' as categorical
... storing 'IR_VJ_2_rev_comp' as categorical
... storing 'IR_VDJ_1_rev_comp' as categorical
... storing 'IR_VDJ_2_rev_comp' as categorical
... storing 'IR_VJ_1_sequence' as categorical
... storing 'IR_VJ_2_sequence' as categorical
... storing 'IR_VDJ_1_sequence' as categorical
... storing 'IR_VDJ_2_sequence' as categorical
... storing 'IR_VJ_1_sequence_aa' as categorical
... storing 'IR_VJ_2_sequence_aa' as categorical
... storing 'IR_VDJ_1_sequence_aa' as categorical
... storing 'IR_VDJ_2_sequence_aa' as categorical
... storing 'IR_VJ_1_sequence_alignment' as categorical
... storing 'IR_VJ_2_sequence_alignment' as categorical
... storing 'IR_VDJ_1_sequence_alignment' as categorical
... storing 'IR_VDJ_2_sequence_alignment' as categorical
... storing 'IR_VJ_1_sequence_id' as categorical
... storing 'IR_VJ_2_sequence_id' as categorical
... storing 'IR_VDJ_1_sequence_id' as categorical
... storing 'IR_VDJ_2_sequence_id' as categorical
... storing 'IR_VJ_1_v_cigar' as categorical
... storing 'IR_VJ_2_v_cigar' as categorical
... storing 'IR_VDJ_1_v_cigar' as categorical
... storing 'IR_VDJ_2_v_cigar' as categorical
[12]:
AnnData object with n_obs × n_vars = 994 × 0
    obs: 'multi_chain', 'extra_chains', 'IR_VJ_1_c_call', 'IR_VJ_2_c_call', 'IR_VDJ_1_c_call', 'IR_VDJ_2_c_call', 'IR_VJ_1_c_cigar', 'IR_VJ_2_c_cigar', 'IR_VDJ_1_c_cigar', 'IR_VDJ_2_c_cigar', 'IR_VJ_1_c_sequence_end', 'IR_VJ_2_c_sequence_end', 'IR_VDJ_1_c_sequence_end', 'IR_VDJ_2_c_sequence_end', 'IR_VJ_1_c_sequence_start', 'IR_VJ_2_c_sequence_start', 'IR_VDJ_1_c_sequence_start', 'IR_VDJ_2_c_sequence_start', 'IR_VJ_1_clone_id', 'IR_VJ_2_clone_id', 'IR_VDJ_1_clone_id', 'IR_VDJ_2_clone_id', 'IR_VJ_1_consensus_count', 'IR_VJ_2_consensus_count', 'IR_VDJ_1_consensus_count', 'IR_VDJ_2_consensus_count', 'IR_VJ_1_d_call', 'IR_VJ_2_d_call', 'IR_VDJ_1_d_call', 'IR_VDJ_2_d_call', 'IR_VJ_1_d_cigar', 'IR_VJ_2_d_cigar', 'IR_VDJ_1_d_cigar', 'IR_VDJ_2_d_cigar', 'IR_VJ_1_d_sequence_end', 'IR_VJ_2_d_sequence_end', 'IR_VDJ_1_d_sequence_end', 'IR_VDJ_2_d_sequence_end', 'IR_VJ_1_d_sequence_start', 'IR_VJ_2_d_sequence_start', 'IR_VDJ_1_d_sequence_start', 'IR_VDJ_2_d_sequence_start', 'IR_VJ_1_duplicate_count', 'IR_VJ_2_duplicate_count', 'IR_VDJ_1_duplicate_count', 'IR_VDJ_2_duplicate_count', 'IR_VJ_1_germline_alignment', 'IR_VJ_2_germline_alignment', 'IR_VDJ_1_germline_alignment', 'IR_VDJ_2_germline_alignment', 'IR_VJ_1_is_cell', 'IR_VJ_2_is_cell', 'IR_VDJ_1_is_cell', 'IR_VDJ_2_is_cell', 'IR_VJ_1_j_call', 'IR_VJ_2_j_call', 'IR_VDJ_1_j_call', 'IR_VDJ_2_j_call', 'IR_VJ_1_j_cigar', 'IR_VJ_2_j_cigar', 'IR_VDJ_1_j_cigar', 'IR_VDJ_2_j_cigar', 'IR_VJ_1_j_sequence_end', 'IR_VJ_2_j_sequence_end', 'IR_VDJ_1_j_sequence_end', 'IR_VDJ_2_j_sequence_end', 'IR_VJ_1_j_sequence_start', 'IR_VJ_2_j_sequence_start', 'IR_VDJ_1_j_sequence_start', 'IR_VDJ_2_j_sequence_start', 'IR_VJ_1_junction', 'IR_VJ_2_junction', 'IR_VDJ_1_junction', 'IR_VDJ_2_junction', 'IR_VJ_1_junction_aa', 'IR_VJ_2_junction_aa', 'IR_VDJ_1_junction_aa', 'IR_VDJ_2_junction_aa', 'IR_VJ_1_junction_aa_length', 'IR_VJ_2_junction_aa_length', 'IR_VDJ_1_junction_aa_length', 'IR_VDJ_2_junction_aa_length', 'IR_VJ_1_junction_length', 'IR_VJ_2_junction_length', 'IR_VDJ_1_junction_length', 'IR_VDJ_2_junction_length', 'IR_VJ_1_locus', 'IR_VJ_2_locus', 'IR_VDJ_1_locus', 'IR_VDJ_2_locus', 'IR_VJ_1_productive', 'IR_VJ_2_productive', 'IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_rev_comp', 'IR_VJ_2_rev_comp', 'IR_VDJ_1_rev_comp', 'IR_VDJ_2_rev_comp', 'IR_VJ_1_sequence', 'IR_VJ_2_sequence', 'IR_VDJ_1_sequence', 'IR_VDJ_2_sequence', 'IR_VJ_1_sequence_aa', 'IR_VJ_2_sequence_aa', 'IR_VDJ_1_sequence_aa', 'IR_VDJ_2_sequence_aa', 'IR_VJ_1_sequence_alignment', 'IR_VJ_2_sequence_alignment', 'IR_VDJ_1_sequence_alignment', 'IR_VDJ_2_sequence_alignment', 'IR_VJ_1_sequence_id', 'IR_VJ_2_sequence_id', 'IR_VDJ_1_sequence_id', 'IR_VDJ_2_sequence_id', 'IR_VJ_1_v_call', 'IR_VJ_2_v_call', 'IR_VDJ_1_v_call', 'IR_VDJ_2_v_call', 'IR_VJ_1_v_cigar', 'IR_VJ_2_v_cigar', 'IR_VDJ_1_v_cigar', 'IR_VDJ_2_v_cigar', 'IR_VJ_1_v_sequence_end', 'IR_VJ_2_v_sequence_end', 'IR_VDJ_1_v_sequence_end', 'IR_VDJ_2_v_sequence_end', 'IR_VJ_1_v_sequence_start', 'IR_VJ_2_v_sequence_start', 'IR_VDJ_1_v_sequence_start', 'IR_VDJ_2_v_sequence_start', 'has_ir'

This time, find clones with scirpy’s method.

[13]:
ir.tl.chain_qc(irdata_s)
ir.pp.ir_dist(irdata_s, metric = 'hamming', sequence="aa")
ir.tl.define_clonotypes(irdata_s)
irdata_s



[13]:
AnnData object with n_obs × n_vars = 994 × 0
    obs: 'multi_chain', 'extra_chains', 'IR_VJ_1_c_call', 'IR_VJ_2_c_call', 'IR_VDJ_1_c_call', 'IR_VDJ_2_c_call', 'IR_VJ_1_c_cigar', 'IR_VJ_2_c_cigar', 'IR_VDJ_1_c_cigar', 'IR_VDJ_2_c_cigar', 'IR_VJ_1_c_sequence_end', 'IR_VJ_2_c_sequence_end', 'IR_VDJ_1_c_sequence_end', 'IR_VDJ_2_c_sequence_end', 'IR_VJ_1_c_sequence_start', 'IR_VJ_2_c_sequence_start', 'IR_VDJ_1_c_sequence_start', 'IR_VDJ_2_c_sequence_start', 'IR_VJ_1_clone_id', 'IR_VJ_2_clone_id', 'IR_VDJ_1_clone_id', 'IR_VDJ_2_clone_id', 'IR_VJ_1_consensus_count', 'IR_VJ_2_consensus_count', 'IR_VDJ_1_consensus_count', 'IR_VDJ_2_consensus_count', 'IR_VJ_1_d_call', 'IR_VJ_2_d_call', 'IR_VDJ_1_d_call', 'IR_VDJ_2_d_call', 'IR_VJ_1_d_cigar', 'IR_VJ_2_d_cigar', 'IR_VDJ_1_d_cigar', 'IR_VDJ_2_d_cigar', 'IR_VJ_1_d_sequence_end', 'IR_VJ_2_d_sequence_end', 'IR_VDJ_1_d_sequence_end', 'IR_VDJ_2_d_sequence_end', 'IR_VJ_1_d_sequence_start', 'IR_VJ_2_d_sequence_start', 'IR_VDJ_1_d_sequence_start', 'IR_VDJ_2_d_sequence_start', 'IR_VJ_1_duplicate_count', 'IR_VJ_2_duplicate_count', 'IR_VDJ_1_duplicate_count', 'IR_VDJ_2_duplicate_count', 'IR_VJ_1_germline_alignment', 'IR_VJ_2_germline_alignment', 'IR_VDJ_1_germline_alignment', 'IR_VDJ_2_germline_alignment', 'IR_VJ_1_is_cell', 'IR_VJ_2_is_cell', 'IR_VDJ_1_is_cell', 'IR_VDJ_2_is_cell', 'IR_VJ_1_j_call', 'IR_VJ_2_j_call', 'IR_VDJ_1_j_call', 'IR_VDJ_2_j_call', 'IR_VJ_1_j_cigar', 'IR_VJ_2_j_cigar', 'IR_VDJ_1_j_cigar', 'IR_VDJ_2_j_cigar', 'IR_VJ_1_j_sequence_end', 'IR_VJ_2_j_sequence_end', 'IR_VDJ_1_j_sequence_end', 'IR_VDJ_2_j_sequence_end', 'IR_VJ_1_j_sequence_start', 'IR_VJ_2_j_sequence_start', 'IR_VDJ_1_j_sequence_start', 'IR_VDJ_2_j_sequence_start', 'IR_VJ_1_junction', 'IR_VJ_2_junction', 'IR_VDJ_1_junction', 'IR_VDJ_2_junction', 'IR_VJ_1_junction_aa', 'IR_VJ_2_junction_aa', 'IR_VDJ_1_junction_aa', 'IR_VDJ_2_junction_aa', 'IR_VJ_1_junction_aa_length', 'IR_VJ_2_junction_aa_length', 'IR_VDJ_1_junction_aa_length', 'IR_VDJ_2_junction_aa_length', 'IR_VJ_1_junction_length', 'IR_VJ_2_junction_length', 'IR_VDJ_1_junction_length', 'IR_VDJ_2_junction_length', 'IR_VJ_1_locus', 'IR_VJ_2_locus', 'IR_VDJ_1_locus', 'IR_VDJ_2_locus', 'IR_VJ_1_productive', 'IR_VJ_2_productive', 'IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_rev_comp', 'IR_VJ_2_rev_comp', 'IR_VDJ_1_rev_comp', 'IR_VDJ_2_rev_comp', 'IR_VJ_1_sequence', 'IR_VJ_2_sequence', 'IR_VDJ_1_sequence', 'IR_VDJ_2_sequence', 'IR_VJ_1_sequence_aa', 'IR_VJ_2_sequence_aa', 'IR_VDJ_1_sequence_aa', 'IR_VDJ_2_sequence_aa', 'IR_VJ_1_sequence_alignment', 'IR_VJ_2_sequence_alignment', 'IR_VDJ_1_sequence_alignment', 'IR_VDJ_2_sequence_alignment', 'IR_VJ_1_sequence_id', 'IR_VJ_2_sequence_id', 'IR_VDJ_1_sequence_id', 'IR_VDJ_2_sequence_id', 'IR_VJ_1_v_call', 'IR_VJ_2_v_call', 'IR_VDJ_1_v_call', 'IR_VDJ_2_v_call', 'IR_VJ_1_v_cigar', 'IR_VJ_2_v_cigar', 'IR_VDJ_1_v_cigar', 'IR_VDJ_2_v_cigar', 'IR_VJ_1_v_sequence_end', 'IR_VJ_2_v_sequence_end', 'IR_VDJ_1_v_sequence_end', 'IR_VDJ_2_v_sequence_end', 'IR_VJ_1_v_sequence_start', 'IR_VJ_2_v_sequence_start', 'IR_VDJ_1_v_sequence_start', 'IR_VDJ_2_v_sequence_start', 'has_ir', 'receptor_type', 'receptor_subtype', 'chain_pairing', 'clonotype', 'clonotype_size'
    uns: 'ir_dist_aa_hamming', 'ir_dist_nt_identity', 'clonotype'
[14]:
vdj4 = ir.io.to_dandelion(irdata_s)
vdj4
[14]:
Dandelion class object with n_obs = 978 and n_contigs = 2093
    data: 'c_call', 'c_cigar', 'c_sequence_end', 'c_sequence_start', 'clone_id', 'consensus_count', 'd_call', 'd_cigar', 'd_sequence_end', 'd_sequence_start', 'germline_alignment', 'is_cell', 'j_call', 'j_cigar', 'j_sequence_end', 'j_sequence_start', 'junction', 'junction_aa', 'junction_aa_length', 'junction_length', 'locus', 'productive', 'rev_comp', 'sequence', 'sequence_aa', 'sequence_alignment', 'sequence_id', 'v_call', 'v_cigar', 'v_sequence_end', 'v_sequence_start', 'cell_id', 'umi_count'
    metadata: 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_heavy_1', 'umi_count_light_0', 'umi_count_light_1', 'junction_aa_heavy', 'junction_aa_light', 'status', 'status_summary', 'productive', 'productive_summary', 'isotype', 'isotype_summary', 'vdj_status', 'vdj_status_summary', 'heavychain_status_summary'
    distance: None
    edges: None
    layout: None
    graph: None

Note that clone_id column is missing.

As [@grst](https://github.com/grst) has noted, ‘Currently only the chain-specific attributes get exported (i.e. all scirpy columns that start with IR_). In principle, it probably makes sense to write out the clonotype column to clone_id. But then again, scirpy allows different versions of clonal assignments…’. What this means is that unless the IR_*_clone_id columns are populated, this will not be transferred over.

You can manually parse that over or use ddl.from_scirpy’s conversion method which will use the clonotype column (or clone_id column if already present) from the scirpy initialized AnnData.obs as the default clone_id. clone_key and key_added options can be toggled to adjust this behavior.

[15]:
vdj5 = ddl.from_scirpy(irdata_s)
vdj5
[15]:
Dandelion class object with n_obs = 978 and n_contigs = 2093
    data: 'c_call', 'c_cigar', 'c_sequence_end', 'c_sequence_start', 'clone_id', 'consensus_count', 'd_call', 'd_cigar', 'd_sequence_end', 'd_sequence_start', 'germline_alignment', 'is_cell', 'j_call', 'j_cigar', 'j_sequence_end', 'j_sequence_start', 'junction', 'junction_aa', 'junction_aa_length', 'junction_length', 'locus', 'productive', 'rev_comp', 'sequence', 'sequence_aa', 'sequence_alignment', 'sequence_id', 'v_call', 'v_cigar', 'v_sequence_end', 'v_sequence_start', 'cell_id', 'umi_count'
    metadata: 'clone_id', 'clone_id_by_size', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_heavy_1', 'umi_count_light_0', 'umi_count_light_1', 'junction_aa_heavy', 'junction_aa_light', 'status', 'status_summary', 'productive', 'productive_summary', 'isotype', 'isotype_summary', 'vdj_status', 'vdj_status_summary', 'heavychain_status_summary'
    distance: None
    edges: None
    layout: None
    graph: None
[ ]: