Interoperability with scirpy
¶

It is now possible to convert the file formats between dandelion>=0.1.0
and scirpy>=0.6.2.dev104
to enhance the collaboration between the analysis toolkits.
We will download the airr_rearrangement.tsv file from here:
# bash
wget https://cf.10xgenomics.com/samples/cell-vdj/4.0.0/sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_airr_rearrangement.tsv
Import dandelion module
[1]:
import os
import dandelion as ddl
# change directory to somewhere more workable
os.chdir(os.path.expanduser('/Users/kt16/Documents/scripts/data/dandelion_tutorial/'))
ddl.logging.print_versions()
dandelion==0.1.0 pandas==1.2.3 numpy==1.20.1 matplotlib==3.3.4 networkx==2.5 scipy==1.6.1 skbio==0.5.6
[2]:
import scirpy as ir
ir.__version__
[2]:
'0.6.2.dev105'
dandelion
¶
[3]:
# read in the airr_rearrangement.tsv file
file_location = 'sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_airr_rearrangement.tsv'
vdj = ddl.read_10x_airr(file_location)
vdj
[3]:
Dandelion class object with n_obs = 978 and n_contigs = 2093
data: 'cell_id', 'clone_id', 'sequence_id', 'sequence', 'sequence_aa', 'productive', 'rev_comp', 'v_call', 'v_cigar', 'd_call', 'd_cigar', 'j_call', 'j_cigar', 'c_call', 'c_cigar', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'junction_length', 'junction_aa_length', 'v_sequence_start', 'v_sequence_end', 'd_sequence_start', 'd_sequence_end', 'j_sequence_start', 'j_sequence_end', 'c_sequence_start', 'c_sequence_end', 'consensus_count', 'duplicate_count', 'is_cell', 'locus'
metadata: 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'duplicate_count_heavy_0', 'duplicate_count_heavy_1', 'duplicate_count_light_0', 'duplicate_count_light_1', 'junction_aa_heavy', 'junction_aa_light', 'status', 'status_summary', 'productive', 'productive_summary', 'isotype', 'isotype_summary', 'vdj_status', 'vdj_status_summary', 'heavychain_status_summary'
distance: None
edges: None
layout: None
graph: None
The test file contains a blank clone_id
column so we run find_clones
to populate it first.
[4]:
ddl.tl.find_clones(vdj)
Finding clones based on heavy chains : 100%|██████████| 157/157 [00:00<00:00, 960.80it/s]
Refining clone assignment based on light chain pairing : 100%|██████████| 978/978 [00:00<00:00, 295983.07it/s]
dandelion
: Converting dandelion
to scirpy
¶
[5]:
irdata = ddl.to_scirpy(vdj)
irdata
/Users/kt16/miniconda3/envs/dandelion/lib/python3.7/site-packages/anndata/_core/anndata.py:1192: FutureWarning: is_categorical is deprecated and will be removed in a future version. Use is_categorical_dtype instead
... storing 'extra_chains' as categorical
... storing 'IR_VJ_1_c_cigar' as categorical
... storing 'IR_VJ_2_c_cigar' as categorical
... storing 'IR_VDJ_1_c_cigar' as categorical
... storing 'IR_VDJ_2_c_cigar' as categorical
... storing 'IR_VJ_1_clone_id' as categorical
... storing 'IR_VJ_2_clone_id' as categorical
... storing 'IR_VDJ_1_clone_id' as categorical
... storing 'IR_VDJ_2_clone_id' as categorical
... storing 'IR_VJ_1_d_cigar' as categorical
... storing 'IR_VJ_2_d_cigar' as categorical
... storing 'IR_VDJ_1_d_cigar' as categorical
... storing 'IR_VDJ_2_d_cigar' as categorical
... storing 'IR_VJ_1_d_sequence_end' as categorical
... storing 'IR_VJ_2_d_sequence_end' as categorical
... storing 'IR_VJ_1_d_sequence_start' as categorical
... storing 'IR_VJ_2_d_sequence_start' as categorical
... storing 'IR_VJ_1_germline_alignment' as categorical
... storing 'IR_VJ_2_germline_alignment' as categorical
... storing 'IR_VDJ_1_germline_alignment' as categorical
... storing 'IR_VDJ_2_germline_alignment' as categorical
... storing 'IR_VJ_1_is_cell' as categorical
... storing 'IR_VJ_2_is_cell' as categorical
... storing 'IR_VDJ_1_is_cell' as categorical
... storing 'IR_VDJ_2_is_cell' as categorical
... storing 'IR_VJ_1_j_cigar' as categorical
... storing 'IR_VJ_2_j_cigar' as categorical
... storing 'IR_VDJ_1_j_cigar' as categorical
... storing 'IR_VDJ_2_j_cigar' as categorical
... storing 'IR_VJ_1_junction' as categorical
... storing 'IR_VJ_2_junction' as categorical
... storing 'IR_VDJ_1_junction' as categorical
... storing 'IR_VDJ_2_junction' as categorical
... storing 'IR_VJ_1_junction_aa' as categorical
... storing 'IR_VJ_2_junction_aa' as categorical
... storing 'IR_VDJ_1_junction_aa' as categorical
... storing 'IR_VDJ_2_junction_aa' as categorical
... storing 'IR_VJ_1_productive' as categorical
... storing 'IR_VJ_2_productive' as categorical
... storing 'IR_VDJ_1_productive' as categorical
... storing 'IR_VDJ_2_productive' as categorical
... storing 'IR_VJ_1_rev_comp' as categorical
... storing 'IR_VJ_2_rev_comp' as categorical
... storing 'IR_VDJ_1_rev_comp' as categorical
... storing 'IR_VDJ_2_rev_comp' as categorical
... storing 'IR_VJ_1_sequence' as categorical
... storing 'IR_VJ_2_sequence' as categorical
... storing 'IR_VDJ_1_sequence' as categorical
... storing 'IR_VDJ_2_sequence' as categorical
... storing 'IR_VJ_1_sequence_aa' as categorical
... storing 'IR_VJ_2_sequence_aa' as categorical
... storing 'IR_VDJ_1_sequence_aa' as categorical
... storing 'IR_VDJ_2_sequence_aa' as categorical
... storing 'IR_VJ_1_sequence_alignment' as categorical
... storing 'IR_VJ_2_sequence_alignment' as categorical
... storing 'IR_VDJ_1_sequence_alignment' as categorical
... storing 'IR_VDJ_2_sequence_alignment' as categorical
... storing 'IR_VJ_1_sequence_id' as categorical
... storing 'IR_VJ_2_sequence_id' as categorical
... storing 'IR_VDJ_1_sequence_id' as categorical
... storing 'IR_VDJ_2_sequence_id' as categorical
... storing 'IR_VJ_1_v_cigar' as categorical
... storing 'IR_VJ_2_v_cigar' as categorical
... storing 'IR_VDJ_1_v_cigar' as categorical
... storing 'IR_VDJ_2_v_cigar' as categorical
[5]:
AnnData object with n_obs × n_vars = 994 × 0
obs: 'multi_chain', 'extra_chains', 'IR_VJ_1_c_call', 'IR_VJ_2_c_call', 'IR_VDJ_1_c_call', 'IR_VDJ_2_c_call', 'IR_VJ_1_c_cigar', 'IR_VJ_2_c_cigar', 'IR_VDJ_1_c_cigar', 'IR_VDJ_2_c_cigar', 'IR_VJ_1_c_sequence_end', 'IR_VJ_2_c_sequence_end', 'IR_VDJ_1_c_sequence_end', 'IR_VDJ_2_c_sequence_end', 'IR_VJ_1_c_sequence_start', 'IR_VJ_2_c_sequence_start', 'IR_VDJ_1_c_sequence_start', 'IR_VDJ_2_c_sequence_start', 'IR_VJ_1_clone_id', 'IR_VJ_2_clone_id', 'IR_VDJ_1_clone_id', 'IR_VDJ_2_clone_id', 'IR_VJ_1_consensus_count', 'IR_VJ_2_consensus_count', 'IR_VDJ_1_consensus_count', 'IR_VDJ_2_consensus_count', 'IR_VJ_1_d_call', 'IR_VJ_2_d_call', 'IR_VDJ_1_d_call', 'IR_VDJ_2_d_call', 'IR_VJ_1_d_cigar', 'IR_VJ_2_d_cigar', 'IR_VDJ_1_d_cigar', 'IR_VDJ_2_d_cigar', 'IR_VJ_1_d_sequence_end', 'IR_VJ_2_d_sequence_end', 'IR_VDJ_1_d_sequence_end', 'IR_VDJ_2_d_sequence_end', 'IR_VJ_1_d_sequence_start', 'IR_VJ_2_d_sequence_start', 'IR_VDJ_1_d_sequence_start', 'IR_VDJ_2_d_sequence_start', 'IR_VJ_1_duplicate_count', 'IR_VJ_2_duplicate_count', 'IR_VDJ_1_duplicate_count', 'IR_VDJ_2_duplicate_count', 'IR_VJ_1_germline_alignment', 'IR_VJ_2_germline_alignment', 'IR_VDJ_1_germline_alignment', 'IR_VDJ_2_germline_alignment', 'IR_VJ_1_is_cell', 'IR_VJ_2_is_cell', 'IR_VDJ_1_is_cell', 'IR_VDJ_2_is_cell', 'IR_VJ_1_j_call', 'IR_VJ_2_j_call', 'IR_VDJ_1_j_call', 'IR_VDJ_2_j_call', 'IR_VJ_1_j_cigar', 'IR_VJ_2_j_cigar', 'IR_VDJ_1_j_cigar', 'IR_VDJ_2_j_cigar', 'IR_VJ_1_j_sequence_end', 'IR_VJ_2_j_sequence_end', 'IR_VDJ_1_j_sequence_end', 'IR_VDJ_2_j_sequence_end', 'IR_VJ_1_j_sequence_start', 'IR_VJ_2_j_sequence_start', 'IR_VDJ_1_j_sequence_start', 'IR_VDJ_2_j_sequence_start', 'IR_VJ_1_junction', 'IR_VJ_2_junction', 'IR_VDJ_1_junction', 'IR_VDJ_2_junction', 'IR_VJ_1_junction_aa', 'IR_VJ_2_junction_aa', 'IR_VDJ_1_junction_aa', 'IR_VDJ_2_junction_aa', 'IR_VJ_1_junction_aa_length', 'IR_VJ_2_junction_aa_length', 'IR_VDJ_1_junction_aa_length', 'IR_VDJ_2_junction_aa_length', 'IR_VJ_1_junction_length', 'IR_VJ_2_junction_length', 'IR_VDJ_1_junction_length', 'IR_VDJ_2_junction_length', 'IR_VJ_1_locus', 'IR_VJ_2_locus', 'IR_VDJ_1_locus', 'IR_VDJ_2_locus', 'IR_VJ_1_productive', 'IR_VJ_2_productive', 'IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_rev_comp', 'IR_VJ_2_rev_comp', 'IR_VDJ_1_rev_comp', 'IR_VDJ_2_rev_comp', 'IR_VJ_1_sequence', 'IR_VJ_2_sequence', 'IR_VDJ_1_sequence', 'IR_VDJ_2_sequence', 'IR_VJ_1_sequence_aa', 'IR_VJ_2_sequence_aa', 'IR_VDJ_1_sequence_aa', 'IR_VDJ_2_sequence_aa', 'IR_VJ_1_sequence_alignment', 'IR_VJ_2_sequence_alignment', 'IR_VDJ_1_sequence_alignment', 'IR_VDJ_2_sequence_alignment', 'IR_VJ_1_sequence_id', 'IR_VJ_2_sequence_id', 'IR_VDJ_1_sequence_id', 'IR_VDJ_2_sequence_id', 'IR_VJ_1_v_call', 'IR_VJ_2_v_call', 'IR_VDJ_1_v_call', 'IR_VDJ_2_v_call', 'IR_VJ_1_v_cigar', 'IR_VJ_2_v_cigar', 'IR_VDJ_1_v_cigar', 'IR_VDJ_2_v_cigar', 'IR_VJ_1_v_sequence_end', 'IR_VJ_2_v_sequence_end', 'IR_VDJ_1_v_sequence_end', 'IR_VDJ_2_v_sequence_end', 'IR_VJ_1_v_sequence_start', 'IR_VJ_2_v_sequence_start', 'IR_VDJ_1_v_sequence_start', 'IR_VDJ_2_v_sequence_start', 'has_ir'
The clone_id
is mapped to IR_VJ_1_clone_id
column.
transfer = True
will perform dandelion’s tl.transfer
.
[6]:
irdatax = ddl.to_scirpy(vdj, transfer = True)
irdatax
... storing 'extra_chains' as categorical
... storing 'IR_VJ_1_c_cigar' as categorical
... storing 'IR_VJ_2_c_cigar' as categorical
... storing 'IR_VDJ_1_c_cigar' as categorical
... storing 'IR_VDJ_2_c_cigar' as categorical
... storing 'IR_VJ_1_clone_id' as categorical
... storing 'IR_VJ_2_clone_id' as categorical
... storing 'IR_VDJ_1_clone_id' as categorical
... storing 'IR_VDJ_2_clone_id' as categorical
... storing 'IR_VJ_1_d_cigar' as categorical
... storing 'IR_VJ_2_d_cigar' as categorical
... storing 'IR_VDJ_1_d_cigar' as categorical
... storing 'IR_VDJ_2_d_cigar' as categorical
... storing 'IR_VJ_1_d_sequence_end' as categorical
... storing 'IR_VJ_2_d_sequence_end' as categorical
... storing 'IR_VJ_1_d_sequence_start' as categorical
... storing 'IR_VJ_2_d_sequence_start' as categorical
... storing 'IR_VJ_1_germline_alignment' as categorical
... storing 'IR_VJ_2_germline_alignment' as categorical
... storing 'IR_VDJ_1_germline_alignment' as categorical
... storing 'IR_VDJ_2_germline_alignment' as categorical
... storing 'IR_VJ_1_is_cell' as categorical
... storing 'IR_VJ_2_is_cell' as categorical
... storing 'IR_VDJ_1_is_cell' as categorical
... storing 'IR_VDJ_2_is_cell' as categorical
... storing 'IR_VJ_1_j_cigar' as categorical
... storing 'IR_VJ_2_j_cigar' as categorical
... storing 'IR_VDJ_1_j_cigar' as categorical
... storing 'IR_VDJ_2_j_cigar' as categorical
... storing 'IR_VJ_1_junction' as categorical
... storing 'IR_VJ_2_junction' as categorical
... storing 'IR_VDJ_1_junction' as categorical
... storing 'IR_VDJ_2_junction' as categorical
... storing 'IR_VJ_1_junction_aa' as categorical
... storing 'IR_VJ_2_junction_aa' as categorical
... storing 'IR_VDJ_1_junction_aa' as categorical
... storing 'IR_VDJ_2_junction_aa' as categorical
... storing 'IR_VJ_1_productive' as categorical
... storing 'IR_VJ_2_productive' as categorical
... storing 'IR_VDJ_1_productive' as categorical
... storing 'IR_VDJ_2_productive' as categorical
... storing 'IR_VJ_1_rev_comp' as categorical
... storing 'IR_VJ_2_rev_comp' as categorical
... storing 'IR_VDJ_1_rev_comp' as categorical
... storing 'IR_VDJ_2_rev_comp' as categorical
... storing 'IR_VJ_1_sequence' as categorical
... storing 'IR_VJ_2_sequence' as categorical
... storing 'IR_VDJ_1_sequence' as categorical
... storing 'IR_VDJ_2_sequence' as categorical
... storing 'IR_VJ_1_sequence_aa' as categorical
... storing 'IR_VJ_2_sequence_aa' as categorical
... storing 'IR_VDJ_1_sequence_aa' as categorical
... storing 'IR_VDJ_2_sequence_aa' as categorical
... storing 'IR_VJ_1_sequence_alignment' as categorical
... storing 'IR_VJ_2_sequence_alignment' as categorical
... storing 'IR_VDJ_1_sequence_alignment' as categorical
... storing 'IR_VDJ_2_sequence_alignment' as categorical
... storing 'IR_VJ_1_sequence_id' as categorical
... storing 'IR_VJ_2_sequence_id' as categorical
... storing 'IR_VDJ_1_sequence_id' as categorical
... storing 'IR_VDJ_2_sequence_id' as categorical
... storing 'IR_VJ_1_v_cigar' as categorical
... storing 'IR_VJ_2_v_cigar' as categorical
... storing 'IR_VDJ_1_v_cigar' as categorical
... storing 'IR_VDJ_2_v_cigar' as categorical
[6]:
AnnData object with n_obs × n_vars = 994 × 0
obs: 'multi_chain', 'extra_chains', 'IR_VJ_1_c_call', 'IR_VJ_2_c_call', 'IR_VDJ_1_c_call', 'IR_VDJ_2_c_call', 'IR_VJ_1_c_cigar', 'IR_VJ_2_c_cigar', 'IR_VDJ_1_c_cigar', 'IR_VDJ_2_c_cigar', 'IR_VJ_1_c_sequence_end', 'IR_VJ_2_c_sequence_end', 'IR_VDJ_1_c_sequence_end', 'IR_VDJ_2_c_sequence_end', 'IR_VJ_1_c_sequence_start', 'IR_VJ_2_c_sequence_start', 'IR_VDJ_1_c_sequence_start', 'IR_VDJ_2_c_sequence_start', 'IR_VJ_1_clone_id', 'IR_VJ_2_clone_id', 'IR_VDJ_1_clone_id', 'IR_VDJ_2_clone_id', 'IR_VJ_1_consensus_count', 'IR_VJ_2_consensus_count', 'IR_VDJ_1_consensus_count', 'IR_VDJ_2_consensus_count', 'IR_VJ_1_d_call', 'IR_VJ_2_d_call', 'IR_VDJ_1_d_call', 'IR_VDJ_2_d_call', 'IR_VJ_1_d_cigar', 'IR_VJ_2_d_cigar', 'IR_VDJ_1_d_cigar', 'IR_VDJ_2_d_cigar', 'IR_VJ_1_d_sequence_end', 'IR_VJ_2_d_sequence_end', 'IR_VDJ_1_d_sequence_end', 'IR_VDJ_2_d_sequence_end', 'IR_VJ_1_d_sequence_start', 'IR_VJ_2_d_sequence_start', 'IR_VDJ_1_d_sequence_start', 'IR_VDJ_2_d_sequence_start', 'IR_VJ_1_duplicate_count', 'IR_VJ_2_duplicate_count', 'IR_VDJ_1_duplicate_count', 'IR_VDJ_2_duplicate_count', 'IR_VJ_1_germline_alignment', 'IR_VJ_2_germline_alignment', 'IR_VDJ_1_germline_alignment', 'IR_VDJ_2_germline_alignment', 'IR_VJ_1_is_cell', 'IR_VJ_2_is_cell', 'IR_VDJ_1_is_cell', 'IR_VDJ_2_is_cell', 'IR_VJ_1_j_call', 'IR_VJ_2_j_call', 'IR_VDJ_1_j_call', 'IR_VDJ_2_j_call', 'IR_VJ_1_j_cigar', 'IR_VJ_2_j_cigar', 'IR_VDJ_1_j_cigar', 'IR_VDJ_2_j_cigar', 'IR_VJ_1_j_sequence_end', 'IR_VJ_2_j_sequence_end', 'IR_VDJ_1_j_sequence_end', 'IR_VDJ_2_j_sequence_end', 'IR_VJ_1_j_sequence_start', 'IR_VJ_2_j_sequence_start', 'IR_VDJ_1_j_sequence_start', 'IR_VDJ_2_j_sequence_start', 'IR_VJ_1_junction', 'IR_VJ_2_junction', 'IR_VDJ_1_junction', 'IR_VDJ_2_junction', 'IR_VJ_1_junction_aa', 'IR_VJ_2_junction_aa', 'IR_VDJ_1_junction_aa', 'IR_VDJ_2_junction_aa', 'IR_VJ_1_junction_aa_length', 'IR_VJ_2_junction_aa_length', 'IR_VDJ_1_junction_aa_length', 'IR_VDJ_2_junction_aa_length', 'IR_VJ_1_junction_length', 'IR_VJ_2_junction_length', 'IR_VDJ_1_junction_length', 'IR_VDJ_2_junction_length', 'IR_VJ_1_locus', 'IR_VJ_2_locus', 'IR_VDJ_1_locus', 'IR_VDJ_2_locus', 'IR_VJ_1_productive', 'IR_VJ_2_productive', 'IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_rev_comp', 'IR_VJ_2_rev_comp', 'IR_VDJ_1_rev_comp', 'IR_VDJ_2_rev_comp', 'IR_VJ_1_sequence', 'IR_VJ_2_sequence', 'IR_VDJ_1_sequence', 'IR_VDJ_2_sequence', 'IR_VJ_1_sequence_aa', 'IR_VJ_2_sequence_aa', 'IR_VDJ_1_sequence_aa', 'IR_VDJ_2_sequence_aa', 'IR_VJ_1_sequence_alignment', 'IR_VJ_2_sequence_alignment', 'IR_VDJ_1_sequence_alignment', 'IR_VDJ_2_sequence_alignment', 'IR_VJ_1_sequence_id', 'IR_VJ_2_sequence_id', 'IR_VDJ_1_sequence_id', 'IR_VDJ_2_sequence_id', 'IR_VJ_1_v_call', 'IR_VJ_2_v_call', 'IR_VDJ_1_v_call', 'IR_VDJ_2_v_call', 'IR_VJ_1_v_cigar', 'IR_VJ_2_v_cigar', 'IR_VDJ_1_v_cigar', 'IR_VDJ_2_v_cigar', 'IR_VJ_1_v_sequence_end', 'IR_VJ_2_v_sequence_end', 'IR_VDJ_1_v_sequence_end', 'IR_VDJ_2_v_sequence_end', 'IR_VJ_1_v_sequence_start', 'IR_VJ_2_v_sequence_start', 'IR_VDJ_1_v_sequence_start', 'IR_VDJ_2_v_sequence_start', 'has_ir', 'clone_id', 'clone_id_by_size', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'duplicate_count_heavy_0', 'duplicate_count_heavy_1', 'duplicate_count_light_0', 'duplicate_count_light_1', 'junction_aa_heavy', 'junction_aa_light', 'status', 'status_summary', 'productive', 'productive_summary', 'isotype', 'isotype_summary', 'vdj_status', 'vdj_status_summary', 'heavychain_status_summary'
dandelion
: Converting scirpy
to dandelion
¶
[7]:
vdjx = ddl.from_scirpy(irdata)
vdjx
[7]:
Dandelion class object with n_obs = 978 and n_contigs = 2093
data: 'c_call', 'c_cigar', 'c_sequence_end', 'c_sequence_start', 'clone_id', 'consensus_count', 'd_call', 'd_cigar', 'd_sequence_end', 'd_sequence_start', 'germline_alignment', 'is_cell', 'j_call', 'j_cigar', 'j_sequence_end', 'j_sequence_start', 'junction', 'junction_aa', 'junction_aa_length', 'junction_length', 'locus', 'productive', 'rev_comp', 'sequence', 'sequence_aa', 'sequence_alignment', 'sequence_id', 'v_call', 'v_cigar', 'v_sequence_end', 'v_sequence_start', 'cell_id', 'umi_count'
metadata: 'clone_id', 'clone_id_by_size', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_heavy_1', 'umi_count_light_0', 'umi_count_light_1', 'junction_aa_heavy', 'junction_aa_light', 'status', 'status_summary', 'productive', 'productive_summary', 'isotype', 'isotype_summary', 'vdj_status', 'vdj_status_summary', 'heavychain_status_summary'
distance: None
edges: None
layout: None
graph: None
[8]:
vdjx.metadata
[8]:
clone_id | clone_id_by_size | locus_heavy | locus_light | productive_heavy | productive_light | v_call_heavy | v_call_light | j_call_heavy | j_call_light | ... | junction_aa_light | status | status_summary | productive | productive_summary | isotype | isotype_summary | vdj_status | vdj_status_summary | heavychain_status_summary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AAACCTGTCCGTTGTC-1 | 148_3_1_266 | 1066 | IGH | IGK | T | T | IGHV1-69D | IGKV1-8 | IGHJ3 | IGKJ1 | ... | CQQYYSYPRTF | IGH + IGK | IGH + IGK | T + T | T + T | IgM | IgM | Single + Single | Single | Single |
AAACCTGTCGAGAACG-1 | 92_4_1_47 | 1065 | IGH | IGL | T | T | IGHV1-2 | IGLV5-45 | IGHJ3 | IGLJ3 | ... | CMIWHSSAWVV | IGH + IGL | IGH + IGL | T + T | T + T | IgM | IgM | Single + Single | Single | Single |
AAACCTGTCTTGAGAC-1 | 149_1_2_419 | 166 | IGH | IGK | T | T | IGHV5-51 | IGKV1D-8 | IGHJ3 | IGKJ2 | ... | CQQYYSFPYTF | IGH + IGK | IGH + IGK | T + T | T + T | IgM | IgM | Single + Single | Single | Single |
AAACGGGAGCGACGTA-1 | 82_1_2_2 | 600 | IGH | IGL | T | T | IGHV4-59 | IGLV3-19 | IGHJ3 | IGLJ2 | ... | CNSRDSSGNHVVF | IGH + IGL | IGH + IGL | T + T | T + T | IgM | IgM | Single + Single | Single | Single |
AAACGGGCACTGTTAG-1 | 70_1_1_92 | 1075 | IGH | IGL | T | T | IGHV4-39 | IGLV3-21 | IGHJ3 | IGLJ2 | ... | CQVWDSSSDHVVF | IGH + IGL | IGH + IGL | T + T | T + T | IgM | IgM | Single + Single | Single | Single |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
TTTGCGCTCTAACGGT-1 | 138_3_1_217 | 156 | IGH | IGL | T | T | IGHV3-43 | IGLV2-8 | IGHJ6 | IGLJ3 | ... | CGSFAGSNIWVF | IGH + IGL | IGH + IGL | T + T | T + T | IgM | IgM | Single + Single | Single | Single |
TTTGGTTGTAGCCTAT-1 | 85_1_1_240 | 506 | IGH | IGK | T | T | IGHV4-39 | IGKV6-21 | IGHJ2 | IGKJ4 | ... | CHQSSSLPLTF | IGH + IGK | IGH + IGK | T + T | T + T | IgM | IgM | Single + Single | Single | Single |
TTTGGTTTCAGAGCTT-1 | 117_5_2_232 | 563 | IGH | IGK | T | T | IGHV7-4-1 | IGKV3-11 | IGHJ4 | IGKJ5 | ... | CQQRSNWLTF | IGH + IGK | IGH + IGK | T + T | T + T | IgM | IgM | Single + Single | Single | Single |
TTTGGTTTCAGTGTTG-1 | 12_1_1_329 | 820 | IGH | IGL | T | T | IGHV2-5 | IGLV2-23 | IGHJ4 | IGLJ2 | ... | CCSYAGSSTFEVF | IGH + IGL | IGH + IGL | T + T | T + T | IgM | IgM | Single + Single | Single | Single |
TTTGGTTTCGGTGTCG-1 | 140_1_1_82 | 210 | IGH | IGK | T | T | IGHV3-21 | IGKV3-11 | IGHJ2 | IGKJ4 | ... | CQQRSNWPRLTF | IGH + IGK | IGH + IGK | T + T | T + T | IgM | IgM | Single + Single | Single | Single |
978 rows × 27 columns
scirpy
¶
scirpy
: Converting dandelion
to scirpy
¶
[9]:
irdata2 = ir.io.from_dandelion(vdj)
irdata2
/Users/kt16/miniconda3/envs/dandelion/lib/python3.7/site-packages/anndata/_core/anndata.py:1192: FutureWarning: is_categorical is deprecated and will be removed in a future version. Use is_categorical_dtype instead
... storing 'extra_chains' as categorical
... storing 'IR_VJ_1_c_cigar' as categorical
... storing 'IR_VJ_2_c_cigar' as categorical
... storing 'IR_VDJ_1_c_cigar' as categorical
... storing 'IR_VDJ_2_c_cigar' as categorical
... storing 'IR_VJ_1_clone_id' as categorical
... storing 'IR_VJ_2_clone_id' as categorical
... storing 'IR_VDJ_1_clone_id' as categorical
... storing 'IR_VDJ_2_clone_id' as categorical
... storing 'IR_VJ_1_d_cigar' as categorical
... storing 'IR_VJ_2_d_cigar' as categorical
... storing 'IR_VDJ_1_d_cigar' as categorical
... storing 'IR_VDJ_2_d_cigar' as categorical
... storing 'IR_VJ_1_d_sequence_end' as categorical
... storing 'IR_VJ_2_d_sequence_end' as categorical
... storing 'IR_VJ_1_d_sequence_start' as categorical
... storing 'IR_VJ_2_d_sequence_start' as categorical
... storing 'IR_VJ_1_germline_alignment' as categorical
... storing 'IR_VJ_2_germline_alignment' as categorical
... storing 'IR_VDJ_1_germline_alignment' as categorical
... storing 'IR_VDJ_2_germline_alignment' as categorical
... storing 'IR_VJ_1_is_cell' as categorical
... storing 'IR_VJ_2_is_cell' as categorical
... storing 'IR_VDJ_1_is_cell' as categorical
... storing 'IR_VDJ_2_is_cell' as categorical
... storing 'IR_VJ_1_j_cigar' as categorical
... storing 'IR_VJ_2_j_cigar' as categorical
... storing 'IR_VDJ_1_j_cigar' as categorical
... storing 'IR_VDJ_2_j_cigar' as categorical
... storing 'IR_VJ_1_junction' as categorical
... storing 'IR_VJ_2_junction' as categorical
... storing 'IR_VDJ_1_junction' as categorical
... storing 'IR_VDJ_2_junction' as categorical
... storing 'IR_VJ_1_junction_aa' as categorical
... storing 'IR_VJ_2_junction_aa' as categorical
... storing 'IR_VDJ_1_junction_aa' as categorical
... storing 'IR_VDJ_2_junction_aa' as categorical
... storing 'IR_VJ_1_productive' as categorical
... storing 'IR_VJ_2_productive' as categorical
... storing 'IR_VDJ_1_productive' as categorical
... storing 'IR_VDJ_2_productive' as categorical
... storing 'IR_VJ_1_rev_comp' as categorical
... storing 'IR_VJ_2_rev_comp' as categorical
... storing 'IR_VDJ_1_rev_comp' as categorical
... storing 'IR_VDJ_2_rev_comp' as categorical
... storing 'IR_VJ_1_sequence' as categorical
... storing 'IR_VJ_2_sequence' as categorical
... storing 'IR_VDJ_1_sequence' as categorical
... storing 'IR_VDJ_2_sequence' as categorical
... storing 'IR_VJ_1_sequence_aa' as categorical
... storing 'IR_VJ_2_sequence_aa' as categorical
... storing 'IR_VDJ_1_sequence_aa' as categorical
... storing 'IR_VDJ_2_sequence_aa' as categorical
... storing 'IR_VJ_1_sequence_alignment' as categorical
... storing 'IR_VJ_2_sequence_alignment' as categorical
... storing 'IR_VDJ_1_sequence_alignment' as categorical
... storing 'IR_VDJ_2_sequence_alignment' as categorical
... storing 'IR_VJ_1_sequence_id' as categorical
... storing 'IR_VJ_2_sequence_id' as categorical
... storing 'IR_VDJ_1_sequence_id' as categorical
... storing 'IR_VDJ_2_sequence_id' as categorical
... storing 'IR_VJ_1_v_cigar' as categorical
... storing 'IR_VJ_2_v_cigar' as categorical
... storing 'IR_VDJ_1_v_cigar' as categorical
... storing 'IR_VDJ_2_v_cigar' as categorical
[9]:
AnnData object with n_obs × n_vars = 994 × 0
obs: 'multi_chain', 'extra_chains', 'IR_VJ_1_c_call', 'IR_VJ_2_c_call', 'IR_VDJ_1_c_call', 'IR_VDJ_2_c_call', 'IR_VJ_1_c_cigar', 'IR_VJ_2_c_cigar', 'IR_VDJ_1_c_cigar', 'IR_VDJ_2_c_cigar', 'IR_VJ_1_c_sequence_end', 'IR_VJ_2_c_sequence_end', 'IR_VDJ_1_c_sequence_end', 'IR_VDJ_2_c_sequence_end', 'IR_VJ_1_c_sequence_start', 'IR_VJ_2_c_sequence_start', 'IR_VDJ_1_c_sequence_start', 'IR_VDJ_2_c_sequence_start', 'IR_VJ_1_clone_id', 'IR_VJ_2_clone_id', 'IR_VDJ_1_clone_id', 'IR_VDJ_2_clone_id', 'IR_VJ_1_consensus_count', 'IR_VJ_2_consensus_count', 'IR_VDJ_1_consensus_count', 'IR_VDJ_2_consensus_count', 'IR_VJ_1_d_call', 'IR_VJ_2_d_call', 'IR_VDJ_1_d_call', 'IR_VDJ_2_d_call', 'IR_VJ_1_d_cigar', 'IR_VJ_2_d_cigar', 'IR_VDJ_1_d_cigar', 'IR_VDJ_2_d_cigar', 'IR_VJ_1_d_sequence_end', 'IR_VJ_2_d_sequence_end', 'IR_VDJ_1_d_sequence_end', 'IR_VDJ_2_d_sequence_end', 'IR_VJ_1_d_sequence_start', 'IR_VJ_2_d_sequence_start', 'IR_VDJ_1_d_sequence_start', 'IR_VDJ_2_d_sequence_start', 'IR_VJ_1_duplicate_count', 'IR_VJ_2_duplicate_count', 'IR_VDJ_1_duplicate_count', 'IR_VDJ_2_duplicate_count', 'IR_VJ_1_germline_alignment', 'IR_VJ_2_germline_alignment', 'IR_VDJ_1_germline_alignment', 'IR_VDJ_2_germline_alignment', 'IR_VJ_1_is_cell', 'IR_VJ_2_is_cell', 'IR_VDJ_1_is_cell', 'IR_VDJ_2_is_cell', 'IR_VJ_1_j_call', 'IR_VJ_2_j_call', 'IR_VDJ_1_j_call', 'IR_VDJ_2_j_call', 'IR_VJ_1_j_cigar', 'IR_VJ_2_j_cigar', 'IR_VDJ_1_j_cigar', 'IR_VDJ_2_j_cigar', 'IR_VJ_1_j_sequence_end', 'IR_VJ_2_j_sequence_end', 'IR_VDJ_1_j_sequence_end', 'IR_VDJ_2_j_sequence_end', 'IR_VJ_1_j_sequence_start', 'IR_VJ_2_j_sequence_start', 'IR_VDJ_1_j_sequence_start', 'IR_VDJ_2_j_sequence_start', 'IR_VJ_1_junction', 'IR_VJ_2_junction', 'IR_VDJ_1_junction', 'IR_VDJ_2_junction', 'IR_VJ_1_junction_aa', 'IR_VJ_2_junction_aa', 'IR_VDJ_1_junction_aa', 'IR_VDJ_2_junction_aa', 'IR_VJ_1_junction_aa_length', 'IR_VJ_2_junction_aa_length', 'IR_VDJ_1_junction_aa_length', 'IR_VDJ_2_junction_aa_length', 'IR_VJ_1_junction_length', 'IR_VJ_2_junction_length', 'IR_VDJ_1_junction_length', 'IR_VDJ_2_junction_length', 'IR_VJ_1_locus', 'IR_VJ_2_locus', 'IR_VDJ_1_locus', 'IR_VDJ_2_locus', 'IR_VJ_1_productive', 'IR_VJ_2_productive', 'IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_rev_comp', 'IR_VJ_2_rev_comp', 'IR_VDJ_1_rev_comp', 'IR_VDJ_2_rev_comp', 'IR_VJ_1_sequence', 'IR_VJ_2_sequence', 'IR_VDJ_1_sequence', 'IR_VDJ_2_sequence', 'IR_VJ_1_sequence_aa', 'IR_VJ_2_sequence_aa', 'IR_VDJ_1_sequence_aa', 'IR_VDJ_2_sequence_aa', 'IR_VJ_1_sequence_alignment', 'IR_VJ_2_sequence_alignment', 'IR_VDJ_1_sequence_alignment', 'IR_VDJ_2_sequence_alignment', 'IR_VJ_1_sequence_id', 'IR_VJ_2_sequence_id', 'IR_VDJ_1_sequence_id', 'IR_VDJ_2_sequence_id', 'IR_VJ_1_v_call', 'IR_VJ_2_v_call', 'IR_VDJ_1_v_call', 'IR_VDJ_2_v_call', 'IR_VJ_1_v_cigar', 'IR_VJ_2_v_cigar', 'IR_VDJ_1_v_cigar', 'IR_VDJ_2_v_cigar', 'IR_VJ_1_v_sequence_end', 'IR_VJ_2_v_sequence_end', 'IR_VDJ_1_v_sequence_end', 'IR_VDJ_2_v_sequence_end', 'IR_VJ_1_v_sequence_start', 'IR_VJ_2_v_sequence_start', 'IR_VDJ_1_v_sequence_start', 'IR_VDJ_2_v_sequence_start', 'has_ir'
likewise, transfer = True
will perform dandelion’s tl.transfer
.
[10]:
irdata2x = ir.io.from_dandelion(vdj, transfer = True)
irdata2x
... storing 'extra_chains' as categorical
... storing 'IR_VJ_1_c_cigar' as categorical
... storing 'IR_VJ_2_c_cigar' as categorical
... storing 'IR_VDJ_1_c_cigar' as categorical
... storing 'IR_VDJ_2_c_cigar' as categorical
... storing 'IR_VJ_1_clone_id' as categorical
... storing 'IR_VJ_2_clone_id' as categorical
... storing 'IR_VDJ_1_clone_id' as categorical
... storing 'IR_VDJ_2_clone_id' as categorical
... storing 'IR_VJ_1_d_cigar' as categorical
... storing 'IR_VJ_2_d_cigar' as categorical
... storing 'IR_VDJ_1_d_cigar' as categorical
... storing 'IR_VDJ_2_d_cigar' as categorical
... storing 'IR_VJ_1_d_sequence_end' as categorical
... storing 'IR_VJ_2_d_sequence_end' as categorical
... storing 'IR_VJ_1_d_sequence_start' as categorical
... storing 'IR_VJ_2_d_sequence_start' as categorical
... storing 'IR_VJ_1_germline_alignment' as categorical
... storing 'IR_VJ_2_germline_alignment' as categorical
... storing 'IR_VDJ_1_germline_alignment' as categorical
... storing 'IR_VDJ_2_germline_alignment' as categorical
... storing 'IR_VJ_1_is_cell' as categorical
... storing 'IR_VJ_2_is_cell' as categorical
... storing 'IR_VDJ_1_is_cell' as categorical
... storing 'IR_VDJ_2_is_cell' as categorical
... storing 'IR_VJ_1_j_cigar' as categorical
... storing 'IR_VJ_2_j_cigar' as categorical
... storing 'IR_VDJ_1_j_cigar' as categorical
... storing 'IR_VDJ_2_j_cigar' as categorical
... storing 'IR_VJ_1_junction' as categorical
... storing 'IR_VJ_2_junction' as categorical
... storing 'IR_VDJ_1_junction' as categorical
... storing 'IR_VDJ_2_junction' as categorical
... storing 'IR_VJ_1_junction_aa' as categorical
... storing 'IR_VJ_2_junction_aa' as categorical
... storing 'IR_VDJ_1_junction_aa' as categorical
... storing 'IR_VDJ_2_junction_aa' as categorical
... storing 'IR_VJ_1_productive' as categorical
... storing 'IR_VJ_2_productive' as categorical
... storing 'IR_VDJ_1_productive' as categorical
... storing 'IR_VDJ_2_productive' as categorical
... storing 'IR_VJ_1_rev_comp' as categorical
... storing 'IR_VJ_2_rev_comp' as categorical
... storing 'IR_VDJ_1_rev_comp' as categorical
... storing 'IR_VDJ_2_rev_comp' as categorical
... storing 'IR_VJ_1_sequence' as categorical
... storing 'IR_VJ_2_sequence' as categorical
... storing 'IR_VDJ_1_sequence' as categorical
... storing 'IR_VDJ_2_sequence' as categorical
... storing 'IR_VJ_1_sequence_aa' as categorical
... storing 'IR_VJ_2_sequence_aa' as categorical
... storing 'IR_VDJ_1_sequence_aa' as categorical
... storing 'IR_VDJ_2_sequence_aa' as categorical
... storing 'IR_VJ_1_sequence_alignment' as categorical
... storing 'IR_VJ_2_sequence_alignment' as categorical
... storing 'IR_VDJ_1_sequence_alignment' as categorical
... storing 'IR_VDJ_2_sequence_alignment' as categorical
... storing 'IR_VJ_1_sequence_id' as categorical
... storing 'IR_VJ_2_sequence_id' as categorical
... storing 'IR_VDJ_1_sequence_id' as categorical
... storing 'IR_VDJ_2_sequence_id' as categorical
... storing 'IR_VJ_1_v_cigar' as categorical
... storing 'IR_VJ_2_v_cigar' as categorical
... storing 'IR_VDJ_1_v_cigar' as categorical
... storing 'IR_VDJ_2_v_cigar' as categorical
[10]:
AnnData object with n_obs × n_vars = 994 × 0
obs: 'multi_chain', 'extra_chains', 'IR_VJ_1_c_call', 'IR_VJ_2_c_call', 'IR_VDJ_1_c_call', 'IR_VDJ_2_c_call', 'IR_VJ_1_c_cigar', 'IR_VJ_2_c_cigar', 'IR_VDJ_1_c_cigar', 'IR_VDJ_2_c_cigar', 'IR_VJ_1_c_sequence_end', 'IR_VJ_2_c_sequence_end', 'IR_VDJ_1_c_sequence_end', 'IR_VDJ_2_c_sequence_end', 'IR_VJ_1_c_sequence_start', 'IR_VJ_2_c_sequence_start', 'IR_VDJ_1_c_sequence_start', 'IR_VDJ_2_c_sequence_start', 'IR_VJ_1_clone_id', 'IR_VJ_2_clone_id', 'IR_VDJ_1_clone_id', 'IR_VDJ_2_clone_id', 'IR_VJ_1_consensus_count', 'IR_VJ_2_consensus_count', 'IR_VDJ_1_consensus_count', 'IR_VDJ_2_consensus_count', 'IR_VJ_1_d_call', 'IR_VJ_2_d_call', 'IR_VDJ_1_d_call', 'IR_VDJ_2_d_call', 'IR_VJ_1_d_cigar', 'IR_VJ_2_d_cigar', 'IR_VDJ_1_d_cigar', 'IR_VDJ_2_d_cigar', 'IR_VJ_1_d_sequence_end', 'IR_VJ_2_d_sequence_end', 'IR_VDJ_1_d_sequence_end', 'IR_VDJ_2_d_sequence_end', 'IR_VJ_1_d_sequence_start', 'IR_VJ_2_d_sequence_start', 'IR_VDJ_1_d_sequence_start', 'IR_VDJ_2_d_sequence_start', 'IR_VJ_1_duplicate_count', 'IR_VJ_2_duplicate_count', 'IR_VDJ_1_duplicate_count', 'IR_VDJ_2_duplicate_count', 'IR_VJ_1_germline_alignment', 'IR_VJ_2_germline_alignment', 'IR_VDJ_1_germline_alignment', 'IR_VDJ_2_germline_alignment', 'IR_VJ_1_is_cell', 'IR_VJ_2_is_cell', 'IR_VDJ_1_is_cell', 'IR_VDJ_2_is_cell', 'IR_VJ_1_j_call', 'IR_VJ_2_j_call', 'IR_VDJ_1_j_call', 'IR_VDJ_2_j_call', 'IR_VJ_1_j_cigar', 'IR_VJ_2_j_cigar', 'IR_VDJ_1_j_cigar', 'IR_VDJ_2_j_cigar', 'IR_VJ_1_j_sequence_end', 'IR_VJ_2_j_sequence_end', 'IR_VDJ_1_j_sequence_end', 'IR_VDJ_2_j_sequence_end', 'IR_VJ_1_j_sequence_start', 'IR_VJ_2_j_sequence_start', 'IR_VDJ_1_j_sequence_start', 'IR_VDJ_2_j_sequence_start', 'IR_VJ_1_junction', 'IR_VJ_2_junction', 'IR_VDJ_1_junction', 'IR_VDJ_2_junction', 'IR_VJ_1_junction_aa', 'IR_VJ_2_junction_aa', 'IR_VDJ_1_junction_aa', 'IR_VDJ_2_junction_aa', 'IR_VJ_1_junction_aa_length', 'IR_VJ_2_junction_aa_length', 'IR_VDJ_1_junction_aa_length', 'IR_VDJ_2_junction_aa_length', 'IR_VJ_1_junction_length', 'IR_VJ_2_junction_length', 'IR_VDJ_1_junction_length', 'IR_VDJ_2_junction_length', 'IR_VJ_1_locus', 'IR_VJ_2_locus', 'IR_VDJ_1_locus', 'IR_VDJ_2_locus', 'IR_VJ_1_productive', 'IR_VJ_2_productive', 'IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_rev_comp', 'IR_VJ_2_rev_comp', 'IR_VDJ_1_rev_comp', 'IR_VDJ_2_rev_comp', 'IR_VJ_1_sequence', 'IR_VJ_2_sequence', 'IR_VDJ_1_sequence', 'IR_VDJ_2_sequence', 'IR_VJ_1_sequence_aa', 'IR_VJ_2_sequence_aa', 'IR_VDJ_1_sequence_aa', 'IR_VDJ_2_sequence_aa', 'IR_VJ_1_sequence_alignment', 'IR_VJ_2_sequence_alignment', 'IR_VDJ_1_sequence_alignment', 'IR_VDJ_2_sequence_alignment', 'IR_VJ_1_sequence_id', 'IR_VJ_2_sequence_id', 'IR_VDJ_1_sequence_id', 'IR_VDJ_2_sequence_id', 'IR_VJ_1_v_call', 'IR_VJ_2_v_call', 'IR_VDJ_1_v_call', 'IR_VDJ_2_v_call', 'IR_VJ_1_v_cigar', 'IR_VJ_2_v_cigar', 'IR_VDJ_1_v_cigar', 'IR_VDJ_2_v_cigar', 'IR_VJ_1_v_sequence_end', 'IR_VJ_2_v_sequence_end', 'IR_VDJ_1_v_sequence_end', 'IR_VDJ_2_v_sequence_end', 'IR_VJ_1_v_sequence_start', 'IR_VJ_2_v_sequence_start', 'IR_VDJ_1_v_sequence_start', 'IR_VDJ_2_v_sequence_start', 'has_ir', 'clone_id', 'clone_id_by_size', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'duplicate_count_heavy_0', 'duplicate_count_heavy_1', 'duplicate_count_light_0', 'duplicate_count_light_1', 'junction_aa_heavy', 'junction_aa_light', 'status', 'status_summary', 'productive', 'productive_summary', 'isotype', 'isotype_summary', 'vdj_status', 'vdj_status_summary', 'heavychain_status_summary'
scirpy
: Converting scirpy
to dandelion
¶
[11]:
vdj3 = ir.io.to_dandelion(irdata2)
vdj3
[11]:
Dandelion class object with n_obs = 978 and n_contigs = 2093
data: 'c_call', 'c_cigar', 'c_sequence_end', 'c_sequence_start', 'clone_id', 'consensus_count', 'd_call', 'd_cigar', 'd_sequence_end', 'd_sequence_start', 'germline_alignment', 'is_cell', 'j_call', 'j_cigar', 'j_sequence_end', 'j_sequence_start', 'junction', 'junction_aa', 'junction_aa_length', 'junction_length', 'locus', 'productive', 'rev_comp', 'sequence', 'sequence_aa', 'sequence_alignment', 'sequence_id', 'v_call', 'v_cigar', 'v_sequence_end', 'v_sequence_start', 'cell_id', 'umi_count'
metadata: 'clone_id', 'clone_id_by_size', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_heavy_1', 'umi_count_light_0', 'umi_count_light_1', 'junction_aa_heavy', 'junction_aa_light', 'status', 'status_summary', 'productive', 'productive_summary', 'isotype', 'isotype_summary', 'vdj_status', 'vdj_status_summary', 'heavychain_status_summary'
distance: None
edges: None
layout: None
graph: None
scirpy
: Read from scirpy
, convert to dandelion
¶
[12]:
# read in the airr_rearrangement.tsv file
file_location = 'sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_airr_rearrangement.tsv'
irdata_s = ir.io.read_airr(file_location)
irdata_s
/Users/kt16/miniconda3/envs/dandelion/lib/python3.7/site-packages/anndata/_core/anndata.py:1192: FutureWarning: is_categorical is deprecated and will be removed in a future version. Use is_categorical_dtype instead
... storing 'extra_chains' as categorical
... storing 'IR_VJ_1_c_cigar' as categorical
... storing 'IR_VJ_2_c_cigar' as categorical
... storing 'IR_VDJ_1_c_cigar' as categorical
... storing 'IR_VDJ_2_c_cigar' as categorical
... storing 'IR_VJ_1_c_sequence_end' as categorical
... storing 'IR_VJ_2_c_sequence_end' as categorical
... storing 'IR_VDJ_1_c_sequence_end' as categorical
... storing 'IR_VDJ_2_c_sequence_end' as categorical
... storing 'IR_VJ_1_c_sequence_start' as categorical
... storing 'IR_VJ_2_c_sequence_start' as categorical
... storing 'IR_VDJ_1_c_sequence_start' as categorical
... storing 'IR_VDJ_2_c_sequence_start' as categorical
... storing 'IR_VJ_1_clone_id' as categorical
... storing 'IR_VJ_2_clone_id' as categorical
... storing 'IR_VDJ_1_clone_id' as categorical
... storing 'IR_VDJ_2_clone_id' as categorical
... storing 'IR_VJ_1_d_cigar' as categorical
... storing 'IR_VJ_2_d_cigar' as categorical
... storing 'IR_VDJ_1_d_cigar' as categorical
... storing 'IR_VDJ_2_d_cigar' as categorical
... storing 'IR_VJ_1_d_sequence_end' as categorical
... storing 'IR_VJ_2_d_sequence_end' as categorical
... storing 'IR_VJ_1_d_sequence_start' as categorical
... storing 'IR_VJ_2_d_sequence_start' as categorical
... storing 'IR_VJ_1_germline_alignment' as categorical
... storing 'IR_VJ_2_germline_alignment' as categorical
... storing 'IR_VDJ_1_germline_alignment' as categorical
... storing 'IR_VDJ_2_germline_alignment' as categorical
... storing 'IR_VJ_1_is_cell' as categorical
... storing 'IR_VJ_2_is_cell' as categorical
... storing 'IR_VDJ_1_is_cell' as categorical
... storing 'IR_VDJ_2_is_cell' as categorical
... storing 'IR_VJ_1_j_cigar' as categorical
... storing 'IR_VJ_2_j_cigar' as categorical
... storing 'IR_VDJ_1_j_cigar' as categorical
... storing 'IR_VDJ_2_j_cigar' as categorical
... storing 'IR_VJ_1_junction' as categorical
... storing 'IR_VJ_2_junction' as categorical
... storing 'IR_VDJ_1_junction' as categorical
... storing 'IR_VDJ_2_junction' as categorical
... storing 'IR_VJ_1_junction_aa' as categorical
... storing 'IR_VJ_2_junction_aa' as categorical
... storing 'IR_VDJ_1_junction_aa' as categorical
... storing 'IR_VDJ_2_junction_aa' as categorical
... storing 'IR_VJ_1_junction_aa_length' as categorical
... storing 'IR_VJ_2_junction_aa_length' as categorical
... storing 'IR_VDJ_1_junction_aa_length' as categorical
... storing 'IR_VDJ_2_junction_aa_length' as categorical
... storing 'IR_VJ_1_productive' as categorical
... storing 'IR_VJ_2_productive' as categorical
... storing 'IR_VDJ_1_productive' as categorical
... storing 'IR_VDJ_2_productive' as categorical
... storing 'IR_VJ_1_rev_comp' as categorical
... storing 'IR_VJ_2_rev_comp' as categorical
... storing 'IR_VDJ_1_rev_comp' as categorical
... storing 'IR_VDJ_2_rev_comp' as categorical
... storing 'IR_VJ_1_sequence' as categorical
... storing 'IR_VJ_2_sequence' as categorical
... storing 'IR_VDJ_1_sequence' as categorical
... storing 'IR_VDJ_2_sequence' as categorical
... storing 'IR_VJ_1_sequence_aa' as categorical
... storing 'IR_VJ_2_sequence_aa' as categorical
... storing 'IR_VDJ_1_sequence_aa' as categorical
... storing 'IR_VDJ_2_sequence_aa' as categorical
... storing 'IR_VJ_1_sequence_alignment' as categorical
... storing 'IR_VJ_2_sequence_alignment' as categorical
... storing 'IR_VDJ_1_sequence_alignment' as categorical
... storing 'IR_VDJ_2_sequence_alignment' as categorical
... storing 'IR_VJ_1_sequence_id' as categorical
... storing 'IR_VJ_2_sequence_id' as categorical
... storing 'IR_VDJ_1_sequence_id' as categorical
... storing 'IR_VDJ_2_sequence_id' as categorical
... storing 'IR_VJ_1_v_cigar' as categorical
... storing 'IR_VJ_2_v_cigar' as categorical
... storing 'IR_VDJ_1_v_cigar' as categorical
... storing 'IR_VDJ_2_v_cigar' as categorical
[12]:
AnnData object with n_obs × n_vars = 994 × 0
obs: 'multi_chain', 'extra_chains', 'IR_VJ_1_c_call', 'IR_VJ_2_c_call', 'IR_VDJ_1_c_call', 'IR_VDJ_2_c_call', 'IR_VJ_1_c_cigar', 'IR_VJ_2_c_cigar', 'IR_VDJ_1_c_cigar', 'IR_VDJ_2_c_cigar', 'IR_VJ_1_c_sequence_end', 'IR_VJ_2_c_sequence_end', 'IR_VDJ_1_c_sequence_end', 'IR_VDJ_2_c_sequence_end', 'IR_VJ_1_c_sequence_start', 'IR_VJ_2_c_sequence_start', 'IR_VDJ_1_c_sequence_start', 'IR_VDJ_2_c_sequence_start', 'IR_VJ_1_clone_id', 'IR_VJ_2_clone_id', 'IR_VDJ_1_clone_id', 'IR_VDJ_2_clone_id', 'IR_VJ_1_consensus_count', 'IR_VJ_2_consensus_count', 'IR_VDJ_1_consensus_count', 'IR_VDJ_2_consensus_count', 'IR_VJ_1_d_call', 'IR_VJ_2_d_call', 'IR_VDJ_1_d_call', 'IR_VDJ_2_d_call', 'IR_VJ_1_d_cigar', 'IR_VJ_2_d_cigar', 'IR_VDJ_1_d_cigar', 'IR_VDJ_2_d_cigar', 'IR_VJ_1_d_sequence_end', 'IR_VJ_2_d_sequence_end', 'IR_VDJ_1_d_sequence_end', 'IR_VDJ_2_d_sequence_end', 'IR_VJ_1_d_sequence_start', 'IR_VJ_2_d_sequence_start', 'IR_VDJ_1_d_sequence_start', 'IR_VDJ_2_d_sequence_start', 'IR_VJ_1_duplicate_count', 'IR_VJ_2_duplicate_count', 'IR_VDJ_1_duplicate_count', 'IR_VDJ_2_duplicate_count', 'IR_VJ_1_germline_alignment', 'IR_VJ_2_germline_alignment', 'IR_VDJ_1_germline_alignment', 'IR_VDJ_2_germline_alignment', 'IR_VJ_1_is_cell', 'IR_VJ_2_is_cell', 'IR_VDJ_1_is_cell', 'IR_VDJ_2_is_cell', 'IR_VJ_1_j_call', 'IR_VJ_2_j_call', 'IR_VDJ_1_j_call', 'IR_VDJ_2_j_call', 'IR_VJ_1_j_cigar', 'IR_VJ_2_j_cigar', 'IR_VDJ_1_j_cigar', 'IR_VDJ_2_j_cigar', 'IR_VJ_1_j_sequence_end', 'IR_VJ_2_j_sequence_end', 'IR_VDJ_1_j_sequence_end', 'IR_VDJ_2_j_sequence_end', 'IR_VJ_1_j_sequence_start', 'IR_VJ_2_j_sequence_start', 'IR_VDJ_1_j_sequence_start', 'IR_VDJ_2_j_sequence_start', 'IR_VJ_1_junction', 'IR_VJ_2_junction', 'IR_VDJ_1_junction', 'IR_VDJ_2_junction', 'IR_VJ_1_junction_aa', 'IR_VJ_2_junction_aa', 'IR_VDJ_1_junction_aa', 'IR_VDJ_2_junction_aa', 'IR_VJ_1_junction_aa_length', 'IR_VJ_2_junction_aa_length', 'IR_VDJ_1_junction_aa_length', 'IR_VDJ_2_junction_aa_length', 'IR_VJ_1_junction_length', 'IR_VJ_2_junction_length', 'IR_VDJ_1_junction_length', 'IR_VDJ_2_junction_length', 'IR_VJ_1_locus', 'IR_VJ_2_locus', 'IR_VDJ_1_locus', 'IR_VDJ_2_locus', 'IR_VJ_1_productive', 'IR_VJ_2_productive', 'IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_rev_comp', 'IR_VJ_2_rev_comp', 'IR_VDJ_1_rev_comp', 'IR_VDJ_2_rev_comp', 'IR_VJ_1_sequence', 'IR_VJ_2_sequence', 'IR_VDJ_1_sequence', 'IR_VDJ_2_sequence', 'IR_VJ_1_sequence_aa', 'IR_VJ_2_sequence_aa', 'IR_VDJ_1_sequence_aa', 'IR_VDJ_2_sequence_aa', 'IR_VJ_1_sequence_alignment', 'IR_VJ_2_sequence_alignment', 'IR_VDJ_1_sequence_alignment', 'IR_VDJ_2_sequence_alignment', 'IR_VJ_1_sequence_id', 'IR_VJ_2_sequence_id', 'IR_VDJ_1_sequence_id', 'IR_VDJ_2_sequence_id', 'IR_VJ_1_v_call', 'IR_VJ_2_v_call', 'IR_VDJ_1_v_call', 'IR_VDJ_2_v_call', 'IR_VJ_1_v_cigar', 'IR_VJ_2_v_cigar', 'IR_VDJ_1_v_cigar', 'IR_VDJ_2_v_cigar', 'IR_VJ_1_v_sequence_end', 'IR_VJ_2_v_sequence_end', 'IR_VDJ_1_v_sequence_end', 'IR_VDJ_2_v_sequence_end', 'IR_VJ_1_v_sequence_start', 'IR_VJ_2_v_sequence_start', 'IR_VDJ_1_v_sequence_start', 'IR_VDJ_2_v_sequence_start', 'has_ir'
This time, find clones with scirpy
’s method.
[13]:
ir.tl.chain_qc(irdata_s)
ir.pp.ir_dist(irdata_s, metric = 'hamming', sequence="aa")
ir.tl.define_clonotypes(irdata_s)
irdata_s
[13]:
AnnData object with n_obs × n_vars = 994 × 0
obs: 'multi_chain', 'extra_chains', 'IR_VJ_1_c_call', 'IR_VJ_2_c_call', 'IR_VDJ_1_c_call', 'IR_VDJ_2_c_call', 'IR_VJ_1_c_cigar', 'IR_VJ_2_c_cigar', 'IR_VDJ_1_c_cigar', 'IR_VDJ_2_c_cigar', 'IR_VJ_1_c_sequence_end', 'IR_VJ_2_c_sequence_end', 'IR_VDJ_1_c_sequence_end', 'IR_VDJ_2_c_sequence_end', 'IR_VJ_1_c_sequence_start', 'IR_VJ_2_c_sequence_start', 'IR_VDJ_1_c_sequence_start', 'IR_VDJ_2_c_sequence_start', 'IR_VJ_1_clone_id', 'IR_VJ_2_clone_id', 'IR_VDJ_1_clone_id', 'IR_VDJ_2_clone_id', 'IR_VJ_1_consensus_count', 'IR_VJ_2_consensus_count', 'IR_VDJ_1_consensus_count', 'IR_VDJ_2_consensus_count', 'IR_VJ_1_d_call', 'IR_VJ_2_d_call', 'IR_VDJ_1_d_call', 'IR_VDJ_2_d_call', 'IR_VJ_1_d_cigar', 'IR_VJ_2_d_cigar', 'IR_VDJ_1_d_cigar', 'IR_VDJ_2_d_cigar', 'IR_VJ_1_d_sequence_end', 'IR_VJ_2_d_sequence_end', 'IR_VDJ_1_d_sequence_end', 'IR_VDJ_2_d_sequence_end', 'IR_VJ_1_d_sequence_start', 'IR_VJ_2_d_sequence_start', 'IR_VDJ_1_d_sequence_start', 'IR_VDJ_2_d_sequence_start', 'IR_VJ_1_duplicate_count', 'IR_VJ_2_duplicate_count', 'IR_VDJ_1_duplicate_count', 'IR_VDJ_2_duplicate_count', 'IR_VJ_1_germline_alignment', 'IR_VJ_2_germline_alignment', 'IR_VDJ_1_germline_alignment', 'IR_VDJ_2_germline_alignment', 'IR_VJ_1_is_cell', 'IR_VJ_2_is_cell', 'IR_VDJ_1_is_cell', 'IR_VDJ_2_is_cell', 'IR_VJ_1_j_call', 'IR_VJ_2_j_call', 'IR_VDJ_1_j_call', 'IR_VDJ_2_j_call', 'IR_VJ_1_j_cigar', 'IR_VJ_2_j_cigar', 'IR_VDJ_1_j_cigar', 'IR_VDJ_2_j_cigar', 'IR_VJ_1_j_sequence_end', 'IR_VJ_2_j_sequence_end', 'IR_VDJ_1_j_sequence_end', 'IR_VDJ_2_j_sequence_end', 'IR_VJ_1_j_sequence_start', 'IR_VJ_2_j_sequence_start', 'IR_VDJ_1_j_sequence_start', 'IR_VDJ_2_j_sequence_start', 'IR_VJ_1_junction', 'IR_VJ_2_junction', 'IR_VDJ_1_junction', 'IR_VDJ_2_junction', 'IR_VJ_1_junction_aa', 'IR_VJ_2_junction_aa', 'IR_VDJ_1_junction_aa', 'IR_VDJ_2_junction_aa', 'IR_VJ_1_junction_aa_length', 'IR_VJ_2_junction_aa_length', 'IR_VDJ_1_junction_aa_length', 'IR_VDJ_2_junction_aa_length', 'IR_VJ_1_junction_length', 'IR_VJ_2_junction_length', 'IR_VDJ_1_junction_length', 'IR_VDJ_2_junction_length', 'IR_VJ_1_locus', 'IR_VJ_2_locus', 'IR_VDJ_1_locus', 'IR_VDJ_2_locus', 'IR_VJ_1_productive', 'IR_VJ_2_productive', 'IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_rev_comp', 'IR_VJ_2_rev_comp', 'IR_VDJ_1_rev_comp', 'IR_VDJ_2_rev_comp', 'IR_VJ_1_sequence', 'IR_VJ_2_sequence', 'IR_VDJ_1_sequence', 'IR_VDJ_2_sequence', 'IR_VJ_1_sequence_aa', 'IR_VJ_2_sequence_aa', 'IR_VDJ_1_sequence_aa', 'IR_VDJ_2_sequence_aa', 'IR_VJ_1_sequence_alignment', 'IR_VJ_2_sequence_alignment', 'IR_VDJ_1_sequence_alignment', 'IR_VDJ_2_sequence_alignment', 'IR_VJ_1_sequence_id', 'IR_VJ_2_sequence_id', 'IR_VDJ_1_sequence_id', 'IR_VDJ_2_sequence_id', 'IR_VJ_1_v_call', 'IR_VJ_2_v_call', 'IR_VDJ_1_v_call', 'IR_VDJ_2_v_call', 'IR_VJ_1_v_cigar', 'IR_VJ_2_v_cigar', 'IR_VDJ_1_v_cigar', 'IR_VDJ_2_v_cigar', 'IR_VJ_1_v_sequence_end', 'IR_VJ_2_v_sequence_end', 'IR_VDJ_1_v_sequence_end', 'IR_VDJ_2_v_sequence_end', 'IR_VJ_1_v_sequence_start', 'IR_VJ_2_v_sequence_start', 'IR_VDJ_1_v_sequence_start', 'IR_VDJ_2_v_sequence_start', 'has_ir', 'receptor_type', 'receptor_subtype', 'chain_pairing', 'clonotype', 'clonotype_size'
uns: 'ir_dist_aa_hamming', 'ir_dist_nt_identity', 'clonotype'
[14]:
vdj4 = ir.io.to_dandelion(irdata_s)
vdj4
[14]:
Dandelion class object with n_obs = 978 and n_contigs = 2093
data: 'c_call', 'c_cigar', 'c_sequence_end', 'c_sequence_start', 'clone_id', 'consensus_count', 'd_call', 'd_cigar', 'd_sequence_end', 'd_sequence_start', 'germline_alignment', 'is_cell', 'j_call', 'j_cigar', 'j_sequence_end', 'j_sequence_start', 'junction', 'junction_aa', 'junction_aa_length', 'junction_length', 'locus', 'productive', 'rev_comp', 'sequence', 'sequence_aa', 'sequence_alignment', 'sequence_id', 'v_call', 'v_cigar', 'v_sequence_end', 'v_sequence_start', 'cell_id', 'umi_count'
metadata: 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_heavy_1', 'umi_count_light_0', 'umi_count_light_1', 'junction_aa_heavy', 'junction_aa_light', 'status', 'status_summary', 'productive', 'productive_summary', 'isotype', 'isotype_summary', 'vdj_status', 'vdj_status_summary', 'heavychain_status_summary'
distance: None
edges: None
layout: None
graph: None
Note that clone_id
column is missing.
As [@grst](https://github.com/grst) has noted, ‘Currently only the chain-specific attributes get exported (i.e. all scirpy columns that start with IR_). In principle, it probably makes sense to write out the clonotype column to clone_id. But then again, scirpy allows different versions of clonal assignments…’. What this means is that unless the IR_*_clone_id columns
are populated, this will not be transferred over.
You can manually parse that over or use ddl.from_scirpy
’s conversion method which will use the clonotype
column (or clone_id
column if already present) from the scirpy initialized AnnData.obs
as the default clone_id
. clone_key
and key_added
options can be toggled to adjust this behavior.
[15]:
vdj5 = ddl.from_scirpy(irdata_s)
vdj5
[15]:
Dandelion class object with n_obs = 978 and n_contigs = 2093
data: 'c_call', 'c_cigar', 'c_sequence_end', 'c_sequence_start', 'clone_id', 'consensus_count', 'd_call', 'd_cigar', 'd_sequence_end', 'd_sequence_start', 'germline_alignment', 'is_cell', 'j_call', 'j_cigar', 'j_sequence_end', 'j_sequence_start', 'junction', 'junction_aa', 'junction_aa_length', 'junction_length', 'locus', 'productive', 'rev_comp', 'sequence', 'sequence_aa', 'sequence_alignment', 'sequence_id', 'v_call', 'v_cigar', 'v_sequence_end', 'v_sequence_start', 'cell_id', 'umi_count'
metadata: 'clone_id', 'clone_id_by_size', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_heavy_1', 'umi_count_light_0', 'umi_count_light_1', 'junction_aa_heavy', 'junction_aa_light', 'status', 'status_summary', 'productive', 'productive_summary', 'isotype', 'isotype_summary', 'vdj_status', 'vdj_status_summary', 'heavychain_status_summary'
distance: None
edges: None
layout: None
graph: None
[ ]: