Dandelion class¶

Much of the functions and utility of the dandelion
package revolves around the Dandelion
class object. The class will act as an intermediary object for storage and flexible interaction with other tools. This notebook will run through a quick primer to the Dandelion
class.
Import modules
[1]:
import os
os.chdir(os.path.expanduser('/Users/kt16/Downloads/dandelion_tutorial/'))
import dandelion as ddl
ddl.logging.print_versions()
dandelion==0.1.0 pandas==1.1.4 numpy==1.19.4 matplotlib==3.3.3 networkx==2.5 scipy==1.5.3 skbio==0.5.6
[2]:
vdj = ddl.read_h5('dandelion_results.h5')
vdj
[2]:
Dandelion class object with n_obs = 838 and n_contigs = 1700
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_freq', 'duplicate_count', 'clone_id', 'changeo_clone_id', 'clone_id_heavy_only'
metadata: 'clone_id', 'clone_id_by_size', 'sample_id', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_genotyped_heavy', 'v_call_genotyped_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_light_0', 'umi_count_light_1', 'umi_count_light_2', 'junction_aa_heavy', 'junction_aa_light', 'status', 'productive', 'isotype', 'vdj_status_detail', 'vdj_status', 'changeo_clone_id', 'd_call_heavy', 'd_call_light', 'clone_id_heavy_only'
distance: 'heavy', 'light_0', 'light_1', 'light_2'
edges: 'source', 'target', 'weight'
layout: layout for 838 vertices, layout for 24 vertices
graph: networkx graph of 838 vertices, networkx graph of 24 vertices
Basically, the object can be summarized in the following illustration:

Essentially, the .data
slot holds the AIRR contig table while the .metadata
holds a collapsed version that is compatible with combining with AnnData
’s .obs
slot. You can retrieve these slots like a typical class object; for example, if I want the metadata:
[3]:
vdj.metadata
[3]:
clone_id | clone_id_by_size | sample_id | locus_heavy | locus_light | productive_heavy | productive_light | v_call_genotyped_heavy | v_call_genotyped_light | j_call_heavy | ... | junction_aa_light | status | productive | isotype | vdj_status_detail | vdj_status | changeo_clone_id | d_call_heavy | d_call_light | clone_id_heavy_only | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC | 102_3_1 | 563 | sc5p_v2_hs_PBMC_10k | IGH | IGK | T | T | IGHV1-69 | IGKV1-8 | IGHJ3 | ... | CQQYYSYPRTF | IGH + IGK | T + T | IgM | Single + Single | Single | 110_33 | IGHD3-22 | 102_3_1 | |
sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG | 141_4_1 | 658 | sc5p_v2_hs_PBMC_10k | IGH | IGL | T | T | IGHV1-2 | IGLV5-45 | IGHJ3 | ... | CMIWHSSAWVV | IGH + IGL | T + T | IgM | Single + Single | Single | 467_34 | IGHD3-16|IGHD4-17 | 141_4_1 | |
sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC | 26_2_2 | 670 | sc5p_v2_hs_PBMC_10k | IGH | IGK | T | T | IGHV5-51 | IGKV1D-8 | IGHJ3 | ... | CQQYYSFPYTF | IGH + IGK | T + T | IgM | Single + Single | Single | 306_35 | IGHD1/OR15-1a|IGHD1/OR15-1b|IGHD1-26 | 26_2_2 | |
sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT | 66_8_3 | 527 | sc5p_v2_hs_PBMC_10k | IGH | IGL | T | T | IGHV3-15 | IGLV6-57 | IGHJ4 | ... | CQSYDSSNVVF | IGH + IGL | T + T | IgM | Single + Multi_light_j | Single | 56_36 | IGHD1-26 | 66_8_3 | |
sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT | 18_4_1 | 244 | sc5p_v2_hs_PBMC_10k | IGH | IGL | T | T | IGHV3-33 | IGLV2-14 | IGHJ6 | ... | CSSYTSSSTRVF | IGH + IGL | T + T | IgM | Single + Single | Single | 125_37 | IGHD3-10 | 18_4_1 | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG | 15_8_1 | 653 | vdj_v1_hs_pbmc3 | IGH | IGK | T | T | IGHV4-59 | IGKV1-12 | IGHJ4 | ... | CQQANSFPLTF | IGH + IGK | T + T | IgM | Single + Single | Single | 348_483 | IGHD6-19 | 15_8_1 | |
vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT | 69_8_1 | 189 | vdj_v1_hs_pbmc3 | IGH | IGK | T | T | IGHV3-21 | IGKV3-20 | IGHJ6 | ... | CQQYGSSPLFTF | IGH + IGK | T + T | IgM | Single + Single | Single | 731_484 | IGHD3-3 | 69_8_1 | |
vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG | 90_7_2 | 713 | vdj_v1_hs_pbmc3 | IGH | IGL | T | T | IGHV3-48 | IGLV2-14 | IGHJ4 | ... | CSSYTSSSTRVF | IGH + IGL | T + T | IgM | Single + Single | Single | 229_485 | IGHD3-3 | 90_7_2 | |
vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT | 172_4_2 | 372 | vdj_v1_hs_pbmc3 | IGH | IGK | T | T | IGHV4-34 | IGKV1D-39|IGKV1-39 | IGHJ3 | ... | CQQSYSTPRTF | IGH + IGK | T + T | IgM | Single + Multi_light_v | Single | 702_486 | IGHD3-22 | 172_4_2 | |
vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC | 48_4_1_1|48_4_1_2 | 699|28 | vdj_v1_hs_pbmc3 | IGH | IGL|IGL | T | T|F | IGHV4-4 | IGLV1-51|IGLV1-40 | IGHJ5 | ... | CQSYDRSLGGHYVF|CGTWDSSLSAGCA | IGH + IGL|IGL | T + T|F | IgM | Single + Multi_light_j|Multi_light_v | Multi | 155_487 | IGHD4-17|IGHD4-23 | 48_4_1 |
838 rows × 28 columns
copy¶
You can deep copy the Dandelion
object to another variable which will inherit all slots:
[4]:
vdj2 = vdj.copy()
vdj2.metadata
[4]:
clone_id | clone_id_by_size | sample_id | locus_heavy | locus_light | productive_heavy | productive_light | v_call_genotyped_heavy | v_call_genotyped_light | j_call_heavy | ... | junction_aa_light | status | productive | isotype | vdj_status_detail | vdj_status | changeo_clone_id | d_call_heavy | d_call_light | clone_id_heavy_only | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC | 102_3_1 | 563 | sc5p_v2_hs_PBMC_10k | IGH | IGK | T | T | IGHV1-69 | IGKV1-8 | IGHJ3 | ... | CQQYYSYPRTF | IGH + IGK | T + T | IgM | Single + Single | Single | 110_33 | IGHD3-22 | 102_3_1 | |
sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG | 141_4_1 | 658 | sc5p_v2_hs_PBMC_10k | IGH | IGL | T | T | IGHV1-2 | IGLV5-45 | IGHJ3 | ... | CMIWHSSAWVV | IGH + IGL | T + T | IgM | Single + Single | Single | 467_34 | IGHD3-16|IGHD4-17 | 141_4_1 | |
sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC | 26_2_2 | 670 | sc5p_v2_hs_PBMC_10k | IGH | IGK | T | T | IGHV5-51 | IGKV1D-8 | IGHJ3 | ... | CQQYYSFPYTF | IGH + IGK | T + T | IgM | Single + Single | Single | 306_35 | IGHD1/OR15-1a|IGHD1/OR15-1b|IGHD1-26 | 26_2_2 | |
sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT | 66_8_3 | 527 | sc5p_v2_hs_PBMC_10k | IGH | IGL | T | T | IGHV3-15 | IGLV6-57 | IGHJ4 | ... | CQSYDSSNVVF | IGH + IGL | T + T | IgM | Single + Multi_light_j | Single | 56_36 | IGHD1-26 | 66_8_3 | |
sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT | 18_4_1 | 244 | sc5p_v2_hs_PBMC_10k | IGH | IGL | T | T | IGHV3-33 | IGLV2-14 | IGHJ6 | ... | CSSYTSSSTRVF | IGH + IGL | T + T | IgM | Single + Single | Single | 125_37 | IGHD3-10 | 18_4_1 | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG | 15_8_1 | 653 | vdj_v1_hs_pbmc3 | IGH | IGK | T | T | IGHV4-59 | IGKV1-12 | IGHJ4 | ... | CQQANSFPLTF | IGH + IGK | T + T | IgM | Single + Single | Single | 348_483 | IGHD6-19 | 15_8_1 | |
vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT | 69_8_1 | 189 | vdj_v1_hs_pbmc3 | IGH | IGK | T | T | IGHV3-21 | IGKV3-20 | IGHJ6 | ... | CQQYGSSPLFTF | IGH + IGK | T + T | IgM | Single + Single | Single | 731_484 | IGHD3-3 | 69_8_1 | |
vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG | 90_7_2 | 713 | vdj_v1_hs_pbmc3 | IGH | IGL | T | T | IGHV3-48 | IGLV2-14 | IGHJ4 | ... | CSSYTSSSTRVF | IGH + IGL | T + T | IgM | Single + Single | Single | 229_485 | IGHD3-3 | 90_7_2 | |
vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT | 172_4_2 | 372 | vdj_v1_hs_pbmc3 | IGH | IGK | T | T | IGHV4-34 | IGKV1D-39|IGKV1-39 | IGHJ3 | ... | CQQSYSTPRTF | IGH + IGK | T + T | IgM | Single + Multi_light_v | Single | 702_486 | IGHD3-22 | 172_4_2 | |
vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC | 48_4_1_1|48_4_1_2 | 699|28 | vdj_v1_hs_pbmc3 | IGH | IGL|IGL | T | T|F | IGHV4-4 | IGLV1-51|IGLV1-40 | IGHJ5 | ... | CQSYDRSLGGHYVF|CGTWDSSLSAGCA | IGH + IGL|IGL | T + T|F | IgM | Single + Multi_light_j|Multi_light_v | Multi | 155_487 | IGHD4-17|IGHD4-23 | 48_4_1 |
838 rows × 28 columns
Retrieving entries with update_metadata
¶
The .metadata
slot in Dandelion class automatically initializes whenever the .data
slot is filled. However, it only returns a standard number of columns that are pre-specified. To retrieve other columns from the .data
slot, we can update the metadata with ddl.update_metadata
and specify the option retrieve
.
The following options determine how the retrieval is completed:
split
- splits the retrieval into heavy and light chains calls.
split_locus
- smiliar to split
but splits the retrieval to IGH/IGK/IGL
.
collapse
- Adds a |
to separate every element.
combine
- similar to collapse
but only retains unique elements (separated by a |
if multiple are found).
Example 1 : retrieving junction amino acid sequences
[5]:
ddl.update_metadata(vdj, retrieve = 'd_call')
vdj
[5]:
Dandelion class object with n_obs = 838 and n_contigs = 1700
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_freq', 'duplicate_count', 'clone_id', 'changeo_clone_id', 'clone_id_heavy_only'
metadata: 'clone_id', 'clone_id_by_size', 'sample_id', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_genotyped_heavy', 'v_call_genotyped_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_light_0', 'umi_count_light_1', 'umi_count_light_2', 'junction_aa_heavy', 'junction_aa_light', 'status', 'productive', 'isotype', 'vdj_status_detail', 'vdj_status', 'changeo_clone_id', 'd_call_heavy', 'd_call_light', 'clone_id_heavy_only'
distance: 'heavy', 'light_0', 'light_1', 'light_2'
edges: 'source', 'target', 'weight'
layout: layout for 838 vertices, layout for 24 vertices
graph: networkx graph of 838 vertices, networkx graph of 24 vertices
Note the additional d_call
heavy and light columns in the metadata slot.
By default, dandelion
will not try to merge numerical columns as it can create mixed dtype columns.
Example 2 : editing clone_id column
Perhaps you want to have a bit more control with how clones are called. We can edit this directly from the .data
slot and retrieve accordingly.
[6]:
# if we only want to keep the light chain clone assignment
clones = []
for clone in vdj.data['clone_id']:
if '|' in clone: # this is because clones were merged into the the same column if they have different pairing of BCR combinations
clone_list = clone.split('|')
clones.append('|'.join(list(set([clone_2.rsplit('_', 1)[0] if clone_2.count('_') == 3 else clone_2 for clone_2 in clone_list]))))
else:
if clone.count('_') == 3: # this means it's looking for X_X_X_X, 3 underscores
clones.append(clone.rsplit('_', 1)[0]) # split the 3rd underscore but only keep the first entry
else:
clones.append(clone)
vdj.data['clone_id_heavy_only'] = clones
ddl.update_metadata(vdj, retrieve = 'clone_id_heavy_only', split = False, collapse = True)
vdj.metadata[['clone_id', 'clone_id_heavy_only']]
[6]:
clone_id | clone_id_heavy_only | |
---|---|---|
sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC | 102_3_1 | 102_3_1 |
sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG | 141_4_1 | 141_4_1 |
sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC | 26_2_2 | 26_2_2 |
sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT | 66_8_3 | 66_8_3 |
sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT | 18_4_1 | 18_4_1 |
... | ... | ... |
vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG | 15_8_1 | 15_8_1 |
vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT | 69_8_1 | 69_8_1 |
vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG | 90_7_2 | 90_7_2 |
vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT | 172_4_2 | 172_4_2 |
vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC | 48_4_1_1|48_4_1_2 | 48_4_1 |
838 rows × 2 columns
concat
enating multiple objects¶
This is a simple function to concatenate (append) two or more Dandelion
class, or pandas
dataframes.
[7]:
# for example, the original dandelion class has 838 unique cell barcodes and 1700 contigs
vdj
[7]:
Dandelion class object with n_obs = 838 and n_contigs = 1700
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_freq', 'duplicate_count', 'clone_id', 'changeo_clone_id', 'clone_id_heavy_only'
metadata: 'clone_id', 'clone_id_by_size', 'sample_id', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_genotyped_heavy', 'v_call_genotyped_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_light_0', 'umi_count_light_1', 'umi_count_light_2', 'junction_aa_heavy', 'junction_aa_light', 'status', 'productive', 'isotype', 'vdj_status_detail', 'vdj_status', 'changeo_clone_id', 'd_call_heavy', 'd_call_light', 'clone_id_heavy_only'
distance: 'heavy', 'light_0', 'light_1', 'light_2'
edges: 'source', 'target', 'weight'
layout: layout for 838 vertices, layout for 24 vertices
graph: networkx graph of 838 vertices, networkx graph of 24 vertices
[8]:
# now it has 5100 contigs instead, and the metadata should also be properly populated
vdj_concat = ddl.concat([vdj, vdj, vdj])
vdj_concat
[8]:
Dandelion class object with n_obs = 838 and n_contigs = 5100
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_freq', 'duplicate_count', 'clone_id', 'changeo_clone_id', 'clone_id_heavy_only'
metadata: 'clone_id', 'clone_id_by_size', 'sample_id', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_genotyped_heavy', 'v_call_genotyped_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_heavy_1', 'umi_count_heavy_2', 'umi_count_light_0', 'umi_count_light_1', 'umi_count_light_2', 'umi_count_light_3', 'umi_count_light_4', 'umi_count_light_5', 'umi_count_light_6', 'umi_count_light_7', 'umi_count_light_8', 'junction_aa_heavy', 'junction_aa_light', 'status', 'productive', 'isotype', 'vdj_status_detail', 'vdj_status'
distance: None
edges: None
layout: None
graph: None
read/write¶
Dandelion
class can be saved using .write_h5
and .write_pkl
functions with accompanying compression methods. write_h5
primarily uses pandas to_hdf
library and write_pkl
just uses pickle. read_h5
and read_pkl
functions will read the respective file formats accordingly.
[9]:
%time vdj.write_h5('dandelion_results.h5', complib = 'bzip2')
CPU times: user 1.53 s, sys: 65.7 ms, total: 1.59 s
Wall time: 1.64 s
[10]:
%time vdj_1 = ddl.read_h5('dandelion_results.h5')
vdj_1
CPU times: user 564 ms, sys: 54.6 ms, total: 619 ms
Wall time: 631 ms
[10]:
Dandelion class object with n_obs = 838 and n_contigs = 1700
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_freq', 'duplicate_count', 'clone_id', 'changeo_clone_id', 'clone_id_heavy_only'
metadata: 'clone_id', 'clone_id_by_size', 'sample_id', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_genotyped_heavy', 'v_call_genotyped_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_light_0', 'umi_count_light_1', 'umi_count_light_2', 'junction_aa_heavy', 'junction_aa_light', 'status', 'productive', 'isotype', 'vdj_status_detail', 'vdj_status', 'changeo_clone_id', 'd_call_heavy', 'd_call_light', 'clone_id_heavy_only'
distance: 'heavy', 'light_0', 'light_1', 'light_2'
edges: 'source', 'target', 'weight'
layout: layout for 838 vertices, layout for 24 vertices
graph: networkx graph of 838 vertices, networkx graph of 24 vertices
The read/write times using pickle
can be situationally faster/slower and file sizes can also be situationally smaller/larger (depending on which compression is used).
[11]:
%time vdj.write_pkl('dandelion_results.pkl.gz')
CPU times: user 9.14 s, sys: 68 ms, total: 9.21 s
Wall time: 9.41 s
[12]:
%time vdj_2 = ddl.read_pkl('dandelion_results.pkl.gz')
vdj_2
CPU times: user 89.9 ms, sys: 9.16 ms, total: 99.1 ms
Wall time: 106 ms
[12]:
Dandelion class object with n_obs = 838 and n_contigs = 1700
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_freq', 'duplicate_count', 'clone_id', 'changeo_clone_id', 'clone_id_heavy_only'
metadata: 'clone_id', 'clone_id_by_size', 'sample_id', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_genotyped_heavy', 'v_call_genotyped_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_light_0', 'umi_count_light_1', 'umi_count_light_2', 'junction_aa_heavy', 'junction_aa_light', 'status', 'productive', 'isotype', 'vdj_status_detail', 'vdj_status', 'changeo_clone_id', 'd_call_heavy', 'd_call_light', 'clone_id_heavy_only'
distance: 'heavy', 'light_0', 'light_1', 'light_2'
edges: 'source', 'target', 'weight'
layout: layout for 838 vertices, layout for 24 vertices
graph: networkx graph of 838 vertices, networkx graph of 24 vertices
[ ]: