BCR clustering¶

On the topic of finding clones/clonotypes, there are many ways used for clustering BCRs, almost all involving some measure based on sequence similarity. There are also a lot of very well established guidelines and criterias maintained by the BCR community. For example, immcantation uses a number of model-based methods to group clones based on the distribution of length-normalised junctional hamming distance while others use the whole BCR V(D)J sequence to define clones as shown in this recent paper.
Import modules
[1]:
import os
import pandas as pd
import dandelion as ddl
ddl.logging.print_header()
dandelion==0.1.0 pandas==1.1.4 numpy==1.19.4 matplotlib==3.3.3 networkx==2.5 scipy==1.5.3 skbio==0.5.6
[2]:
# change directory to somewhere more workable
os.chdir(os.path.expanduser('/Users/kt16/Downloads/dandelion_tutorial/'))
# I'm importing scanpy here to make use of its logging module.
import scanpy as sc
sc.settings.verbosity = 3
import warnings
warnings.filterwarnings('ignore')
sc.logging.print_header()
scanpy==1.6.0 anndata==0.7.4 umap==0.4.6 numpy==1.19.4 scipy==1.5.3 pandas==1.1.4 scikit-learn==0.23.2 statsmodels==0.12.1 python-igraph==0.8.3 leidenalg==0.8.3
Read in the previously saved files
I will work with the same example from the previous notebook since I have the filtered V(D)J data stored in a Dandelion
class.
[3]:
vdj = ddl.read_h5('dandelion_results.h5')
vdj
[3]:
Dandelion class object with n_obs = 838 and n_contigs = 1700
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_freq', 'duplicate_count'
metadata: 'sample_id', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_genotyped_heavy', 'v_call_genotyped_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_light_0', 'umi_count_light_1', 'umi_count_light_2', 'junction_aa_heavy', 'junction_aa_light', 'status', 'productive', 'isotype', 'vdj_status_detail', 'vdj_status'
distance: None
edges: None
layout: None
graph: None
Finding clones¶
The following is dandelion’s implementation of a rather conventional method to define clones, tl.find_clones
.
Clone definition is based on the following criterias:
(1) Identical IGH V-J gene usage.
(2) Identical CDR3 junctional sequence length.
(3) CDR3 Junctional sequences attains a minimum of % sequence similarity, based on hamming distance. The similarity cut-off is tunable (default is 85%).
(4) Light chain usage. If cells within clones use different light chains, the clone will be splitted following the same conditions for heavy chains in (1-3) as above.
The ‘clone_id’ name follows a {A}_{B}_{C}_{D}
format and largely reflects the conditions above where:
{A} indicates if the contigs use the same IGH V/J genes.
{B} indicates if IGH junctional sequences are equal in length.
{C} indicates if clones are splitted based on junctional hamming distance threshold
{D} indicates light chain pairing.
The last position will not be annotated if there’s only one group of light chains usage detected in the clone.
Running tl.find_clones
¶
The function will take a file path, a pandas DataFrame
(for example if you’ve used pandas to read in the filtered file already), or a Dandelion
class object. The default mode for calculation of junctional hamming distance is to use the CDR3 junction amino acid sequences, specified via the key
option (None
defaults to junction_aa
). You can switch it to using CDR3 junction nucleotide sequences (key = 'junction'
, or even the full V(D)J amino acid sequence
(key = 'sequence_alignment_aa
), as long as the column name exists in the .data
slot.
If you want to use the alleles for defining V-J gene usuage, specify:
by_alleles = True
[4]:
ddl.tl.find_clones(vdj)
vdj
Finding clonotypes
Finding clones based on heavy chains : 100%|██████████| 176/176 [00:00<00:00, 2821.65it/s]
Refining clone assignment based on light chain pairing : 100%|██████████| 819/819 [00:00<00:00, 1216.02it/s]
finished: Updated Dandelion object:
'data', contig-indexed clone table
'metadata', cell-indexed clone table
(0:00:01)
[4]:
Dandelion class object with n_obs = 838 and n_contigs = 1700
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_freq', 'duplicate_count', 'clone_id'
metadata: 'clone_id', 'clone_id_by_size', 'sample_id', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_genotyped_heavy', 'v_call_genotyped_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_light_0', 'umi_count_light_1', 'umi_count_light_2', 'junction_aa_heavy', 'junction_aa_light', 'status', 'productive', 'isotype', 'vdj_status_detail', 'vdj_status'
distance: None
edges: None
layout: None
graph: None
This will return a new column with the column name 'clone_id'
as per convention. If a file path is provided as input, it will also save the file automatically into the base directory of the file name. Otherwise, a Dandelion
object will be returned.
[5]:
vdj.metadata
[5]:
clone_id | clone_id_by_size | sample_id | locus_heavy | locus_light | productive_heavy | productive_light | v_call_genotyped_heavy | v_call_genotyped_light | j_call_heavy | ... | umi_count_light_0 | umi_count_light_1 | umi_count_light_2 | junction_aa_heavy | junction_aa_light | status | productive | isotype | vdj_status_detail | vdj_status | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC | 140_3_1 | 362 | sc5p_v2_hs_PBMC_10k | IGH | IGK | T | T | IGHV1-69 | IGKV1-8 | IGHJ3 | ... | 43.0 | NaN | NaN | CATTYYYDSSGYYQNDAFDIW | CQQYYSYPRTF | IGH + IGK | T + T | IgM | Single + Single | Single |
sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG | 88_4_1 | 278 | sc5p_v2_hs_PBMC_10k | IGH | IGL | T | T | IGHV1-2 | IGLV5-45 | IGHJ3 | ... | 90.0 | NaN | NaN | CAREIEGDGVFEIW | CMIWHSSAWVV | IGH + IGL | T + T | IgM | Single + Single | Single |
sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC | 98_2_2 | 726 | sc5p_v2_hs_PBMC_10k | IGH | IGK | T | T | IGHV5-51 | IGKV1D-8 | IGHJ3 | ... | 22.0 | NaN | NaN | CARHIRGNRFGNDAFDIW | CQQYYSFPYTF | IGH + IGK | T + T | IgM | Single + Single | Single |
sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT | 64_8_1 | 645 | sc5p_v2_hs_PBMC_10k | IGH | IGL | T | T | IGHV3-15 | IGLV6-57 | IGHJ4 | ... | 40.0 | NaN | NaN | CTTDDEKRPYSGSYLPFDYW | CQSYDSSNVVF | IGH + IGL | T + T | IgM | Single + Multi_light_j | Single |
sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT | 9_4_1 | 677 | sc5p_v2_hs_PBMC_10k | IGH | IGL | T | T | IGHV3-33 | IGLV2-14 | IGHJ6 | ... | 36.0 | NaN | NaN | CARDWVRGVNDMDVW | CSSYTSSSTRVF | IGH + IGL | T + T | IgM | Single + Single | Single |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG | 63_8_2 | 181 | vdj_v1_hs_pbmc3 | IGH | IGK | T | T | IGHV4-59 | IGKV1-12 | IGHJ4 | ... | 33.0 | NaN | NaN | CARVNVGGIAVAGYFDYW | CQQANSFPLTF | IGH + IGK | T + T | IgM | Single + Single | Single |
vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT | 128_8_1 | 723 | vdj_v1_hs_pbmc3 | IGH | IGK | T | T | IGHV3-21 | IGKV3-20 | IGHJ6 | ... | 18.0 | NaN | NaN | CARVRQEYYDFWSGYPAEVYYYMDVW | CQQYGSSPLFTF | IGH + IGK | T + T | IgM | Single + Single | Single |
vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG | 146_7_2 | 799 | vdj_v1_hs_pbmc3 | IGH | IGL | T | T | IGHV3-48 | IGLV2-14 | IGHJ4 | ... | 80.0 | NaN | NaN | CAREKYDFWSGDSYYFDYW | CSSYTSSSTRVF | IGH + IGL | T + T | IgM | Single + Single | Single |
vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT | 145_4_2 | 276 | vdj_v1_hs_pbmc3 | IGH | IGK | T | T | IGHV4-34 | IGKV1-39|IGKV1D-39 | IGHJ3 | ... | 5.0 | NaN | NaN | CARRRLTYYYDSSGPLSAFDIW | CQQSYSTPRTF | IGH + IGK | T + T | IgM | Single + Multi_light_v | Single |
vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC | 124_4_1_1|124_4_1_2 | 635|368 | vdj_v1_hs_pbmc3 | IGH | IGL|IGL | T | T|F | IGHV4-4 | IGLV1-40|IGLV1-51 | IGHJ5 | ... | 58.0 | 15.0 | NaN | CARGGVSTAFWFDPW | CQSYDRSLGGHYVF|CGTWDSSLSAGCA | IGH + IGL|IGL | T + T|F | IgM | Single + Multi_light_v|Multi_light_j | Multi |
838 rows × 24 columns
Alternative : Running tl.define_clones
¶
Alternatively, a wrapper to call changeo’s DefineClones.py
is also included. To run it, you need to choose the distance threshold for clonal assignment. To facilitate this, the function pp.calculate_threshold
will run shazam’s distToNearest function and return a plot showing the length normalized hamming distance distribution and automated threshold value.
Again, pp.calculate_threshold
will take a file path, pandas DataFrame
or Dandelion
object as input. If a dandelion object is provided, the threshold value will be inserted into the .threshold
slot. For more fine control, please use the DefineClones.py
function directly.
[6]:
ddl.pp.calculate_threshold(vdj)
Calculating threshold
Threshold method 'density' did not return with any values. Switching to method = 'gmm'.

<ggplot: (360081101)>
finished: Updated Dandelion object:
'threshold', threshold value for tuning clonal assignment
(0:00:43)
[7]:
# see the actual value in .threshold slot
vdj.threshold
[7]:
0.21354295894548617
You can also manually select a value as the threshold if you wish.
[8]:
ddl.pp.calculate_threshold(vdj, manual_threshold = 0.1)
Calculating threshold
Threshold method 'density' did not return with any values. Switching to method = 'gmm'.

<ggplot: (360720905)>
finished: Updated Dandelion object:
'threshold', threshold value for tuning clonal assignment
(0:00:26)
[9]:
# see the updated .threshold slot
vdj.threshold
[9]:
0.1
We can run tl.define_clones
to call changeo’s DefineClones.py
; see here for more info. Note, if a pandas.DataFrame
or file path is provided as the input, the value in dist option (corresponds to threshold value) needs to be manually supplied. If a Dandelion
object is provided, it will automatically retrieve it from the threshold slot.
[10]:
ddl.tl.define_clones(vdj, key_added = 'changeo_clone_id')
vdj
Finding clones
finished: Updated Dandelion object:
'data', contig-indexed clone table
'metadata', cell-indexed clone table
(0:00:09)
[10]:
Dandelion class object with n_obs = 838 and n_contigs = 1700
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_freq', 'duplicate_count', 'clone_id', 'changeo_clone_id'
metadata: 'clone_id', 'clone_id_by_size', 'sample_id', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_genotyped_heavy', 'v_call_genotyped_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_light_0', 'umi_count_light_1', 'umi_count_light_2', 'junction_aa_heavy', 'junction_aa_light', 'status', 'productive', 'isotype', 'vdj_status_detail', 'vdj_status', 'changeo_clone_id'
distance: None
edges: None
layout: None
graph: None
Note that I specified the option key_added
and this adds the output from tl.define_clones
into a separate column. If left as default (None
), it will write into clone_id
column. The same option can be specified in tl.find_clones
earlier.
Generation of BCR network¶
dandelion generates a network to facilitate visualisation of results. This uses the full V(D)J contig sequences instead of just the junctional sequences to chart a tree-like network for each clone. The actual visualization will be achieved through scanpy
later.
tl.generate_network
First we need to generate the network. tl.generate_network
will take a V(D)J table that has clones defined, specifically under the 'clone_id'
column. The default mode is to use amino acid sequences for constructing Levenshtein distance matrices, but can be toggled using the key
option.
If you have a pre-processed table parsed from immcantation’s method, or any other method as long as it’s in a AIRR format, the table can be used as well.
You can specify the clone_key
option for generating the network for the clone id definition of choice as long as it exists as a column in the .data
slot.
[11]:
ddl.tl.generate_network(vdj)
Generating network
Calculating distances... : 100%|██████████| 4/4 [00:03<00:00, 1.15it/s]
Generating edge list : 100%|██████████| 7/7 [00:00<00:00, 848.90it/s]
Linking edges : 100%|██████████| 821/821 [00:00<00:00, 5324.08it/s]
generating network layout
finished: Updated Dandelion object:
'data', contig-indexed clone table
'metadata', cell-indexed clone table
'distance', heavy and light chain distance matrices
'edges', network edges
'layout', network layout
'graph', network (0:00:10)
This step works reasonably fast here but will take quite a while when a lot of contigs are provided.
You can also downsample the number of cells. This will return a new object as a downsampled copy of the original with it’s own distance matrix.
[12]:
vdj_downsample = ddl.tl.generate_network(vdj, downsample = 500)
vdj_downsample
Generating network
Downsampling to 500 cells.
Calculating distances... : 100%|██████████| 4/4 [00:01<00:00, 3.40it/s]
Generating edge list : 100%|██████████| 2/2 [00:00<00:00, 757.03it/s]
Linking edges : 100%|██████████| 492/492 [00:00<00:00, 6924.50it/s]
generating network layout
finished: Updated Dandelion object:
'data', contig-indexed clone table
'metadata', cell-indexed clone table
'distance', heavy and light chain distance matrices
'edges', network edges
'layout', network layout
'graph', network (0:00:08)
[12]:
Dandelion class object with n_obs = 500 and n_contigs = 1016
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_freq', 'duplicate_count', 'clone_id', 'changeo_clone_id'
metadata: 'clone_id', 'clone_id_by_size', 'sample_id', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_genotyped_heavy', 'v_call_genotyped_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_light_0', 'umi_count_light_1', 'umi_count_light_2', 'junction_aa_heavy', 'junction_aa_light', 'status', 'productive', 'isotype', 'vdj_status_detail', 'vdj_status'
distance: 'heavy_0', 'light_0', 'light_1', 'light_2'
edges: 'source', 'target', 'weight'
layout: layout for 500 vertices, layout for 10 vertices
graph: networkx graph of 500 vertices, networkx graph of 10 vertices
check the newly re-initialized Dandelion object
[13]:
vdj
[13]:
Dandelion class object with n_obs = 838 and n_contigs = 1700
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_freq', 'duplicate_count', 'clone_id', 'changeo_clone_id'
metadata: 'clone_id', 'clone_id_by_size', 'sample_id', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_genotyped_heavy', 'v_call_genotyped_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_light_0', 'umi_count_light_1', 'umi_count_light_2', 'junction_aa_heavy', 'junction_aa_light', 'status', 'productive', 'isotype', 'vdj_status_detail', 'vdj_status', 'changeo_clone_id'
distance: 'heavy_0', 'light_0', 'light_1', 'light_2'
edges: 'source', 'target', 'weight'
layout: layout for 838 vertices, layout for 24 vertices
graph: networkx graph of 838 vertices, networkx graph of 24 vertices
The graph/networks can be accessed through the .graph
slot as an networkx
graph object if you want to extract the data for network statistics or make any changes to the network.
At this point, we can save the dandelion object; the file can be quite big because the distance matrix is not sparse. I reccomend some form of compression (I use bzip2
below but that can impact on read/write times significantly). See here for options compression options.
[14]:
vdj.write_h5('dandelion_results.h5', complib = 'bzip2')
[ ]: