Running from R

dandelion_logo

Foreword

dandelion is written in python==3.7.6 and it is primarily a single-cell BCR-seq analysis package. It makes use of some tools from the fantastic immcantation suite and the main idea is that it implements a workflow for the pre-processing and exploratory stages with integrated use of tools from immcantation for the BCR side of things and analysis tools from scanpy for the RNA-seq side of things. I hope to be able to introduce some new single-cell BCR-seq exploratory tools down the road through dandelion.

dandelion can be run in R through reticulate. This notebook will try to replicate the examples in notebooks 1-3 entirely in R.

There are some issues with the conversion of dataframes between python and R so I would not recommend saving the final AnnData object as a final out file, but only use this to help generate the intermediate files from the BCR processing and the plots. I would also skip quantify mutation step due to conflicts between rpy2 and reticulate.

For more details, please refer to the original notebooks 1-3.

Let’s start!

First, install reticulate via if you don’t already have it:

install.packages('reticulate')

Because we are managing the packages through a conda virtual environment, we will need to point reticulate to the right python paths.

[1]:
library(reticulate)
use_condaenv('dandelion')
# or use Sys.setenv(RETICULATE_PYTHON = conda_python(envname='dandelion'))

You can check if the python config is set up properly with py_config()

[2]:
py_config()
python:         /Users/kt16/miniconda3/envs/dandelion/bin/python
libpython:      /Users/kt16/miniconda3/envs/dandelion/lib/libpython3.7m.dylib
pythonhome:     /Users/kt16/miniconda3/envs/dandelion:/Users/kt16/miniconda3/envs/dandelion
version:        3.7.8 | packaged by conda-forge | (default, Nov 17 2020, 23:22:07)  [Clang 10.0.1 ]
numpy:          /Users/kt16/miniconda3/envs/dandelion/lib/python3.7/site-packages/numpy
numpy_version:  1.19.4

NOTE: Python version was forced by RETICULATE_PYTHON

To proceed with the analyses, we first change the working directory and also import the dandelion module.

[3]:
# change directory to somewhere more workable
setwd('/Users/kt16/Downloads/dandelion_tutorial_R/')
# ddl = import_from_path('dandelion', path = '/Users/kt16/Documents/Github/dandelion')
ddl = import('dandelion')

As per reticulate convention, python . operators are to be swaped with $ in R.

Pre-processing

To being the BCR preprocessing, the minimum files you require are the following:

filtered_contig.fasta
filtered_contig_annotations.csv

Step 1: Formatting the headers of the cellranger fasta file

This will do the following: 1) add the prefix provided to every sequence header

  1. add the prefix provided to every contig_id in the annotation.csv file

  2. create a folder called dandelion/data (if left as default) and saves a copy of these files in that directory.

[4]:
# the first option is a list of fasta files to format and the second option is the list of prefix to add to each file.
samples = c('sc5p_v2_hs_PBMC_1k', 'sc5p_v2_hs_PBMC_10k', 'vdj_v1_hs_pbmc3', 'vdj_nextgem_hs_pbmc3')
ddl$pp$format_fastas(samples, prefix = samples)

Step 2: Reannotate the V/D/J genes with igblastn.

ddl$pp$reannotate_genes uses changeo’s scripts to call igblastn to reannotate the fasta files. Depending on the fileformat option, it will parse out as either an airr (default) or changeo-legacy TSV file. Importantly, with the recent update to changeo v1.0.0, all the column headers, including changeo format, are now adhereing to the AIRR standard (lowercase and some column name changes). Specifying extended = True will return the additional 10x annotation of V/D/J genes but they are unnecessary at this stage.

[5]:
ddl$pp$reannotate_genes(samples)

you should see something like this in the terminal:

Assigning genes :   0%|                                        | 0/4 [00:00<?, ?it/s]
START> AssignGenes
 COMMAND> igblast
 VERSION> 1.15.0
    FILE> sc5p_v2_hs_PBMC_10k_b_filtered_contig.fasta
ORGANISM> human
    LOCI> ig
   NPROC> 4

PROGRESS> 12:30:37 |Running IgBLAST          | 0.0 min

Step 3 : Reassigning heavy chain V gene alleles (optional but recommended)

Next, we use immcantation’s TIgGER method to reassign allelic calls for heavy chain V genes with pp.reassign_alleles. As stated in TIgGER’s website and manuscript, ‘TIgGER is a computational method that significantly improves V(D)J allele assignments by first determining the complete set of gene segments carried by an individual (including novel alleles) from V(D)J-rearrange sequences. TIgGER can then infer a subject’s genotype from these sequences, and use this genotype to correct the initial V(D)J allele assignments.’

This impact’s on how contigs are chosen for finding clones later so it is highly recommended to run it. It is also important when considering to do mutational analysis.

However, the main caveat is that this needs to be run on multiple samples from the same subject to allow for more information to be used to confidently assign a genotyped v_call. In this tutorial, I’m assuming the four samples can be split into two sets where sets of two corresponds to a different/single individual. So while important, this step can be skipped if you don’t have the samples to do this.

pp.reassign_alleles requires the combined_folder option to be specified so that a merged/concatenated file can be produced for running TIgGER. The function also runs pp.create_germlines using the germline database updated with the germline corrections from TIgGER. The default behavior is that it will return a germline_alignment_d_mask column in the final output. This can be changed by specifying germ_types option; see here for other options.

Specifying fileformat = 'changeo' will run on changeo formatted files if this was run earlier; but it’s probably best to stick to airr’s standard format.

[6]:
# reassigning alleles on the first set of samples
ddl$pp$reassign_alleles(samples[1:2], combined_folder = 'tutorial_scgp1')
[7]:
# reassigning alleles on the first set of samples
ddl$pp$reassign_alleles(samples[3:4], combined_folder = 'tutorial_scgp2')

We can see that most of the original ambiguous V calls have now been corrected and only a few remain. These will be flagged as multi later on and can probably be excluded from detailed analyses. For now, leaving them in the data will not impact on subsequent analyses.

Step 4: Assigning constant region calls

10X’s annotation.csv files provides a c_gene column, but rather than simply relying on 10x’s annotation, hk6 recommended using immcantation-presto’s MaskPrimers.py with his custom primer list and I tested that; worked well but it took 20 min for the first file (~6k contigs). It also only calls the constant region for the heavy chains. The processing speed for MaskPrimers can be sped up with using the filtered file.

Anyway, as an alternative, I wrote a pre-processing function, ddl$pp$assign_isotypes, to use blast to annotate constant region calls for all contigs and retrieves the call, merging it with the tsv files. This function will simply overwrite the output from previous steps and add a c_call column at the end, or replace the existing column if it already exists.

To deal with incorrect constant gene calls due to insufficient length, an internal subfunction will run a pairwise alignment against hk6’s curated sequences that were deemed to be highly specific in distinguishing IGHA1-2, IGHG1-4. I have also curated sets of sequences that should help deal with IGLC3/6/7 as these are problematic too. If there’s insufficient info, the c_call will be returned as a combination of the most aligned sets of sequences. Because of how similar the light chains are, extremely ambiguous calls (only able to map to a common sequence across the light chains) will be returned as IGLC. This typically occurs when the constant sequence is very short. Those that have equal alignment scores between IGLC3/6/7 sequences and the common sequence will be returned as a concatenated call; for example with a contig initially annotated as IGLC3 will be returned as IGLC,IGLC3. If you do not want this subfunction to run, toggle:

correct_c_call = FALSE

Before running, there is a need to set up a database with IMGT constant gene fasta sequences using makeblastdb, basically following the instructions from https://www.ncbi.nlm.nih.gov/books/NBK279688/. This only needs to be done once.

The fasta files were downloaded from IMGT and only sequences corresponding to CH1 region for each constant gene/allele were retained. The headers were trimmed to only keep the gene and allele information. Links to find the sequences can be found here : human and mouse.

The database file is provided in the repository and I’ve written a utility function ddl$utl$makeblastdb to prep new fasta files/databases if you need to.

ddl$utl$makeblastdb('/path/to/folder/containing/database/blast/human/human_BCR_C.fasta')

Again, we really only need to do it once; the file path can be added as an environmental variable after running:

echo "export BLASTDB=/path/to/folder/containing/database/blast/" >> ~/.bash_profile
source ~/.bash_profile

I’ve set it up so that if the default option for blastdb is left as None, the function will retrieve a relative path from the environmental variable $BLASTDB and then, depending on which organism was specified (default = human), point to the correct fasta file. If you choose not to add it to environment, you can provide a string specifying a path to the fasta file for the blastdb option. The string has to point directly to the fasta file, i.e. end with .fasta.


[9]:
# ddl$utl$makeblastdb('/Users/kt16/Documents/Github/dandelion/database/blast/human/human_BCR_C.fasta')
[8]:
ddl$pp$assign_isotypes(samples)

This still takes a while when dealing with large files; the number of cpus to size of file isn’t exactly linear. Nevertheless, I have enabled parallelization as default because there were noticeable improvements in processing speeds with the smaller files. Maybe it will work better on a cluster with more cpus, rather than just a standard laptop. Other than a couple of samples that took about ~10-40min, most ran within 2-5min. I expect that this should run faster with filtered files too.

The default option will return a summary plot that can be disabled with plot = FALSE. In R, if you leave the plot = TRUE, it will wait for you to close the plots before continuing.

Also, the function can be run with fileformat = 'changeo' if preferred.

It’s worthwhile to manually check the the sequences for constant calls returned as IGHA1-2, IGHG1-4 and the light chains and manually correct them if necessary.

Step 5: Quantify mutations (optional).

In my original notebook, at this stage, I quantified the basic mutational load with ddl.pp.quantify_mutations before subsequent analyses. This would not run properly within this R session due to rpy2/reticulate conflict. Instead, i’d recommend you to run this separately on the. *_igblast_gap_genotyped.tsv file that was generated in the previous step, by following Shazam’s tutorial on basic mutational analysis, before continuing to step 6.

Filtering

Create a Seurat object from the transcriptome data

Let’s first import the gene expression data. Let’s try from Seurat object.

[ ]:
# setwd('/Users/kt16/Downloads/dandelion_tutorial_R/')
# library(reticulate)
# ddl = import('dandelion')
[10]:
library(Seurat)
samples = c('sc5p_v2_hs_PBMC_1k', 'sc5p_v2_hs_PBMC_10k', 'vdj_v1_hs_pbmc3', 'vdj_nextgem_hs_pbmc3')
seurat_objects = list()
for (i in 1:length(samples)){
    filename = paste0(samples[i], '/', samples[i], "_filtered_feature_bc_matrix.h5")
    data <- Read10X_h5(filename)
    seurat_objects[[i]] <- CreateSeuratObject(counts = data$`Gene Expression`)
}
Genome matrix has multiple modalities, returning a list of matrices for this genome

Genome matrix has multiple modalities, returning a list of matrices for this genome

Genome matrix has multiple modalities, returning a list of matrices for this genome

Genome matrix has multiple modalities, returning a list of matrices for this genome

[11]:
# merge them, there's probably better ways
merged1 = merge(seurat_objects[[1]], seurat_objects[[2]], add.cell.ids = samples[1:2])
merged2 = merge(seurat_objects[[3]], seurat_objects[[4]], add.cell.ids = samples[3:4])
merged = merge(merged1, merged2)
[12]:
head(merged@meta.data)
A data.frame: 6 × 3
orig.identnCount_RNAnFeature_RNA
<chr><dbl><int>
sc5p_v2_hs_PBMC_1k_AAACCTGCAGCCTGTG-1SeuratProject50291819
sc5p_v2_hs_PBMC_1k_AAACGGGTCGCTGATA-1SeuratProject43431738
sc5p_v2_hs_PBMC_1k_AAAGATGTCCTCGCAT-1SeuratProject 898 554
sc5p_v2_hs_PBMC_1k_AAAGCAAAGTATGACA-1SeuratProject49991796
sc5p_v2_hs_PBMC_1k_AAAGCAATCCATTCTA-1SeuratProject 601 416
sc5p_v2_hs_PBMC_1k_AAATGCCCACTTAAGC-1SeuratProject49981530

dandelion removes hyphen from the cell barcodes as a default behaviour, so we will do the same for the Seurat object.

[13]:
# remove the -# from the end of each cell name
merged <- RenameCells(merged, new.names = gsub('-.*', '', row.names(merged@meta.data)))
merged
An object of class Seurat
38224 features across 30471 samples within 1 assay
Active assay: RNA (38224 features, 0 variable features)

Standard Seurat pre-processing workflow

[14]:
merged[["percent.mt"]] <- PercentageFeatureSet(merged, pattern = "^MT-")
merged <- subset(merged, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
merged <- NormalizeData(merged)
merged <- FindVariableFeatures(merged, selection.method = "vst", nfeatures = 2000)
merged <- ScaleData(merged)
merged <- RunPCA(merged, features = VariableFeatures(object = merged))
merged <- FindNeighbors(merged, dims = 1:50)
merged <- FindClusters(merged)
merged <- RunUMAP(merged, dims = 1:50)
Centering and scaling data matrix

PC_ 1
Positive:  LYZ, S100A9, FCN1, S100A8, CST3, VCAN, MNDA, IFI30, SPI1, SERPINA1
           S100A12, CSTA, TYMP, CTSS, CD14, LST1, CYBB, FTL, MS4A6A, CSF3R
           TYROBP, CD68, TNFAIP2, CFD, NCF2, CD36, CEBPD, AIF1, GRN, LILRB2
Negative:  IFITM1, RPS27, IL32, LTB, RPS12, RPS18, IL7R, TCF7, CD7, TRBC2
           NOSIP, RPS29, LEF1, LINC00861, ETS1, CD247, CCR7, LIME1, CD2, MAL
           SELL, GZMM, TRBC1, AQP3, OXNAD1, AES, CD69, ARL4C, PIM2, TRAC
PC_ 2
Positive:  MS4A1, CD79A, BANK1, LINC00926, IGHM, HLA-DQA1, FCER2, HLA-DQB1, TCL1A, CD79B
           FCRLA, CD19, VPREB3, RALGPS2, CD22, HLA-DRA, FCRL1, HLA-DOB, AFF3, NIBAN3
           IGHD, POU2AF1, BLNK, BLK, HLA-DPB1, SPIB, IGKC, ADAM28, CD24, HVCN1
Negative:  NKG7, HCST, CST7, PRF1, GZMA, GNLY, CTSW, KLRD1, IFITM1, FGFBP2
           FCGR3A, KLRF1, SPON2, S100A4, HOPX, GZMB, SRGN, IFITM2, CCL5, CLIC3
           CD7, MATK, ITGB2, TBX21, IL2RB, CD247, KLRB1, GZMM, SH2D1B, S1PR5
PC_ 3
Positive:  NKG7, GNLY, GZMB, CST7, PRF1, KLRD1, GZMA, FCGR3A, FGFBP2, KLRF1
           SPON2, CLIC3, HOPX, SH2D1B, TBX21, CCL4, ADGRG1, S1PR5, MATK, PTGDR
           MYOM2, IL2RB, TTC38, CCL5, PRSS23, GZMH, CTSW, XCL2, CXXC5, PLEK
Negative:  TCF7, LEF1, TPT1, IL7R, VIM, RPS12, CCR7, RPL36A, MAL, NOSIP
           RGS10, SNHG29, PRKCQ-AS1, TRABD2A, RPL17, EEF1G, GAS5, OXNAD1, GIMAP5, RPS4Y1
           RGCC, BCL11B, ACTN1, TRAT1, RPS29, LTB, AL138963.4, GIMAP1, PCED1B-AS1, NELL2
PC_ 4
Positive:  CAVIN2, PPBP, TUBB1, PF4, GNG11, SPARC, GP9, CLU, MPIG6B, TREML1
           PRKAR2B, NRGN, ITGA2B, CMTM5, PTGS1, MMD, MYL9, AC147651.1, TRIM58, ACRBP
           MAP3K7CL, AL731557.1, DMTN, C2orf88, RGS18, CD9, TSC22D1, MTURN, THBS1, TMEM40
Negative:  AES, RPS12, SEPT9, RARRES3, SEPT6, JUNB, SEPT7, SEPT1, RPS18, AL138963.3
           DUSP1, JUN, KIAA1551, FOS, VIM, CD74, TPT1, S100A4, S100A10, S100A6
           HLA-DQA2, CD69, HLA-DRB5, NFKBIA, CD37, RPS27, DUSP2, HLA-DRA, RETN, ATP6V0B
PC_ 5
Positive:  RPS4Y1, TLE5, EEF1G, SNHG29, GAS5, SEPTIN9, SNHG6, PCED1B-AS1, IL2RG, RPL17
           PLAAT4, SEPTIN7, SEPTIN1, LINC01578, JUND, AL138963.4, TOMM6, SNHG5, CD81, SEPTIN6
           KLF2, RESF1, GIMAP5, ATP5PO, MICOS10, RPS17, RBIS, RNASEK, PRKCQ-AS1, RPL36A
Negative:  AES, SEPT9, SEPT6, SEPT1, SEPT7, RARRES3, AL138963.3, KIAA1551, GNG11, CLU
           CAVIN2, TUBB1, SPARC, PF4, PPBP, JUNB, GP9, CDKN1A, MPIG6B, TREML1
           LMNA, ITGA2B, PTGS1, CMTM5, MAP3K7CL, CD9, DUSP1, PRKAR2B, ACRBP, FTH1

Computing nearest neighbor graph

Computing SNN

Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 18634
Number of edges: 807300

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9222
Number of communities: 24
Elapsed time: 3 seconds
Warning message:
“The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session”
18:21:20 UMAP embedding parameters a = 0.9922 b = 1.112

18:21:20 Read 18634 rows and found 50 numeric columns

18:21:20 Using Annoy for neighbor search, n_neighbors = 30

18:21:20 Building Annoy index with metric = cosine, n_trees = 50

0%   10   20   30   40   50   60   70   80   90   100%

[----|----|----|----|----|----|----|----|----|----|

*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
|

18:21:23 Writing NN index file to temp file /var/folders/nb/wrd6px6171j52lqpmkljt6vw000l2l/T//Rtmpd2RY9r/file171982e100cbf

18:21:23 Searching Annoy index using 1 thread, search_k = 3000

18:21:28 Annoy recall = 100%

18:21:29 Commencing smooth kNN distance calibration using 1 thread

18:21:30 Initializing from normalized Laplacian + noise

18:21:34 Commencing optimization for 200 epochs, with 832510 positive edges

18:21:44 Optimization finished

[15]:
DimPlot(merged, reduction = "umap")
../_images/notebooks_5_dandelion_running_from_R_33_0.png
[16]:
merged
An object of class Seurat
38224 features across 18634 samples within 1 assay
Active assay: RNA (38224 features, 2000 variable features)
 2 dimensional reductions calculated: pca, umap

In order for dandelion to do the next step (marking/filtering of poor quality BCRs and BCR doublets), we need to include a column in the metadata, filter_rna. This column will tell dandelion whether or not the cell has passed transcriptomic QCs. So, we will set the column to FALSE i.e. every cell passed QC. This is important because we want to remove as many doublets and poor quality contigs to save time on computation and construction of the final data tables.

[17]:
# create a column called filter_rna and set it to FALSE
merged@meta.data$filter_rna = FALSE

Convert to AnnData

We will need to convert the Seurat object to AnnData to be able to continue. AnnData .obs slot is essentially the same as @meta.data in seurat.

[18]:
sc = import("scanpy")
# convert the meta.data slot to a python friendly object
obs = r_to_py(merged@meta.data)
normcounts = r_to_py(Matrix::t(GetAssayData(merged)))
[19]:
adata = sc$AnnData(X = normcounts, obs = obs)
adata
AnnData object with n_obs × n_vars = 18634 × 38224
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'RNA_snn_res.0.8', 'seurat_clusters', 'filter_rna'

We need to populate the .neighbors slots via scanpy for smooth transfer later.

[20]:
sc$pp$neighbors(adata)
adata
AnnData object with n_obs × n_vars = 18634 × 38224
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'RNA_snn_res.0.8', 'seurat_clusters', 'filter_rna'
    uns: 'neighbors'
    obsm: 'X_pca'
    obsp: 'distances', 'connectivities'

Read in the BCR files and merge them

[21]:
files = list()
for (i in 1:length(samples)){
    filename = paste0(samples[i], '/dandelion/data/', samples[i], "_b_filtered_contig_igblast_gap_genotyped.tsv")
    files[[i]] <- readr::read_tsv(filename)
}
combined_bcr = do.call(rbind, files)
head(combined_bcr)
Registered S3 method overwritten by 'cli':
  method     from
  print.boxx spatstat

── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  sequence_id = col_character(),
  sequence = col_character(),
  rev_comp = col_logical(),
  productive = col_logical(),
  v_call = col_character(),
  d_call = col_character(),
  j_call = col_character(),
  sequence_alignment = col_character(),
  germline_alignment = col_character(),
  junction = col_character(),
  junction_aa = col_character(),
  v_cigar = col_character(),
  d_cigar = col_character(),
  j_cigar = col_character(),
  locus = col_character(),
  stop_codon = col_logical(),
  vj_in_frame = col_logical(),
  sequence_alignment_aa = col_character(),
  germline_alignment_aa = col_character(),
  v_sequence_alignment = col_character()
  # ... with 36 more columns
)
 Use `spec()` for the full column specifications.



── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  sequence_id = col_character(),
  sequence = col_character(),
  rev_comp = col_logical(),
  productive = col_logical(),
  v_call = col_character(),
  d_call = col_character(),
  j_call = col_character(),
  sequence_alignment = col_character(),
  germline_alignment = col_character(),
  junction = col_character(),
  junction_aa = col_character(),
  v_cigar = col_character(),
  d_cigar = col_character(),
  j_cigar = col_character(),
  locus = col_character(),
  stop_codon = col_logical(),
  vj_in_frame = col_logical(),
  sequence_alignment_aa = col_character(),
  germline_alignment_aa = col_character(),
  v_sequence_alignment = col_character()
  # ... with 36 more columns
)
 Use `spec()` for the full column specifications.



── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  sequence_id = col_character(),
  sequence = col_character(),
  rev_comp = col_logical(),
  productive = col_logical(),
  v_call = col_character(),
  d_call = col_character(),
  j_call = col_character(),
  sequence_alignment = col_character(),
  germline_alignment = col_character(),
  junction = col_character(),
  junction_aa = col_character(),
  v_cigar = col_character(),
  d_cigar = col_character(),
  j_cigar = col_character(),
  locus = col_character(),
  stop_codon = col_logical(),
  vj_in_frame = col_logical(),
  sequence_alignment_aa = col_character(),
  germline_alignment_aa = col_character(),
  v_sequence_alignment = col_character()
  # ... with 36 more columns
)
 Use `spec()` for the full column specifications.



── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  sequence_id = col_character(),
  sequence = col_character(),
  rev_comp = col_logical(),
  productive = col_logical(),
  v_call = col_character(),
  d_call = col_character(),
  j_call = col_character(),
  sequence_alignment = col_character(),
  germline_alignment = col_character(),
  junction = col_character(),
  junction_aa = col_character(),
  v_cigar = col_character(),
  d_cigar = col_character(),
  j_cigar = col_character(),
  locus = col_character(),
  stop_codon = col_logical(),
  vj_in_frame = col_logical(),
  sequence_alignment_aa = col_character(),
  germline_alignment_aa = col_character(),
  v_sequence_alignment = col_character()
  # ... with 36 more columns
)
 Use `spec()` for the full column specifications.


A tibble: 6 × 106
sequence_idsequencerev_compproductivev_calld_callj_callsequence_alignmentgermline_alignmentjunctionc_sequence_alignmentc_germline_alignmentc_sequence_startc_sequence_endc_scorec_identityc_supportsample_idv_call_genotypedgermline_alignment_d_mask
<chr><chr><lgl><lgl><chr><chr><chr><chr><chr><chr><chr><chr><dbl><dbl><dbl><dbl><dbl><chr><chr><chr>
sc5p_v2_hs_PBMC_1k_AACTCCCAGGCTAGGT-1_contig_1ACTGCGGGGGTAAGAGGTTGTGTCCACCATGGCCTGGACTCCTCTCCTCCTCCTGTTCCTCTCTCACTGCACAGGTTCCCTCTCGCAGGCTGTGCTGACTCAGCCGTCTTCCCTCTCTGCATCTCCTGGAGCATCAGCCAGTCTCACCTGCACCTTGCGCAGTGGCATCAATGTTGGTACCTACAGGATATACTGGTACCAGCAGAAGCCAGGGAGTCCTCCCCAGTATCTCCTGAGGTACAAATCAGACTCAGATAAGCAGCAGGGCTCTGGAGTCCCCAGCCGCTTCTCTGGATCCAAAGATGCTTCGGCCAATGCAGGGATTTTACTCATCTCTGGGCTCCAGTCTGAGGATGAGGCTGACTATTACTGTATGATTTGGCACAGCAGCGCTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAGGTCAGCCCAAGGCTGCCCCCTCGGTCACTCTGTTCCCGCCCTCCTCTGAGGAGCTTCAAGCCAACAAGGCCACACTGGTGTGTCTCATAAGTGACTTCTACCCGGGAGCCGTGACAGTGGCCTGGAAGGCAGATAGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCACACCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTA FALSE TRUEIGLV5-45*03NA IGLJ3*02CAGGCTGTGCTGACTCAGCCGTCTTCC...CTCTCTGCATCTCCTGGAGCATCAGCCAGTCTCACCTGCACCTTGCGCAGTGGCATCAATGTT.........GGTACCTACAGGATATACTGGTACCAGCAGAAGCCAGGGAGTCCTCCCCAGTATCTCCTGAGGTACAAATCAGAC.........TCAGATAAGCAGCAGGGCTCTGGAGTCCCC...AGCCGCTTCTCTGGATCCAAAGATGCTTCGGCCAATGCAGGGATTTTACTCATCTCTGGGCTCCAGTCTGAGGATGAGGCTGACTATTACTGTATGATTTGGCACAGCAGCGCTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAG CAGGCTGTGCTGACTCAGCCGTCTTCC...CTCTCTGCATCTCCTGGAGCATCAGCCAGTCTCACCTGCACCTTGCGCAGTGGCATCAATGTT.........GGTACCTACAGGATATACTGGTACCAGCAGAAGCCAGGGAGTCCTCCCCAGTATCTCCTGAGGTACAAATCAGAC.........TCAGATAAGCAGCAGGGCTCTGGAGTCCCC...AGCCGCTTCTCTGGATCCAAAGATGCTTCGGCCAATGCAGGGATTTTACTCATCTCTGGGCTCCAGTCTGAGGATGAGGCTGACTATTACTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAG TGTATGATTTGGCACAGCAGCGCTTGGGTGTTC GGTCAGCCCAAGGCTGCCCCCTCGGTCACTCTGTTCCCGCCCTCCTCTGAGGAGCTTCAAGCCAACAAGGCCACACTGGTGTGTCTCATAAGTGACTTCTACCCGGGAGCCGTGACAGTGGCCTGGAAGGCAGATAGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCACACCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTAGGTCAGCCCAAGGCTGCCCCCTCGGTCACTCTGTTCCCGCCCTCCTCTGAGGAGCTTCAAGCCAACAAGGCCACACTGGTGTGTCTCATAAGTGACTTCTACCCGGGAGCCGTGACAGTGGCCTGGAAGGCAGATAGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCACACCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTA 431642392.6101001e-111sc5p_v2_hs_PBMC_1kIGLV5-45*03CAGGCTGTGCTGACTCAGCCGTCTTCC...CTCTCTGCATCTCCTGGAGCATCAGCCAGTCTCACCTGCACCTTGCGCAGTGGCATCAATGTT.........GGTACCTACAGGATATACTGGTACCAGCAGAAGCCAGGGAGTCCTCCCCAGTATCTCCTGAGGTACAAATCAGAC.........TCAGATAAGCAGCAGGGCTCTGGAGTCCCC...AGCCGCTTCTCTGGATCCAAAGATGCTTCGGCCAATGCAGGGATTTTACTCATCTCTGGGCTCCAGTCTGAGGATGAGGCTGACTATTACTGTATGATTTGGCACAGCAGCGCTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAG
sc5p_v2_hs_PBMC_1k_AACTCCCAGGCTAGGT-1_contig_2ATACTTTCTGAGAGTCCTGGACCTCCTGTGCAAGAACATGAAACATCTGTGGTTCTTCCTCCTCCTGGTGGCAGCTCCCAGATGGGTCCTGTCCCAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCACAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTGGTAGTTACTACTGGAGCTGGATCCGGCAGCCCGCCGGGAAGGGACTGGAGTGGATTGGGCGTATCTATACCAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCCGTGTATTACTGTGCGAGAGAAAATTACGATTTTTGGAGTGGTTATTACCACGGTGCGGACGTCTGGGGCCAAGGGACCACGGTCACCGTCTCCTCAGGGAGTGCATCCGCCCCAACCCTTTTCCCCCTCGTCTCCTGTGAGAATTCCCCGTCGGATACGAGCAGCGTG FALSE TRUEIGHV4-61*02IGHD3-3*01 IGHJ6*02CAGGTGCAGCTGCAGGAGTCGGGCCCA...GGACTGGTGAAGCCTTCACAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGC......AGTGGTAGTTACTACTGGAGCTGGATCCGGCAGCCCGCCGGGAAGGGACTGGAGTGGATTGGGCGTATCTATACCAGT.........GGGAGCACCAACTACAACCCCTCCCTCAAG...AGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCCGTGTATTACTGTGCGAGAGAAAATTACGATTTTTGGAGTGGTTATTACCACGGTGCGGACGTCTGGGGCCAAGGGACCACGGTCACCGTCTCCTCACAGGTGCAGCTGCAGGAGTCGGGCCCA...GGACTGGTGAAGCCTTCACAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGC......AGTGGTAGTTACTACTGGAGCTGGATCCGGCAGCCCGCCGGGAAGGGACTGGAGTGGATTGGGCGTATCTATACCAGT.........GGGAGCACCAACTACAACCCCTCCCTCAAG...AGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCNNATTACGATTTTTGGAGTGGTTATTACTACGGTATGGACGTCTGGGGCCAAGGGACCACGGTCACCGTCTCCTCATGTGCGAGAGAAAATTACGATTTTTGGAGTGGTTATTACCACGGTGCGGACGTCTGGGGGAGTGCATCCGCCCCAACCCTTTTCCCCCTCGTCTCCTGTGAGAATTCCCCGTCGGATACGAGCAGCGTG GGGAGTGCATCCGCCCCAACCCTTTTCCCCCTCGTCTCCTGTGAGAATTCCCCGTCGGATACGAGCAGCGTG 470541134.079100 7e-34sc5p_v2_hs_PBMC_1kIGHV4-61*02CAGGTGCAGCTGCAGGAGTCGGGCCCA...GGACTGGTGAAGCCTTCACAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGC......AGTGGTAGTTACTACTGGAGCTGGATCCGGCAGCCCGCCGGGAAGGGACTGGAGTGGATTGGGCGTATCTATACCAGT.........GGGAGCACCAACTACAACCCCTCCCTCAAG...AGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCCGTGTATTACTGTGCGAGAGANNNNNNNNNNNNNNNNNNNNNNNNNTACTACGGTATGGACGTCTGGGGCCAAGGGACCACGGTCACCGTCTCCTCA
sc5p_v2_hs_PBMC_1k_AACTCCCAGGCTAGGT-1_contig_3GGCTGGGGTCTCAGGAGGCAGCGCTCTGGGGACGTCTCCACCATGGCCTGGGCTCTGCTCCTCCTCACCTCCTCACTCAGGGCACAGGCTCTTGGGCCCAGTCTGCCCTGATTCAGCCTCCCTCCGTGTCCGGGTCTCCTGGACAGTCAGTCACCATCTCCTGCACTGGAACCAGCAGTGATGTTGGGAGTTATGACTATGTCTCCTGGTACCAACAGCACCCAGGCACAGTCCCCAAACCCATGATCTACAATGTCAATACTCAGCCCTCAGGGGTCCCTGATCGTTTCTCTGGCTCCAAGTCTGGCAATACGGCCTCCATGACCATCTCTGGACTCCAGGCTGAGGACGAGGCTGATTATTAGTGCTGCTCATATACAAGCAGTGCCACTTTCTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAGGTCAGCCCAAGGCTGCCCCCTCGGTCACTCTGTTCCCACCCTCCTCTGAGGAGCTTCAAGCCAACAAGGCCACACTGGTGTGTCTCATAAGTGACTTCTACCCGGGAGCCGTGACAGTGGCCTGGAAGGCAGATAGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCACACCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTA FALSEFALSEIGLV2-5*01 NA IGLJ3*02CAGTCTGCCCTGATTCAGCCTCCCTCC...GTGTCCGGGTCTCCTGGACAGTCAGTCACCATCTCCTGCACTGGAACCAGCAGTGATGTTGGG.........AGTTATGACTATGTCTCCTGGTACCAACAGCACCCAGGCACAGTCCCCAAACCCATGATCTACAATGTC.....................AATACTCAGCCCTCAGGGGTCCCT...GATCGTTTCTCTGGCTCCAAG......TCTGGCAATACGGCCTCCATGACCATCTCTGGACTCCAGGCTGAGGACGAGGCTGATTATTAGTGCTGCTCATATACAAGCAGTGCCACTTTCTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAG CAGTCTGCCCTGATTCAGCCTCCCTCC...GTGTCCGGGTCTCCTGGACAGTCAGTCACCATCTCCTGCACTGGAACCAGCAGTGATGTTGGG.........AGTTATGACTATGTCTCCTGGTACCAACAGCACCCAGGCACAGTCCCCAAACCCATGATCTACAATGTC.....................AATACTCAGCCCTCAGGGGTCCCT...GATCGTTTCTCTGGCTCCAAG......TCTGGCAATACGGCCTCCATGACCATCTCTGGACTCCAGGCTGAGGACGNNTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAG TGCTGCTCATATACAAGCAGTGCCACTTTCTTGGGTGTTC GGTCAGCCCAAGGCTGCCCCCTCGGTCACTCTGTTCCCACCCTCCTCTGAGGAGCTTCAAGCCAACAAGGCCACACTGGTGTGTCTCATAAGTGACTTCTACCCGGGAGCCGTGACAGTGGCCTGGAAGGCAGATAGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCACACCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTAGGTCAGCCCAAGGCTGCCCCCTCGGTCACTCTGTTCCCACCCTCCTCTGAGGAGCTTCAAGCCAACAAGGCCACACTGGTGTGTCTCATAAGTGACTTCTACCCGGGAGCCGTGACAGTGGCCTGGAAGGCAGATAGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCACACCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTA 433644392.6101001e-111sc5p_v2_hs_PBMC_1kIGLV2-5*01 CAGTCTGCCCTGATTCAGCCTCCCTCC...GTGTCCGGGTCTCCTGGACAGTCAGTCACCATCTCCTGCACTGGAACCAGCAGTGATGTTGGG.........AGTTATGACTATGTCTCCTGGTACCAACAGCACCCAGGCACAGTCCCCAAACCCATGATCTACAATGTC.....................AATACTCAGCCCTCAGGGGTCCCT...GATCGTTTCTCTGGCTCCAAG......TCTGGCAATACGGCCTCCATGACCATCTCTGGACTCCAGGCTGAGGACGAGGCTGATTATTAGTGCTGCTCATATACAAGCAGTGCCACTTNNTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAG
sc5p_v2_hs_PBMC_1k_AACTCTTGTCATCGGC-1_contig_1AGAGCTCTGGGGAGTCTGCACCATGGCTTGGACCCCACTCCTCTTCCTCACCCTCCTCCTCCACTGCACAGGGTCTCTCTCCCAGCTTGTGCTGACTCAATCGCCCTCTGCCTCTGCCTCCCTGGGAGCCTCGGTCAAGCTCACCTGCACTCTGAGCAGTGGGCACAGCAGCTACGCCATCGCATGGCATCAGCAGCAGCCAGAGAAGGGCCCTCGGTACTTGATGAAGCTTAACAGTGATGGCAGCCACAGCAAGGGGGACGGGATCCCTGATCGCTTCTCAGGCTCCAGCTCTGGGGCTGAGCGCTACCTCACCATCTCCAGCCTCCAGTCTGAGGATGAGGCTGACTATTACTGTCAGACCTGGGGCACTGGCATTTATGTCTTCGGAACTGGGACCAAGGTCACCGTCCTAGGTCAGCCCAAGGCCAACCCCACTGTCACTCTGTTCCCGCCCTCCTCTGAGGAGCTCCAAGCCAACAAGGCCACACTAGTGTGTCTGATCAGTGACTTCTACCCGGGAGCTGTGACAGTGGCCTGGAAGGCAGATGGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCAAACCCTCCAAACAGAGCAACAACAAGTACGCGGCCAGCAGCTA FALSE TRUEIGLV4-69*01NA IGLJ1*01CAGCTTGTGCTGACTCAATCGCCCTCT...GCCTCTGCCTCCCTGGGAGCCTCGGTCAAGCTCACCTGCACTCTGAGCAGTGGGCACAGC...............AGCTACGCCATCGCATGGCATCAGCAGCAGCCAGAGAAGGGCCCTCGGTACTTGATGAAGCTTAACAGTGAT.........GGCAGCCACAGCAAGGGGGACGGGATCCCT...GATCGCTTCTCAGGCTCCAGC......TCTGGGGCTGAGCGCTACCTCACCATCTCCAGCCTCCAGTCTGAGGATGAGGCTGACTATTACTGTCAGACCTGGGGCACTGGCATTTATGTCTTCGGAACTGGGACCAAGGTCACCGTCCTAG CAGCTTGTGCTGACTCAATCGCCCTCT...GCCTCTGCCTCCCTGGGAGCCTCGGTCAAGCTCACCTGCACTCTGAGCAGTGGGCACAGC...............AGCTACGCCATCGCATGGCATCAGCAGCAGCCAGAGAAGGGCCCTCGGTACTTGATGAAGCTTAACAGTGAT.........GGCAGCCACAGCAAGGGGGACGGGATCCCT...GATCGCTTCTCAGGCTCCAGC......TCTGGGGCTGAGCGCTACCTCACCATCTCCAGCCTCCAGTCTGAGGATGAGTATGTCTTCGGAACTGGGACCAAGGTCACCGTCCTAG TGTCAGACCTGGGGCACTGGCATTTATGTCTTC GGTCAGCCCAAGGCCAACCCCACTGTCACTCTGTTCCCGCCCTCCTCTGAGGAGCTCCAAGCCAACAAGGCCACACTAGTGTGTCTGATCAGTGACTTCTACCCGGGAGCTGTGACAGTGGCCTGGAAGGCAGATGGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCAAACCCTCCAAACAGAGCAACAACAAGTACGCGGCCAGCAGCTAGGTCAGCCCAAGGCCAACCCCACTGTCACTCTGTTCCCGCCCTCCTCTGAGGAGCTCCAAGCCAACAAGGCCACACTAGTGTGTCTGATCAGTGACTTCTACCCGGGAGCTGTGACAGTGGCCTGGAAGGCAGATGGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCAAACCCTCCAAACAGAGCAACAACAAGTACGCGGCCAGCAGCTA 416627392.6101001e-111sc5p_v2_hs_PBMC_1kIGLV4-69*01CAGCTTGTGCTGACTCAATCGCCCTCT...GCCTCTGCCTCCCTGGGAGCCTCGGTCAAGCTCACCTGCACTCTGAGCAGTGGGCACAGC...............AGCTACGCCATCGCATGGCATCAGCAGCAGCCAGAGAAGGGCCCTCGGTACTTGATGAAGCTTAACAGTGAT.........GGCAGCCACAGCAAGGGGGACGGGATCCCT...GATCGCTTCTCAGGCTCCAGC......TCTGGGGCTGAGCGCTACCTCACCATCTCCAGCCTCCAGTCTGAGGATGAGGCTGACTATTACTGTCAGACCTGGGGCACTGGCATTTATGTCTTCGGAACTGGGACCAAGGTCACCGTCCTAG
sc5p_v2_hs_PBMC_1k_AACTCTTGTCATCGGC-1_contig_2AGCTCTGAGAGAGGAGCCTTAGCCCTGGATTCCAAGGCCTATCCACTTGGTGATCAGCACTGAGCACCGAGGATTCACCATGGAACTGGGGCTCCGCTGGGTTTTCCTTGTTGCTATTTTAGAAGGTGTCCAGTGTGAGGTGCAGCTGGTGGAGTCTGGGGGAGGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCATCCATTAGTAGTAGTAGTAGTTACATATACTACGCAGACTCAGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGACGTTACTATGATAGTAGTGGTTATTCCGCAAACTTTGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAGGGAGTGCATCCGCCCCAACCCTTTTCCCCCTCGTCTCCTGTGAGAATTCCCCGTCGGATACGAGCAGCGTG FALSE TRUEIGHV3-21*01IGHD3-22*01IGHJ4*02GAGGTGCAGCTGGTGGAGTCTGGGGGA...GGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTC............AGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCATCCATTAGTAGTAGT......AGTAGTTACATATACTACGCAGACTCAGTGAAG...GGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGACGTTACTATGATAGTAGTGGTTATTCCGCAAACTTTGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG GAGGTGCAGCTGGTGGAGTCTGGGGGA...GGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTC............AGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCATCCATTAGTAGTAGT......AGTAGTTACATATACTACGCAGACTCAGTGAAG...GGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACNNTTACTATGATAGTAGTGGTTATTNNNNNNACTTTGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG TGTGCGAGACGTTACTATGATAGTAGTGGTTATTCCGCAAACTTTGACTACTGG GGGAGTGCATCCGCCCCAACCCTTTTCCCCCTCGTCTCCTGTGAGAATTCCCCGTCGGATACGAGCAGCGTG GGGAGTGCATCCGCCCCAACCCTTTTCCCCCTCGTCTCCTGTGAGAATTCCCCGTCGGATACGAGCAGCGTG 506577134.079100 7e-34sc5p_v2_hs_PBMC_1kIGHV3-21*01GAGGTGCAGCTGGTGGAGTCTGGGGGA...GGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTC............AGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCATCCATTAGTAGTAGT......AGTAGTTACATATACTACGCAGACTCAGTGAAG...GGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNACTTTGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG
sc5p_v2_hs_PBMC_1k_AACTCTTGTCATCGGC-1_contig_3AGCAGAGCTCTGGGGAGTCTGCACCATGGCTTGGACCCCACTCCTCTTCCTCACCCTCCTCCTCCACTGCACAGGTCAGGATGGCCCTCAGCACCCTGACCTCCAGCTCACTGATACCACCTCCCAAACTTATGCCAGGAATGTCCTTCCCTCTTTTCTTGACTCCAGCCGGTAATGGGTGTCTGTGTTTTCAGGGTCTCTCTCCCAGCTTGTGCTGACTCAATCGCCCTCTGCCTCTGCCTCCCTGGGAGCCTCGGTCAAGCTCACCTGCACTCTGAGCAGTGGGCACAGCAGCTACGCCATCGCATGGCATCAGCAGCAGCCAGAGAAGGGCCCTCGGTACTTGATGAAGCTTAACAGTGATGGCAGCCACAGCAAGGGGGACGGGATCCCTGATCGCTTCTCAGGCTCCAGCTCTGGGGCTGAGCGCTACCTCACCATCTCCAGCCTCCAGTCTGAGGATGAGGCTGACTATTACTGTCAGACCTGGGGCACTGGCATTCTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAGGTCAGCCCAAGGCTGCCCCCTCGGTCACTCTGTTCCCACCCTCCTCTGAGGAGCTTCAAGCCAACAAGGCCACACTGGTGTGTCTCATAAGTGACTTCTACCCGGGAGCCGTGACAGTGGCCTGGAAGGCAGATAGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCACACCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTAFALSEFALSEIGLV4-69*01NA IGLJ3*02CAGCTTGTGCTGACTCAATCGCCCTCT...GCCTCTGCCTCCCTGGGAGCCTCGGTCAAGCTCACCTGCACTCTGAGCAGTGGGCACAGC...............AGCTACGCCATCGCATGGCATCAGCAGCAGCCAGAGAAGGGCCCTCGGTACTTGATGAAGCTTAACAGTGAT.........GGCAGCCACAGCAAGGGGGACGGGATCCCT...GATCGCTTCTCAGGCTCCAGC......TCTGGGGCTGAGCGCTACCTCACCATCTCCAGCCTCCAGTCTGAGGATGAGGCTGACTATTACTGTCAGACCTGGGGCACTGGCATTCTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAG CAGCTTGTGCTGACTCAATCGCCCTCT...GCCTCTGCCTCCCTGGGAGCCTCGGTCAAGCTCACCTGCACTCTGAGCAGTGGGCACAGC...............AGCTACGCCATCGCATGGCATCAGCAGCAGCCAGAGAAGGGCCCTCGGTACTTGATGAAGCTTAACAGTGAT.........GGCAGCCACAGCAAGGGGGACGGGATCCCT...GATCGCTTCTCAGGCTCCAGC......TCTGGGGCTGAGCGCTACCTCACCATCTCCAGCCTCCAGTCTGAGGATGAGGTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAG TGTCAGACCTGGGGCACTGGCATTCTTGGGTGTTC GGTCAGCCCAAGGCTGCCCCCTCGGTCACTCTGTTCCCACCCTCCTCTGAGGAGCTTCAAGCCAACAAGGCCACACTGGTGTGTCTCATAAGTGACTTCTACCCGGGAGCCGTGACAGTGGCCTGGAAGGCAGATAGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCACACCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTAGGTCAGCCCAAGGCTGCCCCCTCGGTCACTCTGTTCCCACCCTCCTCTGAGGAGCTTCAAGCCAACAAGGCCACACTGGTGTGTCTCATAAGTGACTTCTACCCGGGAGCCGTGACAGTGGCCTGGAAGGCAGATAGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCACACCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTA 541752392.6101001e-111sc5p_v2_hs_PBMC_1kIGLV4-69*01CAGCTTGTGCTGACTCAATCGCCCTCT...GCCTCTGCCTCCCTGGGAGCCTCGGTCAAGCTCACCTGCACTCTGAGCAGTGGGCACAGC...............AGCTACGCCATCGCATGGCATCAGCAGCAGCCAGAGAAGGGCCCTCGGTACTTGATGAAGCTTAACAGTGAT.........GGCAGCCACAGCAAGGGGGACGGGATCCCT...GATCGCTTCTCAGGCTCCAGC......TCTGGGGCTGAGCGCTACCTCACCATCTCCAGCCTCCAGTCTGAGGATGAGGCTGACTATTACTGTCAGACCTGGGGCACTGGCATTCTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAG
[22]:
head(combined_bcr)
A tibble: 6 × 106
sequence_idsequencerev_compproductivev_calld_callj_callsequence_alignmentgermline_alignmentjunctionc_sequence_alignmentc_germline_alignmentc_sequence_startc_sequence_endc_scorec_identityc_supportsample_idv_call_genotypedgermline_alignment_d_mask
<chr><chr><lgl><lgl><chr><chr><chr><chr><chr><chr><chr><chr><dbl><dbl><dbl><dbl><dbl><chr><chr><chr>
sc5p_v2_hs_PBMC_1k_AACTCCCAGGCTAGGT-1_contig_1ACTGCGGGGGTAAGAGGTTGTGTCCACCATGGCCTGGACTCCTCTCCTCCTCCTGTTCCTCTCTCACTGCACAGGTTCCCTCTCGCAGGCTGTGCTGACTCAGCCGTCTTCCCTCTCTGCATCTCCTGGAGCATCAGCCAGTCTCACCTGCACCTTGCGCAGTGGCATCAATGTTGGTACCTACAGGATATACTGGTACCAGCAGAAGCCAGGGAGTCCTCCCCAGTATCTCCTGAGGTACAAATCAGACTCAGATAAGCAGCAGGGCTCTGGAGTCCCCAGCCGCTTCTCTGGATCCAAAGATGCTTCGGCCAATGCAGGGATTTTACTCATCTCTGGGCTCCAGTCTGAGGATGAGGCTGACTATTACTGTATGATTTGGCACAGCAGCGCTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAGGTCAGCCCAAGGCTGCCCCCTCGGTCACTCTGTTCCCGCCCTCCTCTGAGGAGCTTCAAGCCAACAAGGCCACACTGGTGTGTCTCATAAGTGACTTCTACCCGGGAGCCGTGACAGTGGCCTGGAAGGCAGATAGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCACACCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTA FALSE TRUEIGLV5-45*03NA IGLJ3*02CAGGCTGTGCTGACTCAGCCGTCTTCC...CTCTCTGCATCTCCTGGAGCATCAGCCAGTCTCACCTGCACCTTGCGCAGTGGCATCAATGTT.........GGTACCTACAGGATATACTGGTACCAGCAGAAGCCAGGGAGTCCTCCCCAGTATCTCCTGAGGTACAAATCAGAC.........TCAGATAAGCAGCAGGGCTCTGGAGTCCCC...AGCCGCTTCTCTGGATCCAAAGATGCTTCGGCCAATGCAGGGATTTTACTCATCTCTGGGCTCCAGTCTGAGGATGAGGCTGACTATTACTGTATGATTTGGCACAGCAGCGCTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAG CAGGCTGTGCTGACTCAGCCGTCTTCC...CTCTCTGCATCTCCTGGAGCATCAGCCAGTCTCACCTGCACCTTGCGCAGTGGCATCAATGTT.........GGTACCTACAGGATATACTGGTACCAGCAGAAGCCAGGGAGTCCTCCCCAGTATCTCCTGAGGTACAAATCAGAC.........TCAGATAAGCAGCAGGGCTCTGGAGTCCCC...AGCCGCTTCTCTGGATCCAAAGATGCTTCGGCCAATGCAGGGATTTTACTCATCTCTGGGCTCCAGTCTGAGGATGAGGCTGACTATTACTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAG TGTATGATTTGGCACAGCAGCGCTTGGGTGTTC GGTCAGCCCAAGGCTGCCCCCTCGGTCACTCTGTTCCCGCCCTCCTCTGAGGAGCTTCAAGCCAACAAGGCCACACTGGTGTGTCTCATAAGTGACTTCTACCCGGGAGCCGTGACAGTGGCCTGGAAGGCAGATAGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCACACCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTAGGTCAGCCCAAGGCTGCCCCCTCGGTCACTCTGTTCCCGCCCTCCTCTGAGGAGCTTCAAGCCAACAAGGCCACACTGGTGTGTCTCATAAGTGACTTCTACCCGGGAGCCGTGACAGTGGCCTGGAAGGCAGATAGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCACACCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTA 431642392.6101001e-111sc5p_v2_hs_PBMC_1kIGLV5-45*03CAGGCTGTGCTGACTCAGCCGTCTTCC...CTCTCTGCATCTCCTGGAGCATCAGCCAGTCTCACCTGCACCTTGCGCAGTGGCATCAATGTT.........GGTACCTACAGGATATACTGGTACCAGCAGAAGCCAGGGAGTCCTCCCCAGTATCTCCTGAGGTACAAATCAGAC.........TCAGATAAGCAGCAGGGCTCTGGAGTCCCC...AGCCGCTTCTCTGGATCCAAAGATGCTTCGGCCAATGCAGGGATTTTACTCATCTCTGGGCTCCAGTCTGAGGATGAGGCTGACTATTACTGTATGATTTGGCACAGCAGCGCTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAG
sc5p_v2_hs_PBMC_1k_AACTCCCAGGCTAGGT-1_contig_2ATACTTTCTGAGAGTCCTGGACCTCCTGTGCAAGAACATGAAACATCTGTGGTTCTTCCTCCTCCTGGTGGCAGCTCCCAGATGGGTCCTGTCCCAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCACAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTGGTAGTTACTACTGGAGCTGGATCCGGCAGCCCGCCGGGAAGGGACTGGAGTGGATTGGGCGTATCTATACCAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCCGTGTATTACTGTGCGAGAGAAAATTACGATTTTTGGAGTGGTTATTACCACGGTGCGGACGTCTGGGGCCAAGGGACCACGGTCACCGTCTCCTCAGGGAGTGCATCCGCCCCAACCCTTTTCCCCCTCGTCTCCTGTGAGAATTCCCCGTCGGATACGAGCAGCGTG FALSE TRUEIGHV4-61*02IGHD3-3*01 IGHJ6*02CAGGTGCAGCTGCAGGAGTCGGGCCCA...GGACTGGTGAAGCCTTCACAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGC......AGTGGTAGTTACTACTGGAGCTGGATCCGGCAGCCCGCCGGGAAGGGACTGGAGTGGATTGGGCGTATCTATACCAGT.........GGGAGCACCAACTACAACCCCTCCCTCAAG...AGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCCGTGTATTACTGTGCGAGAGAAAATTACGATTTTTGGAGTGGTTATTACCACGGTGCGGACGTCTGGGGCCAAGGGACCACGGTCACCGTCTCCTCACAGGTGCAGCTGCAGGAGTCGGGCCCA...GGACTGGTGAAGCCTTCACAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGC......AGTGGTAGTTACTACTGGAGCTGGATCCGGCAGCCCGCCGGGAAGGGACTGGAGTGGATTGGGCGTATCTATACCAGT.........GGGAGCACCAACTACAACCCCTCCCTCAAG...AGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCNNATTACGATTTTTGGAGTGGTTATTACTACGGTATGGACGTCTGGGGCCAAGGGACCACGGTCACCGTCTCCTCATGTGCGAGAGAAAATTACGATTTTTGGAGTGGTTATTACCACGGTGCGGACGTCTGGGGGAGTGCATCCGCCCCAACCCTTTTCCCCCTCGTCTCCTGTGAGAATTCCCCGTCGGATACGAGCAGCGTG GGGAGTGCATCCGCCCCAACCCTTTTCCCCCTCGTCTCCTGTGAGAATTCCCCGTCGGATACGAGCAGCGTG 470541134.079100 7e-34sc5p_v2_hs_PBMC_1kIGHV4-61*02CAGGTGCAGCTGCAGGAGTCGGGCCCA...GGACTGGTGAAGCCTTCACAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGC......AGTGGTAGTTACTACTGGAGCTGGATCCGGCAGCCCGCCGGGAAGGGACTGGAGTGGATTGGGCGTATCTATACCAGT.........GGGAGCACCAACTACAACCCCTCCCTCAAG...AGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCCGTGTATTACTGTGCGAGAGANNNNNNNNNNNNNNNNNNNNNNNNNTACTACGGTATGGACGTCTGGGGCCAAGGGACCACGGTCACCGTCTCCTCA
sc5p_v2_hs_PBMC_1k_AACTCCCAGGCTAGGT-1_contig_3GGCTGGGGTCTCAGGAGGCAGCGCTCTGGGGACGTCTCCACCATGGCCTGGGCTCTGCTCCTCCTCACCTCCTCACTCAGGGCACAGGCTCTTGGGCCCAGTCTGCCCTGATTCAGCCTCCCTCCGTGTCCGGGTCTCCTGGACAGTCAGTCACCATCTCCTGCACTGGAACCAGCAGTGATGTTGGGAGTTATGACTATGTCTCCTGGTACCAACAGCACCCAGGCACAGTCCCCAAACCCATGATCTACAATGTCAATACTCAGCCCTCAGGGGTCCCTGATCGTTTCTCTGGCTCCAAGTCTGGCAATACGGCCTCCATGACCATCTCTGGACTCCAGGCTGAGGACGAGGCTGATTATTAGTGCTGCTCATATACAAGCAGTGCCACTTTCTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAGGTCAGCCCAAGGCTGCCCCCTCGGTCACTCTGTTCCCACCCTCCTCTGAGGAGCTTCAAGCCAACAAGGCCACACTGGTGTGTCTCATAAGTGACTTCTACCCGGGAGCCGTGACAGTGGCCTGGAAGGCAGATAGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCACACCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTA FALSEFALSEIGLV2-5*01 NA IGLJ3*02CAGTCTGCCCTGATTCAGCCTCCCTCC...GTGTCCGGGTCTCCTGGACAGTCAGTCACCATCTCCTGCACTGGAACCAGCAGTGATGTTGGG.........AGTTATGACTATGTCTCCTGGTACCAACAGCACCCAGGCACAGTCCCCAAACCCATGATCTACAATGTC.....................AATACTCAGCCCTCAGGGGTCCCT...GATCGTTTCTCTGGCTCCAAG......TCTGGCAATACGGCCTCCATGACCATCTCTGGACTCCAGGCTGAGGACGAGGCTGATTATTAGTGCTGCTCATATACAAGCAGTGCCACTTTCTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAG CAGTCTGCCCTGATTCAGCCTCCCTCC...GTGTCCGGGTCTCCTGGACAGTCAGTCACCATCTCCTGCACTGGAACCAGCAGTGATGTTGGG.........AGTTATGACTATGTCTCCTGGTACCAACAGCACCCAGGCACAGTCCCCAAACCCATGATCTACAATGTC.....................AATACTCAGCCCTCAGGGGTCCCT...GATCGTTTCTCTGGCTCCAAG......TCTGGCAATACGGCCTCCATGACCATCTCTGGACTCCAGGCTGAGGACGNNTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAG TGCTGCTCATATACAAGCAGTGCCACTTTCTTGGGTGTTC GGTCAGCCCAAGGCTGCCCCCTCGGTCACTCTGTTCCCACCCTCCTCTGAGGAGCTTCAAGCCAACAAGGCCACACTGGTGTGTCTCATAAGTGACTTCTACCCGGGAGCCGTGACAGTGGCCTGGAAGGCAGATAGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCACACCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTAGGTCAGCCCAAGGCTGCCCCCTCGGTCACTCTGTTCCCACCCTCCTCTGAGGAGCTTCAAGCCAACAAGGCCACACTGGTGTGTCTCATAAGTGACTTCTACCCGGGAGCCGTGACAGTGGCCTGGAAGGCAGATAGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCACACCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTA 433644392.6101001e-111sc5p_v2_hs_PBMC_1kIGLV2-5*01 CAGTCTGCCCTGATTCAGCCTCCCTCC...GTGTCCGGGTCTCCTGGACAGTCAGTCACCATCTCCTGCACTGGAACCAGCAGTGATGTTGGG.........AGTTATGACTATGTCTCCTGGTACCAACAGCACCCAGGCACAGTCCCCAAACCCATGATCTACAATGTC.....................AATACTCAGCCCTCAGGGGTCCCT...GATCGTTTCTCTGGCTCCAAG......TCTGGCAATACGGCCTCCATGACCATCTCTGGACTCCAGGCTGAGGACGAGGCTGATTATTAGTGCTGCTCATATACAAGCAGTGCCACTTNNTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAG
sc5p_v2_hs_PBMC_1k_AACTCTTGTCATCGGC-1_contig_1AGAGCTCTGGGGAGTCTGCACCATGGCTTGGACCCCACTCCTCTTCCTCACCCTCCTCCTCCACTGCACAGGGTCTCTCTCCCAGCTTGTGCTGACTCAATCGCCCTCTGCCTCTGCCTCCCTGGGAGCCTCGGTCAAGCTCACCTGCACTCTGAGCAGTGGGCACAGCAGCTACGCCATCGCATGGCATCAGCAGCAGCCAGAGAAGGGCCCTCGGTACTTGATGAAGCTTAACAGTGATGGCAGCCACAGCAAGGGGGACGGGATCCCTGATCGCTTCTCAGGCTCCAGCTCTGGGGCTGAGCGCTACCTCACCATCTCCAGCCTCCAGTCTGAGGATGAGGCTGACTATTACTGTCAGACCTGGGGCACTGGCATTTATGTCTTCGGAACTGGGACCAAGGTCACCGTCCTAGGTCAGCCCAAGGCCAACCCCACTGTCACTCTGTTCCCGCCCTCCTCTGAGGAGCTCCAAGCCAACAAGGCCACACTAGTGTGTCTGATCAGTGACTTCTACCCGGGAGCTGTGACAGTGGCCTGGAAGGCAGATGGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCAAACCCTCCAAACAGAGCAACAACAAGTACGCGGCCAGCAGCTA FALSE TRUEIGLV4-69*01NA IGLJ1*01CAGCTTGTGCTGACTCAATCGCCCTCT...GCCTCTGCCTCCCTGGGAGCCTCGGTCAAGCTCACCTGCACTCTGAGCAGTGGGCACAGC...............AGCTACGCCATCGCATGGCATCAGCAGCAGCCAGAGAAGGGCCCTCGGTACTTGATGAAGCTTAACAGTGAT.........GGCAGCCACAGCAAGGGGGACGGGATCCCT...GATCGCTTCTCAGGCTCCAGC......TCTGGGGCTGAGCGCTACCTCACCATCTCCAGCCTCCAGTCTGAGGATGAGGCTGACTATTACTGTCAGACCTGGGGCACTGGCATTTATGTCTTCGGAACTGGGACCAAGGTCACCGTCCTAG CAGCTTGTGCTGACTCAATCGCCCTCT...GCCTCTGCCTCCCTGGGAGCCTCGGTCAAGCTCACCTGCACTCTGAGCAGTGGGCACAGC...............AGCTACGCCATCGCATGGCATCAGCAGCAGCCAGAGAAGGGCCCTCGGTACTTGATGAAGCTTAACAGTGAT.........GGCAGCCACAGCAAGGGGGACGGGATCCCT...GATCGCTTCTCAGGCTCCAGC......TCTGGGGCTGAGCGCTACCTCACCATCTCCAGCCTCCAGTCTGAGGATGAGTATGTCTTCGGAACTGGGACCAAGGTCACCGTCCTAG TGTCAGACCTGGGGCACTGGCATTTATGTCTTC GGTCAGCCCAAGGCCAACCCCACTGTCACTCTGTTCCCGCCCTCCTCTGAGGAGCTCCAAGCCAACAAGGCCACACTAGTGTGTCTGATCAGTGACTTCTACCCGGGAGCTGTGACAGTGGCCTGGAAGGCAGATGGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCAAACCCTCCAAACAGAGCAACAACAAGTACGCGGCCAGCAGCTAGGTCAGCCCAAGGCCAACCCCACTGTCACTCTGTTCCCGCCCTCCTCTGAGGAGCTCCAAGCCAACAAGGCCACACTAGTGTGTCTGATCAGTGACTTCTACCCGGGAGCTGTGACAGTGGCCTGGAAGGCAGATGGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCAAACCCTCCAAACAGAGCAACAACAAGTACGCGGCCAGCAGCTA 416627392.6101001e-111sc5p_v2_hs_PBMC_1kIGLV4-69*01CAGCTTGTGCTGACTCAATCGCCCTCT...GCCTCTGCCTCCCTGGGAGCCTCGGTCAAGCTCACCTGCACTCTGAGCAGTGGGCACAGC...............AGCTACGCCATCGCATGGCATCAGCAGCAGCCAGAGAAGGGCCCTCGGTACTTGATGAAGCTTAACAGTGAT.........GGCAGCCACAGCAAGGGGGACGGGATCCCT...GATCGCTTCTCAGGCTCCAGC......TCTGGGGCTGAGCGCTACCTCACCATCTCCAGCCTCCAGTCTGAGGATGAGGCTGACTATTACTGTCAGACCTGGGGCACTGGCATTTATGTCTTCGGAACTGGGACCAAGGTCACCGTCCTAG
sc5p_v2_hs_PBMC_1k_AACTCTTGTCATCGGC-1_contig_2AGCTCTGAGAGAGGAGCCTTAGCCCTGGATTCCAAGGCCTATCCACTTGGTGATCAGCACTGAGCACCGAGGATTCACCATGGAACTGGGGCTCCGCTGGGTTTTCCTTGTTGCTATTTTAGAAGGTGTCCAGTGTGAGGTGCAGCTGGTGGAGTCTGGGGGAGGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCATCCATTAGTAGTAGTAGTAGTTACATATACTACGCAGACTCAGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGACGTTACTATGATAGTAGTGGTTATTCCGCAAACTTTGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAGGGAGTGCATCCGCCCCAACCCTTTTCCCCCTCGTCTCCTGTGAGAATTCCCCGTCGGATACGAGCAGCGTG FALSE TRUEIGHV3-21*01IGHD3-22*01IGHJ4*02GAGGTGCAGCTGGTGGAGTCTGGGGGA...GGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTC............AGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCATCCATTAGTAGTAGT......AGTAGTTACATATACTACGCAGACTCAGTGAAG...GGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGACGTTACTATGATAGTAGTGGTTATTCCGCAAACTTTGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG GAGGTGCAGCTGGTGGAGTCTGGGGGA...GGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTC............AGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCATCCATTAGTAGTAGT......AGTAGTTACATATACTACGCAGACTCAGTGAAG...GGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACNNTTACTATGATAGTAGTGGTTATTNNNNNNACTTTGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG TGTGCGAGACGTTACTATGATAGTAGTGGTTATTCCGCAAACTTTGACTACTGG GGGAGTGCATCCGCCCCAACCCTTTTCCCCCTCGTCTCCTGTGAGAATTCCCCGTCGGATACGAGCAGCGTG GGGAGTGCATCCGCCCCAACCCTTTTCCCCCTCGTCTCCTGTGAGAATTCCCCGTCGGATACGAGCAGCGTG 506577134.079100 7e-34sc5p_v2_hs_PBMC_1kIGHV3-21*01GAGGTGCAGCTGGTGGAGTCTGGGGGA...GGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTC............AGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCATCCATTAGTAGTAGT......AGTAGTTACATATACTACGCAGACTCAGTGAAG...GGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNACTTTGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG
sc5p_v2_hs_PBMC_1k_AACTCTTGTCATCGGC-1_contig_3AGCAGAGCTCTGGGGAGTCTGCACCATGGCTTGGACCCCACTCCTCTTCCTCACCCTCCTCCTCCACTGCACAGGTCAGGATGGCCCTCAGCACCCTGACCTCCAGCTCACTGATACCACCTCCCAAACTTATGCCAGGAATGTCCTTCCCTCTTTTCTTGACTCCAGCCGGTAATGGGTGTCTGTGTTTTCAGGGTCTCTCTCCCAGCTTGTGCTGACTCAATCGCCCTCTGCCTCTGCCTCCCTGGGAGCCTCGGTCAAGCTCACCTGCACTCTGAGCAGTGGGCACAGCAGCTACGCCATCGCATGGCATCAGCAGCAGCCAGAGAAGGGCCCTCGGTACTTGATGAAGCTTAACAGTGATGGCAGCCACAGCAAGGGGGACGGGATCCCTGATCGCTTCTCAGGCTCCAGCTCTGGGGCTGAGCGCTACCTCACCATCTCCAGCCTCCAGTCTGAGGATGAGGCTGACTATTACTGTCAGACCTGGGGCACTGGCATTCTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAGGTCAGCCCAAGGCTGCCCCCTCGGTCACTCTGTTCCCACCCTCCTCTGAGGAGCTTCAAGCCAACAAGGCCACACTGGTGTGTCTCATAAGTGACTTCTACCCGGGAGCCGTGACAGTGGCCTGGAAGGCAGATAGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCACACCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTAFALSEFALSEIGLV4-69*01NA IGLJ3*02CAGCTTGTGCTGACTCAATCGCCCTCT...GCCTCTGCCTCCCTGGGAGCCTCGGTCAAGCTCACCTGCACTCTGAGCAGTGGGCACAGC...............AGCTACGCCATCGCATGGCATCAGCAGCAGCCAGAGAAGGGCCCTCGGTACTTGATGAAGCTTAACAGTGAT.........GGCAGCCACAGCAAGGGGGACGGGATCCCT...GATCGCTTCTCAGGCTCCAGC......TCTGGGGCTGAGCGCTACCTCACCATCTCCAGCCTCCAGTCTGAGGATGAGGCTGACTATTACTGTCAGACCTGGGGCACTGGCATTCTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAG CAGCTTGTGCTGACTCAATCGCCCTCT...GCCTCTGCCTCCCTGGGAGCCTCGGTCAAGCTCACCTGCACTCTGAGCAGTGGGCACAGC...............AGCTACGCCATCGCATGGCATCAGCAGCAGCCAGAGAAGGGCCCTCGGTACTTGATGAAGCTTAACAGTGAT.........GGCAGCCACAGCAAGGGGGACGGGATCCCT...GATCGCTTCTCAGGCTCCAGC......TCTGGGGCTGAGCGCTACCTCACCATCTCCAGCCTCCAGTCTGAGGATGAGGTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAG TGTCAGACCTGGGGCACTGGCATTCTTGGGTGTTC GGTCAGCCCAAGGCTGCCCCCTCGGTCACTCTGTTCCCACCCTCCTCTGAGGAGCTTCAAGCCAACAAGGCCACACTGGTGTGTCTCATAAGTGACTTCTACCCGGGAGCCGTGACAGTGGCCTGGAAGGCAGATAGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCACACCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTAGGTCAGCCCAAGGCTGCCCCCTCGGTCACTCTGTTCCCACCCTCCTCTGAGGAGCTTCAAGCCAACAAGGCCACACTGGTGTGTCTCATAAGTGACTTCTACCCGGGAGCCGTGACAGTGGCCTGGAAGGCAGATAGCAGCCCCGTCAAGGCGGGAGTGGAGACCACCACACCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTA 541752392.6101001e-111sc5p_v2_hs_PBMC_1kIGLV4-69*01CAGCTTGTGCTGACTCAATCGCCCTCT...GCCTCTGCCTCCCTGGGAGCCTCGGTCAAGCTCACCTGCACTCTGAGCAGTGGGCACAGC...............AGCTACGCCATCGCATGGCATCAGCAGCAGCCAGAGAAGGGCCCTCGGTACTTGATGAAGCTTAACAGTGAT.........GGCAGCCACAGCAAGGGGGACGGGATCCCT...GATCGCTTCTCAGGCTCCAGC......TCTGGGGCTGAGCGCTACCTCACCATCTCCAGCCTCCAGTCTGAGGATGAGGCTGACTATTACTGTCAGACCTGGGGCACTGGCATTCTTGGGTGTTCGGCGGAGGGACCAAGCTGACCGTCCTAG

Run ddl$pp$filter_bcr

[23]:
vdj_results_list = ddl$pp$filter_bcr(combined_bcr, adata)
vdj_results_list
[[1]]
Dandelion class object with n_obs = 971 and n_contigs = 1959
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'locus', 'stop_codon', 'vj_in_frame', 'sequence_alignment_aa', 'germline_alignment_aa', 'v_alignment_start', 'v_alignment_end', 'd_alignment_start', 'd_alignment_end', 'j_alignment_start', 'j_alignment_end', 'v_sequence_alignment', 'v_sequence_alignment_aa', 'v_germline_alignment', 'v_germline_alignment_aa', 'd_sequence_alignment', 'd_sequence_alignment_aa', 'd_germline_alignment', 'd_germline_alignment_aa', 'j_sequence_alignment', 'j_sequence_alignment_aa', 'j_germline_alignment', 'j_germline_alignment_aa', 'fwr1', 'fwr1_aa', 'cdr1', 'cdr1_aa', 'fwr2', 'fwr2_aa', 'cdr2', 'cdr2_aa', 'fwr3', 'fwr3_aa', 'fwr4', 'fwr4_aa', 'cdr3', 'cdr3_aa', 'junction_length', 'v_score', 'd_score', 'j_score', 'v_support', 'd_support', 'j_support', 'v_identity', 'd_identity', 'j_identity', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'fwr1_start', 'fwr1_end', 'cdr1_start', 'cdr1_end', 'fwr2_start', 'fwr2_end', 'cdr2_start', 'cdr2_end', 'fwr3_start', 'fwr3_end', 'fwr4_start', 'fwr4_end', 'cdr3_start', 'cdr3_end', 'np1', 'np1_length', 'np2', 'np2_length', 'junction_aa_length', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'sample_id', 'v_call_genotyped', 'germline_alignment_d_mask', 'duplicate_count'
    metadata: 'sample_id', 'isotype', 'lightchain', 'status', 'vdj_status', 'productive', 'umi_counts_heavy', 'umi_counts_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light'
    distance: None
    edges: None
    layout: None
    graph: None

[[2]]
View of AnnData object with n_obs × n_vars = 18413 × 38224
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'RNA_snn_res.0.8', 'seurat_clusters', 'filter_rna', 'has_bcr', 'filter_bcr_quality', 'filter_bcr_heavy', 'filter_bcr_light', 'bcr_QC_pass', 'filter_bcr'
    uns: 'neighbors'
    obsm: 'X_pca'
    obsp: 'distances', 'connectivities'

This returns a two level list in R. The first level is the VDJ results, stored as a Dandelion python-class object and the second level is the accompanying AnnData object.

The Dandelion class is structured like a multi-slot object and the two data frames below are linked: 1) data <- BCR table with row names as individual vdj contigs

  1. metadata <- BCR table collapsed to cell barcodes as row names

More details on the Dandelion class are in my python notebooks. The most important slot for now, is the metadata slot within Dandelion.

In order for the metadata to form properly, there must not be any duplicate barcodes, or incorrectly retrieved information from the data slot. If you end up with a Dandelion object that only contains the data slot filled, it means one of the two conditions happened. In those situations, I would recommend you to send me a copy of the file so I can check why it’s failing; it is usually due to coding eror that arise from string and float incompatibilities when constructing the object.

To save the Dandelion object, you can do the following:

[24]:
vdj_results_list[[1]]$write_pkl('vdj_save.pkl.pbz2')

The .pkl.pbz2 extension is basically a bzip2-compressed pickle file format from python.

You can also save using $write_h5.

[25]:
vdj_results_list[[1]]$write_h5('vdj_save.h5')

To read the file back into R, you can do the following:

[26]:
vdj_data <- ddl$read_pkl('vdj_save.pkl.pbz2')
vdj_data
Dandelion class object with n_obs = 971 and n_contigs = 1959
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'locus', 'stop_codon', 'vj_in_frame', 'sequence_alignment_aa', 'germline_alignment_aa', 'v_alignment_start', 'v_alignment_end', 'd_alignment_start', 'd_alignment_end', 'j_alignment_start', 'j_alignment_end', 'v_sequence_alignment', 'v_sequence_alignment_aa', 'v_germline_alignment', 'v_germline_alignment_aa', 'd_sequence_alignment', 'd_sequence_alignment_aa', 'd_germline_alignment', 'd_germline_alignment_aa', 'j_sequence_alignment', 'j_sequence_alignment_aa', 'j_germline_alignment', 'j_germline_alignment_aa', 'fwr1', 'fwr1_aa', 'cdr1', 'cdr1_aa', 'fwr2', 'fwr2_aa', 'cdr2', 'cdr2_aa', 'fwr3', 'fwr3_aa', 'fwr4', 'fwr4_aa', 'cdr3', 'cdr3_aa', 'junction_length', 'v_score', 'd_score', 'j_score', 'v_support', 'd_support', 'j_support', 'v_identity', 'd_identity', 'j_identity', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'fwr1_start', 'fwr1_end', 'cdr1_start', 'cdr1_end', 'fwr2_start', 'fwr2_end', 'cdr2_start', 'cdr2_end', 'fwr3_start', 'fwr3_end', 'fwr4_start', 'fwr4_end', 'cdr3_start', 'cdr3_end', 'np1', 'np1_length', 'np2', 'np2_length', 'junction_aa_length', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'sample_id', 'v_call_genotyped', 'germline_alignment_d_mask', 'duplicate_count'
    metadata: 'sample_id', 'isotype', 'lightchain', 'status', 'vdj_status', 'productive', 'umi_counts_heavy', 'umi_counts_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light'
    distance: None
    edges: None
    layout: None
    graph: None

or

[27]:
vdj_data2 <- ddl$read_h5('vdj_save.h5')
vdj_data2
Dandelion class object with n_obs = 971 and n_contigs = 1959
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'locus', 'stop_codon', 'vj_in_frame', 'sequence_alignment_aa', 'germline_alignment_aa', 'v_alignment_start', 'v_alignment_end', 'd_alignment_start', 'd_alignment_end', 'j_alignment_start', 'j_alignment_end', 'v_sequence_alignment', 'v_sequence_alignment_aa', 'v_germline_alignment', 'v_germline_alignment_aa', 'd_sequence_alignment', 'd_sequence_alignment_aa', 'd_germline_alignment', 'd_germline_alignment_aa', 'j_sequence_alignment', 'j_sequence_alignment_aa', 'j_germline_alignment', 'j_germline_alignment_aa', 'fwr1', 'fwr1_aa', 'cdr1', 'cdr1_aa', 'fwr2', 'fwr2_aa', 'cdr2', 'cdr2_aa', 'fwr3', 'fwr3_aa', 'fwr4', 'fwr4_aa', 'cdr3', 'cdr3_aa', 'junction_length', 'v_score', 'd_score', 'j_score', 'v_support', 'd_support', 'j_support', 'v_identity', 'd_identity', 'j_identity', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'fwr1_start', 'fwr1_end', 'cdr1_start', 'cdr1_end', 'fwr2_start', 'fwr2_end', 'cdr2_start', 'cdr2_end', 'fwr3_start', 'fwr3_end', 'fwr4_start', 'fwr4_end', 'cdr3_start', 'cdr3_end', 'np1', 'np1_length', 'np2', 'np2_length', 'junction_aa_length', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'sample_id', 'v_call_genotyped', 'germline_alignment_d_mask', 'duplicate_count'
    metadata: 'sample_id', 'isotype', 'lightchain', 'status', 'vdj_status', 'productive', 'umi_counts_heavy', 'umi_counts_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light'
    distance: None
    edges: None
    layout: None
    graph: None

Finding clones

Dandelion comes with a method to define clones based on V-J gene usuage and CDR3 junction similarity but you can always run Immcantation/Change-O’s DefineClones with the filtered file from earlier using their tutorial. To use dandelion’s you just need to do the following:

[28]:
ddl$tl$find_clones(vdj_data)

Calculating size of clones

Sometimes it’s useful to evaluate the size of the clone. Here ddl$tl$clone_size does a simple calculation to enable that.

[29]:
ddl$tl$clone_size(vdj_data)

You can also specify max_size to clip off the calculation at a fixed value.

[30]:
ddl$tl$clone_size(vdj_data, max_size = 3)

I have blitz through the last 3 functions without showing you the output but don’t worry, they are all stashed in the Dandelion object.

[31]:
# compare the column names in the metadata slot of the object below with the one above.
vdj_data
Dandelion class object with n_obs = 971 and n_contigs = 1959
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'locus', 'stop_codon', 'vj_in_frame', 'sequence_alignment_aa', 'germline_alignment_aa', 'v_alignment_start', 'v_alignment_end', 'd_alignment_start', 'd_alignment_end', 'j_alignment_start', 'j_alignment_end', 'v_sequence_alignment', 'v_sequence_alignment_aa', 'v_germline_alignment', 'v_germline_alignment_aa', 'd_sequence_alignment', 'd_sequence_alignment_aa', 'd_germline_alignment', 'd_germline_alignment_aa', 'j_sequence_alignment', 'j_sequence_alignment_aa', 'j_germline_alignment', 'j_germline_alignment_aa', 'fwr1', 'fwr1_aa', 'cdr1', 'cdr1_aa', 'fwr2', 'fwr2_aa', 'cdr2', 'cdr2_aa', 'fwr3', 'fwr3_aa', 'fwr4', 'fwr4_aa', 'cdr3', 'cdr3_aa', 'junction_length', 'v_score', 'd_score', 'j_score', 'v_support', 'd_support', 'j_support', 'v_identity', 'd_identity', 'j_identity', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'fwr1_start', 'fwr1_end', 'cdr1_start', 'cdr1_end', 'fwr2_start', 'fwr2_end', 'cdr2_start', 'cdr2_end', 'fwr3_start', 'fwr3_end', 'fwr4_start', 'fwr4_end', 'cdr3_start', 'cdr3_end', 'np1', 'np1_length', 'np2', 'np2_length', 'junction_aa_length', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'sample_id', 'v_call_genotyped', 'germline_alignment_d_mask', 'duplicate_count', 'clone_id'
    metadata: 'sample_id', 'clone_id', 'clone_id_by_size', 'isotype', 'lightchain', 'status', 'vdj_status', 'productive', 'umi_counts_heavy', 'umi_counts_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'clone_id_size', 'clone_id_size_max_3.0'
    distance: None
    edges: None
    layout: None
    graph: None

Generate BCR network visualization

The name of Dandelion came from the way it visualizes the BCR as networks, that look like dandelions in the field. We need to first generate the network. We will hopewfully visualize in Seurat later.

[32]:
ddl$tl$generate_network(vdj_data)

Integrating with Seurat

At this point, you might want to transfer the metadata slot back to Seurat so you can visualise some things. You can do that column by column directly from Dandelion object like as follows:

isotype = unlist(vdj_data$metadata$isotype) # because of the python to R conversion, it thinks it's a list rather than a vector. we can correct this with unlist
names(isotype) <- row.names(vdj_data$metadata)
merged <- AddMetaData(merged, isotype, 'isotype')
DimPlot(merged, group.by = 'isotype')

I will demonstrate how to do this via the AnnData object.

[33]:
adata2 = vdj_results_list[[2]]
adata2
View of AnnData object with n_obs × n_vars = 18413 × 38224
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'RNA_snn_res.0.8', 'seurat_clusters', 'filter_rna', 'has_bcr', 'filter_bcr_quality', 'filter_bcr_heavy', 'filter_bcr_light', 'bcr_QC_pass', 'filter_bcr'
    uns: 'neighbors'
    obsm: 'X_pca'
    obsp: 'distances', 'connectivities'

Transfer ``Dandelion`` to ``AnnData``

ddl$tl$transfer will act to transfer the metadata and graph slots from Dandelion object to AnnData.

[34]:
ddl$tl$transfer(adata2, vdj_data) # switch expanded_only to TRUE if you only want to get the coordinates for expanded clones

This will populate the adata .obs slot with the metadata from the Dandelion object.

[35]:
adata2
AnnData object with n_obs × n_vars = 18413 × 38224
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'RNA_snn_res.0.8', 'seurat_clusters', 'filter_rna', 'has_bcr', 'filter_bcr_quality', 'filter_bcr_heavy', 'filter_bcr_light', 'bcr_QC_pass', 'filter_bcr', 'sample_id', 'clone_id', 'clone_id_by_size', 'isotype', 'lightchain', 'status', 'vdj_status', 'productive', 'umi_counts_heavy', 'umi_counts_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'clone_id_size', 'clone_id_size_max_3.0'
    uns: 'neighbors', 'rna_neighbors'
    obsm: 'X_pca', 'X_bcr'
    obsp: 'distances', 'connectivities', 'rna_connectivities', 'rna_distances', 'bcr_connectivities', 'bcr_distances'

Saving

There may be some issues with conversions between python and R and vice versa. So my recommendation at this stage is to save the three objects separately and load them up in a fresh session. There’s a high chance your session will crash if you ignore this.

[36]:
adata2$write('adata_test.h5ad', compression = 'gzip')
[37]:
saveRDS(merged, 'merged.RDS')
[38]:
vdj_data$write_pkl('vdj_save.pkl.pbz2')
[ ]:

New Session: Transfer ``AnnData`` to ``Seurat``

We start a new session and read in the files.

[1]:
setwd('/Users/kt16/Downloads/dandelion_tutorial_R/')
library(reticulate)
ddl = import_from_path('dandelion', path = '/Users/kt16/Documents/Github/dandelion')
# ddl = import('dandelion')
sc = import('scanpy')
[2]:
library(Seurat)
samples = c('sc5p_v2_hs_PBMC_1k', 'sc5p_v2_hs_PBMC_10k', 'vdj_v1_hs_pbmc3', 'vdj_nextgem_hs_pbmc3')
adata = sc$read_h5ad('adata_test.h5ad')
# vdj = ddl$read('vdj_save.pkl.pbz2') # don't need this at this stage
merged = readRDS('merged.RDS')
[3]:
adata
AnnData object with n_obs × n_vars = 18413 × 38224
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'RNA_snn_res.0.8', 'seurat_clusters', 'filter_rna', 'has_bcr', 'filter_bcr_quality', 'filter_bcr_heavy', 'filter_bcr_light', 'bcr_QC_pass', 'filter_bcr', 'sample_id', 'clone_id', 'clone_id_by_size', 'isotype', 'lightchain', 'status', 'vdj_status', 'productive', 'umi_counts_heavy', 'umi_counts_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'clone_id_size', 'clone_id_size_max_3.0'
    uns: 'neighbors', 'rna_neighbors'
    obsm: 'X_bcr', 'X_pca'
    obsp: 'bcr_connectivities', 'bcr_distances', 'connectivities', 'distances', 'rna_connectivities', 'rna_distances'
[4]:
merged
An object of class Seurat
38224 features across 18634 samples within 1 assay
Active assay: RNA (38224 features, 2000 variable features)
 2 dimensional reductions calculated: pca, umap

So there are a few cells missing from the AnnData object because they were filtered away. Let’s do a simple merge to populate the Seurat object’s meta.data slot.

[5]:
merged_meta = merged@meta.data
head(merged_meta)
A data.frame: 6 × 7
orig.identnCount_RNAnFeature_RNApercent.mtRNA_snn_res.0.8seurat_clustersfilter_rna
<chr><dbl><int><dbl><fct><fct><lgl>
sc5p_v2_hs_PBMC_1k_AAACCTGCAGCCTGTGSeuratProject502918193.5792401919FALSE
sc5p_v2_hs_PBMC_1k_AAACGGGTCGCTGATASeuratProject434317382.4407095 5 FALSE
sc5p_v2_hs_PBMC_1k_AAAGCAAAGTATGACASeuratProject499917962.8805765 5 FALSE
sc5p_v2_hs_PBMC_1k_AAATGCCCACTTAAGCSeuratProject499815303.2813132 2 FALSE
sc5p_v2_hs_PBMC_1k_AACACGTCAACAACCTSeuratProject481815792.2831052 2 FALSE
sc5p_v2_hs_PBMC_1k_AACACGTGTTATCCGASeuratProject756422202.5251195 5 FALSE
[6]:
# extract the metadata from the anndata object
adata_meta = adata$obs

If you run into issues with the conversion, unfortunately there’s not much I can do about it. One alternative is to just transfer the .obs slot from the dandelion object one by one as above into the seurat object.

[7]:
# merge into a data frame and then make sure the format is correct
merged_data <- as.data.frame(merge(merged_meta, adata_meta, by=0, all=TRUE))
rownames(merged_data) <- merged_data[,1]
merged_data[,1] <- NULL
merged_data <- as.data.frame(apply(merged_data, 2, function(x) plyr::revalue(x, c("NaN"=NA, 'nan' = NA)))) # just replace the NAs manually
head(merged_data)
The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN

The following `from` values were not present in `x`: NaN

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN

The following `from` values were not present in `x`: NaN

The following `from` values were not present in `x`: NaN

The following `from` values were not present in `x`: NaN

The following `from` values were not present in `x`: NaN

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN

The following `from` values were not present in `x`: NaN

The following `from` values were not present in `x`: NaN

The following `from` values were not present in `x`: NaN

The following `from` values were not present in `x`: NaN

The following `from` values were not present in `x`: NaN

The following `from` values were not present in `x`: NaN

The following `from` values were not present in `x`: NaN, nan

The following `from` values were not present in `x`: NaN, nan

A data.frame: 6 × 38
orig.ident.xnCount_RNA.xnFeature_RNA.xpercent.mt.xRNA_snn_res.0.8.xseurat_clusters.xfilter_rna.xorig.ident.ynCount_RNA.ynFeature_RNA.yumi_counts_heavyumi_counts_lightv_call_heavyv_call_lightj_call_heavyj_call_lightc_call_heavyc_call_lightclone_id_sizeclone_id_size_max_3.0
<chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr>
sc5p_v2_hs_PBMC_10k_AAACCTGAGACAGACCSeuratProject 475119824.251736483 3 FALSESeuratProject 47511982NANANANANANANANANA1
sc5p_v2_hs_PBMC_10k_AAACCTGAGCGATAGCSeuratProject 576717611.924744232 2 FALSESeuratProject 57671761NANANANANANANANANA1
sc5p_v2_hs_PBMC_10k_AAACCTGAGCGGCTTCSeuratProject 499119572.504508113 3 FALSESeuratProject 49911957NANANANANANANANANA1
sc5p_v2_hs_PBMC_10k_AAACCTGAGGATCGCASeuratProject 601223103.692614773 3 FALSESeuratProject 60122310NANANANANANANANANANA
sc5p_v2_hs_PBMC_10k_AAACCTGAGTCACGCCSeuratProject 726723152.174212195 5 FALSESeuratProject 72672315NANANANANANANANANA1
sc5p_v2_hs_PBMC_10k_AAACCTGCACGTCAGCSeuratProject 567119503.826485631616FALSESeuratProject 56711950NANANANANANANANANANA
[8]:
# now just replace the current Seurat@meta.data
merged@meta.data = merged_data
[9]:
options(repr.plot.width = 6, repr.plot.height = 12, repr.plot.res = 200)
DimPlot(merged, group.by = c('bcr_QC_pass', 'isotype', 'status'), ncol = 1)
../_images/notebooks_5_dandelion_running_from_R_89_0.png

If you want to visualise the BCR network, you will have to subset to cells that contain BCR.

[10]:
merged_bcr = subset(merged, subset = bcr_QC_pass == 'True')
merged_bcr
An object of class Seurat
38224 features across 971 samples within 1 assay
Active assay: RNA (38224 features, 2000 variable features)
 2 dimensional reductions calculated: pca, umap

Plotting BCR network

[11]:
X_bcr = adata$obsm['X_bcr']
X_bcr <- apply(X_bcr, 2, function(x) gsub('NaN', NA, x)) # convert python NAs to R NAs
X_bcr <- apply(X_bcr, 2, function(x) as.numeric(x)) # Make sure they are actually numbers
X_bcr <- as.matrix(X_bcr)
row.names(X_bcr) <- row.names(adata$obs) # will not work if the anndata .obs slot is malformed. the row.names for adata$obs and row.names(merged_bcr@meta.data) should be identical anyway, so just replace with row.names(merged_bcr@meta.data)
X_bcr <- X_bcr[!is.na(X_bcr[,1]), ]
colnames(X_bcr) <- c('BCR_1', 'BCR_2')
[12]:
merged_bcr[["bcr"]] <- CreateDimReducObject(embeddings = X_bcr, key = "BCR_", assay = DefaultAssay(merged_bcr))
[13]:
options(repr.plot.width = 6, repr.plot.height = 5, repr.plot.res = 200)
DimPlot(merged_bcr, reduction = 'bcr')
../_images/notebooks_5_dandelion_running_from_R_95_0.png
[14]:
DimPlot(merged_bcr, reduction = 'bcr', group.by = 'isotype')
../_images/notebooks_5_dandelion_running_from_R_96_0.png

This concludes a quick primer on how to use dandelion from R. It may be a bit buggy from time to time due to how reticulate works but hopefully it will be overall alright.

The rest of the functions could be potentially run from R (just be changing . to $ for example), but I haven’t tested it. Might be easier to run it through python, or maybe with the new RStudio 4?

[15]:
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Seurat_3.9.9.9010 reticulate_1.18

loaded via a namespace (and not attached):
  [1] nlme_3.1-148          matrixStats_0.57.0    RcppAnnoy_0.0.17
  [4] RColorBrewer_1.1-2    httr_1.4.2            repr_1.1.0
  [7] sctransform_0.3.1     tools_4.0.2           R6_2.5.0
 [10] irlba_2.3.3           rpart_4.1-15          KernSmooth_2.23-17
 [13] uwot_0.1.9            mgcv_1.8-31           lazyeval_0.2.2
 [16] colorspace_2.0-0      tidyselect_1.1.0      gridExtra_2.3
 [19] compiler_4.0.2        plotly_4.9.2.1        labeling_0.4.2
 [22] scales_1.1.1          lmtest_0.9-38         spatstat.data_1.5-2
 [25] ggridges_0.5.2        pbapply_1.4-3         goftest_1.2-2
 [28] spatstat_1.64-1       pbdZMQ_0.3-4          stringr_1.4.0
 [31] digest_0.6.27         spatstat.utils_1.17-0 base64enc_0.1-3
 [34] pkgconfig_2.0.3       htmltools_0.5.0       parallelly_1.21.0
 [37] fastmap_1.0.1         htmlwidgets_1.5.2     rlang_0.4.9
 [40] shiny_1.5.0           farver_2.0.3          generics_0.1.0
 [43] zoo_1.8-8             jsonlite_1.7.1        ica_1.0-2
 [46] dplyr_1.0.2           magrittr_2.0.1        patchwork_0.0.1.9000
 [49] Matrix_1.2-18         Rcpp_1.0.5            IRkernel_1.1.1
 [52] munsell_0.5.0         abind_1.4-5           lifecycle_0.2.0
 [55] stringi_1.5.3         MASS_7.3-51.6         Rtsne_0.15
 [58] plyr_1.8.6            grid_4.0.2            parallel_4.0.2
 [61] listenv_0.8.0         promises_1.1.1        ggrepel_0.8.2
 [64] crayon_1.3.4          deldir_0.2-3          miniUI_0.1.1.1
 [67] lattice_0.20-41       IRdisplay_0.7.0       cowplot_1.1.0
 [70] splines_4.0.2         tensor_1.5            pillar_1.4.7
 [73] igraph_1.2.6          uuid_0.1-4            future.apply_1.6.0
 [76] reshape2_1.4.4        codetools_0.2-16      leiden_0.3.6
 [79] glue_1.4.2            evaluate_0.14         data.table_1.13.2
 [82] vctrs_0.3.5           png_0.1-7             httpuv_1.5.4
 [85] polyclip_1.10-0       gtable_0.3.0          RANN_2.6.1
 [88] purrr_0.3.4           tidyr_1.1.2           future_1.20.1
 [91] ggplot2_3.3.2         rsvd_1.0.3            mime_0.9
 [94] xtable_1.8-4          later_1.1.0.1         survival_3.1-12
 [97] viridisLite_0.3.0     tibble_3.0.4          cluster_2.1.0
[100] globals_0.14.0        fitdistrplus_1.1-3    ellipsis_0.3.1
[103] ROCR_1.0-11
[ ]: