Detect CRISPR Cas9 editing effect

Coniguration

gRNA_region_coordinates_ori.txt

Column 1: chromosome (same as in bam file) Column 2: start (1-based) Column 3: end (1-based) Column 4: Name (unique) eg. Region2, Region3 … Column 5: +/- strand sgRNA targets to which strand Column 6: designed protospacer sequence. By default, PAM is also given following protospacer sequence. Column 7: target gene (gene symbol)

By default, cut site is defined as 3nt to 4nt upstream (5’) of the Protospacer Adjacent Motif (PAM).

Requiremenets

  • 2bit genome downloaded from UCSC

  • umi_tools

Simplified process

The script first generates coordinates based on input window size, and reference sequence and later they are used by Samtools to split alignment file (bam file) into bam files which contain reads that aligned to window region. FeatureCount from UMI-tools is employed to filter out reads that not align properly to this target gene. Consensus sequences are generated by union reads within detection window with the same UMI and cell barcode. Reference sequence for each region is compared with the consensus sequence to identify mutations. Deletions that pass cutsite are defined as editing effect.

Output files

See section