Metadata-Version: 2.4
Name: naapam
Version: 0.1.13
Summary: Chip-based CRISPR analysis
Author-Email: ljw <ljw2017@sjtu.edu.cn>
License-Expression: MIT
License-File: LICENSE.md
Requires-Python: >=3.13
Requires-Dist: biopython>=1.86
Requires-Dist: pandas[feather]>=3.0.0
Requires-Dist: plotnine>=0.15.2
Requires-Dist: pysam>=0.23.3
Requires-Dist: pyyaml>=6.0.3
Requires-Dist: statsmodels>=0.14.6
Description-Content-Type: text/markdown

# Introduction

This package is used to analyze CRISPR editing data from chip synthesis. It is based on the following assumptions:
  - The synthesized plasmids can be very different from the designed plasmids. We classify the synthesized plasmids as functional and non-functional. Only functional plasmids can induce CRISPR editing.
  - The top common sythesized functional plasmids instead of the designed plasmids should be used as editing reference.
  - If a treat read has more than one reference functional plasmids (based on barcode), we distribute the read count as follows.
    - Normalize the reference count to get the priori distribution for read count.
    - Normalize the alignment score by a temperature and use softmax to calculate the conditional probability.
    - Compose the priori distribution and conditional probability to get the posteriori distribution of read count across all references.
  - Both functional and non-functional plasmids are transferred to treat samples.
  - With the functional plasmids as reference, the called mutants in treat samples comes either from edited functional plasmids or non-functional plasmids.
  - The abundance of non-functional plasmids is similar in treat and control samples. Therefore, one may substract the mutant frequency in control from that in treat. The remained mutants are expected to be edited reference functional plasmids.

The previous analysis piplines either use designed plasmids as reference or does not substract non-functional plasmids from total mutants. We compose both methods. We also use score as energy and apply an energy based method to distribute the treat read count to multiple references. We use the bioconda package rearr backend by an efficient and accurate chimeric alignment engine to call mutants from treat reads. Rearr allow us to discriminate templated insertions. 


# Install

```shell
$ pip install naapam
```

# TODO

- [ ] try modify score threshold
- [ ] do not filter bad ref, keep them and see whether bad ref leads to more mutation
- [ ] check non-designed ref
- [ ] check filtered ref
- [ ] check each sample with low frequency
- [ ] check alg file
- [ ] check all filter step
- [ ] agg kim correct

- [ ] Document
  - Usage
  - Example
  - mermaid diagram

# Dependencies

- bowtie2
- gawk
