Metadata-Version: 2.4
Name: isolaser
Version: 0.0.2
Summary: Exonic Part centered allele-specific splicing analysis
Home-page: https://github.com/gxiaolab/isoLASER
Author: Giovanni Quinones Valdez
Author-email: giovas@ucla.edu
Project-URL: Bug Tracker, https://github.com/gxiaolab/isoLASER/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: multiprocess
Requires-Dist: numpy
Requires-Dist: scipy==1.10.1
Requires-Dist: pandas
Requires-Dist: pysam
Requires-Dist: pytabix
Requires-Dist: pyfaidx
Requires-Dist: biopython
Requires-Dist: HTSeq
Requires-Dist: networkx
Requires-Dist: scikit-learn
Dynamic: license-file

# **isoLASER**
[![](https://img.shields.io/badge/isoLASER-v0.0.0.1-blue)](https://test.pypi.org/project/isoLASER/)

[github](https://github.com/gxiaolab/isoLASER/)



## **About**

isoLASER performs gene-level variant calls, phasing, and splicing linkage analysis using third-generation RNA sequencing data.

## **Table of contents**
- [Requirements](#requirements)
- [Installation](#installation)
- [Preprocessing](#preprocessing)
  - [Transcript identification](#transcript-identification)
  - [Annotate bam file](#annotate-bam-file)
  - [Extract exonic parts from GTF](#extract-exonic-parts-from-gtf)
- [Run isoLASER](#run-isolaser)
- [Run isoLASER joint](#run-isolaser-joint)
- [Make a nigiri plot](#make-a-nigiri-plot)
- [Demo](#demo)
- [Output](#output)
- [Debug](#debug)


## **Requirements**

isoLASER is written in Python 3.8 and requires the following packages (with the tested versions):

- multiprocess
- numpy==1.24.4
- scipy==1.10.1        
- pandas==2.0.3
- pysam==0.16.0.1
- pytabix==0.1
- pyfaidx==0.5.8
- biopython==1.76
- vcfpy==1.0.3
- HTSeq==0.12.4
- networkx==3.1
- scikit-learn==1.3.2

isoLASER has been tested in the following operating systems:
- CentOS Linux 7

External software requirements:
- [GATK](https://gatk.broadinstitute.org/hc/en-us) 
- [samtools](http://www.htslib.org/)
- [minimap2](https://github.com/lh3/minimap2)
- [tabix](http://www.htslib.org/doc/tabix.html)

## **Installation**

You can clone this **GitHub** repository:
```
git clone git@github.com:gxiaolab/isoLASER.git 
cd isoLASER
python -m build
pip install .
```

It is recommended to install isoLASER in a virtual environment.

```
conda create -n isoLASER_env python=3.8
```

You can also download the **Singularity** container:

```
singularity pull library://giovas/collection/s6
singularity exec s5_latest.sif isoLASER
```

If successful, the program is ready to use. The installation incorporates console script entry points to directly call isoLASER:

```
isoLASER --help
```

Installation time varies depending on the number of dependencies that need to be installed. 
Assuming all library dependencies are installed already, the installation of isoLASER should only take a few seconds.  

## *Preprocessing* 

Long-read RNA sequencing is notorious for its high base-calling error rate. As such, it is important to clean and preprocess the data to discard false transcripts resulting from misalignment, bad consensus, truncation, and other technical artifacts.   



### Transcript identification

isoLASER needs a GTF file as input to annotate individual reads with their isoform membership. Ideally, this GTF file is built using a long-read annotation software such as Talon, Clair, Bambu, Espresso, or similar. 

Details of the Talon pipeline can be found on their GitHub repository.[TALON](https://github.com/mortazavilab/TALON).

- The pipeline consists of correcting alignments around splice junctions using `TranscriptClean`, and labeling the reads for internal priming using `talon_label_reads`. 
- The processed bam files are then ready to be annotated by first creating a database with `talon_initialize_database` and then annotating the individual reads with `talon`. 
- Finally, the transcripts are filtered with `talon_filter_transcripts` and a GTF file is constructed with the retained transcripts with the command `talon_create_GTF`.     


### Annotate bam file

From the annotation used in the previous step, use the GTF file to generate a transcriptome reference for alignment.   
This step serves to assign transcript ids to every read of the target bam file.

```
# generate transcriptome reference from GTF
isolaser_convert_gtf_to_fasta -g {annot.gtf} -f {reference.fa} -o {transcriptome}

# convert bam file to fastq for re-alignment
samtools fastq {input.bam} > {input.fq}

# re-align against newly generated transcriptome reference
minimap2 -t 16 -ax splice:hq -uf --MD {transcriptome.fa} {input.fq} > {transcriptome.sam}
```

The output is a sam file where the contigs are transcript ids. 
Next, filter for secondary, supplementary and trans-gene reads whilst annotating with transcript ids. 

```
isolaser_filter_and_annotate -b {input.bam} -t {transcriptome.sam} -g {annot.gtf} -o {input.annot}
```
The output is a bam file: `input.annot.bam` after some basic filtering (secondary and supplemental reads), trans-gene filtering and the `ZG` and `ZT` tags with the name of the corresponding gene and transcript id for each read.  

### Extract exonic parts from GTF

isoLASER uses an exon-centric approach to analyze splicing and exonic-parts are a great granular approach to understand local splicing changes. 

```
isolaser_extract_exonic_parts -g {annot.gtf} -o {transcript.db}
```

The output is a new directory `transcript.db` that contains a pickle file per gene encapsulating all the exonic parts and transcripts associated with them.   

## *Run isoLASER*

isoLASER requires the annotated bam file (with ZG and ZT tags), the transcriptome database with exonic parts, and a reference annotation (e.g. hg38.fa).  

```
isoLASER -b {input.annot.bam} -o {output.prefix} -t {transcript.db} -f {reference.fa}

# output:
{output.prefix}.vcf
{output.prefix}.mi_summary.tsv

```

The output is very extensive and includes information that is only relevant for the joint analysis or plotting. 
To obtain the significant allele-specific events (cis-directed splicing events) use the filter function: 

```
isolaser_filter -m {output.prefix.mi_summary.tsv} -o {output.prefix.mi_summary.filtered.tsv}
```

## *Run isoLASER joint*

The first step is a wrapper of GATK functions to merge the variant calls from different samples.

The input file `fofn` contains informaiton of the individual samples
```
# fofn.tsv
SM1 SM1.bam SM1.mi_summary.tsv
SM2 SM2.bam SM2.mi_summary.tsv
```
Run the combine vcf step:
```
isoLASER_combine_vcf -f {fofn.tsv} -o {output.prefix}

# output:
{output.prefix}.combined.vcf
{output.prefix}.genotyped.vcf
```
Perform joint analysis
```
isoLASER_joint -f {fofn.tsv} -o {output.prefix} -t {transcript.db}

# output:
{output.prefix}.merged.mi_summary.tsv
```

## *Make a nigiri plot*

Parse the `.mi_summary.tsv` file to obtain the list of events to plot

```
nigiri_parse --mi {output.prefix.mi_summary.tsv} -o {output.plot} -t {transcriptome.db} 

# output:
{output.plot}.cis_events.bed
{output.plot}.cis_genes.tsv
```
Split the bam file

```
nigiri_split -b {input.annot.bam} -v {var.string} -o {fofn} 
```
We use an adapted version of [ggsashimi](https://github.com/guigolab/ggsashimi)
```
nigiri_plot -b {fofn} -c {region} -o {output.plot} 
```

<img src="nigiri.FAM221A.png" alt="FAM221A" width="600" heigth="600" class="center" />

## **Demo**

For a complete demo please check the [test_pipeline](test_pipeline) repository.

## **Debug**
If you experience any issues please submit your question to the *Issues* tab on this website. 



