Metadata-Version: 2.4
Name: dynamont
Version: 0.7.2
Summary: Segmentation/resquiggling tool for ONT signals.
Project-URL: repository, https://github.com/rnajena/dynamont
Author-email: Jannes Spangenberg <jannes.spangenberg@uni-jena.de>
License: GNUv3
License-File: LICENSE
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.10
Requires-Dist: matplotlib>=3.2.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: pandas>=2.2.0
Requires-Dist: psutil>=6.0.0
Requires-Dist: pysam>=0.22.0
Requires-Dist: read5-ont>=1.2.8
Requires-Dist: seaborn>=0.13.0
Provides-Extra: test
Requires-Dist: mypy; extra == 'test'
Requires-Dist: pytest; extra == 'test'
Requires-Dist: pytest-cov; extra == 'test'
Requires-Dist: pytest-mock; extra == 'test'
Description-Content-Type: text/markdown

![Dynamont](figures/logo.png)

A **Dynam**ic Programming Approach to Segment **ONT** Signals. 
Dynamont is a segmentation/resquiggling tool for ONT signals.
Dynamont was tested on
* RNA002
* RNA004
* DNA R10.4.1 5kHz (I applied the trained transition parameters from the RNA004 model to the DNA R10 models. These should be fine-tuned for the DNA models.)

![PyPI - Python Version](https://img.shields.io/pypi/pyversions/dynamont)
[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-teal.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![PyPI](https://img.shields.io/pypi/v/dynamont) ![PyPI - Downloads](https://img.shields.io/pypi/dm/dynamont)](https://pypi.org/project/dynamont/)
[![Anaconda-Server Badge](https://anaconda.org/jannessp/dynamont/badges/version.svg)](https://anaconda.org/jannessp/dynamont) ![Conda](https://img.shields.io/conda/dn/jannessp/dynamont) [![Conda package](https://anaconda.org/jannessp/dynamont/badges/latest_release_date.svg)](https://anaconda.org/jannessp/dynamont) [![Conda package](https://anaconda.org/jannessp/dynamont/badges/platforms.svg)](https://anaconda.org/jannessp/dynamont)

[![DOI](https://zenodo.org/badge/608215683.svg)](https://zenodo.org/badge/latestdoi/608215683)

---

- [Installation](#installation)
  - [Pypi/pip](#pypipip)
  - [Conda](#conda)
- [Usage](#usage)
- [Default models:](#default-models)
- [Output](#output)
  - [Example Output](#example-output)
- [Exit-Codes](#exit-codes)

---

# Installation

## Pypi/pip

```bash
pip install dynamont
```

## Conda

```bash
conda config --add channels jannessp # to install all dependencies from the correct channel
conda create -n dynamont jannessp::dynamont
conda activate dynamont
```

# Usage

```bash
# segment a dataset
dynamont-resquiggle -r <path/to/pod5/dataset/> -b <basecalls.bam> --mode basic -o <output.csv> -p <pore>

# train model
dynamont-train -r <path/to/pod5/dataset/> -b <basecalls.bam> --mode basic -o <output/path> -p <pore>

# choosing a pore will automatically load the default model for that pore, a custom model can be used with the parameter --pore_model <model/path>
```

# Default models:

- [rna_r9](models/rna/r9.4.1/rna002_5mer.model) (tested)
- [rna_rp4](models/rna/rp4/rna004_9mer.model) (tested)
- dna_r9 not available
- [dna_r10.4.1 260 bps](models/dna/r10.4.1/dna_r10.4.1_e8.2_260bps.model) (not tested)
- [dna_r10.4.1 400 bps](models/dna/r10.4.1/dna_r10.4.1_e8.2_400bps.model) (tested)

# Output

Dynamont produces a tabular output with the following columns:  

| Column Name             | Description |
|-------------------------|-------------|
| **readid**             | Unique identifier for the read. |
| **signalid**           | Identifier for the signal corresponding to the read. |
| **start**              | Start position of the signal segment in the read. |
| **end**                | End position of the signal segment in the read. |
| **basepos**            | Reference base position in the genomic sequence. |
| **base**               | The detected base at this position. |
| **motif**              | The surrounding sequence motif in which the base appears. |
| **state**              | The methylation state (or modification state) of the base. |
| **posterior_probability** | Probability assigned to the predicted segment. |
| **polish**             | Polished kmer, only available in resquiggle mode. |

## Example Output  

Below is an example of the output generated by Dynamont:  

```csv
readid,signalid,start,end,basepos,base,motif,state,posterior_probability,polish
476b4ed2-7865-4f81-9f78-82d614fb40a2,476b4ed2-7865-4f81-9f78-82d614fb40a2,12762,12777,53,A,AAAAAAAAA,M,0.12434,NA
476b4ed2-7865-4f81-9f78-82d614fb40a2,476b4ed2-7865-4f81-9f78-82d614fb40a2,12777,12791,52,A,AAAAAAAAA,M,0.12146,NA
476b4ed2-7865-4f81-9f78-82d614fb40a2,476b4ed2-7865-4f81-9f78-82d614fb40a2,12791,12806,51,A,AAAAAAAAA,M,0.11881,NA
476b4ed2-7865-4f81-9f78-82d614fb40a2,476b4ed2-7865-4f81-9f78-82d614fb40a2,12806,12820,50,A,AAAAAAAAA,M,0.11665,NA
```

# Exit-Codes

- -11: Segmentation fault
- -9: Out of Memory error. Decrease the number of processes or move to a system with more memory.
- -6: std::bad_alloc
- 1: `resquiggle mode` specific: alignment score (Z) does not match between forward and backward run in preprocessing on signal (T) and read (N).
- 2: `resquiggle mode` specific: alignment score (Z) does not match between forward and backward run in preprocessing on signal (T) and error correction (C).
- 3: Alignment score (Z) does not match between forward and backward pass or is -Infinity
- 4: Input signal is missing or not found in stdin stream
- 5: Input read is missing or not found in stdin stream
- 6: raw file does not exist
- 7: Invalid model path was provided
- 8: Provided ONT signal is too short
- 9: Read is too short
- 10: Signal is smaller than read
- 11: Read is smaller than `kmerSize` of provided pore model
- 20: Terminated using KeyboardInterrupt (Ctrl + C)
