Metadata-Version: 2.4
Name: andro
Version: 0.3
Summary: Reference-guided CLI tool for finding and annotating human rDNA units in FASTA sequences
Project-URL: Homepage, https://github.com/dmelkovic/andro
Project-URL: Issues, https://github.com/dmelkovic/andro/issues
Author: D. Melkovic
License-Expression: MIT
License-File: LICENSE
Keywords: BED,FASTA,annotation,bioinformatics,rDNA
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.10
Requires-Dist: biopython<2,>=1.78
Requires-Dist: mappy<3,>=2.28
Requires-Dist: matplotlib<4,>=3.7
Requires-Dist: numpy<3,>=1.24
Requires-Dist: pyahocorasick<3,>=2
Requires-Dist: scipy<2,>=1.10
Description-Content-Type: text/markdown

# andro

`andro` is a reference-guided command line tool for finding and annotating
human ribosomal DNA (rDNA) units in FASTA sequences. It is built around the
`KY962518-ROT` reference, reports annotations in BED format, and can optionally
generate dotplots for detected units.

The tool was developed as part of a bachelor's thesis project and is intended
for exploratory analysis of human rDNA-containing assemblies.

## Installation

Install the latest release from PyPI:

```bash
pip install andro
```

Or install from a local clone:

```bash
git clone https://github.com/dmelkovic/andro.git
cd andro
pip install .
```

## Basic usage

Run `andro` with a FASTA file:

```bash
andro example.fa
```

By default, results are written to standard output in BED format. To write the
annotations to a file:

```bash
andro example.fa -o annotations.bed
```

To generate a dotplot for each reported unit:

```bash
andro example.fa --plot ref --dir plots
```

Display all available options with:

```bash
andro --help
```

## What andro reports

Given a FASTA file with one or more records, `andro` will:

- find 5.8S rDNA candidates and extend them to full 45S regions when possible
- find rDNA units anchored by detected 45S regions
- extend detected 45S regions to full rDNA units when the surrounding sequence
  supports it
- annotate major rDNA features
- write annotations in BED format
- optionally generate dotplots for each reported unit

## Design choices and limitations

### Forward orientation

`andro` searches for rDNA units in the forward orientation relative to the
`KY962518-ROT` reference. This keeps annotation coordinates consistent with the
ordered 45S and IGS model used internally.

If an assembly contains rDNA arrays in the reverse-complemented orientation,
run `andro` on a reverse-complemented copy of that FASTA record as a separate
input.

### Complete 45S regions by default

By default, `andro` reports only units where a complete 45S region is found. If
a substantial part of the 45S region is missing, the sequence is not reported
as an rDNA unit in the default mode.

The `--partial` option enables reporting of incomplete units. Partial-unit
annotation is experimental: regions shorter than approximately 2500 bp are not
reported, and incomplete annotations should be reviewed manually before being
used downstream.

## Reference

`andro` includes the `KY962518-ROT` reference sequence used for detection and
annotation. Results should be interpreted relative to that reference and the
feature model encoded in the package.

## License

`andro` is distributed under the MIT License. See `LICENSE` for details.

## Issues

Please report bugs and unexpected results at:

https://github.com/dmelkovic/andro/issues
