Metadata-Version: 2.1
Name: gffkit
Version: 0.1.0
Summary: Region-aware GFF annotation integration toolkit
Author: Qunjie Zhang
License: MIT
Project-URL: Homepage, https://github.com/qunjie-zhang/gffkit
Project-URL: Repository, https://github.com/qunjie-zhang/gffkit
Project-URL: Issues, https://github.com/qunjie-zhang/gffkit/issues
Keywords: GFF3,GTF,genome annotation,bioinformatics,UTR,gene annotation
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE

# gffkit

`gffkit` is a lightweight toolkit for region-aware GFF/GTF annotation integration.
It combines three utilities:

1. `detect-bridge`: detect suspicious merged-gene artifacts caused by bridge transcripts.
2. `complement`: complement/merge annotations, with optional region-swap mode.
3. `add-utr`: reconstruct `five_prime_UTR` and `three_prime_UTR` features from exon/CDS coordinates.

## Installation

```bash
pip install gffkit
```


## Quick start

### Full integration pipeline

```bash
gffkit integrate \
  --annotation-a EviAnn.gff3 \
  --annotation-b ANNEVO.gff3 \
  --outdir gffkit_out \
  --prefix sample
```

Outputs:

- `gffkit_out/sample.suspicious.tsv`
- `gffkit_out/sample.merged.gff3`
- `gffkit_out/sample.final.withUTR.gff3`

### Step-by-step usage

```bash
# 1. Detect suspicious merged genes in Annotation A
gffkit detect-bridge -i EviAnn.gff3 -o suspicious.tsv

# 2. Use A as the global reference, but switch to B in suspicious regions
gffkit complement \
  --ref EviAnn.gff3 \
  --add ANNEVO.gff3 \
  --swap_region_tsv suspicious.tsv \
  --swap_region_flank 100 \
  --output merged.gff3

# 3. Add UTR features
gffkit add-utr -i merged.gff3 -o final.annotation.withUTR.gff3
```

## Command overview

```bash
gffkit --help
gffkit detect-bridge --help
gffkit complement --help
gffkit add-utr --help
gffkit integrate --help
```

## Annotation integration strategy

- Annotation A, for example EviAnn/RNA-seq-supported GFF, is used as the global primary reference.
- Annotation B, for example ANNEVO/deep-learning GFF, is used as the local primary reference only in suspicious merged-gene regions.
- UTR features are reconstructed after merging using an exon-minus-CDS strategy.

## License

MIT License.
