Metadata-Version: 2.4
Name: gpathex
Version: 1.0.0
Summary: Genbinesia Pathway Extractor
Home-page: https://gitlab.com/biomikalab/gpathex 
Author: Maulana Malik Nashrulloh
Author-email: maulana@genbinesia.or.id
Maintainer: Maulana Malik Nashrulloh
Maintainer-email: maulana@genbinesia.or.id
License: GPLv3
Project-URL: Homepage, https://gitlab.com/biomikalab/gpathex 
Project-URL: Documentation, https://gitlab.com/biomikalab/gpathex /blob/main/README.md
Project-URL: Repository, https://gitlab.com/biomikalab/gpathex 
Project-URL: Issues, https://gitlab.com/biomikalab/gpathex /issues
Project-URL: Changelog, https://gitlab.com/biomikalab/gpathex /releases
Project-URL: Download, https://pypi.org/project/gpathex/
Keywords: bioinformatics,kegg,genomics,proteomics,metabolomics,pathway,orthology,database,cli,polars,biopython,fasta,annotation,visualization
Platform: any
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Operating System :: POSIX :: Linux
Classifier: Natural Language :: English
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: polars>=1.37.1
Requires-Dist: biopython>=1.86
Provides-Extra: fast
Requires-Dist: polars-bio>=0.19.0; extra == "fast"
Provides-Extra: viz
Requires-Dist: matplotlib>=3.10.8; extra == "viz"
Requires-Dist: seaborn>=0.13.2; extra == "viz"
Provides-Extra: cli
Requires-Dist: rich>=14.2.0; extra == "cli"
Requires-Dist: tqdm>=4.67.1; extra == "cli"
Requires-Dist: colorama>=0.4.6; extra == "cli"
Requires-Dist: click>=8.3.1; extra == "cli"
Provides-Extra: formats
Requires-Dist: pyarrow>=21.0.0; extra == "formats"
Requires-Dist: openpyxl>=3.1.5; extra == "formats"
Requires-Dist: orjson>=3.11.5; extra == "formats"
Provides-Extra: network
Requires-Dist: aiohttp>=3.13.3; extra == "network"
Requires-Dist: httpx>=0.28.1; extra == "network"
Requires-Dist: backoff>=2.2.1; extra == "network"
Provides-Extra: dev
Requires-Dist: pytest>=9.0.2; extra == "dev"
Requires-Dist: pytest-cov>=7.0.0; extra == "dev"
Requires-Dist: pytest-mock>=3.15.1; extra == "dev"
Requires-Dist: pytest-xdist>=3.8.0; extra == "dev"
Requires-Dist: black>=26.1.0; extra == "dev"
Requires-Dist: mypy>=1.19.1; extra == "dev"
Requires-Dist: ruff>=0.14.14; extra == "dev"
Requires-Dist: pre-commit>=4.5.1; extra == "dev"
Requires-Dist: twine>=6.2.0; extra == "dev"
Requires-Dist: build>=1.4.0; extra == "dev"
Requires-Dist: sphinx>=7.0.0; extra == "dev"
Requires-Dist: sphinx-rtd-theme>=3.1.0; extra == "dev"
Requires-Dist: sphinx-autodoc-typehints>=3.6.2; extra == "dev"
Provides-Extra: test
Requires-Dist: pytest>=9.0.2; extra == "test"
Requires-Dist: pytest-cov>=7.0.0; extra == "test"
Requires-Dist: pytest-mock>=3.15.1; extra == "test"
Requires-Dist: pytest-xdist>=3.8.0; extra == "test"
Provides-Extra: docs
Requires-Dist: sphinx>=9.1.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=3.1.0; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints>=3.6.2; extra == "docs"
Requires-Dist: sphinx-copybutton>=0.5.2; extra == "docs"
Requires-Dist: myst-parser>=5.0.0; extra == "docs"
Provides-Extra: all
Requires-Dist: polars-bio>=0.19.0; extra == "all"
Requires-Dist: matplotlib>=3.10.8; extra == "all"
Requires-Dist: seaborn>=0.13.2; extra == "all"
Requires-Dist: rich>=14.2.0; extra == "all"
Requires-Dist: tqdm>=4.67.1; extra == "all"
Requires-Dist: colorama>=0.4.6; extra == "all"
Requires-Dist: pyarrow>=21.0.0; extra == "all"
Requires-Dist: openpyxl>=3.1.5; extra == "all"
Requires-Dist: orjson>=3.11.5; extra == "all"
Requires-Dist: aiohttp>=3.13.3; extra == "all"
Requires-Dist: httpx>=0.28.1; extra == "all"
Requires-Dist: backoff>=2.2.1; extra == "all"
Provides-Extra: production
Requires-Dist: polars-bio>=0.19.0; extra == "production"
Requires-Dist: rich>=14.2.0; extra == "production"
Requires-Dist: pyarrow>=21.0.0; extra == "production"
Requires-Dist: backoff>=2.2.1; extra == "production"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: maintainer
Dynamic: maintainer-email
Dynamic: platform
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# GPATHEX (Genbinesia Pathway Extractor)

![Python Version](https://img.shields.io/badge/python-3.11+-blue.svg)
![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)
![Version](https://img.shields.io/badge/version-1.0.0-green.svg)

**GPATHEX** is a simple biological pathway and taxonomic information extractor from the KEGG Database.

# ⚠️ Derivative Work Notice

GPATHEX is a **derivative work** based on `KEGGTools` by Junpeng Fan.
This version includes substantial modifications and enhancements while maintaining MIT License compliance.
A copy of MIT License of original KEGGTools also included.

**Original work:** Fan, J. (2018). KEGGTools. https://github.com/FlyPythons/KEGGTools

# Major Enhancements from Original
- ✅ Complete Python 3.11 migration and support to 3.11+
- ✅ Biological big data support (thanks to Polars and Polars-bio) and efficient data storage using Apache Parquet format (thru Polars) 
- ✅ Better NCBI Entrez communication using Biopython 
- ✅ Better KEGG information processing following current-state of KEGG format

# Author
- Maulana Malik Nashrulloh (Division of Biomics Research, Department of Sciences, Generasi Biologi Indonesia Foundation)

# Quick Start

## Dependencies

Make sure that your system have Python >=3.11 installed and these packages/libraries installed:

- biopython>=1.86
- polars-bio>=0.19.0
- polars>=1.37.1
- matplotlib>=3.10.8
- seaborn>=0.13.2
- rich>=14.2.0
- tqdm>=4.67.1
- colorama>=0.4.6
- pyarrow>=21.0.0 
- openpyxl>=3.1.5 
- orjson>=3.11.5
- aiohttp>=3.13.3
- httpx>=0.28.1
- backoff>=2.2.1

## Installation
Currently we only support installation thru `pip` command only.

```bash
pip install gpathex
```

## Valid, available commands
GPATHEX is accessible using `gpathex` command. Under GPATHEX, currently we support these commands, which accessible using `gpathex <commands>`
Available Commands:
  <command>            Command to execute
    download-org       Download KEGG organism information
    download-ko        Download KEGG Orthology (KO) files
    download-proteins  Download protein sequences from NCBI
    process-proteins   Process proteins with KO annotations
    make-db            Create custom KEGG database
    make-keg           Create .keg file from annotations
    plot-keg-kos       Plot KEGG KO hierarchy (ko00001.keg format)
    plot-keg-genes     Plot KEGG gene annotations (from make-keg command)
    plot-keg           Legacy: Plot KEGG annotation results (use plot-keg-kos or plot-keg-genes)
    get-ranks          Get KEGG organism classification ranks
    config             Show or modify configuration
    info               Show system information and dependencies
    download-taxonomy  Download NCBI taxonomy database

## Usage


Get KEGG organisms list:   

```bash
gpathex download-org \
    --out /path/to/your/organisms.tsv
```

Download KO files:    
```bash
gpathex download-ko \
    --org /path/to/your/organisms.tsv \
    --out /path/to/your/ko_files_dir/
```

Get protein sequences: 
```bash
gpathex download-proteins \
    --org /path/to/your/organisms.tsv \
    --out /path/to/your/proteins_dir/
```

Get NCBI Taxonomy Lineage: 
```bash
gpathex download-taxonomy \
    --out /path/to/your/ncbi_taxonomy.tsv
```

Create database:      
```bash
gpathex make-db \
    --org /path/to/your/organisms.tsv \
    --keg /path/to/your/ko_files_dir/ \
    --pep /path/to/your/proteins_dir/ 
    --out /path/to/your/my_kegg_db/
```

The resulting database at `/path/to/your/my_kegg_db/` structured as follows:

```text
my_kegg_db/
├── my_kegg_db.pep.fasta.gz          # Compressed protein sequences 
├── my_kegg_db.sequences.parquet     # Sequence metadata
├── my_kegg_db.annotations.parquet   # KO annotations
├── my_kegg_db.pep2ko.tsv            # Protein-to-KO mapping
├── my_kegg_db.stats.json            # Statistics
└── my_kegg_db.summary.txt           # Human-readable summary
```

Create .keg file:     
```bash
gpathex make-keg \
    --keg /path/to/your/ko00001.keg \
    --in /path/to/your/annotations.tsv 
    --out /path/to/your/results.keg
```

Plot KO hierarchy:    
```bash
gpathex plot-keg-kos \
    --keg /path/to/your/ko00001.keg \
    --out /path/to/your/kos_plot
```

By default, this will make your plot in PNG format.

Plot gene annotations: 
```bash
gpathex plot-keg-genes \
    --keg /path/to/your/results.keg \
    --out /path/to/your/genes_plot
```

By default, this will make your plot in PNG format.

Get organism ranks:   
```bash
gpathex get-ranks \
    --keg /path/to/your/br08610.keg \
    --taxon /path/to/your/ncbi_taxonomy.tsv \
    --out /path/to/your/ranks.tsv
```

## Help
To access the help, use:

```bash
gpathex -h
```

Or, if you want to access the help of specific command, use:

```bash
gpathex <command> -h
```


# Acknowledgments
- This program is based on KEGGTools by Junpeng Fan (https://github.com/FlyPythons/KEGGTools)
- This program was made as part of research mini-project "PyTax4Fun2: A Python Tool for Functional Profiling and Redundancy Analysis of Bacterial Communities via 16S rRNA Gene Sequences, Featuring Polars for Efficient Processing of Large Genomic Datasets" (Project #BIOMIKA-02), Subproject #BIOMIKA-02.1, funded internally by Generasi Biologi Indonesia Foundation.

# Citation
A dedicated publication for this program is not yet available. For citation purposes, please refer to the following technical report:

Nashrulloh, M.M. (2026). *GPATHEX: A simple biological pathway and taxonomic information extractor from the KEGG Database* (Technical Report No. GBR-TR-BIOMIKA-03/Genbinesia/I/2026). Generasi Biologi Indonesia Foundation. Gresik, Indonesia.

If you wish to cite this repository, you may use the following APA-style reference entry:

Nashrulloh, M.M. (2026). GPATHEX: A simple biological pathway and taxonomic information extractor from the KEGG Database (Version 1.0.0) [Computer software]. https://gitlab.com/biomikalab/gpathex

# License
This project is licensed under the GNU General Public License v3.0 - See the LICENSE file for details.
