Metadata-Version: 2.4
Name: gtdb-gtranslate
Version: 0.0.3
Summary: software tool designed to accurately identify the genetic translation table (GTT) used in prokaryotic organisms.
Author: Donovan Parks
Author-email: Pierre-Alain Chaumeil <pch@bio.aau.dk>
License-Expression: GPL-3.0-only
Project-URL: repository, https://github.com/cmc-aau/gTranslate
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: joblib~=1.3
Requires-Dist: numpy~=1.26
Requires-Dist: pandas~=2.2
Requires-Dist: scikit-learn==1.6.1
Requires-Dist: scipy~=1.12
Requires-Dist: tqdm~=4.67
Requires-Dist: mlxtend~=0.23
Requires-Dist: plotly~=5.15
Requires-Dist: xgboost==2.1.4
Requires-Dist: lightgbm==4.6.0
Requires-Dist: requests>=2.32.3
Dynamic: license-file

# gTranslate

**gTranslate** is a machine learning-based command-line tool for predicting the translation table (TT) used by prokaryotic genomes. By analyzing specific sequence features — such as coding density differences, Trp ratios, and Gly ratios — `gTranslate` can accurately distinguish between the genetic codes associated with reassignment of the UGA stop codon, i.e. translation tables 11 (standard prokaryotic code), 4 (UGA=Trp), and 25 (UGA=Gly).

## Features

* **Automated Table Detection:** Rapidly predict the correct translation table for a single genome or large batches of genomes.
* **Interactive Visualizations:** Generate dynamic HTML dashboards to explore the feature space used by the classifiers.

## Installation

gTranslate requires **Python >= 3.12** and **Prodigal >= 2.6.2** on your system path.

### Option 1: Bioconda (Recommended)
We recommend using **Mamba** for a faster setup:
```bash
conda create -n gtranslate -c conda-forge -c bioconda gtdb-gtranslate
conda activate gtranslate
```

### Option 2: pip
It is recommended to use a [virtual environment](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/):
```bash
# Install
python -m pip install -i https://test.pypi.org/simple/ gtdb-gtranslate

# Upgrade
python -m pip install -i https://test.pypi.org/simple/ --upgrade gtdb-gtranslate
```

## Usage

gTranslate is operated via two subcommands: `detect_table` and `generate_plot`.

You can view the general help menu at any time:
```bash
gtranslate -h
```

### Detect table
The core pipeline for detecting the translation table used by prokaryotic organisms. You must provide input genomes either via a directory or a batch file.

**Basic Usage:**
```bash
# Process a directory of genomic FASTA files
gtranslate detect_table --genome_dir /path/to/genomes --out_dir /path/to/output

# Process genomic FASTA files defined in a batch file
gtranslate detect_table --batchfile genomes.tsv --out_dir /path/to/output
```

The file provided to `--batchfile` is a two-column, tab-separated values (TSV) file that indicates the path to the genomic FASTA file and the desired genome identifier, e.g.:
```
/example/path/GCF_001729785.1_ASM172978v1/GCF_001729785.1.fna.gz      G001729785
/a/different/path/GCF_043834535.1.fna.gz    G043834535
...
```

### Generate plot (optional)
Generates an interactive HTML dashboard to visually explore the features (e.g., coding density difference, amino acid ratios) used by gTranslate to predict the translation table. 

**Basic Usage:**
```bash
gtranslate generate_plot --feature_file features.tsv --output_file dashboard.html
```

The file `features.tsv` provided to `--feature_file` is generated by the `detect_table` command.

## Testing

Validation that gTranslate has been installed correctly and is operating as expected can be done using the `check_install` and `test` subcommands:

### Check installation
Verifies your installation and ensures that all required dependencies and reference data files are correctly configured and present.

**Basic Usage:**
```bash
gtranslate check_install
```

### Test operation of gTranslate
Runs the built-in test suite on bundled genomic data to validate that gTranslate is functioning correctly on your system.

**Basic Usage:**
```bash
gtranslate test --cpus 4
```
*Note: If `--out_dir` is not specified, the test will securely execute in a temporary directory.*
