Metadata-Version: 2.4
Name: nfixplanet
Version: 0.1.0
Summary: Nitrogen Fixer detection pipeline
Author: Shahriyar Mahdi Robbani
Author-email: Shahriyar Mahdi Robbani <shahriyar.robbani@embl.de>
License-Expression: MIT
License-File: LICENSE
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: License :: OSI Approved :: MIT License
Requires-Dist: pandas>=1.3,<2.2
Requires-Dist: defopt<7
Requires-Dist: pytest>=8.0 ; extra == 'dev'
Requires-Python: >=3.11
Project-URL: Homepage, https://github.com/grp-bork/nfixplanet
Project-URL: Issues, https://github.com/grp-bork/nfixplanet/issues
Project-URL: Repository, https://github.com/grp-bork/nfixplanet
Provides-Extra: dev
Description-Content-Type: text/markdown

# NFixPlanet

Python package for detection of nitrogen fixers.

## Description
TODO: need to update description
TODO: mention somewhere that the pipeline only works for short read sequences (coverm step)

## Installation
TODO: install via pip
TODO: install via conda

### Local installation

```
git clone git@git.embl.org:grp-bork/nfixplanet.git
cd nfixplanet
conda create -c bioconda -n nfixdev python=3.11 prodigal=2.6.3 hmmer=3.4 "defopt<7" wget hostile=2.0.0 fastp=0.24.0 minimap2=2.28 coverm=0.7.0
conda activate nfixdev
pip install -e .[dev]
```

Requirements:
- Prodigal V2.6.3: February, 2016
- HMMER 3.4 (Aug 2023); http://hmmer.org/

## Usage
### nfixplanet annotate

Pipeline for annotating genomes as N-fixers TODO: update description

Basic command:

```bash
nfixplanet annotate --input_fasta /path/to/fasta --output_directory /path/to/output
```
Optional arguments:

- `--genomic_context_range <int>`: Maximum number of genes upstream or downstream to consider for operon context (default: 10).
- `--cpus <int>`: Number of CPUs to use for HMMscan (default: 2, max recommended: 4).
- `--verbose`: Enable verbose (DEBUG) logging.
- `--version`: Print version number and exit.

### nfixplanet map
Pipeline for mapping metagenomes TODO: update description

Basic command:

```bash
nfixplanet map \
  --sample_id SAMPLE_NAME \
  --read_1 /path/to/read_1.fastq \
  --read_2 /path/to/read_2.fastq \
  --single /path/to/reads.fastq \
  --output_directory /path/to/output
```

Required arguments:
- `--sample_id <str>`: Name of the FASTA/FASTQ sample.
- `--output_directory <str>`: Path to output directory.

Input read options (choose one mode):
- `--read_1 <str>`: Path to FASTA/FASTQ file for paired-end read 1 (R1).
- `--read_2 <str>`: Path to FASTA/FASTQ file for paired-end read 2 (R2). Must be provided together with `--read_1`.
- `--single <str>`: Path to FASTA/FASTQ file for single-end reads Can be used on its own or with `--read_1` and `--read_2`.

Optional arguments:
- `--work_directory <str>`: Path to directory for temporary files (default: tmp).
- `--cpus <int>`: Number of CPUs used by processes (default: 8).
- `--verbose`: Enable verbose logging.

## Authors and acknowledgment
- [Mahdi Robbani](https://github.com/mahdi-robbani)
- Lucas Ustick
- [Anthony Fullam](https://github.com/fullama)

## License
This project is licensed under the MIT License. See the LICENSE file for details.
