Metadata-Version: 2.4
Name: streptofile
Version: 0.5.0
Summary: Add your description here
License-Expression: MIT
Requires-Dist: polars
Requires-Dist: python-dotenv
Requires-Python: >=3.14
Description-Content-Type: text/markdown

# Streptofile

### Group A Streptococcus (Streptococcus pyogenes) profiling from whole genome sequencing data

Current version of Streptofile performs
- **EMM typing** based on the emm nucleotide sequence database curated by the U.S. Centers for Disease Control and Prevention (https://ftp.cdc.gov/pub/infectious_diseases/biotech/tsemm/)
- **Multilocus Sequence Typing** based on the *S. pyogenes scheme* curated by pubMLST (https://pubmlst.org/bigsdb?db=pubmlst_spyogenes_seqdef)
- **Virulence gene profiling** based on 66 known virulence factors



## Setup

### Conda install

```
conda install thej-ssi::streptofile
```


### pip install
```
pip install streptofile
```
non-python dependencies that need to be installed:
 - blast


### Usage

To run emm typing, MLST and virulence gene detection on a batch of assembly files
```
streptofile -o <output_folder> *.fasta
```

To run only a subset of analyses, these can be specified in a comma-separated list using the --analyses parameter
```
streptofile -o <output_folder> --analyses emm,mlst,virulence *.fasta
```


### Input

Designed for genome assemblies, though all nucleotide fasta input will work, so cds files or similar is fine as well


### Outputs
- **<output_folder>/results.tsv**: All analysis results in a wide table format. Includes emm-typing results, mlst results and one column for each of the 66 virulence genes indicating presence or absence of that gene.
- **<output_folder>/virulence_details.tsv**: All identified virulence genes in long table format. Includes information on coverage, identity, sequence reference and the identified sequence
- **<output_fodler>/<sample_name>**: One folder for each input fasta with blast results etc.
