Metadata-Version: 2.4
Name: EFMCalculator
Version: 0.0.post1
Summary: A webapp and command line utility for identifying mutational hotspots
Author-email: croots <croots@utexas.edu>, coolbears <sld3379@utexas.edu>, kevin99111 <kevinyang260@gmail.com>, avkatre <avyaykatre@gmail.com>
Maintainer-email: croots <croots@utexas.edu>
License: GPL-3.0
Project-URL: Homepage, https://github.com/barricklab/efmcalculator2
Project-URL: Documentation, https://github.com/barricklab/efmcalculator2
Project-URL: Issues, https://github.com/barricklab/efmcalculator2/issues
Keywords: science,biology,evolution,synthetic biology,synbio,bioinformatics,bioengineering
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Environment :: Web Environment
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: pandas
Requires-Dist: polars==1.19
Requires-Dist: progress
Requires-Dist: biopython
Requires-Dist: statsmodels
Requires-Dist: rich
Requires-Dist: pyarrow
Requires-Dist: bokeh==2.4.3
Requires-Dist: numpy<2
Requires-Dist: streamlit==1.43.0
Requires-Dist: streamlit-extras
Requires-Dist: streamlit-aggrid
Requires-Dist: streamlit_javascript
Dynamic: license-file

[![Status](https://github.com/barricklab/efm-calculator2/actions/workflows/package_and_test.yml/badge.svg)](https://github.com/barricklab/efm-calculator2/actions/workflows/package_and_test.yml)

`efmcalculator` is a Python package or web tool for detecting mutational hotspots. It predicts the mutation rates associated with each hotspot and combines them into a relative instability score. These hotspots include simple sequence repeats, repeat mediated deletions, and short repeat sequences. This code updates and improves upon the last version of the [EFM calculator](https://github.com/barricklab/efm-calculator).

`efmcalculator` supports multifasta, genbank, or csv files as input and accepts parameters from the command line. It also supports the scanning of both linear and circular sequences. It defaults to a pairwise comparison strategy (all occurrences of a repeat are compared with all other occurrences), but it also contains an option for a linear comparison strategy (each occurrence of a repeat is only compared with the next occurrence in the sequence) to accelerate the analysis of large sequences.


# Installation
The EFM Calculator can be accessed as a free web tool at efm2-beta.streamlit.app. It is limited to 50000 bases to ensure the app remains performant for other users.
It can be installed and run locally below without such base restriction.

## From pip:
`pip install efmcalculator` or clone this repository and `pip install ./` from the root of the repository.

# Command Line Usage
- -h: help
- -i: inpath
- -o: outpath
- -s: strategy. Either “linear” or “pairwise”
- -c: circular inputs
- -f: output filetype for tables, either csv or parquet
- -j: threads
- -t: tall. Parallelizes across inputs rather than within.
- -v: verbose. 0 (silent), 1 (basic information), 2 (debug)
- --summary: saves only aggrigate results, useful for very tall inputs

Print efmcalculator help:
```
efmcalculator -h
```

Run efmcalculator on all sequences in a FASTA file using the pairwise strategy and print output to csv files within an output folder:
```
efmcalculator -i “input.fasta” -o “output_folder”
```

Run efmcalculator on all sequences in a FASTA file, outputing to the folder output_folder, while treating the input as circular, searching with a linear pattern, and printing debug information:
```
efmcalculator -i “input.fasta” -o “output_folder” -c -s “linear” -v 2
```
