Metadata-Version: 2.4
Name: chloroscan
Version: 0.1.6
Summary: A computational workflow designed to recover plastid genomes from metagenomes.
License: Apache-2.0
License-File: LICENSE
Author: Andy Tong, Robert Turnbull, Vanessa Rossetto Marcelino, Heroen Verbruggen
Requires-Python: >=3.9,<4.0
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Dist: click (<8)
Requires-Dist: gitpython (>=3.1.32,<4.0.0)
Requires-Dist: numpy (>1.24.4,<2)
Requires-Dist: pandas (>=2.0)
Requires-Dist: snk-cli (>=0.1.2,<0.2.0)
Description-Content-Type: text/x-rst


==================================================================
ChloroScan: A metagenomic workflow to recover chloroplast genomes
==================================================================

.. start-badges

|testing badge| |docs badge|

.. |testing badge| image:: https://github.com/Andyargueasae/chloroscan/actions/workflows/testing.yml/badge.svg
    :target: https://github.com/Andyargueasae/chloroscan/actions

.. |docs badge| image:: https://github.com/Andyargueasae/chloroscan/actions/workflows/docs.yml/badge.svg
    :target: https://Andyargueasae.github.io/chloroscan
    
.. end-badges


.. image:: docs/source/_static/images/new_ChloroScan_workflow.drawio.png

This workflow is designed to recover chloroplast genomes from metagenomic datasets.

Installation
============

To install the workflow, use pip3. The background environment will require Python <4.0, >=3.9 to set up the virtual environment.

.. code-block:: bash

    pip3 install chloroscan==0.1.5

Detailed workflow instructions can be found at: https://andyargueasae.github.io/chloroscan/index.html.
The website also contains Chinese version of the documentation with identical contents.

Machine/OS Requirements
=======================
ChloroScan is only tested on Linux (x86_64), running on IOS system is not recommended. 
ChloroScan can be installed on servers with hpc clusters and it is recommended to use a GPU to accelerate its running.

``Note``: Through testing, current version of chloroscan cannot support NVIDIA H-100 GPU, due to cuda version incompatibilities. We will work on updating it to allow better performances. 

Configuration databases
=======================
Before running ChloroScan, some packages and datasets need to be installed to run CAT taxonomy prediction properly.
ChloroScan incorporates a marker gene database while running binning, you don't need to do anything, it will be loaded since you build conda environments.
To download our curated Uniref90-algae plastid protein database, use the link: https://doi.org/10.26188/27990278. 

To avoid authentication issues, we recommend using the pyfigshare command-line tool to download. The information of this tool can be found at: ``https://pypi.org/project/pyfigshare/``. 
* **Python > 3.0** is required to download pyfigshare.

Before downloading the files, set up your own figshare account and add an api token to the file ~/.figshare/token.
Then run:
.. code-block:: bash

    figshare download -o CAT_db.tar.gz 27990278


``Note``: The tar.gz format of CAT database's size is 47GB, and nearly 85GB after unzipped, please ensure you have enough disk storage. Meanwhile, the space to setup the conda environment also requires 15 GB of disk.  

Sample data to try
==================
To try ChloroScan, I recommend downloading our synthetic metagenome data via the command: 

.. code-block:: bash

    figshare download -o simulated_metagenomes.tar.gz 28748540

There are also some real metagenome datasets (modified to keep them lightweight) available at: https://figshare.unimelb.edu.au/articles/dataset/ChloroScan_test_data/30218614.

To download:

.. code-block:: bash

    figshare download -o real_test_samples.tar.gz 30218614

Credit
============

ChloroScan is developed by:

.. start-credits

- Yuhao Tong 童禹皓 (University of Melbourne)
- `Dr Robert Turnbull <https://findanexpert.unimelb.edu.au/profile/877006-robert-turnbull>`_ 
- `Dr Vanessa Rossetto Marcelino <https://findanexpert.unimelb.edu.au/profile/532755-vanessa-rossetto-marcelino>`_ 
- `A/Prof Heroen Verbruggen <https://hverbruggen.github.io/>`_

.. end-credits

With Yuhao Tong the primary developer, if you want to contact us, please email to:

.. code-block:: text
    
    yuhtong@student.unimelb.edu.au


