Metadata-Version: 2.4
Name: picnic_bio
Version: 1.2.0
Summary: PICNIC (Proteins Involved in CoNdensates In Cells) is a machine learning-based model that predicts proteins involved in biomolecular condensates.
Author-email: Anna Hadarovich <hadarovi@mpi-cbg.de>, Soumyadeep Ghosh <soumyadeep11194@gmail.com>, Maxim Scheremetjew <schereme@mpi-cbg.de>
Maintainer-email: Maxim Scheremetjew <schereme@mpi-cbg.de>
License-Expression: BSD-3-Clause
Project-URL: Source, https://git.mpi-cbg.de/tothpetroczylab/picnic
Project-URL: Documentation, https://git.mpi-cbg.de/tothpetroczylab/picnic/-/blob/main/README.md
Project-URL: Funding, https://picnic.cd-code.org/about-us
Project-URL: Homepage, https://picnic.cd-code.org/
Project-URL: Tracker, https://git.mpi-cbg.de/tothpetroczylab/picnic/-/issues
Keywords: Biomolecular condensate,Scientific Annotation Tool,condensate,machine learning
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: Unix
Classifier: Operating System :: MacOS
Requires-Python: <3.13,>=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests~=2.32.0
Requires-Dist: catboost~=1.2.7
Requires-Dist: matplotlib~=3.9.4
Requires-Dist: pandas~=2.2.3
Requires-Dist: Bio~=1.6.2
Requires-Dist: numpy~=1.26.4
Dynamic: license-file

<h1 align="center">
<img src="https://git.mpi-cbg.de/tothpetroczylab/picnic/-/raw/main/branding/logo/logo_picnic_v1.96113169.png" width="300">
</h1><br>

# PICNIC (Proteins Involved in CoNdensates In Cells)

[![Build Status](https://git.mpi-cbg.de/tothpetroczylab/picnic/badges/main/pipeline.svg)](https://git.mpi-cbg.de/tothpetroczylab/picnic/-/pipelines)
[![Coverage Status](https://git.mpi-cbg.de/tothpetroczylab/picnic/badges/main/coverage.svg)](https://git.mpi-cbg.de/tothpetroczylab/picnic/-/pipelines)
[![PyPI Version](https://img.shields.io/pypi/v/picnic-bio.svg)](https://pypi.org/project/picnic-bio/#history)
[![PyPI Downloads](https://img.shields.io/pypi/dm/picnic-bio.svg?label=PyPI%20downloads)](
https://pypi.org/project/picnic-bio/#files)
[![Nat Commun 15, 10668 (2024)](https://img.shields.io/badge/DOI-10.1038%2Fs41467_024_55089_x-blue)](
https://doi.org/10.1038/s41467-024-55089-x)
[![Python Versions](https://img.shields.io/pypi/pyversions/picnic-bio.svg)](https://pypi.org/project/picnic-bio/#description)
[![License](https://img.shields.io/pypi/l/picnic-bio.svg)](https://git.mpi-cbg.de/tothpetroczylab/picnic/-/blob/main/LICENSE)

PICNIC (Proteins Involved in CoNdensates In Cells) is a machine learning-based model that predicts proteins involved in biomolecular condensates. The first model (PICNIC) is based on sequence-based features and structure-based features derived from Alphafold2 models. Another model includes extended set of features based on Gene Ontology terms (PICNIC-GO). Although this model is biased by the already available annotations on proteins, it provides useful insights about specific protein properties that are enriched in proteins of biomolecular condensate. Overall, we recommend using PICNIC that is an unbiased predictor, and using PICNIC-GO for specific cases, for example for experimental hypothesis generation.

- [External software](#external-software)
- [Installation instructions](#installation-instructions)
  - [Requirements](#requirements)
  - [Install external requirements](#install-external-requirements)
  - [PICNIC is available on PyPI](#picnic-is-available-on-pypi)
  - [PICNIC is also installable from source](#picnic-is-also-installable-from-source)
  - [How to install PICNIC using Conda?](#how-to-install-picnic-using-conda)
- [How to use?](#how-to-use)
  - [Usage - Using PICNIC from command line](#usage---using-picnic-from-command-line)
  - [Examples](#examples)
  - [How to run the provided Jupyter notebook?](#how-to-run-the-provided-jupyter-notebook)
- [Publication](#publication)

## External software

*IUPred2A*

IUPred2A is a tool that predicts disordered protein regions. It is available for download via the link https://iupred2a.elte.hu/download_new
The downloaded archive should be unpacked into the "src/files/" directory.

*STRIDE*

STRIDE is a software for protein secondary structure assignment 
Installation guide can be found [here](https://github.com/heiniglab/stride)

## Installation instructions

A binary installer for the latest released version is available at the Python Package Index (PyPI).

### Requirements

* Python versions >=3.9,<3.13
* Download and unpack IUPred2A
  * Add IUPred2A to PYTHONPATH
* Download and unpack STRIDE
  * Add STRIDE binary to your system PATH


### Install external requirements

#### How to install STRIDE?

A complete installation guide can be found [here](https://github.com/heiniglab/stride) or simply
run the following commands:

```shell
git clone https://github.com/heiniglab/stride
cd stride
make stride
export PATH="$PATH:$PWD"
```

#### How to install IUPred2A?

IUPred2A software is available for free only for academic users and it cannot be used for commercial purpose.
If you are an academic user, then you can download IUPred2A by filling out the following form [here](https://iupred2a.elte.hu/download_new).

```shell
# Step 1: Fill out the form above and download the IUPred2A tar ball
tar -zxf iupred2a.tar.gz
cd iupred2a
export PYTHONPATH="$PWD"
```

### PICNIC is available on PyPI

PICNIC officially supports Python versions >=3.9,<3.13.

```shell
python3 --version
Python 3.11.5

python3 -m venv picnic-env
source picnic-env/bin/activate
(picnic-env) % python -m pip install --upgrade pip
(picnic-env) % python -m pip install picnic_bio
```

### PICNIC is also installable from source

```shell
git clone git@git.mpi-cbg.de:atplab/picnic.git
```

Once you have a copy of the source, you can embed it in your own Python package, or install it into your site-packages easily

```shell
cd picnic
python3 -m venv picnic-env
source picnic-env/bin/activate
(picnic-env) % python -m pip install --upgrade pip
(picnic-env) % python -m pip install .
```

### How to install PICNIC using Conda?

There isn't any binary installer available on Conda yet. Though it is possible to install PICNIC within a virtual Conda environment.

Please note that in a conda environment you have to pre-install catboost, before installing picnic-bio itself, otherwise the installation will fail when compiling the catboost package from source code. Also it is recommended to use and set up [conda-forge](https://conda-forge.org/docs/user/introduction.html) to fetch pre-compiled versions of catboost.

We have documented how to get around the catboost installation issue.

```shell
conda config --add channels conda-forge
conda config --set channel_priority strict

# Choose one of the supported Python versions, when creating the Conda environment: >=3.9,<3.13
# conda create -n myenv python=[3.9, 3.10, 3.11, 3.12] catboost
# e.g.
conda create -n myenv python=3.11 catboost
conda activate myenv
(myenv) % python -m pip install picnic_bio
```

## How to use?

### Usage - Using PICNIC from command line

PICNIC now exposes two subcommands — **auto** and **manual** — instead of the previous positional `is_automated` flag.

```
usage: PICNIC [-h] {auto,manual} ...

PICNIC (Proteins Involved in CoNdensates In Cells) is a machine learning-based
model that predicts proteins involved in biomolecular condensates.

subcommands:
  {auto,manual}
    auto              Automated pipeline: downloads the AlphaFold2 model and
                      FASTA sequence from public APIs for a given UniProt
                      accession (works for proteins < 1400 aa deposited to
                      UniProtKB).
    manual            Manual pipeline: provide a local AlphaFold2 PDB model
                      file. The FASTA sequence is extracted directly from the
                      PDB file using BioPython.
```

#### `auto` subcommand

```
usage: PICNIC auto [-h] [--with-go] [-o DIR] uniprot_id

positional arguments:
  uniprot_id    UniProt accession of the target protein (e.g. P12345).

options:
  -h, --help    show this help message and exit
  --with-go     Calculate the PICNIC_GO score (model variant that includes
                Gene Ontology features). GO terms are retrieved from
                UniProtKB via the uniprot_id.
  -o DIR        Path to the output folder. Default: $CWD/picnic-bio-output
```

#### `manual` subcommand

```
usage: PICNIC manual [-h] [--with-go] [-o DIR] pdb_file

positional arguments:
  pdb_file      Path to the AlphaFold2 PDB file. The filename must follow
                the AlphaFold naming convention:
                AF-<uniprot_id>-F<i>-v<j>.pdb (e.g. AF-P12345-F1-v4.pdb).

options:
  -h, --help    show this help message and exit
  --with-go     Calculate the PICNIC_GO score (model variant that includes
                Gene Ontology features). GO terms are retrieved from
                UniProtKB via the uniprot_id parsed from the filename.
  -o DIR        Path to the output folder. Default: $CWD/picnic-bio-output
```

### Examples

Run the automated pipeline for a given UniProt accession:
```shell
picnic auto Q99720
```
Run the automated pipeline and also compute the PICNIC-GO score:
```shell
picnic auto Q99720 --with-go
```
Run the automated pipeline and write results to a custom output directory:
```shell
picnic auto Q99720 -o /path/to/output
```
Run the manual pipeline by providing a local AlphaFold2 PDB file:
```shell
picnic manual notebooks/test_files/O95613/AF-O95613-F1-v4.pdb
```
Run the manual pipeline with PICNIC-GO and a custom output path:
```shell
picnic manual notebooks/test_files/O95613/AF-O95613-F1-v4.pdb --with-go -o /path/to/output
```
Examples of using PICNIC are shown in a jupyter-notebook in notebooks folder.

### How to run the provided Jupyter notebook?

Examples of how to use and run PICNIC are shown in a provided Jupyter notebook. The notebook can be found under the
**notebooks** folder.

#### What is Jupyter Notebook?

Please read documentation [here](https://saturncloud.io/blog/how-to-launch-jupyter-notebook-from-your-terminal/#what-is-jupyter-notebook).


#### How to create a virtual environment and install all required Python packages.

Create a virtual environment by executing the command venv:
```shell
python -m venv /path/to/new/virtual/environment
# e.g.
python -m venv my_jupyter_env
```

Then install the classic Jupyter Notebook with:
```shell
source my_jupyter_env/bin/activate

pip install notebook
```
Also install picnic-bio from source in the same virtual environment...
```shell
pip install .
```

#### How to Launch Jupyter Notebook from Your Terminal?

In your terminal source the previously created virtual environment...
```shell
source my_jupyter_env/bin/activate
```
Launch Jupyter Notebook...
```shell
jupyter notebook
```
Open the example notebook called 'picnic_examples.ipynb' under the notebooks folder.  

## Publication
***PICNIC accurately predicts condensate-forming proteins regardless of their structural disorder across organisms.***
Anna Hadarovich, Hari Raj Singh, Soumyadeep Ghosh, Maxim Scheremetjew, Nadia Rostam, Anthony A. Hyman & Agnes Toth-Petroczy. 
Nature Communications volume 15, Article number: 10668 (2024). doi: [10.1038/s41467-024-55089-x](https://doi.org/10.1038/s41467-024-55089-x). PMID: [39663388](https://pubmed.ncbi.nlm.nih.gov/39663388/).

<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.
