Metadata-Version: 2.4
Name: af-analysis
Version: 0.2.1
Summary: `AF analysis` is a python library allowing analysis of Alphafold results.
Author-email: Samuel Murail <samuel.murail@u-paris.fr>
License:  GPL-2.0
Project-URL: Homepage, https://github.com/samuelmurail/af_analysis
Keywords: AlphaFold2,ColabFold,Python,af_analysis
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Software Development
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pdb_cpp>=0.0.2
Requires-Dist: pandas>=1.3.4
Requires-Dist: numpy>=1.21
Requires-Dist: tqdm>=4.0
Requires-Dist: seaborn>=0.11
Requires-Dist: cmcrameri>=1.7
Requires-Dist: nglview>=3.0
Requires-Dist: ipywidgets>=7.6
Requires-Dist: mdanalysis>=2.4
Requires-Dist: scikit-learn
Provides-Extra: gui
Requires-Dist: plotly>=5.0; extra == "gui"
Requires-Dist: flask>=2.3; extra == "gui"
Dynamic: license-file

[![Documentation Status](https://readthedocs.org/projects/af-analysis/badge/?version=latest)](https://af-analysis.readthedocs.io/en/latest/?badge=latest)
[![codecov](https://codecov.io/gh/samuelmurail/af_analysis/graph/badge.svg?token=WOJYQKKOP7)](https://codecov.io/gh/samuelmurail/af_analysis)
[![Build Status](https://dev.azure.com/samuelmurailRPBS/af_analysis/_apis/build/status%2Fsamuelmurail.af_analysis?branchName=main)](https://dev.azure.com/samuelmurailRPBS/af_analysis/_build/latest?definitionId=2&branchName=main)
[![PyPI - Version](https://img.shields.io/pypi/v/af-analysis)](https://pypi.org/project/af-analysis/)
[![Downloads](https://static.pepy.tech/badge/af2-analysis)](https://pepy.tech/project/af2-analysis)
[![status](https://joss.theoj.org/papers/0c359e32dc2f159688848361530239f5/status.svg)](https://joss.theoj.org/papers/0c359e32dc2f159688848361530239f5)
[![License: GPL v2](https://img.shields.io/badge/License-GPL%20v2-blue.svg)](https://www.gnu.org/licenses/old-licenses/gpl-2.0.html)
[![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/samuelmurail/af_analysis/blob/main/basic_example_colab.ipynb)
[![doi](https://zenodo.org/badge/DOI/10.5281/zenodo.14859764.svg)](https://doi.org/10.5281/zenodo.14859764)


# About Alphafold Analysis

<img src="https://raw.githubusercontent.com/samuelmurail/af_analysis/master/docs/source/logo.jpeg" alt="AF Analysis Logo" width="300" style="display: block; margin: auto;"/>

`af-analysis` is a python package for the analysis of AlphaFold protein structure predictions.
This package is designed to simplify and streamline the process of working with protein structures
generated by:

* [AlphaFold 2][AF2]
* [AlphaFold 3][AF3]
* [ColabFold][ColabFold]
* [AlphaFold-Multimer][AF2-M]
* [AlphaPulldown][AlphaPulldown]
* [Boltz1][Boltz1]
* [Chai-1][Chai1]
* [MassiveFold][MassiveFold]


Source code repository:
   [https://github.com/samuelmurail/af_analysis](https://github.com/samuelmurail/af_analysis)

## Statement of Need

AlphaFold 2 and its derivatives have revolutionized protein structure prediction, achieving remarkable accuracy.
Analyzing the abundance of resulting structural models can be challenging and time-consuming.
Existing tools often require separate scripts for calculating various quality metrics (pDockQ, pDockQ2, LIS score) and assessing model diversity.
`af-analysis` addresses these challenges by providing a unified and user-friendly framework for in-depth analysis of AlphaFold 2 results.

## Main features

* Import AlphaFold or ColabFold prediction directories as pandas DataFrames for efficient data handling.
* Calculate and add additional structural quality metrics to the DataFrame, including:
  * pDockQ
  * pDockQ2
  * LIS score (cLIS and iLIS scores)
  * ipSAE (and the ipTM matrix derived from PAE)
* Visualize predicted protein models.
* Cluster generated models to identify diverse conformations.
* Select the best models based on defined criteria.
* Add your custom metrics to the DataFrame for further analysis.

## Installation

* `af-analysis` is available on PyPI and can be installed using ``pip``:

```bash
pip install af_analysis
```

* You can install last version from the github repo:

```bash
pip install git+https://github.com/samuelmurail/af_analysis.git@main
```

* AF-Analysis can also be installed easily through github:

```bash
git clone https://github.com/samuelmurail/af_analysis
cd af_analysis
pip install .
```

### Optional GUI (Flask)

You can install and launch the GUI with:

```bash
pip install "af-analysis[gui]"
af_analysis_gui
```

Then open `http://127.0.0.1:5000` in your browser. The GUI allows loading result folders, viewing tables, selecting models, and plotting pLDDT/PAE.

* For developers, you can install the package in editable mode:

```bash
git clone https://github.com/samuelmurail/af_analysis
cd af_analysis
pip install -e .
```

## Conda environment

A conda environment file is provided to create an environment with all dependencies:

```bash
conda env create -f environment.yml
conda activate af_analysis
```

## Documentation

The complete documentation is available at [ReadTheDocs](https://af-analysis.readthedocs.io/en/latest/).

* A notebook showing the basic usage of the `af_analysis` library can be found [here](https://af-analysis.readthedocs.io/en/latest/notebooks/basic_example.html).

* Alternatively you can test is directly on Google colab:

    [![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/samuelmurail/af_analysis/blob/main/basic_example_colab.ipynb)

## Usage

### Importing data

Create the `Data` object, giving the path of the directory containing the results of the alphafold2/colabfold run. 

```python
import af_analysis
my_data = af_analysis.Data('MY_AF_RESULTS_DIR')
```

In most cases, the `Data` object will automatically detect the format of the results (AlphaFold 2, AlphaFold 3, ColabFold). If needed, you can specify the format using the `format` argument:

```python
my_data = af_analysis.Data('MY_AF_RESULTS_DIR', format='afpulldown')
```

Extracted data are available in the `df` attribute of the `Data` object. 

```python
my_data.df
```


### Analysis

* The `analysis` package contains several function to add metrics like [pdockQ][pdockq] and [pdockQ2][pdockq2]:

```python
from af_analysis import analysis
analysis.pdockq(my_data)
analysis.pdockq2(my_data)
```

### Docking Analysis

* The `docking` package contains several function to add metrics like [LIS Score][LIS] and [ipSAE][ipsae]:

```python
from af_analysis import docking
docking.LIS_pep(my_data)
docking.ipSAE(my_data)
```

### Plots

* At first approach the user can visualize the pLDDT, PAE matrix and the model scores. The ``show_info()`` function displays the scores of the models, as well as the pLDDT plot and PAE matrix in a interactive way.

<img src="https://raw.githubusercontent.com/samuelmurail/af_analysis/master/docs/source/_static/show_info.gif" alt="Interactive Visualization" width="100%" style="display: block; margin: auto;"/>

* plot msa, plddt and PAE:

```python
my_data.plot_msa()
my_data.plot_plddt([0,1])
best_model_index = my_data.df['ranking_confidence'].idxmax()
my_data.plot_pae(best_model_index)
```

* show 3D structure (`nglview` package required):

```python
my_data.show_3d(my_data.df['ranking_confidence'].idxmax())
```

## GUI

`af-analysis` includes an optional web-based graphical user interface (GUI) built with Flask. It allows you to load result folders, browse model tables, select models, and plot pLDDT and PAE interactively — without writing any code.

Model structures can also be visualized in 3D using `Mol*` directly in the browser. The GUI is designed to be user-friendly and accessible to researchers who may not be comfortable with command-line tools.

pdockq2, LIS score, and ipSAE can also be calculated directly from the GUI, allowing users to quickly assess the quality of their models and make informed decisions about which ones to focus on for further analysis.


<img src="https://raw.githubusercontent.com/samuelmurail/af_analysis/master/docs/source/af_analysis_GUI.png" alt="AF Analysis GUI" width="100%" style="display: block; margin: auto;"/>

Install and launch the GUI with:

```bash
pip install "af-analysis[gui]"
af_analysis_gui
```

Then open `http://127.0.0.1:5000` in your browser.

## Dependencies

`af_analysis` requires the following dependencies:

* `pdb_cpp`
* `pandas`
* `numpy`
* `tqdm`
* `seaborn`
* `cmcrameri`
* `nglview`
* `ipywidgets`
* `mdanalysis`

as well as the optional dependencies for the GUI:

* `Flask`
* `plotly`

## Contributing

`af-analysis` is an open-source project and contributions are welcome. If
you find a bug or have a feature request, please open an issue on the GitHub
repository at https://github.com/samuelmurail/af_analysis. If you would like
to contribute code, please fork the repository and submit a pull request.

## Authors

* Alaa Regei, Graduate Student - [Université Paris Cité](https://u-paris.fr).
* [Samuel Murail](https://samuelmurail.github.io/PersonalPage/>), Associate Professor - [Université Paris Cité](https://u-paris.fr), [CMPLI](http://bfa.univ-paris-diderot.fr/equipe-8/>), [RPBS platform](https://bioserv.rpbs.univ-paris-diderot.fr/).

See also the list of [contributors](https://github.com/samuelmurail/af_analysis/contributors) who participated in this project.

## Release a new package version - Only for maintainers

To release a new version of the package, follow these steps:

1. Commit the changes and push to GitHub:

```bash
git add .
git commit -m "Update of ..."
git push origin main
```

2. Update the version number in using [bump-my-version](https://pypi.org/project/bump-my-version/):

```bash
bump-my-version bump <part>
```

where `<part>` is one of `major`, `minor`, or `patch` depending on the type of release.

3. Commit the changes and push to GitHub:

```bash
git add .
git commit -m "Bump version to x.y.z"
git push origin main
```

4. Create the pypi package and upload it:

```bash
make release
```

Remember that a valid `.pypirc` file must be present in your home directory with the correct credentials.

## Citing this work

If you use the code of this package, please cite:

- Reguei A and Murail S. Af-analysis: a Python package for Alphafold analysis. <br>
  Journal of Open Source Software (2025) doi: [10.21105/joss.07577](https://joss.theoj.org/papers/10.21105/joss.07577)

```bibtex
@Article{reguei_af-analysis_2025,
	title = {Af-analysis: a {Python} package for {Alphafold} analysis},
	volume = {10},
	issn = {2475-9066},
	shorttitle = {Af-analysis},
	url = {https://joss.theoj.org/papers/10.21105/joss.07577},
	doi = {10.21105/joss.07577},
	language = {en},
	number = {107},
	urldate = {2025-03-14},
	journal = {Journal of Open Source Software},
	author = {Reguei, Alaa and Murail, Samuel},
	month = mar,
	year = {2025},
	pages = {7577},
}
```

## License

This project is licensed under the GNU General Public License version 2 - see the `LICENSE` file for details.

# References

* Jumper et al. Nature (2021) doi: [10.1038/s41586-021-03819-2][AF2]
* Abramson et al. Nature (2024) doi: [10.1038/s41586-024-07487-w][AF3]
* Mirdita et al. Nature Methods (2022) doi: [10.1038/s41592-022-01488-1][ColabFold]
* Evans et al. bioRxiv (2021) doi: [10.1101/2021.10.04.463034][AF2-M]
* Bryant et al. Nat. Commun. (2022) doi: [10.1038/s41467-022-28865-w][pdockq]
* Zhu et al. Bioinformatics (2023) doi: [10.1093/bioinformatics/btad424][pdockq2]
* Kim et al. bioRxiv (2024) doi: [10.1101/2024.02.19.580970][LIS]
* Yu et al. Bioinformatics (2023) doi: [10.1093/bioinformatics/btac749][AlphaPulldown]
* Wohlwend et al. bioRxiv (2024) doi: [10.1101/2024.11.19.624167][Boltz1]
* Chai Discovery et al. bioRxiv (2024) doi:[10.1101/2024.10.10.615955v2][Chai1]
* MassiveFold Raouraoua et al. Nat. Comput. Sci. (2024) doi:[10.1038/s43588-024-00714-4][MassiveFold]
* Dunbrack. Biorxiv (2025) doi: [10.1101/2025.02.10.637595][ipsae]

[AF2]: https://www.nature.com/articles/s41586-021-03819-2 "Jumper et al. Nature (2021) doi: 10.1038/s41586-021-03819-2"
[AF3]: https://www.nature.com/articles/s41586-024-07487-w "Abramson et al. Nature (2024) doi: 10.1038/s41586-024-07487-w"
[ColabFold]: https://www.nature.com/articles/s41592-022-01488-1 "Mirdita et al. Nat Methods (2022) doi: 10.1038/s41592-022-01488-1"
[AF2-M]: https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2 "Evans et al. bioRxiv (2021) doi: 10.1101/2021.10.04.463034"
[pdockq]: https://www.nature.com/articles/s41467-022-28865-w "Bryant et al. Nat Commun (2022) doi: 10.1038/s41467-022-28865-w"
[pdockq2]: https://academic.oup.com/bioinformatics/article/39/7/btad424/7219714 "Zhu et al. Bioinformatics (2023) doi: 10.1093/bioinformatics/btad424"
[LIS]: https://www.biorxiv.org/content/10.1101/2024.02.19.580970v1 "Kim et al. bioRxiv (2024) doi: 10.1101/2024.02.19.580970 "
[AlphaPulldown]: https://doi.org/10.1093/bioinformatics/btac749 "Yu et al. Bioinformatics (2023) doi: 10.1093/bioinformatics/btac749"
[Boltz1]: https://doi.org/10.1101/2024.11.19.624167 "Wohlwend et al. bioRxiv (2024) doi: 10.1101/2024.11.19.624167"
[Chai1]: https://doi.org/10.1101/2024.10.10.615955v2 "Chai Discovery et al. bioRxiv (2024) doi: 10.1101/2024.10.10.615955v2"
[MassiveFold]: https://doi.org/10.1038/s43588-024-00714-4 "Raouraoua et al. Nat Comput Sci (2024) doi: 10.1038/s43588-024-00714-4"
[ipsae]: https://www.biorxiv.org/content/10.1101/2025.02.10.637595v1 "Dunbrack. Biorxiv (2025) doi: 10.1101/2025.02.10.637595"


## TODO

- check with the original implementation of pDockQ2 that the results are the same
- Same for LIS/LIA
