Metadata-Version: 2.4
Name: af-analysis
Version: 0.1.5
Summary: `AF analysis` is a python library allowing analysis of Alphafold results.
Author-email: Samuel Murail <samuel.murail@u-paris.fr>
License:  GPL-2.0
Project-URL: Homepage, https://github.com/samuelmurail/af_analysis
Keywords: AlphaFold2,ColabFold,Python,af_analysis
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Software Development
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pdb_numpy>=0.0.12
Requires-Dist: pandas>=1.3.4
Requires-Dist: numpy>=1.21
Requires-Dist: tqdm>=4.0
Requires-Dist: seaborn>=0.11
Requires-Dist: cmcrameri>=1.7
Requires-Dist: nglview>=3.0
Requires-Dist: ipywidgets>=7.6
Requires-Dist: mdanalysis>=2.4
Requires-Dist: scikit-learn
Dynamic: license-file

[![Documentation Status](https://readthedocs.org/projects/af-analysis/badge/?version=latest)](https://af-analysis.readthedocs.io/en/latest/?badge=latest)
[![codecov](https://codecov.io/gh/samuelmurail/af_analysis/graph/badge.svg?token=WOJYQKKOP7)](https://codecov.io/gh/samuelmurail/af_analysis)
[![Build Status](https://dev.azure.com/samuelmurailRPBS/af_analysis/_apis/build/status%2Fsamuelmurail.af_analysis?branchName=main)](https://dev.azure.com/samuelmurailRPBS/af_analysis/_build/latest?definitionId=2&branchName=main)
[![PyPI - Version](https://img.shields.io/pypi/v/af-analysis)](https://pypi.org/project/af-analysis/)
[![Downloads](https://static.pepy.tech/badge/af2-analysis)](https://pepy.tech/project/af2-analysis)
[![status](https://joss.theoj.org/papers/0c359e32dc2f159688848361530239f5/status.svg)](https://joss.theoj.org/papers/0c359e32dc2f159688848361530239f5)
[![License: GPL v2](https://img.shields.io/badge/License-GPL%20v2-blue.svg)](https://www.gnu.org/licenses/old-licenses/gpl-2.0.html)
[![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/samuelmurail/af_analysis/blob/main/basic_example_colab.ipynb)
[![doi](https://zenodo.org/badge/DOI/10.5281/zenodo.14859764.svg)](https://doi.org/10.5281/zenodo.14859764)


# About Alphafold Analysis

<img src="https://raw.githubusercontent.com/samuelmurail/af_analysis/master/docs/source/logo.jpeg" alt="AF Analysis Logo" width="300" style="display: block; margin: auto;"/>

`af-analysis` is a python package for the analysis of AlphaFold protein structure predictions.
This package is designed to simplify and streamline the process of working with protein structures
generated by:

* [AlphaFold 2][AF2]
* [AlphaFold 3][AF3]
* [ColabFold][ColabFold]
* [AlphaFold-Multimer][AF2-M]
* [AlphaPulldown][AlphaPulldown]
* [Boltz1][Boltz1]
* [Chai-1][Chai1]
* [MassiveFold][MassiveFold]


Source code repository:
   [https://github.com/samuelmurail/af_analysis](https://github.com/samuelmurail/af_analysis)

## Statement of Need

AlphaFold 2 and its derivatives have revolutionized protein structure prediction, achieving remarkable accuracy.
Analyzing the abundance of resulting structural models can be challenging and time-consuming.
Existing tools often require separate scripts for calculating various quality metrics (pDockQ, pDockQ2, LIS score) and assessing model diversity.
`af-analysis` addresses these challenges by providing a unified and user-friendly framework for in-depth analysis of AlphaFold 2 results.

## Main features

* Import AlphaFold or ColabFold prediction directories as pandas DataFrames for efficient data handling.
* Calculate and add additional structural quality metrics to the DataFrame, including:
  * pDockQ
  * pDockQ2
  * LIS score
* Visualize predicted protein models.
* Cluster generated models to identify diverse conformations.
* Select the best models based on defined criteria.
* Add your custom metrics to the DataFrame for further analysis.

## Installation

* `af-analysis` is available on PyPI and can be installed using ``pip``:

```bash
pip install af_analysis
```

* You can install last version from the github repo:

```bash
pip install git+https://github.com/samuelmurail/af_analysis.git@main
```

* AF-Analysis can also be installed easily through github:

```bash
git clone https://github.com/samuelmurail/af_analysis
cd af_analysis
pip install .
```

## Documentation

The complete documentation is available at [ReadTheDocs](https://af-analysis.readthedocs.io/en/latest/).

* A notebook showing the basic usage of the `af_analysis` library can be found [here](https://af-analysis.readthedocs.io/en/latest/notebooks/basic_example.html).

* Alternatively you can test is directly on Google colab:

    [![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/samuelmurail/af_analysis/blob/main/basic_example_colab.ipynb)

## Usage

### Importing data

Create the `Data` object, giving the path of the directory containing the results of the alphafold2/colabfold run. 

```python
import af_analysis
my_data = af_analysis.Data('MY_AF_RESULTS_DIR')
```

Extracted data are available in the `df` attribute of the `Data` object. 

```python
my_data.df
```

### Analysis

* The `analysis` package contains several function to add metrics like [pdockQ][pdockq] and [pdockQ2][pdockq2]:

```python
from af_analysis import analysis
analysis.pdockq(my_data)
analysis.pdockq2(my_data)
```

### Docking Analysis

* The `docking` package contains several function to add metrics like [LIS Score][LIS]:

```python
from af_analysis import docking
docking.LIS_pep(my_data)
```

### Plots

* At first approach the user can visualize the pLDDT, PAE matrix and the model scores. The ``show_info()`` function displays the scores of the models, as well as the pLDDT plot and PAE matrix in a interactive way.

<img src="https://raw.githubusercontent.com/samuelmurail/af_analysis/master/docs/source/_static/show_info.gif" alt="Interactive Visualization" width="100%" style="display: block; margin: auto;"/>

* plot msa, plddt and PAE:

```python
my_data.plot_msa()
my_data.plot_plddt([0,1])
best_model_index = my_data.df['ranking_confidence'].idxmax()
my_data.plot_pae(best_model_index)
```

* show 3D structure (`nglview` package required):

```python
my_data.show_3d(my_data.df['ranking_confidence'].idxmax())
```

## Dependencies

`af_analysis` requires the following dependencies:

* `pdb_numpy`
* `pandas`
* `numpy`
* `tqdm`
* `seaborn`
* `cmcrameri`
* `nglview`
* `ipywidgets`
* `mdanalysis`

## Contributing

`af-analysis` is an open-source project and contributions are welcome. If
you find a bug or have a feature request, please open an issue on the GitHub
repository at https://github.com/samuelmurail/af_analysis. If you would like
to contribute code, please fork the repository and submit a pull request.

## Authors

* Alaa Regei, Graduate Student - [Université Paris Cité](https://u-paris.fr).
* [Samuel Murail](https://samuelmurail.github.io/PersonalPage/>), Associate Professor - [Université Paris Cité](https://u-paris.fr), [CMPLI](http://bfa.univ-paris-diderot.fr/equipe-8/>), [RPBS platform](https://bioserv.rpbs.univ-paris-diderot.fr/).

See also the list of [contributors](https://github.com/samuelmurail/af_analysis/contributors) who participated in this project.

## Release a new package version

To release a new version of the package, follow these steps:

1. Commit the changes and push to GitHub:

```bash
git add .
git commit -m "Update of ..."
git push origin main
```

2. Update the version number in using [bump-my-version](https://pypi.org/project/bump-my-version/):

```bash
bump-my-version bump <part>
```

where `<part>` is one of `major`, `minor`, or `patch` depending on the type of release.

3. Commit the changes and push to GitHub:

```bash
git add .
git commit -m "Bump version to x.y.z"
git push origin main
```

4. Create the pypi package and upload it:

```bash
make release
```

## Citing this work

If you use the code of this package, please cite:

- Reguei A and Murail S. Af-analysis: a Python package for Alphafold analysis. <br>
  Journal of Open Source Software (2025) doi: [10.21105/joss.07577](https://joss.theoj.org/papers/10.21105/joss.07577)

```bibtex
@Article{reguei_af-analysis_2025,
	title = {Af-analysis: a {Python} package for {Alphafold} analysis},
	volume = {10},
	issn = {2475-9066},
	shorttitle = {Af-analysis},
	url = {https://joss.theoj.org/papers/10.21105/joss.07577},
	doi = {10.21105/joss.07577},
	language = {en},
	number = {107},
	urldate = {2025-03-14},
	journal = {Journal of Open Source Software},
	author = {Reguei, Alaa and Murail, Samuel},
	month = mar,
	year = {2025},
	pages = {7577},
}
```

## License

This project is licensed under the GNU General Public License version 2 - see the `LICENSE` file for details.

# References

* Jumper et al. Nature (2021) doi: [10.1038/s41586-021-03819-2][AF2]
* Abramson et al. Nature (2024) doi: [10.1038/s41586-024-07487-w][AF3]
* Mirdita et al. Nature Methods (2022) doi: [10.1038/s41592-022-01488-1][ColabFold]
* Evans et al. bioRxiv (2021) doi: [10.1101/2021.10.04.463034][AF2-M]
* Bryant et al. Nat. Commun. (2022) doi: [10.1038/s41467-022-28865-w][pdockq]
* Zhu et al. Bioinformatics (2023) doi: [10.1093/bioinformatics/btad424][pdockq2]
* Kim et al. bioRxiv (2024) doi: [10.1101/2024.02.19.580970][LIS]
* Yu et al. Bioinformatics (2023) doi: [10.1093/bioinformatics/btac749][AlphaPulldown]
* Wohlwend et al. bioRxiv (2024) doi: [10.1101/2024.11.19.624167][Boltz1]
* Chai Discovery et al. bioRxiv (2024) doi:[10.1101/2024.10.10.615955v2][Chai1]
* MassiveFold Raouraoua et al. Nat. Comput. Sci. (2024) doi:[10.1038/s43588-024-00714-4][MassiveFold]

[AF2]: https://www.nature.com/articles/s41586-021-03819-2 "Jumper et al. Nature (2021) doi: 10.1038/s41586-021-03819-2"
[AF3]: https://www.nature.com/articles/s41586-024-07487-w "Abramson et al. Nature (2024) doi: 10.1038/s41586-024-07487-w"
[ColabFold]: https://www.nature.com/articles/s41592-022-01488-1 "Mirdita et al. Nat Methods (2022) doi: 10.1038/s41592-022-01488-1"
[AF2-M]: https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2 "Evans et al. bioRxiv (2021) doi: 10.1101/2021.10.04.463034"
[pdockq]: https://www.nature.com/articles/s41467-022-28865-w "Bryant et al. Nat Commun (2022) doi: 10.1038/s41467-022-28865-w"
[pdockq2]: https://academic.oup.com/bioinformatics/article/39/7/btad424/7219714 "Zhu et al. Bioinformatics (2023) doi: 10.1093/bioinformatics/btad424"
[LIS]: https://www.biorxiv.org/content/10.1101/2024.02.19.580970v1 "Kim et al. bioRxiv (2024) doi: 10.1101/2024.02.19.580970 "
[AlphaPulldown]: https://doi.org/10.1093/bioinformatics/btac749 "Yu et al. Bioinformatics (2023) doi: 10.1093/bioinformatics/btac749"
[Boltz1]: https://doi.org/10.1101/2024.11.19.624167 "Wohlwend et al. bioRxiv (2024) doi: 10.1101/2024.11.19.624167"
[Chai1]: https://doi.org/10.1101/2024.10.10.615955v2 "Chai Discovery et al. bioRxiv (2024) doi: 10.1101/2024.10.10.615955v2"
[MassiveFold]: https://doi.org/10.1038/s43588-024-00714-4 "Raouraoua et al. Nat Comput Sci (2024) doi: 10.1038/s43588-024-00714-4"
