Metadata-Version: 2.4
Name: pyXLMS
Version: 1.8.8
Summary: A python package to process protein cross-linking data.
Author-email: Micha Johannes Birklbauer <micha.birklbauer@fh-hagenberg.at>
Maintainer-email: Micha Johannes Birklbauer <micha.birklbauer@fh-hagenberg.at>
License: MIT License
        
        Copyright (c) 2024 Micha Johannes Birklbauer
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://hgb-bin-proteomics.github.io/pyXLMS
Project-URL: Documentation, https://hgb-bin-proteomics.github.io/pyXLMS-docs
Project-URL: Repository, https://github.com/hgb-bin-proteomics/pyXLMS.git
Project-URL: Issues, https://github.com/hgb-bin-proteomics/pyXLMS/issues
Keywords: crosslink,crosslinker,crosslinking,mass spectrometry,proteomics
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Development Status :: 5 - Production/Stable
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: openpyxl
Requires-Dist: tqdm
Requires-Dist: pyteomics[XML]
Requires-Dist: biopython
Requires-Dist: biopandas
Requires-Dist: matplotlib
Requires-Dist: matplotlib-venn
Requires-Dist: pyarrow
Requires-Dist: lxml
Requires-Dist: requests
Provides-Extra: gui
Requires-Dist: streamlit>=1.50.0; extra == "gui"
Requires-Dist: xlsxwriter; extra == "gui"
Requires-Dist: uv; extra == "gui"
Provides-Extra: docs
Requires-Dist: sphinx; extra == "docs"
Requires-Dist: pydata-sphinx-theme; extra == "docs"
Requires-Dist: myst-parser; extra == "docs"
Requires-Dist: sphinx-copybutton; extra == "docs"
Provides-Extra: dev
Requires-Dist: streamlit>=1.50.0; extra == "dev"
Requires-Dist: xlsxwriter; extra == "dev"
Requires-Dist: jupyterlab; extra == "dev"
Requires-Dist: sphinx; extra == "dev"
Requires-Dist: pydata-sphinx-theme; extra == "dev"
Requires-Dist: myst-parser; extra == "dev"
Requires-Dist: sphinx-copybutton; extra == "dev"
Requires-Dist: uv; extra == "dev"
Requires-Dist: ty; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: pyright; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Dynamic: license-file

# pyXLMS
_a python package to process protein cross-linking data_

**pyXLMS** is a python package and web application with graphical user interface that aims to simplify and streamline the intermediate step of
connecting crosslink search engine results with down-stream analysis tools, enabling researchers even without bioinformatics knowledge to
conduct in-depth crosslink analyses and shifting the focus from data transformation to data interpretation and therefore gaining biological
insight. Currently pyXLMS supports input from several different crosslink search engines including:
[MaxLynx (part of MaxQuant)](https://www.maxquant.org/),
[MeroX](https://www.stavrox.com/),
[MS Annika](https://github.com/hgb-bin-proteomics/MSAnnika),
[pLink 2 and pLink 3](http://pfind.ict.ac.cn/se/plink/),
[Scout](https://github.com/diogobor/Scout),
[xiSearch](https://www.rappsilberlab.org/software/xisearch/) and [xiFDR](https://www.rappsilberlab.org/software/xifdr/),
[XlinkX](https://docs.thermofisher.com/r/XlinkX-3.2-Quick-Start-Guide/),
as well as the [mzIdentML format](https://www.psidev.info/mzidentml)
of the HUPO Proteomics Standards Initiative, and a well-documented and
[human-readable custom tabular format](https://github.com/hgb-bin-proteomics/pyXLMS/blob/master/docs/format.md).
Down-stream analysis is facilitated by functionality that is directly available within pyXLMS such as validation, annotation, aggregation, filtering, and visualization - and [much more](https://hgb-bin-proteomics.github.io/pyXLMS/modules.html) - of crosslink-spectrum-matches and crosslinks. In addition, the data can easily be exported to the required data format of the various available down-stream analysis tools such as
[AlphaLink2](https://github.com/Rappsilber-Laboratory/AlphaLink2),
[ProXL](https://www.yeastrc.org/proxl_public/),
[xiNET](https://crosslinkviewer.org/index.php),
[xiVIEW](https://www.xiview.org/index.php),
[xiFDR](https://www.rappsilberlab.org/software/xifdr/),
[XlinkDB](https://xlinkdb.gs.washington.edu/xlinkdb/),
[xlms-tools](https://gitlab.com/topf-lab/xlms-tools),
PyMOL (via [PyXlinkViewer](https://github.com/BobSchiffrin/PyXlinkViewer)),
ChimeraX (via [XMAS](https://github.com/ScheltemaLab/ChimeraX_XMAS_bundle)),
or [IMP-X-FDR](https://github.com/vbc-proteomics-org/imp-x-fdr).

## Installation

pyXLMS supports python **version 3.7 and greater**!

pyXLMS can easily be installed via pip:
```
pip install pyxlms
```

## Quick Start

After installation you can use pyXLMS in python like this:

_This example shows reading of MS Annika crosslink-spectrum-matches and exporting_
_them to xiFDR format for external validation._

```python
>>> import pyXLMS
>>> pr = pyXLMS.parser.read(
...     "data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_CSMs.xlsx",
...     engine="MS Annika",
...     crosslinker="DSS"
... )
Reading MS Annika CSMs...: 100%|████████████████| 826/826 [00:00<00:00, 20731.70it/s]
>>> _ = pyXLMS.transform.summary(pr)
Number of CSMs: 826.0
Number of unique CSMs: 826.0
Number of intra CSMs: 803.0
Number of inter CSMs: 23.0
Number of target-target CSMs: 786.0
Number of target-decoy CSMs: 39.0
Number of decoy-decoy CSMs: 1.0
Minimum CSM score: 1.11
Maximum CSM score: 452.99
>>> _ = pyXLMS.exporter.to_xifdr(
...     pr["crosslink-spectrum-matches"],
...     filename="msannika_CSMs_for_xiFDR.csv"
... )
```

_For python projects using pyXLMS we also provide a project [template](https://github.com/hgb-bin-proteomics/pyXLMS-template)!_

## Web App

The web app is publicly accessible for free via [hgb-bin-proteomics.github.io/pyXLMS-app](https://hgb-bin-proteomics.github.io/pyXLMS-app).

Additionally, it can be run locally or self-hosted as described here: [pyXLMS Web Application](https://github.com/hgb-bin-proteomics/pyXLMS/blob/master/gui/README.md).

## User Guide, Examples and Documentation

- A user guide that documents all available functionality is available via [hgb-bin-proteomics.github.io/pyXLMS-docs](https://hgb-bin-proteomics.github.io/pyXLMS-docs).
- Example jupyter notebooks can be found in `/examples`.
- A full documentation of the python package can be accessed via [hgb-bin-proteomics.github.io/pyXLMS](https://hgb-bin-proteomics.github.io/pyXLMS).

## FAQ

Not sure if pyXLMS is what you are looking for? You can find a collection of common questions and answers about pyXLMS
[here](https://github.com/hgb-bin-proteomics/pyXLMS/discussions/categories/q-a) or in the
[user guide](https://hgb-bin-proteomics.github.io/pyXLMS-docs) under `Documentation` ➡️ `FAQ`.

## Limitations

Despite our best efforts pyXLMS still comes with some limitations that are a direct result of the differences in the output formats of crosslink search engines.
Many crosslink search engines do not report any kind of decoy matches which makes validation in pyXLMS or export to xiFDR impossible which is why **we recommend**
**using validated results for pyXLMS**. Validation within pyXLMS is currently only supported for MaxLynx, MS Annika, and xiSearch.

Furthermore, the different down-stream
analysis tools require varying input information which might not be consistently available from all crosslink search engines. Some of this can be mitigated by functionality
in pyXLMS such as annotation or by additional information that needs to be passed to pyXLMS for a successful export. Generally, the export to all downstream analysis tools
should work for all crosslink search engines and input formats, with the exception of the export to xiFDR which is limited to MaxLynx, MS Annika, and xiSearch for above reasons.
For safety pyXLMS makes sure before the export that all the required information is available and will otherwise throw an error. For more information please check the specific
export pages in the [user guide](https://hgb-bin-proteomics.github.io/pyXLMS-docs) and the [documentation](https://hgb-bin-proteomics.github.io/pyXLMS).

The web app supports most of the features of the python package, features that are not supported in the web app are listed in the [user guide](https://hgb-bin-proteomics.github.io/pyXLMS-docs)
under `Documentation` ➡️ `Web Application` ➡️ `Feature Support`.

Interacting with [STRING](https://string-db.org/) requires an active internet connection and depends on the service availability of STRING.

## Citing

If you are using pyXLMS please cite the following publication:

- Manuscript in preparation
  ```
  (wip)
  ```

## Acknowledgements

We thank Melanie Birklbauer for designing the logo.

## Contact

- [proteomics@fh-hagenberg.at](mailto:proteomics@fh-hagenberg.at)
- [micha.birklbauer@fh-hagenberg.at](mailto:micha.birklbauer@fh-hagenberg.at) (primary developer)
