Metadata-Version: 2.3
Name: MLinvitroTox
Version: 0.2.2
Summary: MLinvitroTox performs high-throughput hazard-based prioritization of high-resolution mass spectrometry data.
Author-email: Katarzyna Arturi <kasia.arturi@eawag.ch>, Lilian Gasser <lilian.gasser@sdsc.ethz.ch>, Matthias Meyer <matthias.meyer@sdsc.ethz.ch>, Eliza Harris <eliza.harris@sdsc.ethz.ch>
License-File: LICENSE
Requires-Python: >=3.8
Requires-Dist: cdk-pywrapper>=0.1.0
Requires-Dist: click>=8.1.7
Requires-Dist: filetype>=1.2.0
Requires-Dist: imbalanced-learn>=0.12.3
Requires-Dist: matplotlib>=3.9.0
Requires-Dist: numpy>=1.26.4
Requires-Dist: openbabel-wheel>=3.1.1.19
Requires-Dist: pandas>=2.2.2
Requires-Dist: plotly>=5.22.0
Requires-Dist: pyyaml>=6.0.1
Requires-Dist: rdkit>=2023.9.6
Requires-Dist: scikit-learn>=1.5.0
Requires-Dist: scipy>=1.13.1
Requires-Dist: seaborn>=0.13.2
Requires-Dist: streamlit>=1.35.0
Requires-Dist: tqdm>=4.66.4
Requires-Dist: xgboost>=2.0.3
Description-Content-Type: text/markdown

# MLinvitroTox

MLinvitroTox performs high-throughput hazard-based prioritization of high-resolution mass spectrometry data.


## A. Project description

MLinvitroTox is an open-source Python package developed to provide a fully automated high-throughput pipeline for hazard-driven prioritization of toxicologically relevant signals among tens of thousands of signals commonly detected in complex environmental samples through nontarget high-resolution mass spectrometry (NTS HRMS/MS). It is a machine learning (ML) framework comprising 490 independent XGBoost classifiers trained on molecular fingerprints from chemical structures and target specific endpoints from the ToxCast/Tox21 [invitroDBv4.1 database](https://www.epa.gov/comptox-tools/exploring-toxcast-data). In contrast to the classical approaches for ML-based toxicity prediction, MLinvitroTox predicts a bioactivity fingerprint for each unidentified HRMS feature (a distinct m/z ion) based on the molecular fingerprints derived from MS2 fragmentation spectra, rather than its chemical structure. The 490-bit binary bioactivity fingerprints are used as the basis for prioritizing the HRMS features towards further elucidation and analytical confirmation. This approach adds toxicological relevance to environmental analysis by focusing the time-consuming molecular identification efforts on features most likely to cause adverse effects instead of the most intense ones. MlinvitroTox enhances the interpretability by providing applicability domain, prediction probabilities, model accuracy, and cumulative contribution of endpoints for mechanistic targets, as well as feature importance analysis. In addition to its core functionality of predicting bioactivity from molecular fingerprints derived from MS2 data, the full release of MLinvitroTox will also support:

- standardization of custom molecular structures
- generation of molecular fingerprints for custom molecular structures
- prediction of bioactivity from structures [smiles](https://archive.epa.gov/med/med_archive_03/web/html/smiles.html)
- validatation of SIRIUS' accuracy in predicting molecular fingerprints


## B. Getting started

Currently, the package is only available on PyPI and can be installed as follows. 

```
pip install mlinvitrotox
```


## C. Example / Usage

Have a look at the [tutorial](https://renkulab.io/projects/expectmine/mlinvitrotox-tutorial). 

MLinvitroTox will work with SIRIUS output up to [v5.8.6](https://github.com/bright-giant/sirius/releases/tag/v5.8.6), but not the latest release v6.0.4 (work in progress).


## D. Development

If you are interested in the project and the package, please reach out to <lilian.gasser@sdsc.ethz.ch>.


## References
- Arturi et al. (2024) "MLinvitroTox reloaded for high-throughput hazard-based prioritization of HRMS data." (In preparation).
- Arturi, Katarzyna, and Juliane Hollender. "Machine learning-based hazard-driven prioritization of features in nontarget screening of environmental high-resolution mass spectrometry data." Environmental Science & Technology 57, no. 46 (2023): 18067-18079.
- Dührkop, Kai, Markus Fleischauer, Marcus Ludwig, Alexander A. Aksenov, Alexey V. Melnik, Marvin Meusel, Pieter C. Dorrestein, Juho Rousu, and Sebastian Böcker. "SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information." Nature methods 16, no. 4 (2019): 299-302.
- Abedini, Jaleh, Bethany Cook, Shannon Bell, Xiaoqing Chang, Neepa Choksi, Amber B. Daniel, David Hines et al. "Application of new approach methodologies: ICE tools to support chemical evaluations." Computational Toxicology 20 (2021): 100184.
- Richard, Ann M., Richard S. Judson, Keith A. Houck, Christopher M. Grulke, Patra Volarath, Inthirany Thillainadarajah, Chihae Yang et al. "ToxCast chemical landscape: paving the road to 21st century toxicology." Chemical research in toxicology 29, no. 8 (2016): 1225-1251.
