Metadata-Version: 2.4
Name: deephalo
Version: 1.0.0
Summary: DeepHalo: A deep learning-integrated workflow for high-throughput discovery of halogenated metabolites from HRMS data.
Author-email: Yunying Xie <xieyy@imb.pumc.edu.cn>, Xin Qi <xq75@163.com>
License-Expression: MIT
Classifier: Programming Language :: Python :: 3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas==2.0.3
Requires-Dist: numpy==1.22.0
Requires-Dist: molmass==2023.8.30
Requires-Dist: scikit-learn==1.3.1
Requires-Dist: tensorflow==2.10.1
Requires-Dist: keras==2.10.0
Requires-Dist: keras_tuner==1.4.6
Requires-Dist: matplotlib==3.8.0
Requires-Dist: pyopenms==3.1.0
Requires-Dist: scipy==1.11.4
Requires-Dist: tomli==2.0.1
Requires-Dist: tomli-w==1.0.0
Requires-Dist: importlib_resources==6.4.0
Requires-Dist: mzml2gnps==1.0.3
Requires-Dist: networkx==3.4.2
Requires-Dist: typer==0.15.1
Dynamic: license-file

# DeepHalo

**A deep learning-integrated workflow for high-throughput discovery of halogenated metabolites from HRMS data.**
---

## Core Features

### 1. Halogen Prediction
- **Element Prediction Model (EPM)**
  - Dual-branch Isotope Neural Network (IsoNN) architecture
  - High accuracy Cl/Br detection (>99.9% precision based on benchmark results)
  - Wide mass range coverage (50-2000 Da)
  - Robust interference resistance to B/Se/Fe/dehydro isomers

### 2. Isotope Pattern Validation
- **Dual Validation System**
  - Mass Dimension: Statistical rule-based correction.
  - Intensity Dimension: Autoencoder-based Anomaly Detection Model (ADM).

### 3. Multi-Level Halogen Confidence Scoring (H-score)
- **Dual levels**
  - Prediction based on centroid-level isotope patterns
  - Prediction based on Scan-level isotope patterns
  - H-score integration for comprehensive assessment on the above both levels

### 3. Enhanced Dereplication
- **Dual-Strategy Approach**
  - MS1-Based Dereplication Using Custom Database Matching
    - Exact mass analysis
    - Halogen presence verification
    - Isotope intensity similarity scoring
  - MS2-Based Dereplication by Integrating GNPS
    - MS2 molecular networking
    - Halogenated compound annotation
    - GraphML file enhancement
---

##  Technical Advantages

- **High Throughput**
  - end-to-end automated analysis
  - Batch processing of unlimited LC-MS/MS datasets
  - Rapid processing (several to dozens of seconds per sample) on standard hardware (Core i9, 16GB RAM)

- **High Accuracy**
  - >98.3% precision in halogen detection across simulated and experimental LC-MS datasets.
  - Comprehensively validation across both simulated and experimental LC-MS datasets

- **Comprehensive Integration**
  - Input: Supports `.mzML` format
  - Output: Cytoscape-compatible network files
  - Seamless integration with GNPS molecular networking

- **Enhanced Dereplication**
  - Embeds halogen prediction results into GNPS output GraphML files
  - Significantly higher dereplicaton rate compared to molecular networking alone
---

## Target Applications
- Natural product discovery  
- Halogenated metabolite annotation  

---

## Key Differentiators
1. Deep leaning-based halogen prediction resistance to Fe/dehydro isomers
2. First Isotope Pattern Validation strategies specific for halogenated molecules
3. hierarchical halogen scoring system (H-score) 
4. Comprehensive dereplication workflow
5. Enhanced GNPS molecular networking

---

*For methodology details and validation datasets, see [Methods](#).*  

## Where to get it？
The source code is hosted on GitHub at: https://github.com/xieyying/DeepHalo

Binary installers of DeepHalo are available at the Python Package Index (PyPI).

## Dependencies
- pandas ==  2.0.3
- numpy ==  1.22.0     
- molmass ==  2023.8.30
- scikit-learn ==  1.3.1    
- tensorflow ==  2.10.1
- keras ==  2.10.0
- keras_tuner ==  1.4.6
- matplotlib ==  3.8.0 
- pyopenms ==  3.1.0
- scipy ==  1.11.4  
- tomli ==  2.0.1
- tomli-w ==  1.0.0
- importlib_resources == 6.4.0
- mzml2gnps == 1.0.3
- networkx == 3.4.2
- typer == 0.15.1

## Installation


**Note**  
Python 3.10 is required. Verify your Python version with:  
```bash
 python --version
```

### Install from PyPI
```bash
pip install DeepHalo
``` 
### Install from Local Wheel
```bash
pip install path/to/DeepHalo-xxx.whl
```

### Install from Source
```bash
git clone https://github.com/xieyying/DeepHalo.git
cd DeepHalo
pip install -e .
```

## Quickstart
### High-throughput Detection of Halogenated Compounds
```bash
halo detect -i /path/to/mzml_files -o /path/to/output_directory -ms2
```
### Dereplication
```bash
halo dereplicate -o /path/to/output_directory -g /path/to/GNPS_results -ud /path/to/custom_database.csv
```
## Full Usage Guide
### Get help
```bash
halo --help                 # Show all commands
halo detect --help    # Detailed parameters for the subcommand 'detect'
halo dereplicate --help  # Detailed parameters for the subcommand 'dereplicate'
```
### Main Functions

- **Analyze mzML file:**
    ```bash
    halo detect -i <input_path> -o <project_path> [-c <config_file>] [-b <blank_samples_dir>] [-ob] [-ms2]
    ```
- **Dereplication:** 
    ```bash
    halo dereplicate -o <project_path> [-g <GNPS_folder>] [-ud <user_database.csv>]
    ```
- **Create training dataset:** 
    ```bash
    halo create-ds <project_path> [-c <config_file>]
    ```
- **Train model:** 
    ```bash
    halo train <project_path> [-c <config_file>] [-m search]
    ```

If you need to modify configuration parameters, edit the config file ([download it here](https://github.com/xieyying/DeepHalo/tree/main/DeepHalo/config.toml)) and override the default settings by specifying:
```bash
-c [user_config_file]
 ```
*See documentation for more applications.*

## License
This code repository is licensed under the [MIT License](LICENSE).
