Metadata-Version: 2.4
Name: filterzyme
Version: 0.0.6
Home-page: https://github.com/HelenSchmid/Filterzyme
Author: Helen Schmid
Author-email: schmid.helen2@gmail.com
License: GPL3
Project-URL: Bug Tracker, https://github.com/HelenSchmid/Filterzyme/issues
Project-URL: Documentation, https://github.com/HelenSchmid/Filterzyme
Project-URL: Source Code, https://github.com/HelenSchmid/Filterzyme
Keywords: util
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: tqdm
Requires-Dist: biopython
Requires-Dist: biotite
Requires-Dist: matplotlib
Requires-Dist: seaborn
Requires-Dist: rdkit
Requires-Dist: freesasa
Requires-Dist: plip
Requires-Dist: enzymetk
Requires-Dist: docko
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: project-url
Dynamic: requires-dist

# Filterzyme

Structural filtering pipeline using docking and active site heuristics to prioritze ML-predicted enzyme variants for experimental validation. 
This tool processes superimposed ligand poses and filters them using geometric criteria such as distances, angles, and optionally, esterase-specific filters or nucleophilic proximity.

---

## Features

- Analysis of enzyme-ligand docking using multiple docking tools (ML- and physics-based).
- Optional catalytic nucleophile-focused analysis for esterases or other enzymes with nucleophilic catalytic residues. 
- User-friendly pipeline only using a .pkl file as input and ligand smile strings.

---

## Installation

## Environment Setup
### Using conda
```bash
conda env create -f environment.yml
conda activate filterpipeline
```

### Clone the repository
```bash
git clone https://github.com/HelenSchmid/EnzymeStructuralFiltering.git
cd EnzymeStructuralFiltering
pip install .
```

### Coming soon: Install via pip
```bash
pip install enzyme-filtering-pipline
```

## Usage Example

The input pandas **DataFrame** must include:  
- `Entry` – unique identifier for each enzyme  
- `Sequence` – amino acid sequence of the enzyme
- `substrate_name` – name of the substrate
- `substrate_smiles` – SMILES string of substrate e.g. MEHP "CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)O"
- `substrate_moiety` – SMARTS pattern to define chemical moiety of interest within substrate e.g. general ester SMARTS "[C](=O)(O)(O)"

If cofactors are included, add:
- `cofactor_name` – name of the cofactor
- `cofactor_smiles` – SMILES string of cofactor e.g. PLP "CC1=NC=C(C(=C1O)C=O)COP(=O)(O)O" 
- `cofactor_moiety` – SMARTS pattern to define chemical moiety of interest within the cofactor 


```python
from filterzyme.pipeline import Pipeline
import pandas as pd

df = pd.read_pickle("example_df.pkl")

pipeline = Pipeline(
        df = df,
        ligand_name="TPP",
        max_matches=1000,                # number of matches during substructure SEARCH
        esterase=0,                      # 1 if interested specifically in esterases
        find_closest_nuc=1,
        num_threads=1,
        skip_catalytic_residue_prediction = False,
        alternative_structure_for_vina = 'Chai', 
        squidly_dir='/nvme2/helen/EnzymeStructuralFiltering/filterzyme/squidly_final_models/',
        base_output_dir="pipeline_output"
    )

pipeline.run()
```
