Metadata-Version: 2.3
Name: chemged
Version: 0.1.1
Summary: Package calculating the approximate graph edit distance between chemicals
License: MIT
Author: James Wellnitz
Author-email: wellnitz.james@gmail.com
Maintainer: James Wellnitz
Maintainer-email: wellnitz.james@gmail.com
Requires-Python: >=3.11,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: networkx (>=3.5,<4.0)
Requires-Dist: numpy (>=2.0.0,<3.0.0)
Requires-Dist: rdkit (>=2024.9.5,<2025.0.0)
Requires-Dist: scipy (>=1.15.3,<2.0.0)
Requires-Dist: tqdm (>=4.66.4,<5.0.0)
Project-URL: Repository, https://github.com/molecularmodelinglab/ChemGED
Description-Content-Type: text/markdown

# ChemGED

ChemGED is a Python package for enabling the appoximate graph edit distance (GED) computation between
chemicals. Normally, GED is a NP-hard problem, but ChemGED uses heuristics to approximate the GED in
a reasonable time.

## Installation
You can install ChemGED using pip:

```bash
pip install chemged
```

## Usage
To use ChemGED, you just create an ``ApproximateChemicalGED`` object, and then call the
``compute_ged`` method with two chemical structures. They can be SMILES *or*
[RDKit](https://www.rdkit.org/docs/index.html) Mol objects.

```python
from chemged import ApproximateChemicalGED
ged_calc = ApproximateChemicalGED()

chemical1 = "CCO"  # SMILES of the first chemical
chemical2 = "CCN"  # SMILES of the second chemical

# you can use SMILES strings directly
approx_ged = ged_calc.compute_ged(chemical1, chemical2)

# you can also use RDKit Mol objects
from rdkit.Chem import MolFromSmiles
approx_ged = ged_calc.compute_ged(MolFromSmiles(chemical1), MolFromSmiles(chemical2))
```
ChemGED also implements ``pdist`` and ``cdist`` functions to compute pairwise distances between
sets of chemicals. These will return as Numpy arrays.

> [!NOTE]
> ``pdist`` will return the vector-form distance vector, while ``cdist`` will return a
> square-form distance matrix. ``scipy.spatial.distance.squareform`` can be used to convert
> the vector-form distance vector to a square-form distance matrix.

```python
from chemged import pdist, cdist

# Create a list of chemicals
chemicals = ["CCO", "CCN", "CC", "C"]

# Compute pairwise distances
distances_vector = pdist(chemicals)
print(distances_vector)  # Vector form

# Compute all-vs-all distances
distances_matrix = cdist(chemicals, chemicals)
print(distances_matrix)  # Square form
```

## Documentation
You can read more detailed documentation in the docs folder.

## Implementation
The approach used here uses bipartite graph matching[[1]](#1), and most of its implementation in python
is based off scripts from https://github.com/priba/aproximated_ged/tree/master.
ChemGED uses [RDKit](https://www.rdkit.org/docs/index.html) to handle chemicals inside python.

## References
<a id="1">[1]</a>
Riesen, Kaspar, and Horst Bunke.
"Approximate graph edit distance computation by means of bipartite graph matching."
Image and Vision computing 27.7 (2009): 950-959

