Metadata-Version: 2.4
Name: oefp
Version: 0.2.4
Summary: Improved fingerprints for the OpenEye Toolkits
Keywords: chemistry,openeye,swig
Author-Email: Scott Arne Johnson <scott.johnson@bms.com>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Chemistry
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: C++
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24
Requires-Dist: openeye-toolkits>=2025.2
Provides-Extra: dev
Requires-Dist: build>=1.2; extra == "dev"
Requires-Dist: mypy>=1.8; extra == "dev"
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: rdkit>=2024.9; extra == "dev"
Requires-Dist: ruff>=0.6; extra == "dev"
Requires-Dist: vrzn>=0.1.0; extra == "dev"
Description-Content-Type: text/markdown

# OEFP

High-performance molecular fingerprints for the [OpenEye Toolkits](https://www.eyesopen.com/).

OEFP generates RDKit-compatible Morgan and Atom Pair fingerprints from OpenEye
molecules, stores them in compact C++ containers, and compares them with fast
scalar and batch kernels. Python bindings are built with SWIG, so
`openeye.oechem` molecules pass directly into C++ without serialization.

OEFP currently supports dense binary, sparse binary, and sparse counted
fingerprint containers; scalar comparison; query-to-batch comparison; `cdist`;
and SciPy-compatible condensed `pdist`.

Try it out:

```bash
pip install oefp
```

## Usage

Here are a few examples of using `oefp`.

### Python

```python
from openeye import oechem
import oefp

mol = oechem.OEGraphMol()
oechem.OESmilesToMol(mol, "CC(=O)OC1=CC=CC=C1C(=O)O")  # aspirin

# Generate an RDKit-compatible Morgan fingerprint.
fp = oefp.morgan_fingerprint(mol, radius=2, num_bits=2048)
print(fp.popcount)
print(fp.words[:4])

# Compare fingerprints.
score = oefp.compare(fp, fp, oefp.Metric.tanimoto())
print(score)
```

Use reusable generators when applying the same options to many molecules:

```python
from openeye import oechem
import oefp

smiles = ["c1ccccc1", "c1ccc(O)cc1", "CC(=O)O"]
mols = []
for smi in smiles:
    mol = oechem.OEGraphMol()
    oechem.OESmilesToMol(mol, smi)
    mols.append(mol)

generator = oefp.MorganGenerator(radius=2, num_bits=2048)
fps = [generator.fingerprint(mol) for mol in mols]

batch = oefp.OEFPBatch.from_fingerprints(fps)
distances = oefp.pdist(batch, oefp.Metric.jaccard())
```

Generate sparse and counted fingerprints:

```python
folded_count = oefp.morgan_count_fingerprint(mol)
sparse_binary = oefp.morgan_sparse_fingerprint(mol)
atom_pair_count = oefp.atom_pair_sparse_count_fingerprint(mol)

print(folded_count.indices[:5])
print(folded_count.counts[:5])
print(sparse_binary.indices[:5])
print(atom_pair_count.total_count)
```

Inspect Morgan bit provenance:

```python
result = oefp.morgan_fingerprint_with_mapping(mol)
print(result.fingerprint.popcount)
print(result.mapping.bit_info())
```

Import and export OpenEye fingerprints:

```python
from openeye import oechem, oegraphsim
import oefp

mol = oechem.OEGraphMol()
oechem.OESmilesToMol(mol, "CCO")

oe_fp = oegraphsim.OEFingerPrint()
oegraphsim.OEMakeCircularFP(oe_fp, mol)

fp = oefp.from_openeye_fingerprint(oe_fp)
round_tripped = oefp.to_openeye_fingerprint(fp)
print(oegraphsim.OETanimoto(oe_fp, round_tripped))
```

### C++

```cpp
#include <oefp/oefp.h>
#include <oechem.h>
#include <iostream>

int main() {
    OEChem::OEGraphMol mol_a;
    OEChem::OEGraphMol mol_b;
    OEChem::OESmilesToMol(mol_a, "c1ccccc1");
    OEChem::OESmilesToMol(mol_b, "c1ccc(O)cc1");

    OEFP::MorganGenerator generator;
    OEFP::OEFP fp_a = generator.Fingerprint(mol_a);
    OEFP::OEFP fp_b = generator.Fingerprint(mol_b);

    double score = OEFP::Compare(fp_a, fp_b, OEFP::Metric::Tanimoto());
    std::cout << score << "\n";

    return 0;
}
```

## Supported Fingerprints

| Family | Outputs | Notes |
|--------|---------|-------|
| Morgan | Folded binary, folded count, sparse binary, sparse count | Bit mapping is available for all Morgan outputs |
| Atom Pair | Folded binary, folded count, sparse binary, sparse count | Count simulation is enabled by default for binary output |
| OpenEye | `OEFingerPrint` import/export | Numeric type metadata is preserved when available |

Current conformance scope is explicit: Morgan chirality, Atom Pair chirality,
and Atom Pair 3D-distance generation raise `ValueError` until those paths have
dedicated RDKit parity coverage.

## Installation

Install OpenEye Toolkits first:

```bash
pip install --extra-index-url https://pypi.anaconda.org/openeye/simple openeye-toolkits
```

Install OEFP:

```bash
pip install oefp
```

## Build from Source

Set the OpenEye C++ SDK path:

```bash
export OPENEYE_ROOT=/path/to/openeye/sdk
```

Build the C++ library and Python bindings:

```bash
cmake --preset debug
cmake --build build-debug
```

Install the Python package in editable mode:

```bash
pip install --config-settings editable_mode=compat -e python/
```

The `editable_mode=compat` flag keeps the package on a traditional editable
path that works with compiled SWIG extension modules.

## Tests

C++ tests:

```bash
cmake --build build-debug --target oefp_tests
ctest --test-dir build-debug --output-on-failure
```

Python tests:

```bash
PYTHONPATH=python python -m pytest tests/python -q
```

RDKit is required for conformance tests but is not a runtime dependency.

## Documentation

Build the Sphinx documentation:

```bash
python -m pip install -r docs/requirements.txt
make -C docs html
```

Open the local build:

```bash
open docs/_build/html/index.html
```

The documentation includes installation, quickstart, Python API notes, C++ API
reference generation through Doxygen, and release build guidance.

## Benchmarks

Run the RDKit generation and dense `pdist` benchmark:

```bash
PYTHONPATH=python python benchmarks/benchmark_rdkit_generation.py \
  --max-mols 1500 \
  --trials 7 \
  --warmup 1 \
  --pdist-size 400 \
  --generation-max-ratio 1.10 \
  --atom-pair-generation-max-ratio 1.10
```

Run the optional C++ guardrail against a local `oecluster` checkout:

```bash
cmake -S . -B build-bench \
  -DOEFP_BUILD_BENCHMARKS=ON \
  -DOEFP_OECLUSTER_SOURCE_DIR=/path/to/oecluster
cmake --build build-bench --target oefp_oecluster_fingerprint_benchmark
./build-bench/benchmarks/oefp_oecluster_fingerprint_benchmark 512 0 256
```

## Tools

| Tool | Purpose |
|------|---------|
| [CMake](https://cmake.org/) | C++ build system |
| [SWIG](https://www.swig.org/) | Python bindings |
| [scikit-build-core](https://scikit-build-core.readthedocs.io/) | Python wheel build backend |
| [cmake-openeye](https://github.com/scott-arne/cmake-openeye) | OpenEye CMake discovery and SWIG helpers |
| [vrzn](https://github.com/scott-arne/vrzn) | Version synchronization |
| [pytest](https://docs.pytest.org/) | Python tests |
| [Sphinx](https://www.sphinx-doc.org/) | Documentation |

## License

MIT
