Metadata-Version: 2.4
Name: chemrecon
Version: 0.1.2
Summary: The ChemRecon library for integration and exploration of interconnected biochemical databases.
Keywords: bioinformatics
Author: Casper Asbjørn Eriksen
Author-email: Casper Asbjørn Eriksen <casbjorn@imada.sdu.dk>
License-Expression: GPL-3.0-only
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Chemistry
Requires-Dist: psycopg[binary]~=3.3.2
Requires-Dist: rustworkx~=0.17.1
Requires-Dist: networkx~=3.6.1
Requires-Dist: matplotlib~=3.10
Requires-Dist: rdkit
Requires-Dist: sphinx==8.3.0 ; extra == 'docs'
Requires-Dist: myst-parser ; extra == 'docs'
Requires-Dist: sphinx-autobuild ; extra == 'docs'
Requires-Dist: enum-tools[sphinx]==0.12.0 ; extra == 'docs'
Requires-Dist: sphinx-toolbox ; extra == 'docs'
Requires-Dist: nbsphinx ; extra == 'docs'
Requires-Dist: ipykernel>=7.1.0 ; extra == 'docs'
Requires-Dist: furo ; extra == 'docs'
Requires-Dist: sphinxext-opengraph ; extra == 'docs'
Maintainer: Casper Asbjørn Eriksen
Maintainer-email: Casper Asbjørn Eriksen <casbjorn@imada.sdu.dk>
Requires-Python: >=3.12
Provides-Extra: docs
Description-Content-Type: text/markdown

# ChemRecon
*v. 0.1.2*

ChemRecon is a Python library and consolidated meta-database designed to simplify the integration and exploration of 
biochemical data from a range of sources. 
It is built from full-database downloads of compounds, reactions, enzymes, molecular structures, and atom-to-atom maps 
from the following source databases: BiGG, BRENDA, ChEBI, ECMDB, M-CSA, MetaMDB, and PubChem.

Heterogenous data formats were standardized, and relationships within and between these databases were reconstructed in 
a consistent format. 
The resulting meta-database is freely accessible online and is complemented by a Python library which allows for easy 
integration into existing workflows.
This enables unified querying of entries from all the source databases, and discovery and visualization of 
relationships between these entries.

![entrygraph](docs/source/resources/eg.svg)

ChemRecon was developed at the 
    [Algorithmic Cheminformatics Group](https://cheminf.imada.sdu.dk/),
    [Department of Mathematics and Computer Science](https://cheminf.imada.sdu.dk/),
    [University of Southern Denmark](https://sdu.dk).

## Paper
If ChemRecon proves useful to your research, you may want to cite the following paper.
 * **Title**
    
    C. A. Eriksen, J. L. Andersen, R. Fagerberg, D. Merkle

    Arxiv preprint, submitted to Bioinformatics.

    TODO more

## Availability and Installation
ChemRecon is available via your Python package manager from the Python Package Index (PyPI): 
[chemrecon](https://pypi.org/project/chemrecon/)
It can be installed using pip:

`pip install chemrecon`

Visualizing entry graphs requires [GraphViz](https://www.graphviz.org/) to be installed, and for the `dot` executable,
which renders the graphs, to be available on your system's `PATH`.
See the [GraphViz Python package](https://pypi.org/project/graphviz/) for instructions.

***

## Documentation
The documentation, including instructions on usage, tutorials, and complete description covering the types of entries
and relations supported, is available on the [ChemRecon homepage](https://chemrecon.org).

## Usage
The following is an example of a typical ChemRecon workflow, producing the graph seen above.
For more detailed examples, see the tutorial section of the documentation.

```python
from chemrecon import *

connect_public()

# Perform a database query to find the 'citrate' entry in BiGG.
citrate_entry = find_entry(id_type = C_BIGG, source_id = 'M_cit')

# Define a protocol to find related entries and molecular structures (protocols like this are included)
compound_structure_protocol = ExplorationProtocol(
    relation_types = {CompoundReference, CompoundHasMolStructure, MolStructureStandardization}
)

# Create and expand an entry graph, according to this protocol, by traversing the database.
eg = EntryGraph(initial_entries = {citrate_entry})
explore(eg, compound_structure_protocol, steps = 5)

# Score the molecular structures in the graph according to their 'connectedness'
scorer = Scorer(score_entry_type = MolStructure)
scores = scorer(citrate_entry)  # Result is an OrderedDict

# Draw the graph with these scores, producing the image seen on this page
eg.show(scores = scores)
```

***

## Database
ChemRecon needs to be connected to a database to function.
The easiest is to connect to the public database, hosted by [SDU](https://sdu.dk):
```
connect_public()
```
Alternatively, a local instance of the database can be hosted via Docker.
Instructions are given in the [documentation](https://chemrecon.org).
This has the advantage of lower latency, making queries and entry graph construction faster, and allows adding
custom data sources.

## Source Databases
ChemRecon contains compound, molecular structure, reaction, atom-to-atom map, and enzyme entries from the following
databases.

| Source   	 | Compound 	   | Structure 	   | Reaction 	 | AAM   	 | Enzyme 	 | Version |
|------------|--------------|---------------|------------|---------|----------|---------|
| BiGG     	 | 20428    	   | -         	   | 33942    	 | -     	 | 5705   	 | 1.6     |
| BRENDA   	 | -        	   | -         	   | 61129    	 | -     	 | 8697  	  | 2025_1  |
| ChEBI    	 | 224485   	   | 330207     	  | -        	 | -     	 | -      	 | 2024-05 |
| ECMDB    	 | 3760     	   | 7517     	    | -        	 | -     	 | -      	 | 2.0     |
| M-CSA    	 | -        	   | -         	   | 1003    	  | 342 	   | 1003  	  | 2024-11 |
| MetaMDB  	 | 80815    	   | 4392     	    | 74520    	 | 1003 	  | -  	     | 2025-02 |
| MetaNetX 	 | 2601834  	   | 2297518     	 | 143880   	 | -     	 | 48175  	 | 4.4     |
| PubChem  	 | 9031498    	 | 5000000     	 | -    	     | -     	 | -      	 | 2024-09 |

In addition to the source databases, ChemRecon can make use of a greater number of *auxiliary* databases, including 
MetaCyc and KEGG.  Data from these sources is are not directly included due to being proprietary or difficult to access. 
However, the source databases contain references to the auxiliary databases, so entries are created which contain only 
the identifier and no additional information. This allows users to use ChemRecon workflows based on identifiers from a 
great number of databases, not just the source databases.
