Metadata-Version: 2.4
Name: gandalf-csr
Version: 0.1.0
Summary: Fast path finding in large knowledge graphs
Home-page: https://github.com/ranking-agent/gandalf
Author: Max Wang
Author-email: Max Wang <max@covar.com>
License: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: bmt>=1.4.6
Requires-Dist: numpy>=1.20.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=3.0; extra == "dev"
Requires-Dist: black>=22.0; extra == "dev"
Provides-Extra: server
Requires-Dist: fastapi>=0.100.0; extra == "server"
Requires-Dist: httpx>=0.24.0; extra == "server"
Requires-Dist: uvicorn>=0.20.0; extra == "server"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# GANDALF

Graph Analysis Navigator for Discovery And Link Finding

## Features
- **Compressed Sparse Row (CSR)** graph representation for memory efficiency
- **Bidirectional search** for optimal performance
- **O(1) property lookups** via hash indexing
- **Predicate filtering** to reduce path explosion
- **Batch property enrichment** for fast results
- **Diagnostic tools** to understand path counts

## Installation

**Recommended: Use a virtual environment**

Some transitive dependencies (e.g., `stringcase`, `pytest-logging`) require modern pip/setuptools to build correctly. Using a virtual environment ensures you have updated tools.

```bash
# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Upgrade pip and setuptools (important for building dependencies)
pip install --upgrade pip setuptools wheel

# Install the package
pip install -e .
```

**Alternative: Direct install (may fail on some systems)**

If you have a recent pip/setuptools already, you can try:
```bash
pip install -e .
```

## Quick Start

### Unzipping a full translator kgx
- `tar -xvf translator_kg.tar.zst`
This will output a nodes.jsonl and edges.jsonl file

### Build a graph from JSONL
```python
from gandalf import build_graph_from_jsonl

# Build with ontology filtering
graph = build_graph_from_jsonl(
    edges_path="data/raw/edges.jsonl",
    nodes_path="data/raw/nodes.jsonl",
    excluded_predicates={'biolink:subclass_of'}
)

# Save for fast loading
graph.save("data/processed/graph_filtered.pkl")
```

### Query paths
```python
from gandalf import CSRGraph, find_paths

# Load graph (takes ~1-2 seconds)
graph = CSRGraph.load("data/processed/graph.pkl")

# Find paths
paths = find_paths(
    graph,
    start_id="CHEBI:45783",
    end_id="MONDO:0004979"
)

print(f"Found {len(paths)} paths")
```

### Filter by predicates
```python
from gandalf import find_paths_filtered

# Only mechanistic relationships
paths = find_paths_filtered(
    graph,
    start_id="CHEBI:45783",
    end_id="MONDO:0004979",
    allowed_predicates={
        'biolink:treats',
        'biolink:affects',
        'biolink:has_metabolite'
    }
)
```

## Architecture

The package uses a three-stage pipeline:

1. **Topology Search** (fast) - Find all paths using indices only
2. **Filtering** (medium) - Apply business logic on necessary node or edge properties
3. **Enrichment** (batch) - Load all properties for final paths only

This separation allows filtering millions of paths before expensive property lookups.
