Metadata-Version: 2.4
Name: reasongraph
Version: 0.3.0
Summary: A graph-based reasoning library with embedding search and multi-hop traversal
Project-URL: Homepage, https://github.com/bgokden/reasongraph
Project-URL: Repository, https://github.com/bgokden/reasongraph
Project-URL: Issues, https://github.com/bgokden/reasongraph/issues
Author: Berk
License-Expression: MIT
License-File: LICENSE
Keywords: agent,causal-reasoning,embeddings,knowledge-graph,multi-hop,ner,nlp,rag,reasoning,semantic-search
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: sentence-transformers>=2.2.0
Provides-Extra: all
Requires-Dist: aiosqlite>=0.19.0; extra == 'all'
Requires-Dist: gliner2>=1.2.0; extra == 'all'
Requires-Dist: pgvector>=0.2.0; extra == 'all'
Requires-Dist: psycopg-pool>=3.1.0; extra == 'all'
Requires-Dist: psycopg[binary]>=3.1.0; extra == 'all'
Requires-Dist: requests>=2.28.0; extra == 'all'
Requires-Dist: sqlite-vec>=0.1.6; extra == 'all'
Requires-Dist: urllib3>=2.0.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: aiosqlite>=0.19.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Requires-Dist: sqlite-vec>=0.1.6; extra == 'dev'
Provides-Extra: gliner2
Requires-Dist: gliner2>=1.2.0; extra == 'gliner2'
Requires-Dist: requests>=2.28.0; extra == 'gliner2'
Requires-Dist: urllib3>=2.0.0; extra == 'gliner2'
Provides-Extra: postgres
Requires-Dist: pgvector>=0.2.0; extra == 'postgres'
Requires-Dist: psycopg-pool>=3.1.0; extra == 'postgres'
Requires-Dist: psycopg[binary]>=3.1.0; extra == 'postgres'
Provides-Extra: sqlite
Requires-Dist: aiosqlite>=0.19.0; extra == 'sqlite'
Requires-Dist: sqlite-vec>=0.1.6; extra == 'sqlite'
Description-Content-Type: text/markdown

# ReasonGraph

A graph-based reasoning library with embedding search, multi-hop traversal, and automatic entity/causal extraction.

[![PyPI version](https://img.shields.io/pypi/v/reasongraph?color=blue)](https://pypi.org/project/reasongraph/)
[![Python 3.11+](https://img.shields.io/pypi/pyversions/reasongraph?color=blue)](https://pypi.org/project/reasongraph/)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)

## Installation

```bash
pip install reasongraph[all]        # everything included
```

Or install only what you need:

```bash
pip install reasongraph             # core: in-memory backend, NER extraction, embeddings
pip install reasongraph[sqlite]     # + SQLite backend with sqlite-vec
pip install reasongraph[gliner2]    # + GLiNER2 entity + causal extraction (recommended)
pip install reasongraph[postgres]   # + PostgreSQL + pgvector backend
```

## Quick Start

### Using a built-in dataset

```python
from reasongraph import ReasonGraph

graph = ReasonGraph()
graph.initialize_sync()
graph.load_dataset_sync("financial")

results = graph.query_sync("What caused the 2008 financial crisis?")
for i, text in enumerate(results, 1):
    print(f"{i}. {text}")

graph.close_sync()
```

Output -- a connected reasoning chain, not just keyword matches:

```
1. Lehman Brothers filed for bankruptcy in September 2008 after massive MBS losses.
2. Loose lending standards fueled a housing price bubble across the United States.
3. Lehman's collapse triggered a global credit freeze as interbank lending stopped.
4. Mortgage-backed securities built on subprime loans collapsed when defaults surged.
5. The U.S. government enacted TARP, a $700 billion bailout to stabilize the financial system.
6. Banks issued subprime mortgages to borrowers with poor credit histories.
```

### Parsing free-form text

For free-form text, install with `pip install reasongraph[gliner2]` to get automatic entity and causal relation extraction.

```python
from reasongraph import ReasonGraph

graph = ReasonGraph()
graph.initialize_sync()

graph.add_text_sync("Lehman Brothers filed for bankruptcy in September 2008 after massive MBS losses.")
graph.add_text_sync("The U.S. government enacted TARP, a $700 billion bailout to stabilize the financial system.")
graph.add_text_sync("The Federal Reserve cut interest rates to near zero after the 2008 crisis.")

results = graph.query_sync("What happened after Lehman collapsed?")
for i, text in enumerate(results, 1):
    print(f"{i}. {text}")

graph.close_sync()
```

```
1. Lehman Brothers filed for bankruptcy in September 2008 after massive MBS losses.
2. The Federal Reserve cut interest rates to near zero after the 2008 crisis.
3. The U.S. government enacted TARP, a $700 billion bailout to stabilize the financial system.
```

### Async API

```python
import asyncio
from reasongraph import ReasonGraph

async def main():
    async with ReasonGraph() as graph:
        await graph.load_dataset("financial")
        results = await graph.query("What caused the 2008 crisis?")
        for text in results:
            print(text)

asyncio.run(main())
```

## Features

- **Automatic extraction** -- GLiNER2 extracts entities and causal relations in one pass (falls back to BERT NER when gliner2 is not installed)
- **Hybrid search** -- combine embedding similarity, keyword (trigram) matching, or both
- **Multi-hop traversal** -- follow graph edges to discover connected reasoning chains
- **Cross-encoder reranking** -- rerank results at each hop with `ms-marco-MiniLM-L-6-v2`
- **Built-in datasets** -- load curated reasoning graphs for immediate use
- **Async-first** -- native async API with sync convenience wrappers
- **Pluggable backends** -- in-memory (zero-config default), SQLite, or PostgreSQL with pgvector

## Built-in Datasets

| Dataset | Description |
|---------|-------------|
| `syllogisms` | Classical syllogistic reasoning chains |
| `causal` | Cause-effect reasoning with entity annotations |
| `taxonomy` | Hierarchical concept taxonomy |
| `financial` | Financial crisis causal chains (2008 crisis, dot-com, inflation, eurozone) |
| `medical` | Medical causal chains (heart disease, diabetes, infectious disease, cancer) |
| `analysis_patterns` | Data analysis reasoning: scenario detection, technique selection, implementation patterns |

```python
graph.load_dataset_sync("financial")
```

## Search Modes

```python
# Pure embedding similarity (default)
results = graph.query_sync("credit freeze", search_mode="embedding")

# Pure keyword/trigram matching
results = graph.query_sync("credit freeze", search_mode="keyword")

# Hybrid: Reciprocal Rank Fusion of embedding + trigram rankings
results = graph.query_sync("credit freeze", search_mode="hybrid")

# Tune the RRF smoothing constant (default 60, lower = more weight to top ranks)
results = graph.query_sync("credit freeze", search_mode="hybrid", rrf_k=30)
```

## Entity and Causal Extraction

When `gliner2` is installed, `add_text()` / `add_texts()` automatically use GLiNER2 for both entity extraction and causal relation detection. Without `gliner2`, it falls back to BERT NER (entities only).

```python
from reasongraph import ReasonGraph, NERExtractor, GLiNER2Extractor

graph = ReasonGraph()
graph.initialize_sync()

# Default: GLiNER2 (entities + causal relations) if installed, else BERT NER
entities = graph.add_text_sync("Apple released the iPhone in 2007.")
print(entities)  # ['Apple', 'iPhone']

# Explicit: force BERT NER even if GLiNER2 is installed
entities = graph.add_text_sync("Apple released the iPhone in 2007.", extractor=NERExtractor())

# Explicit: GLiNER2 with custom entity types
gliner = GLiNER2Extractor(entity_types=["company", "product", "date"])
entities = graph.add_text_sync("Apple released the iPhone in 2007.", extractor=gliner)

# Any callable works
entities = graph.add_text_sync("some text", extractor=lambda t: ["custom"])
```

## Backends

By default, `ReasonGraph()` uses a pure Python in-memory backend (`MemoryBackend`). This works everywhere with zero dependencies beyond numpy. For persistence, pass a file path to save/load as JSON:

```python
from reasongraph import ReasonGraph, MemoryBackend

# In-memory only (default)
graph = ReasonGraph()

# In-memory with JSON file persistence (loads on init, saves on close)
graph = ReasonGraph(backend=MemoryBackend(file_path="graph.json"))
```

### SQLite Backend

For larger graphs or concurrent access, use the SQLite backend with `sqlite-vec` for vector search. Requires `pip install reasongraph[sqlite]`.

```python
from reasongraph import ReasonGraph
from reasongraph.backends import SqliteBackend

graph = ReasonGraph(backend=SqliteBackend(db_path="graph.db"))
```

### PostgreSQL Backend

```python
from reasongraph import ReasonGraph
from reasongraph.backends import PostgresBackend

graph = ReasonGraph(backend=PostgresBackend(database_url="postgresql://user:pass@localhost/db"))
```

Requires `pip install reasongraph[postgres]` and the `pgvector` + `pg_trgm` extensions enabled on your database.

## Evaluation: Mixed-Domain Reasoning

We evaluate reasoning quality by loading all 6 built-in datasets into a single graph (~130 text nodes, ~104 entity nodes, ~280 edges) and testing whether the library can trace the correct causal chains, syllogistic proofs, taxonomic hierarchies, and data analysis patterns -- without being distracted by unrelated facts from other domains.

32 test cases simulate agent-style queries like *"I need to understand what caused the 2008 financial crisis"*, *"How does insulin resistance lead to kidney failure?"*, or *"I have two numeric columns, check if related"* and check whether the returned reasoning chain matches the expected ground truth.

**Per-domain results (hybrid search, `top_k=5`, `hops=4`, `rerank_top_k=4`):**

| Domain | Cases | Chain Completeness | Recall@5 | Precision@5 | Domain Accuracy |
|--------|------:|--------------------|----------|-------------|-----------------|
| Causal | 5 | 100% | 100% | 92% | 100% |
| Financial | 6 | 100% | 82% | 60% | 100% |
| Medical | 5 | 100% | 92% | 76% | 92% |
| Syllogisms | 5 | 100% | 100% | 92% | 85% |
| Taxonomy | 3 | 100% | 83% | 53% | 92% |
| Analysis Patterns | 8 | 96% | 75% | 45% | 96% |
| **Overall** | **32** | **99%** | **88%** | **68%** | **95%** |

32/32 cases pass (>= 50% chain completeness). Split reranking gives chain continuations (text-to-text edges) priority over bridge discoveries (entity-to-text edges), keeping traversal focused.

**Search mode comparison:**

| Mode | Chain Completeness | Recall@5 | Precision@5 | Domain Accuracy |
|------|-------------------|----------|-------------|-----------------|
| Embedding | 99% | 88% | 68% | 95% |
| Keyword | 0% | 0% | 0% | 0% |
| Hybrid | 99% | 88% | 68% | 95% |

Keyword-only mode scores 0% because the eval queries are natural language questions that don't substring-match the dataset's declarative statements. This is expected -- keyword search is designed for known-term lookups, not question answering.

Reproduce: `uv run python tests/eval_financial_reasoning.py`

## API Reference

### `ReasonGraph(backend=None, embed_model=None, rerank_model=None, forget_after=30)`

| Method | Description |
|--------|-------------|
| `add_nodes(nodes)` | Add `(content, type)` tuples to the graph |
| `add_edges(edges)` | Add `(from, to)` content edges |
| `add_text(text, extractor=None)` | Add text with automatic entity extraction |
| `add_texts(texts, extractor=None, causal_extractor=None)` | Batch add with entity + causal extraction (auto-enabled with GLiNER2) |
| `query(query, top_k=5, hops=4, rerank_top_k=4, search_mode="embedding", rrf_k=60)` | Search and traverse the graph |
| `load_dataset(name)` | Load a built-in dataset |
| `delete_stale()` | Remove nodes not accessed within `forget_after` days |
| `get_all_nodes()` / `get_all_edges()` | Inspect graph contents |

All methods are async. Sync variants are available with a `_sync` suffix (e.g. `query_sync`).

## License

MIT
