Metadata-Version: 2.4
Name: mathesis-ai
Version: 0.1.4
Summary: Interdisciplinary GraphRAG -- find cross-domain bridges in scientific literature
Project-URL: Homepage, https://github.com/nathanpi8/mathesis-ai
Project-URL: Repository, https://github.com/nathanpi8/mathesis-ai
Project-URL: Issues, https://github.com/nathanpi8/mathesis-ai/issues
Project-URL: PyPI, https://pypi.org/project/mathesis-ai/
License: MIT
License-File: LICENSE
Keywords: arxiv,embeddings,graphrag,rag,research,scientific
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Requires-Dist: aiosqlite==0.20.0
Requires-Dist: apscheduler==3.10.4
Requires-Dist: chromadb==0.5.0
Requires-Dist: fastapi==0.111.0
Requires-Dist: feedparser==6.0.11
Requires-Dist: groq>=0.9.0
Requires-Dist: httpx==0.27.0
Requires-Dist: numpy==1.26.4
Requires-Dist: pdfplumber==0.11.0
Requires-Dist: pydantic-settings==2.2.1
Requires-Dist: pydantic==2.7.1
Requires-Dist: pypdf==4.2.0
Requires-Dist: python-dotenv==1.0.1
Requires-Dist: rich==13.7.1
Requires-Dist: scipy==1.13.0
Requires-Dist: sentence-transformers==3.0.1
Requires-Dist: sqlalchemy==2.0.30
Requires-Dist: structlog==24.1.0
Requires-Dist: tenacity==8.3.0
Requires-Dist: uvicorn[standard]==0.29.0
Provides-Extra: cpu
Requires-Dist: torch>=2.3.0; (sys_platform == 'win32' or sys_platform == 'linux' or sys_platform == 'darwin') and extra == 'cpu'
Provides-Extra: dev
Requires-Dist: httpx==0.27.0; extra == 'dev'
Requires-Dist: mypy==1.10.0; extra == 'dev'
Requires-Dist: pytest-asyncio==0.23.6; extra == 'dev'
Requires-Dist: pytest-cov==5.0.0; extra == 'dev'
Requires-Dist: pytest==8.2.0; extra == 'dev'
Requires-Dist: ruff==0.4.4; extra == 'dev'
Provides-Extra: gpu
Requires-Dist: torch>=2.3.0; extra == 'gpu'
Description-Content-Type: text/markdown

# mathesis-ai

Find cross-domain bridges in scientific literature. Give it a research question; it returns papers from unrelated fields that solved the same underlying problem -- expressed in different vocabulary.

Built on SPECTER embeddings, recursive semantic graph expansion, PDF RAG, and Groq LLM summaries. No server required.

---

## Install

```bash
# Install CPU torch first (PyPI's default is a 4 GB CUDA build)
pip install torch --index-url https://download.pytorch.org/whl/cpu

pip install mathesis-ai
```

GPU:
```bash
pip install torch --index-url https://download.pytorch.org/whl/cu121
pip install mathesis-ai
```

Requires Python 3.11+. Needs ~4 GB RAM for the embedding model.

---

## Usage

```python
from mathesis import Mathesis

m = Mathesis(groq_api_key="gsk_...")  # free key at console.groq.com
m.seed()                               # fetch & embed 250 papers, ~10 min, run once

results = m.query("vanishing gradients in deep neural network optimization")

for bridge in results.bridges:
    print(bridge.domain_jump)   # "q-bio.NC -> cs.LG"
    print(bridge.confidence)    # 0.796
    print(bridge.llm_summary)   # one-paragraph analogy explanation
```

No `.env` file, no server to start. Data is stored in `~/.mathesis` by default.

---

## Configuration

```python
m = Mathesis(
    groq_api_key="gsk_...",          # required for LLM summaries
    data_dir="~/my-mathesis-data",   # default: ~/.mathesis
    domains=["cs.LG", "q-bio.NC"],   # default: 5 arXiv categories
    llm_enabled=True,                # default: True
    device="cpu",                    # "cpu" or "cuda"
)
```

---

## Query response

```python
results.bridges_found          # bool
results.bridges                # list of BridgeResult
results.top_nodes              # top 5 papers by graph rank
results.graph_stats            # {"nodes": 310, "edges": 1548, "domains": [...]}

bridge.domain_jump             # "q-bio.NC -> cs.LG"
bridge.confidence              # 0.0 - 1.0
bridge.similarity              # raw cosine similarity
bridge.rag_confirmed           # bool - methods text confirmed the bridge
bridge.rag_excerpt             # extracted methods section passage
bridge.llm_summary             # Groq-generated analogy explanation
bridge.source_paper.title
bridge.source_paper.id         # arXiv ID
bridge.target_paper.title
```

---

## Async support

```python
results = await m.aquery("attention mechanisms in transformers")
```

---

## Status

```python
print(m)
# Mathesis(papers_indexed=581, domains=5, llm=enabled)

m.status()
# {"papers_indexed": 581, "domains": [...], "data_dir": "~/.mathesis"}
```

---

## Getting more data

```python
m.seed(papers_per_domain=50)    # quick start -- 250 papers total
m.ingest()                      # full corpus -- 500 papers/domain, ~25 min
```

---

## Source & docs

[https://github.com/nathanpi8/mathesis-ai](https://github.com/nathanpi8/mathesis-ai)
