Metadata-Version: 2.4
Name: rag-ai-scientist
Version: 0.1.3
Summary: Installable RAG + MCP skills framework with a reliability-loop workflow.
Author: Cursor AI Infrastructure contributors
License: AGPL-3.0-or-later
Project-URL: Homepage, https://github.com/uzzielperez/rag-ai-scientist
Project-URL: Documentation, https://github.com/uzzielperez/rag-ai-scientist/blob/dev/docs/GETTING_STARTED.md
Project-URL: Repository, https://github.com/uzzielperez/rag-ai-scientist
Project-URL: Issues, https://github.com/uzzielperez/rag-ai-scientist/issues
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: LICENSE-COMMERCIAL.md
Requires-Dist: mcp>=1.0.0
Requires-Dist: PyYAML>=6.0
Requires-Dist: numpy>=1.25
Requires-Dist: scikit-learn>=1.3
Requires-Dist: langchain-core>=0.3
Requires-Dist: langchain-community>=0.3
Requires-Dist: langchain-chroma>=0.2
Requires-Dist: langchain-huggingface>=0.1
Requires-Dist: langchain-groq>=0.1
Requires-Dist: langchain-openai>=0.1
Requires-Dist: langchain-text-splitters>=0.3
Requires-Dist: chromadb>=0.5
Requires-Dist: sentence-transformers>=3.0
Requires-Dist: pymupdf>=1.23
Requires-Dist: pymupdf4llm>=0.0.5
Requires-Dist: pdfplumber>=0.10
Requires-Dist: pylatexenc>=2.10
Requires-Dist: ftfy>=6.1
Requires-Dist: regex>=2023.0.0
Requires-Dist: tiktoken>=0.5
Requires-Dist: unidecode>=1.3
Requires-Dist: python-dotenv>=1.0
Requires-Dist: pysqlite3-binary>=0.5; sys_platform == "linux"
Requires-Dist: umap-learn>=0.5
Requires-Dist: matplotlib>=3.7
Dynamic: license-file

# rag-ai-scientist

Installable toolkit for local RAG indexing + MCP serving in scientific workflows.

[![PyPI](https://img.shields.io/badge/package-installable-blue)](#installation)
[![Python](https://img.shields.io/badge/python-3.10%2B-informational)](https://www.python.org/)
[![License](https://img.shields.io/badge/license-AGPL--3.0--or--later-green)](https://github.com/uzzielperez/rag-ai-scientist/blob/dev/LICENSE)

`rag-ai-scientist` gives you:

- a CLI to initialize and build a **local vector database** from **your** papers and notes,
- an MCP server entrypoint for Cursor / agent integrations,
- **packaged skills** under `rag_ai_scientist/skills/` (workflow checklists—no Git clone needed).

On **PyPI**, only this README is shown; detailed guides live on **GitHub** (absolute links below).

---

## End-user workflow (pip only — **no clone required**)

You install from PyPI, create **any folder** for your project, put your research materials there, index once, then connect Cursor.

**Full step-by-step:** **[Getting started](https://github.com/uzzielperez/rag-ai-scientist/blob/dev/docs/GETTING_STARTED.md)** — install → `references/` → `init-references` → `setup-rag` → MCP → update notes and rebuild.

Minimal command sequence (after `pip install rag-ai-scientist`):

```bash
mkdir -p ~/my-ai-scientist/references
cd ~/my-ai-scientist
# Add your own .md / .pdf files under references/

rag-ai-scientist init-references --project-root . --references-dir ./references
rag-ai-scientist setup-rag --project-root . --force
rag-ai-scientist mcp --project-root .    # usually configured once inside Cursor — see Getting started link above
```

- **`query_analysis_knowledge`** answers from **your indexed files**.
- **`get_skill`** loads packaged skills (e.g. **`cms-higgs-opendata`**) **without** indexing anything extra.

You update your AI scientist by editing files under **`references/`** (and **`configs/references.yaml`** if paths change), then **`setup-rag --force`** again.

---

## Installation

### From PyPI (recommended)

```bash
python -m pip install rag-ai-scientist
```

Pinned example:

```bash
python -m pip install rag-ai-scientist==0.1.3
```

PyPI project page: [rag-ai-scientist](https://pypi.org/project/rag-ai-scientist/)

### Verify

```bash
rag-ai-scientist --help
python -c "import rag_ai_scientist; print(rag_ai_scientist.__version__)"
```

### From source (maintainers / contributors only)

```bash
git clone https://github.com/uzzielperez/rag-ai-scientist.git
cd rag-ai-scientist
git checkout dev   # or your working branch
python3 -m venv .venv && source .venv/bin/activate
python -m pip install -e .
```

Isolation tip: use a dedicated venv (e.g. `~/venvs/rag-ai-scientist`) instead of mixing with heavy analysis stacks.

---

## CLI commands

| Command | Purpose |
|---------|---------|
| **`init-references`** | Writes **`configs/references.yaml`** pointing at your references directory. |
| **`setup-rag`** | Indexes sources into **`.cursor/rag_db`**. |
| **`mcp`** | Starts the stdio MCP server — point **`--project-root`** at the same folder you indexed. |

Common flags: **`--project-root`**, **`--force`** (rebuild index), **`--references-dir`** (with `init-references`).

---

## Cursor MCP configuration

Register the server so Cursor runs it with **your** project path:

```json
{
  "mcpServers": {
    "rag-ai-scientist": {
      "command": "rag-ai-scientist",
      "args": ["mcp", "--project-root", "/absolute/path/to/my-ai-scientist"]
    }
  }
}
```

See **[Getting started](https://github.com/uzzielperez/rag-ai-scientist/blob/dev/docs/GETTING_STARTED.md)** for optional **`.cursor/.env`** (LLM keys).

---

## Packaged skills and examples

- Skills ship **inside the installed package**. Access via MCP **`get_skill`** (e.g. **`cms-higgs-opendata`**). No clone required.
- **[Examples / MCP access](https://github.com/uzzielperez/rag-ai-scientist/blob/dev/docs/examples/README.md)** explains **`get_skill`**, Cursor wiring, and optional curated markdown **for maintainers** who ship a full docs tree. End users normally only need their own files under **`references/`**.

---

## Running agents beside a separate lab environment

If training runs use a different conda/venv than `rag-ai-scientist`:

1. Install **`rag-ai-scientist`** in its own small venv.
2. Keep **`--project-root`** pointed at your research folder.
3. Run heavy jobs via explicit wrappers (`conda run`, scripts) from the agent — see **[Runbook](https://github.com/uzzielperez/rag-ai-scientist/blob/dev/docs/RUNBOOK.md)** for patterns.

---

## Repository layout (when developing from source)

```text
rag_ai_scientist/
  cli.py                  # CLI entrypoint
  mcp_server.py           # MCP server
  skills/                 # Packaged skills (ship in wheel)
rag/
  index_documents.py      # Indexer used by setup-rag
configs/
  references.example.yaml # Example only — users run init-references instead
docs/
  GETTING_STARTED.md      # Primary user guide (pip-only path)
  examples/               # Maintainer docs / optional narratives
```

Browse on GitHub: [docs/](https://github.com/uzzielperez/rag-ai-scientist/tree/dev/docs).

---

## Development & PyPI releases

Contributor workflow and release steps: **[DEV_README.md](https://github.com/uzzielperez/rag-ai-scientist/blob/dev/DEV_README.md)**.

---

## License

- Open-source: AGPL-3.0-or-later ([`LICENSE`](https://github.com/uzzielperez/rag-ai-scientist/blob/dev/LICENSE))
- Commercial: see [`LICENSE-COMMERCIAL.md`](https://github.com/uzzielperez/rag-ai-scientist/blob/dev/LICENSE-COMMERCIAL.md)

---

## Security notes

- Never commit secrets (`.env`, API keys).
- Treat **`.cursor/rag_db`** as sensitive if your indexed PDFs are sensitive.
