Metadata-Version: 2.4
Name: researchbot
Version: 0.1.2
Summary: Git-based AI research assistant with literature analysis tools
Project-URL: Homepage, https://github.com/juankost/researchbot
Project-URL: Repository, https://github.com/juankost/researchbot
Project-URL: Issues, https://github.com/juankost/researchbot/issues
Author: Jiawei Xu
License-Expression: MIT
License-File: LICENSE
Keywords: academic,claude,mcp,papers,research,semantic-scholar
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.12
Requires-Dist: click>=8.3.1
Requires-Dist: httpx>=0.28.1
Requires-Dist: mcp>=1.26.0
Requires-Dist: mistralai<2.0,>=1.11.1
Requires-Dist: openai>=2.16.0
Requires-Dist: python-dotenv>=1.2.1
Description-Content-Type: text/markdown

# Researchbot

**A Git-based AI research assistant that turns your local file system into an active research lab.**

You brain dump ideas, paper references, and notes into a workspace file. Then you tell the agent what to do — survey the landscape, check novelty, deep-read specific papers — and it orchestrates the work: searching APIs, analyzing papers in parallel, and writing structured output to your project files.

Built on Claude Code with custom MCP servers, using VS Code/Cursor as the interface and Markdown as the document format.

## Installation

### From PyPI

```bash
pip install researchbot
```

### From source

```bash
git clone https://github.com/juankost/researchbot.git
cd researchbot
uv sync
```

### Configuration

Researchbot looks for API keys in this order:

1. **Shell environment variables** (highest priority)
2. **`~/.config/researchbot/.env`** (user-level config)
3. **`<project_root>/.env`** (local dev fallback)

Set up your API keys:

```bash
mkdir -p ~/.config/researchbot
cat > ~/.config/researchbot/.env << 'EOF'
# Semantic Scholar (optional — increases rate limits)
SEMANTIC_SCHOLAR_API_KEY=

# Mistral OCR (required if using mistral as OCR provider)
MISTRAL_API_KEY=

# Deepseek OCR (required if using deepseek as OCR provider)
DEEPSEEK_API_KEY=

# OCR provider: "mistral" (default) or "deepseek"
RESEARCHBOT_OCR_PROVIDER=mistral
EOF
```

### Claude Code Integration

To use researchbot as an MCP server in Claude Code globally, see [claude-config](https://github.com/juankost/claude-config) for the install script that sets up skills and MCP tools.

## Current Status

**v1 — In Development**

The project is in early development. The Semantic Scholar API client, PDF download + OCR pipeline, CLI commands, and MCP server are implemented. The agent skills are being built next.

### What's implemented
- Python package structure (`researchbot/`) with module stubs
- Dependencies: `httpx`, `click`, `mistralai`, `openai`, `mcp`, `python-dotenv`
- Configuration with OCR provider toggle (Mistral default, Deepseek configurable), two-tier `.env` loading
- **Semantic Scholar API client** (`researchbot/scholar.py`) — `Paper` dataclass, `SemanticScholarClient` class with search, get_paper, citations, references, and similar paper methods. Includes `resolve_paper()` for flexible input: accepts IDs, URLs, or paper names.
- **PDF download + OCR pipeline** (`researchbot/pdf.py`, `researchbot/ocr.py`) — resolve PDF URLs via Semantic Scholar, download with caching, Mistral OCR with image extraction, per-paper cache (`text.md` + `images/`)
- **CLI commands** — `search`, `paper`, `citations`, `references`, `similar`, `read` with JSON output, `--pretty` flag, and `--include-images` for the `read` command
- **MCP server** (`researchbot/mcp_server.py`) — FastMCP server exposing 7 tools (`search_papers`, `get_paper`, `get_citations`, `get_references`, `search_similar`, `read_paper`, `ocr_local_pdf`) over stdio transport, auto-started by Claude Code via `.mcp.json`


## Usage

### CLI

```bash
# Search for papers
researchbot search "state space models" --limit 5 --pretty

# Get details for a specific paper
researchbot paper "ARXIV:1706.03762" --pretty

# Get papers that cite a paper
researchbot citations "ARXIV:1706.03762" --limit 5 --pretty

# Get papers referenced by a paper
researchbot references "ARXIV:1706.03762" --limit 5 --pretty

# Find similar papers
researchbot similar "ARXIV:1706.03762" --limit 5 --pretty

# Download + OCR a paper (markdown text output)
researchbot read "ARXIV:2106.15928"

# Download + OCR with image paths (JSON output)
researchbot read "ARXIV:2106.15928" --include-images --pretty
```

All commands output JSON by default. Add `--pretty` for indented output and `--limit N` to control result count.

Supported paper ID formats: `ARXIV:xxx`, `DOI:xxx`, `CorpusId:xxx`, Semantic Scholar hash, or `URL:xxx`.

The `read` command (and the MCP `read_paper` tool) also accepts paper names — they are resolved via Semantic Scholar search:

```bash
researchbot read "Attention is All You Need"
```


## Project Structure

```
researchbot/
  __init__.py
  config.py       # API keys, OCR provider toggle, cache paths
  scholar.py      # Semantic Scholar API client
  pdf.py          # PDF download utilities
  ocr.py          # OCR pipeline (Mistral / Deepseek)
  cli.py          # CLI entry point (Click)
  mcp_server.py   # MCP server for Claude Code (FastMCP, stdio)
```

## License

MIT
