Metadata-Version: 2.4
Name: researchbot
Version: 0.1.0
Summary: Git-based AI research assistant with literature analysis tools
Project-URL: Homepage, https://github.com/juankost/researchbot
Project-URL: Repository, https://github.com/juankost/researchbot
Project-URL: Issues, https://github.com/juankost/researchbot/issues
Author: Jiawei Xu
License-Expression: MIT
License-File: LICENSE
Keywords: academic,claude,mcp,papers,research,semantic-scholar
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.12
Requires-Dist: click>=8.3.1
Requires-Dist: httpx>=0.28.1
Requires-Dist: mcp>=1.26.0
Requires-Dist: mistralai>=1.11.1
Requires-Dist: openai>=2.16.0
Requires-Dist: python-dotenv>=1.2.1
Description-Content-Type: text/markdown

# Researchbot

**A Git-based AI research assistant that turns your local file system into an active research lab.**

You brain dump ideas, paper references, and notes into a workspace file. Then you tell the agent what to do — survey the landscape, check novelty, deep-read specific papers — and it orchestrates the work: searching APIs, analyzing papers in parallel, and writing structured output to your project files.

Built on Claude Code with custom MCP servers, using VS Code/Cursor as the interface and Markdown as the document format.

## Installation

### From PyPI

```bash
pip install researchbot
```

### From source

```bash
git clone https://github.com/juankost/researchbot.git
cd researchbot
uv sync
```

### Configuration

Researchbot looks for API keys in this order:

1. **Shell environment variables** (highest priority)
2. **`~/.config/researchbot/.env`** (user-level config)
3. **`<project_root>/.env`** (local dev fallback)

Set up your API keys:

```bash
mkdir -p ~/.config/researchbot
cat > ~/.config/researchbot/.env << 'EOF'
# Semantic Scholar (optional — increases rate limits)
SEMANTIC_SCHOLAR_API_KEY=

# Mistral OCR (required if using mistral as OCR provider)
MISTRAL_API_KEY=

# Deepseek OCR (required if using deepseek as OCR provider)
DEEPSEEK_API_KEY=

# OCR provider: "mistral" (default) or "deepseek"
RESEARCHBOT_OCR_PROVIDER=mistral
EOF
```

### Claude Code Integration

To use researchbot as an MCP server in Claude Code globally, see [claude-config](https://github.com/juankost/claude-config) for the install script that sets up skills and MCP tools.

## Current Status

**v1 — In Development**

The project is in early development. The Semantic Scholar API client, PDF download + OCR pipeline, CLI commands, and MCP server are implemented. The agent skills are being built next.

### What's implemented
- Python package structure (`researchbot/`) with module stubs
- Dependencies: `httpx`, `click`, `mistralai`, `openai`, `mcp`, `python-dotenv`
- Configuration with OCR provider toggle (Mistral default, Deepseek configurable), two-tier `.env` loading
- **Semantic Scholar API client** (`researchbot/scholar.py`) — `Paper` dataclass, `SemanticScholarClient` class with search, get_paper, citations, references, and similar paper methods. Includes `resolve_paper()` for flexible input: accepts IDs, URLs, or paper names.
- **PDF download + OCR pipeline** (`researchbot/pdf.py`, `researchbot/ocr.py`) — resolve PDF URLs via Semantic Scholar, download with caching, Mistral OCR with image extraction, per-paper cache (`text.md` + `images/`)
- **CLI commands** — `search`, `paper`, `citations`, `references`, `similar`, `read` with JSON output, `--pretty` flag, and `--include-images` for the `read` command
- **MCP server** (`researchbot/mcp_server.py`) — FastMCP server exposing 7 tools (`search_papers`, `get_paper`, `get_citations`, `get_references`, `search_similar`, `read_paper`, `ocr_local_pdf`) over stdio transport, auto-started by Claude Code via `.mcp.json`
- **Skills:**
  - `/analyze` — In-depth structured analysis of a single paper (contribution, methodology, results, limitations, future work)
  - `/compare` — Compare two papers on problem formulation and methodology
  - `/expand` — Find papers solving the same problem as a seed paper, with parallel subagent analysis
  - `/gaps` — Identify research gaps and open questions from a related works analysis
  - `/verify_gaps` — Verify which gaps are genuinely open by searching for papers that address them
  - `/paper_review` — Constructive conference-style review with literature verification via parallel subagents

### What's next
- v1 skills complete. See [docs/Vision.md](docs/Vision.md) for future directions.

See [docs/plan.md](docs/plan.md) for the full implementation plan.

## Usage

### CLI

```bash
# Search for papers
researchbot search "state space models" --limit 5 --pretty

# Get details for a specific paper
researchbot paper "ARXIV:1706.03762" --pretty

# Get papers that cite a paper
researchbot citations "ARXIV:1706.03762" --limit 5 --pretty

# Get papers referenced by a paper
researchbot references "ARXIV:1706.03762" --limit 5 --pretty

# Find similar papers
researchbot similar "ARXIV:1706.03762" --limit 5 --pretty

# Download + OCR a paper (markdown text output)
researchbot read "ARXIV:2106.15928"

# Download + OCR with image paths (JSON output)
researchbot read "ARXIV:2106.15928" --include-images --pretty
```

All commands output JSON by default. Add `--pretty` for indented output and `--limit N` to control result count.

Supported paper ID formats: `ARXIV:xxx`, `DOI:xxx`, `CorpusId:xxx`, Semantic Scholar hash, or `URL:xxx`.

The `read` command (and the MCP `read_paper` tool) also accepts paper names — they are resolved via Semantic Scholar search:

```bash
researchbot read "Attention is All You Need"
```

### Skills (Claude Code slash commands)

Skills are invoked as slash commands inside Claude Code.

```
# Deep-read a paper by arXiv ID
/analyze ARXIV:1706.03762

# Deep-read by paper name
/analyze Attention is All You Need

# Deep-read by URL
/analyze https://arxiv.org/abs/2106.15928

# Compare two papers
/compare Mamba S4

# Compare papers by arXiv ID
/compare ARXIV:2312.00752 ARXIV:2111.00396

# Find related works for a seed paper
/expand Mamba

# Expand from a specific paper
/expand ARXIV:2312.00752

# Identify gaps from a related works analysis
/gaps workspace/efficient-sequence-modeling/

# Verify which gaps are genuinely open
/verify_gaps workspace/efficient-sequence-modeling/

# Review a paper (published)
/paper_review ARXIV:2312.00752

# Review a local PDF draft
/paper_review ~/drafts/my-paper.pdf

# Review with specific focus
/paper_review ARXIV:2312.00752 Focus on the theoretical claims
```

## Documentation

| File | Purpose |
|------|---------|
| [docs/plan.md](docs/plan.md) | Implementation plan — remaining tasks and their specs |
| [docs/next_task.md](docs/next_task.md) | Detailed spec for the next task to implement |
| [docs/wip.md](docs/wip.md) | Work-in-progress notes for the current task |
| [docs/Vision.md](docs/Vision.md) | Long-term vision and design for the full research lab |
| [docs/project_thoughts.md](docs/project_thoughts.md) | Brain dumps on vision and project direction |

## Project Structure

```
researchbot/
  __init__.py
  config.py       # API keys, OCR provider toggle, cache paths
  scholar.py      # Semantic Scholar API client
  pdf.py          # PDF download utilities
  ocr.py          # OCR pipeline (Mistral / Deepseek)
  cli.py          # CLI entry point (Click)
  mcp_server.py   # MCP server for Claude Code (FastMCP, stdio)
.claude/
  commands/
    analyze.md      # /analyze skill
    compare.md      # /compare skill
    expand.md       # /expand skill
    gaps.md         # /gaps skill
    verify_gaps.md  # /verify_gaps skill
    paper_review.md # /paper_review skill
.mcp.json         # MCP server config (auto-starts researchbot server)
docs/
  plan.md         # Implementation plan
  next_task.md    # Next task spec
  wip.md          # Work in progress
```

## License

MIT
