Metadata-Version: 2.4
Name: arxiv-research-mcp
Version: 1.0.0
Summary: An MCP server exposing arXiv research tools (search, abstracts, author lookup, trending) to LLM agents.
Project-URL: Homepage, https://github.com/JananiV07/arxiv-mcp-server
Project-URL: Repository, https://github.com/JananiV07/arxiv-mcp-server
Project-URL: Issues, https://github.com/JananiV07/arxiv-mcp-server/issues
Author-email: JananiV07 <jananiv1207@gmail.com>
License: MIT
License-File: LICENSE
Keywords: ai-agent,arxiv,fastmcp,llm,mcp,model-context-protocol,research
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: arxiv>=2.1.0
Requires-Dist: mcp[cli]>=1.2.0
Provides-Extra: dev
Requires-Dist: pyright>=1.1.380; extra == 'dev'
Requires-Dist: ruff>=0.6.0; extra == 'dev'
Description-Content-Type: text/markdown

<div align="center">

# 📚 arXiv Research MCP Server

**Give any LLM agent a research librarian for [arXiv](https://arxiv.org).**

Search 2.4M+ papers, pull full abstracts, track a researcher's latest work, and
surface what a field is publishing right now — all over the
[Model Context Protocol](https://modelcontextprotocol.io).

[![CI](https://github.com/JananiV07/arxiv-mcp-server/actions/workflows/ci.yml/badge.svg)](https://github.com/JananiV07/arxiv-mcp-server/actions/workflows/ci.yml)
[![Python](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/)
[![MCP](https://img.shields.io/badge/MCP-FastMCP-7c3aed.svg)](https://modelcontextprotocol.io)
[![arXiv API](https://img.shields.io/badge/data-arXiv%20API-b31b1b.svg)](https://info.arxiv.org/help/api/index.html)
[![Type checked: pyright strict](https://img.shields.io/badge/types-pyright%20strict-2563eb.svg)](https://github.com/microsoft/pyright)
[![Lint: ruff](https://img.shields.io/badge/lint-ruff-261230.svg)](https://github.com/astral-sh/ruff)
[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](#-license)

</div>

---

## 🎬 Demo

<!--
  A short demo GIF goes here. To add it:
    1. Record an MCP client (e.g. Claude Desktop) calling the tools — search a
       topic, then fetch an abstract — and trim the clip to ~30 seconds.
    2. Save it as docs/demo.gif in this repo.
    3. Replace this comment block with:  ![Demo](docs/demo.gif)
-->

> ▶️ _Demo GIF coming soon — a 30-second walkthrough of an agent searching arXiv
> and reading an abstract through these tools._

---

## ✨ Why this server

Large language models are great at *reasoning about* papers but have no live
access to the literature. This server closes that gap with four focused,
**read-only** tools that an agent can call to discover, read, and monitor
research on arXiv — with output shaped specifically for an LLM's context window.

- 🧠 **Agent-first tool design** — every tool carries a detailed docstring the
  host shows to the model, so it knows *when* and *how* to call each one.
- 📦 **Structured, validated output** — each tool returns a typed Pydantic model,
  surfaced as MCP `structuredContent` (not just a blob of text).
- 🎚️ **Context-aware verbosity** — `concise` mode (default) trims abstracts and
  caps author lists; `detailed` returns everything. You never blow the window by
  accident.
- 🛟 **Honest by design** — `trending_topics` refuses to fake popularity metrics
  arXiv doesn't expose, and says so in every response.
- ✅ **Actually verified** — ruff + pyright `--strict` + an in-process smoke test
  **and** a real end-to-end stdio MCP client test, all green against the live API.

---

## 🧰 The four tools

| Tool | What it's for | Parameters (defaults) |
| --- | --- | --- |
| 🔍 `search_papers` | Keyword discovery across all of arXiv. Supports field prefixes (`ti:`, `au:`, `abs:`, `cat:`) and boolean `AND` / `OR` / `ANDNOT`. | `query`, `max_results=10`, `sort_by="relevance"`, `response_format="concise"` |
| 📄 `get_abstract` | Full record for one paper by ID — untruncated abstract, every author, all categories, DOI / journal ref / comment, PDF + abstract URLs. | `arxiv_id` |
| 👤 `find_by_author` | A researcher's most recent papers, newest first. | `author_name`, `max_results=10`, `response_format="concise"` |
| 📈 `trending_topics` | Recent submissions in a category within a time window, plus the sub-topics that dominate them. | `category`, `days=7`, `max_results=10`, `response_format="concise"` |

**Shared conventions**

- `response_format`: `"concise"` (default) shortens the abstract to ~280 chars and
  caps the author list to 8 names — `abstract_truncated` and `author_count` always
  tell the agent what was elided. `"detailed"` returns full text and all authors.
- `sort_by` (search only): `"relevance"`, `"newest"`, or `"last_updated"`.
- **Safety caps** (auto-applied, and reported back in a `note` field): `max_results`
  is clamped to **50**, `trending_topics` scans at most **200** recent papers and
  honors a window of **1–90** days.
- `arxiv_id` is forgiving — it accepts bare (`2401.01234`), versioned
  (`2401.01234v2`), legacy (`math.GT/0309136`), and full-URL forms.

### A deliberate note on "trending"

> The arXiv API exposes **no citation, download, or view counts** — so genuine
> popularity *cannot* be measured. `trending_topics` therefore defines "trending"
> as **recency of submission** within the window, and ranks the sub-categories
> those recent papers co-occur in. Every response restates this in its `note`
> field so the agent never overclaims. Honesty over vanity metrics.

---

## 🚀 Quick start

Install from PyPI:

```bash
pip install arxiv-research-mcp
```

…then point your MCP client at the `arxiv-research-mcp` command (see
[Connect it to an MCP host](#-connect-it-to-an-mcp-host)).

<details>
<summary><b>Or install from source</b></summary>

```bash
git clone https://github.com/JananiV07/arxiv-mcp-server.git
cd arxiv-mcp-server

python -m venv .venv
# Windows (PowerShell):
.venv\Scripts\Activate.ps1
# macOS / Linux:
source .venv/bin/activate

pip install -r requirements.txt
python src/server.py
```

</details>

> Requires **Python 3.10+**. Runtime deps are just `mcp[cli]` and `arxiv`.
> The PyPI package is named **`arxiv-research-mcp`** (the name `arxiv-mcp-server`
> was already taken by an unrelated project).

Run it directly (it speaks MCP over **stdio**, so normally a host launches it):

```bash
python src/server.py
```

---

## 🔌 Connect it to an MCP host

### Configure your client

Add an entry to your client's MCP config file (for example, Claude Desktop uses
`claude_desktop_config.json`; other clients expose an equivalent).

**If you installed from PyPI** (`pip install arxiv-research-mcp`), just reference
the installed command:

```jsonc
{
  "mcpServers": {
    "arxiv-research": {
      "command": "arxiv-research-mcp"
    }
  }
}
```

**If you installed from source**, point at the Python interpreter from your
virtual environment:

```jsonc
{
  "mcpServers": {
    "arxiv-research": {
      "command": "/absolute/path/to/arxiv-mcp-server/.venv/bin/python",
      "args": ["/absolute/path/to/arxiv-mcp-server/src/server.py"]
    }
  }
}
```

On **Windows** (from source), use the `.exe` and forward slashes — e.g.
`C:/path/to/arxiv-mcp-server/.venv/Scripts/python.exe`.

Restart the host, and the four tools appear under the **arxiv-research** server.

### Try it with the MCP Inspector

```bash
npx @modelcontextprotocol/inspector python src/server.py
```

---

## 💬 What an agent can do with it

Once connected, natural-language requests map cleanly onto the tools:

| You ask… | The agent calls… |
| --- | --- |
| *"Find recent papers on diffusion models for video."* | `search_papers("ti:diffusion AND cat:cs.CV", sort_by="newest")` |
| *"Summarize 'Attention Is All You Need'."* | `get_abstract("1706.03762")` |
| *"What has Yoshua Bengio published lately?"* | `find_by_author("Yoshua Bengio")` |
| *"What's hot in machine learning this week?"* | `trending_topics("cs.LG", days=7)` |

### Example output (`get_abstract`, abridged)

```jsonc
{
  "arxiv_id": "1706.03762v7",
  "title": "Attention Is All You Need",
  "authors": ["Ashish Vaswani", "Noam Shazeer", "..."],
  "author_count": 8,
  "published": "2017-06-12",
  "updated": "2023-08-02",
  "primary_category": "cs.CL",
  "categories": ["cs.CL", "cs.LG"],
  "abstract": "The dominant sequence transduction models ...",
  "abstract_truncated": false,
  "abstract_url": "http://arxiv.org/abs/1706.03762v7",
  "pdf_url": "https://arxiv.org/pdf/1706.03762v7"
}
```

---

## 🏗️ Architecture & design choices

```
arxiv-mcp-server/
├── src/
│   └── server.py          # FastMCP server: 4 tools + Pydantic models + helpers
├── scripts/
│   ├── smoke_test.py      # in-process tests (import the tool fns directly)
│   └── client_test.py     # end-to-end test over the real stdio MCP protocol
├── pyproject.toml         # packaging + ruff + pyright config
├── requirements.txt       # runtime deps
└── README.md
```

- **FastMCP** registers each tool via `@mcp.tool()`; type hints + `pydantic.Field`
  descriptions become the JSON input schema the host advertises to the model.
- **Typed output models** — `Paper`, `SearchResults`, `AuthorResults`,
  `TopicCount`, `TrendingResults` — give the host structured, machine-readable
  results.
- **Read-only annotations** — all four tools set `readOnlyHint=True` /
  `destructiveHint=False`, so hosts can treat them as safe to call freely.
- **One shared `arxiv.Client`** with a polite delay + retries, respecting arXiv's
  fair-use guidance; its chatty INFO logging is silenced so stdout stays a clean
  MCP channel.
- **Actionable errors** — bad input or a failed request raises a `ValueError`
  whose message tells the agent how to fix the call (correct ID format, valid
  category code, query-prefix syntax, …).

---

## 🧪 Development & testing

```bash
pip install -e ".[dev]"          # ruff + pyright

ruff check .                     # lint
pyright                          # type check (strict on our own code)
python scripts/smoke_test.py     # in-process checks vs the live arXiv API
python scripts/client_test.py    # full stdio MCP protocol round-trip
```

Two complementary test layers:

- **`smoke_test.py`** imports the tool functions directly — fast feedback on tool
  logic, the concise/detailed split, `max_results`/`days` clamping, missing-field
  handling, and error paths.
- **`client_test.py`** is a true MCP client: it **spawns `src/server.py` as a
  subprocess** and exercises `initialize → list_tools → call_tool` over stdio —
  the same path any MCP host uses. This is what proves the server
  works *as an MCP server*: input schemas, `structuredContent`, tool annotations,
  and protocol-level error reporting (`isError`).

---

## 📋 Requirements

- Python **3.10+**
- [`mcp[cli]`](https://pypi.org/project/mcp/) — the MCP Python SDK (FastMCP)
- [`arxiv`](https://pypi.org/project/arxiv/) — Python wrapper for the arXiv API
- Network access to `export.arxiv.org`

---

## 🙏 Acknowledgements

- Paper data from the [arXiv API](https://info.arxiv.org/help/api/index.html).
  Thank you to arXiv for the open API — please use it within their
  [Terms of Use](https://info.arxiv.org/help/api/tou.html).
- Built on the [Model Context Protocol](https://modelcontextprotocol.io).

> arXiv is a trademark of Cornell University. This project is an independent,
> unofficial integration and is not affiliated with or endorsed by arXiv.

---

## 📄 License

Released under the **MIT License** — see [`LICENSE`](LICENSE).

<div align="center">
<sub>Built for the agentic era — so your LLM can read the literature, not just guess about it.</sub>
</div>
