Metadata-Version: 2.1
Name: deepxiv-sdk
Version: 0.2.3
Summary: A Python package for arXiv paper access with CLI and MCP server support
Home-page: https://github.com/qhjqhj00/deepxiv-sdk
Author: Hongjin Qian
License: MIT
Project-URL: Homepage, https://1stauthor.com/
Project-URL: Documentation, https://github.com/qhjqhj00/deepxiv-sdk#readme
Project-URL: Repository, https://github.com/qhjqhj00/deepxiv-sdk
Project-URL: Bug Tracker, https://github.com/qhjqhj00/deepxiv-sdk/issues
Project-URL: Demo, https://1stauthor.com/
Project-URL: API Documentation, https://data.rag.ac.cn/api/docs
Keywords: arxiv,research,papers,agent,llm,react,mcp,cli
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.31.0
Requires-Dist: click>=8.0.0
Requires-Dist: python-dotenv>=0.19.0
Provides-Extra: agent
Requires-Dist: openai>=1.0.0; extra == "agent"
Requires-Dist: langgraph>=0.0.20; extra == "agent"
Requires-Dist: langchain-core>=0.1.0; extra == "agent"
Provides-Extra: all
Requires-Dist: requests>=2.31.0; extra == "all"
Requires-Dist: click>=8.0.0; extra == "all"
Requires-Dist: python-dotenv>=0.19.0; extra == "all"
Requires-Dist: mcp[cli]>=1.2.0; extra == "all"
Requires-Dist: openai>=1.0.0; extra == "all"
Requires-Dist: langgraph>=0.0.20; extra == "all"
Requires-Dist: langchain-core>=0.1.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-mock>=3.10.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: flake8>=5.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: isort>=5.0.0; extra == "dev"
Provides-Extra: mcp
Requires-Dist: mcp[cli]>=1.2.0; extra == "mcp"

# deepxiv-sdk

**Agent-first academic paper interface for CLI, MCP, and Python.** deepxiv gives OpenClaw, Claude Code, Codex, and other coding agents a fast, structured way to search papers, inspect metadata, read only the right sections, and reason over open-access literature without wasting tokens.

- **📚 API Documentation**: [https://data.rag.ac.cn/api/docs](https://data.rag.ac.cn/api/docs)
- **🎥 Demo Video**: [![Watch Demo](https://img.shields.io/badge/YouTube-Watch%20Demo-red)](https://youtu.be/atr71CbQybM)
- **📄 Technical Report**: [![arxiv](https://img.shields.io/badge/arXiv-2603.00084-b31b1b)](https://arxiv.org/abs/2603.00084)
- **📖 中文文档**: [README.zh.md](README.zh.md)

## Why deepxiv for agents?

| Feature | deepxiv | Standard arXiv API |
|---------|---------|-------------------|
| **Hybrid Search** (BM25 + Vector) | ✅ | ❌ |
| **AI-Generated Summaries** (TLDR) | ✅ | ❌ |
| **Section-by-Section Access** | ✅ | ❌ |
| **GitHub Link Extraction** | ✅ | ❌ |
| **MCP Protocol Support** | ✅ | ❌ |
| **Biomedical Papers** (PMC) | ✅ | ❌ |
| **Agent-Oriented CLI** | ✅ | ❌ |
| **Free Daily Requests** | 10,000 | ∞* |

*arXiv API has no limit, but strict rate limiting

## Core Features

- 🔍 **Hybrid Search**: BM25 + vector search for better retrieval quality
- 📄 **Section-Based Access**: load only the sections an agent actually needs
- ✨ **Brief Views**: title, TLDR, keywords, citations, PDF, and GitHub link when available
- 💻 **Three Interfaces**: CLI / MCP Server / Python SDK
- 🤖 **Agent-Friendly by Default**: works well inside OpenClaw, Claude Code, Codex, and similar agent loops
- 📚 **PMC Support**: access biomedical literature alongside arXiv
- 🔥 **Trending + Social Impact**: discover papers getting attention online

## Agent Integration

deepxiv is designed to be the paper interface layer for coding and research agents.

- **Codex**: install the CLI skill and let Codex call `deepxiv search`, `deepxiv paper`, and `deepxiv pmc` directly
- **Claude Code**: load the same CLI skill or use the MCP server for tool-based access
- **OpenClaw**: use the CLI as a stable shell interface, or wire the MCP server into your agent runtime
- **Other agents**: use the CLI for predictable terminal workflows, the MCP server for tool calling, or the Python SDK for direct integration

The key design goal is simple: give agents a comprehensive and token-efficient academic paper interface instead of forcing them to scrape raw PDFs or overfetch entire papers.

## 🌐 Open Access Literature Support

### Current Support
- ✅ **arXiv** - Computer Science, Physics, Math, and more
- ✅ **PubMed Central (PMC)** - Biomedical and life sciences

### Coming Soon (Roadmap)
- 🔄 **bioRxiv** - Preprints in biology
- 🔄 **medRxiv** - Preprints in medicine
- 🔄 **Other OA Sources** - Additional open access repositories
- 🔄 **Full OA Literature Coverage** - Comprehensive open access ecosystem

> **Why OA Literature?** By focusing on open access papers, deepxiv ensures that researchers and AI systems have unrestricted access to knowledge without subscription barriers.

## Quick Start

### 1. Installation

```bash
# Basic install (Reader + CLI)
pip install deepxiv-sdk

# Full install (MCP + Agent)
pip install deepxiv-sdk[all]
```

### 2. First Use

On first use, deepxiv automatically registers a free token and saves it to `~/.env`:

```bash
deepxiv search "agent memory" --limit 5
```

### 3. CLI Usage

The CLI is the fastest way to plug deepxiv into agent workflows.

```bash
# Search papers
deepxiv search "transformer" --limit 10

# Quick paper understanding
deepxiv paper 2409.05591 --brief

# Paper structure and targeted reading
deepxiv paper 2409.05591 --head
deepxiv paper 2409.05591 --section Introduction
deepxiv paper 2409.05591 --preview
deepxiv paper 2409.05591

# Social/trending signals
deepxiv paper 2409.05591 --popularity
deepxiv trending --days 14 --limit 10

# Biomedical papers
deepxiv pmc PMC544940 --head
```

### 4. Use with OpenClaw, Claude Code, and Codex

#### Codex skill

```bash
mkdir -p $CODEX_HOME/skills
ln -s "$(pwd)/skills/deepxiv-cli" $CODEX_HOME/skills/deepxiv-cli
```

The included skill teaches agents when to use:
- `deepxiv search` for literature discovery
- `deepxiv paper --brief` for quick filtering
- `deepxiv paper --section` for focused reading
- `deepxiv pmc` for biomedical papers
- `deepxiv agent` for deeper multi-turn reasoning

#### Claude Code / OpenClaw / custom agents

If your framework supports reusable operating instructions, load [skills/deepxiv-cli/SKILL.md](skills/deepxiv-cli/SKILL.md) directly. This gives agents a clean command selection guide instead of relying on ad hoc shell usage.

### 5. MCP Server

Use MCP when you want tool-based integration rather than shell execution.

Add to Claude Desktop MCP config file:

**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`

**Windows**: `%APPDATA%\Claude\claude_desktop_config.json`

**Linux**: `~/.config/Claude/claude_desktop_config.json`

```json
{
  "mcpServers": {
    "deepxiv": {
      "command": "deepxiv",
      "args": ["serve"],
      "env": {
        "DEEPXIV_TOKEN": "your_token_here"
      }
    }
  }
}
```

Available MCP tools:

| Tool | Description |
|------|-------------|
| `search_papers` | Search arXiv papers |
| `get_paper_brief` | Quick summary |
| `get_paper_metadata` | Full metadata |
| `get_paper_section` | Read specific section |
| `get_full_paper` | Complete paper |
| `get_paper_preview` | Paper preview |
| `get_pmc_metadata` | PMC paper metadata |
| `get_pmc_full` | Complete PMC paper |

### 6. Python Usage

```python
from deepxiv_sdk import Reader

reader = Reader()

# Search papers
results = reader.search("agent memory", size=5)
for paper in results.get("results", []):
    print(f"{paper['title']} ({paper['arxiv_id']})")

# Get paper info
brief = reader.brief("2409.05591")
print(f"Title: {brief['title']}")
print(f"TLDR: {brief.get('tldr', 'N/A')}")
print(f"GitHub: {brief.get('github_url', 'N/A')}")

# Read specific section
intro = reader.section("2409.05591", "Introduction")
print(intro[:500])

# Get trending papers (no token required)
trending = reader.trending(days=7, limit=5)
for paper in trending['papers']:
    print(f"#{paper['rank']}: {paper['arxiv_id']}")
    print(f"  Views: {paper['stats']['total_views']}")

# Get social impact metrics (requires token)
reader_with_token = Reader(token="your_token_here")
impact = reader_with_token.social_impact("2409.05591")
if impact:
    print(f"Views: {impact['total_views']}")
    print(f"Tweets: {impact['total_tweets']}")
```

## Complete API Reference

### Search and Query

```python
reader.search(query, size=10, search_mode="hybrid", categories=None, min_citation=None)
reader.head(arxiv_id)              # Paper metadata and sections overview
reader.brief(arxiv_id)             # Quick summary (title, TLDR, keywords, citations, GitHub URL)
reader.section(arxiv_id, section)  # Read specific section
reader.raw(arxiv_id)               # Full paper
reader.preview(arxiv_id)           # Paper preview (~10k characters)
reader.json(arxiv_id)              # Complete structured JSON
```

### PMC (Biomedical Papers)

```python
reader.pmc_head(pmc_id)            # PMC paper metadata
reader.pmc_full(pmc_id)            # Complete PMC paper JSON
```

### Agent (Optional)

```python
from deepxiv_sdk import Agent

agent = Agent(api_key="your_openai_key", model="gpt-4")
answer = agent.query("What are the latest papers about agent memory?")
print(answer)
```

## Token Management

deepxiv supports 4 ways to configure tokens:

**1. Auto-registration (Recommended)** - Automatically creates and saves on first use
```bash
deepxiv search "agent"
```

**2. Using config command**
```bash
deepxiv config --token YOUR_TOKEN
```

**3. Environment variable**
```bash
export DEEPXIV_TOKEN="your_token"
```

**4. Command-line option**
```bash
deepxiv paper 2409.05591 --token YOUR_TOKEN
```

**Increase daily limit**: Default is 10,000 requests/day. For higher limits, email your name, email, and phone to `tommy@chien.io`.

### Free Test Papers

These papers can be accessed without a token:

**arXiv**: `2409.05591`, `2504.21776`
**PMC**: `PMC544940`, `PMC514704`

## Agent Usage (Optional)

The built-in ReAct agent can automatically search papers, read content, and perform multi-turn reasoning:

```python
from deepxiv_sdk import Agent

agent = Agent(
    api_key="your_deepseek_key",
    base_url="https://api.deepseek.com/v1",
    model="deepseek-chat"
)

answer = agent.query("Compare key ideas in transformers and attention mechanisms")
print(answer)
```

Or via CLI:

```bash
deepxiv agent config  # Configure LLM API
deepxiv agent query "What are the latest papers about agent memory?" --verbose
```

## Error Handling

deepxiv provides specific exception types:

```python
from deepxiv_sdk import (
    Reader,
    AuthenticationError,  # 401 - Invalid or expired token
    RateLimitError,       # 429 - Daily limit reached
    NotFoundError,        # 404 - Paper not found
    ServerError,          # 5xx - Server error
    APIError              # Other API errors
)

try:
    paper = reader.brief("2409.05591")
except AuthenticationError:
    print("Please update your token")
except RateLimitError:
    print("Daily limit reached")
except NotFoundError:
    print("Paper not found")
except APIError as e:
    print(f"API error: {e}")
```

## Troubleshooting

**Q: Do I need a token to use?**
A: No. Some papers are free to access. Search and some content require a token, but it's auto-created on first use.

**Q: What's the maximum search results?**
A: 100 per request. Use `offset` parameter for pagination.

**Q: How to handle timeouts?**
A: Reader automatically retries (max 3 times) with exponential backoff. You can customize:
```python
reader = Reader(timeout=120, max_retries=5)
```

**Q: Can I cache paper content?**
A: Yes. After getting content with reader, cache locally to database or file system.

**Q: Which LLMs does the agent support?**
A: Any OpenAI-compatible API (OpenAI, DeepSeek, OpenRouter, local Ollama, etc.).

## Examples

See [examples/](examples/) directory:

- `quickstart.py` - 5-minute quick start
- `example_reader.py` - Basic Reader usage
- `example_agent.py` - Agent usage
- `example_advanced.py` - Advanced patterns
- `example_error_handling.py` - Error handling examples

## License

MIT License - see [LICENSE](LICENSE) file

## Support

- 🐛 **GitHub Issues**: [https://github.com/qhjqhj00/deepxiv_sdk/issues](https://github.com/qhjqhj00/deepxiv_sdk/issues)
- 📚 **API Documentation**: [https://data.rag.ac.cn/api/docs](https://data.rag.ac.cn/api/docs)
- 📧 **Higher Limits**: Email with your name, email, and phone to `tommy@chien.io`
