Metadata-Version: 2.4
Name: literature-discovery-mcp
Version: 0.1.0
Summary: MCP server for scholarly literature discovery via arXiv and Semantic Scholar
Keywords: mcp,arxiv,semantic-scholar,literature-search,research
Author: Pidgeos Agent
Author-email: Pidgeos Agent <pidgeos@agent.damien.lu>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Dist: arxiv>=4
Requires-Dist: fastmcp>=3
Requires-Dist: httpx>=0.28.1
Requires-Dist: pydantic-settings>=2
Requires-Dist: semanticscholar>=0.8
Requires-Python: >=3.13
Description-Content-Type: text/markdown

# literature-discovery-mcp

Private MCP server for scholarly literature discovery. It covers the discovery/metadata side of the LIST literature-search workflow and intentionally delegates web-page/PDF crawling to an external `fetch_url` MCP.

## Capabilities

- arXiv discovery:
  - `search_arxiv_papers`
  - `get_arxiv_paper`
  - `search_arxiv_by_author`
  - `search_arxiv_by_category`
  - `get_arxiv_urls`
- Semantic Scholar discovery:
  - `search_semantic_scholar_papers`
  - `get_paper_metadata`
  - `get_paper_citations`
  - `get_paper_references`
  - `get_related_papers`
  - `search_authors`
  - `get_author`
- Citation export:
  - `get_bibtex`

Returned arXiv and Semantic Scholar search records use a normalized schema with fields such as:

```json
{
  "id": "arXiv:2402.03300",
  "source": "arxiv",
  "title": "...",
  "authors": ["..."],
  "year": 2024,
  "abstract": "...",
  "url": "https://arxiv.org/abs/...",
  "html_url": "https://arxiv.org/html/...",
  "pdf_url": "https://arxiv.org/pdf/...",
  "doi": "...",
  "citation_count": 42,
  "access": "abstract_only",
  "caveats": []
}
```

Use your external `fetch_url` MCP with `url`, `html_url`, or `pdf_url` to retrieve full pages or PDFs.

## Install and run

```bash
uv sync
uv run literature-discovery-mcp
```

MCP client configuration example:

```yaml
mcp_servers:
  literature_discovery:
    command: "uvx"
    args: ["--from", "git+https://github.com/pidgeos/literature-discovery-mcp.git", "literature-discovery-mcp"]
    env:
      SEMANTIC_SCHOLAR_API_KEY: "optional-key"
```

For local development:

```yaml
mcp_servers:
  literature_discovery:
    command: "uv"
    args: ["--directory", "/path/to/literature-discovery-mcp", "run", "literature-discovery-mcp"]
```

## Configuration

- `SEMANTIC_SCHOLAR_API_KEY` / `S2_API_KEY`: optional Semantic Scholar API key.
- `ARXIV_RATE_LIMIT_RPS`: default `0.33` to respect arXiv's courtesy limit.
- `SEMANTIC_SCHOLAR_RATE_LIMIT_RPS` / `S2_RATE_LIMIT_RPS`: default `1.0`.
- `ARXIV_PAGE_SIZE`: default `100`.
- `SEMANTIC_SCHOLAR_TIMEOUT` / `S2_TIMEOUT`: default `30` seconds.
- `LITERATURE_MAX_ATTEMPTS`: default `3`.
- `LITERATURE_RETRY_BACKOFF_SECONDS`: default `1.0`.

Rate limits are process-local, not distributed across multiple MCP processes.

## Local quality gates

```bash
uv lock --check
uv sync --locked --dev
uv run ruff format --check .
uv run ruff check .
uv run coverage run -m pytest -q
uv run coverage report --fail-under=95
uv build
```
