Metadata-Version: 2.4
Name: mcarchive-org
Version: 2026.4.21
Summary: MCP server for searching and downloading files from the Internet Archive (archive.org)
Project-URL: Homepage, https://git.supported.systems/rsp2k/mcarchive-org
Project-URL: Repository, https://git.supported.systems/rsp2k/mcarchive-org
Project-URL: Bug Tracker, https://git.supported.systems/rsp2k/mcarchive-org/issues
Project-URL: Changelog, https://git.supported.systems/rsp2k/mcarchive-org/src/branch/main/CHANGELOG.md
Project-URL: Archive.org API docs, https://archive.org/developers/
Author-email: Ryan Malloy <ryan@supported.systems>
License: MIT
Keywords: archive.org,fastmcp,internet-archive,llm,mcp
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP
Requires-Python: >=3.10
Requires-Dist: fastmcp>=3.2.4
Requires-Dist: httpx>=0.28.1
Description-Content-Type: text/markdown

# mcarchive-org

An MCP (Model Context Protocol) server that lets an LLM search, inspect, and download content from the [Internet Archive](https://archive.org).

Built on [FastMCP](https://gofastmcp.com) + [httpx](https://www.python-httpx.org/). No API key required — archive.org's read endpoints are public.

## Tools

| Tool | Purpose |
|------|---------|
| `search_items` | Small Solr-style search via `advancedsearch.php` (1–200 rows, paginated) |
| `scrape_items` | Bulk cursor-paginated search via Scrape API (count ≥ 100) |
| `get_item_metadata` | Metadata for one item; skips the (possibly huge) files list by default |
| `list_files` | Files array with optional format / glob filtering — includes `download_url` per file |
| `get_file_url` | Build a canonical download URL without hitting the network |
| `download_file` | Stream a file to disk with resume support and optional MD5 verification |

Also exposes an MCP resource template: `archive://item/{identifier}`.

## Install & run

```bash
# From a checkout:
uv sync
uv run mcarchive-org

# Or from PyPI (once published):
uvx mcarchive-org
```

Register with Claude Code:

```bash
claude mcp add archive-org -- uvx mcarchive-org
# or, from a local checkout:
claude mcp add archive-org -- uv run --directory /path/to/mcarchive-org mcarchive-org
```

## Environment

| Variable | Default | Purpose |
|----------|---------|---------|
| `MCARCHIVE_DOWNLOAD_ROOT` | `./downloads` | Base directory for `download_file` |

## Example flow

```
search_items(query='mediatype:audio AND creator:"Grateful Dead"', sort=['downloads desc'])
  → identifier 'gd77-05-08.sbd.hicks.4982.sbeok.shnf' (among others)

list_files(identifier='gd77-05-08.sbd.hicks.4982.sbeok.shnf', formats=['VBR MP3'])
  → [{ name: 'gd1977-05-08d1t01.mp3', size: 6342912, md5: '…', download_url: '…' }, …]

download_file(identifier='gd77-…', filename='gd1977-05-08d1t01.mp3', verify_md5='…')
  → { path: './downloads/gd77-…/gd1977-…mp3', bytes: 6342912, md5_ok: True }
```

## Query syntax notes

archive.org uses a Solr/Lucene dialect:

- `mediatype:(audio OR movies)` — restrict to media types
- `collection:etree` — items in a specific collection
- `date:[1977-01-01 TO 1977-12-31]` — date ranges
- `creator:"Grateful Dead"` — phrase match
- `-subject:bootleg` — exclusion
- Sort by `downloads desc`, `date asc`, `addeddate desc`, etc.

See [archive.org's search docs](https://archive.org/advancedsearch.php) for the full grammar.

## License

MIT
