Metadata-Version: 2.4
Name: perseus-mcp
Version: 1.0.2
Summary: MCP server for Perseus Greek and Latin text research
Author: Tony Jurg
License-Expression: MIT
Project-URL: Homepage, https://github.com/tonyjurg/Perseus-mcp
Project-URL: Documentation, https://tonyjurg.github.io/Perseus-mcp/
Project-URL: Repository, https://github.com/tonyjurg/Perseus-mcp
Project-URL: Issues, https://github.com/tonyjurg/Perseus-mcp/issues
Keywords: mcp,model-context-protocol,perseus,classics,ancient-greek,latin,cts
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Education
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: defusedxml>=0.7.1
Requires-Dist: fastmcp>=2.12.0
Requires-Dist: httpx>=0.27.0
Provides-Extra: dev
Requires-Dist: build>=1.2; extra == "dev"
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: ruff>=0.15; extra == "dev"
Requires-Dist: twine>=6.0; extra == "dev"
Dynamic: license-file

[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active) [![Docs](https://img.shields.io/badge/docs-%F0%9F%93%96-success.svg)](https://tonyjurg.github.io/Perseus-mcp/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![DOI](https://img.shields.io/badge/DOI-10.5281%2Fzenodo.20708960-007ec6.svg)](https://doi.org/10.5281/zenodo.20708960) [![Python](https://img.shields.io/badge/Python-3.x-blue?logo=python)](https://www.python.org/) ![Jupyter](https://img.shields.io/badge/Jupyter-F37626?logo=jupyter&logoColor=white) [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/tonyjurg/Perseus-mcp)

# Perseus-mcp

*Give Claude / Cursor / Windsurf direct access to the Perseus Digital Library* — ancient Greek and Latin texts, precise CTS navigation, plaintext, search, and more.

A high-quality MCP server for Classical Greek and Latin literature. It runs as a local FastMCP server so MCP-capable applications can attach these Perseus tools to the LLM/model provider of your choice.

## Features

This server exposes twenty-three MCP tools. Every tool returns a text payload: some
are raw Perseus CTS XML or Scaife JSON, while the discovery and plaintext
helpers return locally shaped JSON or readable text.

- `get_passage(urn)` — fetch a CTS passage by URN.
- `get_passage_plus(urn)` — fetch passage text plus contextual metadata.
- `get_passage_plaintext(urn)` — fetch a CTS passage as plain readable text.
- `get_valid_references(urn, level=None)` — retrieve navigable citation references for a work or edition.
- `get_valid_references_json(urn, level=None, limit=100, offset=0)` — retrieve paged citation references as JSON (`limit`: 1–500).
- `count_valid_references(urn, level=None)` — count valid references without returning the full list.
- `get_capabilities()` — list available texts/editions from Perseus CTS.
- `get_cache_status()` — inspect local metadata cache state.
- `refresh_metadata_cache()` — refresh cached CTS and Scaife library metadata.
- `clear_metadata_cache()` — clear in-memory and disk metadata cache entries.
- `list_text_groups(language=None, query=None, limit=100, offset=0)` — list matching authors/textgroups and works with pagination metadata (`limit`: 1–500).
- `get_author_resources(author, language=None)` — list works, editions, and translations for a matching author name or CTS textgroup URN.
- `find_author_names(query, language=None, limit=100, offset=0)` — find author/textgroup names by partial name match with pagination metadata (`limit`: 1–500).
- `get_work_resources(urn_or_title, language=None)` — list editions, translations, and resources for a work, optionally filtered by original language.
- `get_label(urn)` — fetch human-readable metadata labels for a URN.
- `get_first_urn(urn)` — get the first navigable URN under a work/edition.
- `get_prev_next_urn(urn)` — get neighboring passage URNs for navigation.
- `search_perseus(query, language="greek", query_format="auto", author=None, search_kind="form", preserve_operators=False, page_num=1, text_group=None, work=None, result_format="instances")` — search texts via Scaife search API. Greek queries may be entered as Unicode Greek (for example `μῆνιν`) or Beta Code (for example `mh=nin`).
- `search_within_text(query, text_urn, ..., size=10, offset=0)` — search within one Scaife text/edition URN (`size`: 1–500).
- `get_passage_highlights(query, passage_urn, ...)` — get Scaife token highlight positions for one passage.
- `get_scaife_library_metadata(urn)` — get Scaife JSON metadata for a library URN.
- `get_scaife_passage_json(urn)` — get Scaife JSON for a passage URN.
- `get_scaife_passage_text(urn)` — get Scaife plaintext for a passage URN.

## Greek Search Input

`search_perseus` normalizes Greek search terms before sending them to Scaife.
You can pass Unicode Greek directly, or use Beta Code such as `mh=nin a)/eide`.
Search queries must contain at least one non-whitespace character.
The default `query_format="auto"` detects explicit Beta Code marks like `=`, `/`, `(`, `)`, and `*`, and also treats short unaccented Greek-looking queries such as `logos` as Beta Code.
If an ASCII query is ambiguous, set `query_format="betacode"` to force conversion or `query_format="unicode"` to preserve it exactly.
Search queries are normalized to composed Greek Unicode (NFC), matching sampled Perseus Greek text.
The tool uses Scaife's JSON search route and returns the JSON response as text.
The `language` argument accepts recognized Greek aliases (`greek`, `grc`, `gr`)
or Latin aliases (`latin`, `lat`, `la`); blank input defaults to Greek and
unrecognized values raise an error. It controls query normalization and is not
currently sent to Scaife as a corpus language filter.
For inventory discovery, `list_text_groups`, `find_author_names`,
`get_author_resources`, and `get_work_resources` accept `language="greek"` or
`language="latin"` (and common codes such as `grc` or `lat`) as an actual work
language filter. `find_author_names` merges the legacy CTS inventory with the
Scaife library catalog, so Scaife-only authors such as Philo Judaeus remain
discoverable. Passage and navigation tools use CTS URNs, whose
`greekLit`/`latinLit` namespace and edition identifier already select the text.
Pass `author` to resolve a CTS author/textgroup name or URN. When it resolves
to exactly one textgroup, Scaife receives a server-side `text_group` filter;
ambiguous matches fall back to local CTS URN-prefix filtering of the current
result page.
Use `search_kind="lemma"` for lemma search; the default `search_kind="form"`
keeps existing form-search behavior. For Scaife operator queries such as
quoted phrases, `-`, `|`, `*`, or `~`, set `preserve_operators=True` so Beta
Code auto-detection does not consume operator characters. For example:
`search_perseus('"μῆνιν ἄειδε"', query_format="unicode", preserve_operators=True)`,
`search_perseus("μῆνιν -ἄειδε", query_format="unicode", preserve_operators=True)`,
or `search_perseus("λόγος | ἀνήρ", search_kind="lemma", query_format="unicode", preserve_operators=True)`.
Use `page_num` for pagination and pass `text_group` or `work` to use Scaife's
server-side scope filters. When `author` resolves to exactly one CTS textgroup,
`search_perseus` sends that textgroup to Scaife instead of filtering only the
returned page locally.

## Local Metadata Cache

Discovery and navigation tools cache stable CTS metadata locally to avoid
repeated multi-megabyte `GetCapabilities` and `GetValidReff` requests. The
default disk cache lives in `.cache/perseus-mcp` under the current working
directory and also uses an in-memory cache for the running server process.
Configure it with:

- `PERSEUS_MCP_CACHE_DIR` — override the disk cache directory.
- `PERSEUS_MCP_CACHE_TTL_SECONDS` — set cache TTL; default is 86400 seconds.
- `PERSEUS_MCP_DISABLE_CACHE=1` — disable both memory and disk cache reads/writes.

The current working directory is the directory from which the Python process is
started. Running the MCP server from the repository root uses
`.cache/perseus-mcp`; running a notebook from `examples/` would otherwise use
`examples/.cache/perseus-mcp`. That is not a second server instance, only a
second cache location for a separate Python process. To keep one cache location
across notebooks and MCP clients, set `PERSEUS_MCP_CACHE_DIR` to an absolute
path such as `/path/to/Perseus-mcp/.cache/perseus-mcp`.
Disk entries are written to unique sibling temporary files and atomically
replaced, so multiple local processes can safely share that directory without
exposing partially written cache files. Disk-cache write failures emit a
`MetadataCacheWarning` but do not discard a successfully fetched upstream
response.

## URN Discovery

Available edition URNs can differ between Perseus CTS and Scaife search results,
and the live inventory can change. Use `get_author_resources`,
`get_work_resources`, or `list_text_groups` before constructing
edition-specific CTS passage URNs. The notebooks select advertised CTS editions
from discovery results instead of assuming that a Scaife edition URN is valid
for Perseus CTS.

The live Perseus CTS implementation may return malformed HTML for
`GetFirstUrn` and `GetPrevNextUrn`. The MCP tools detect that response and
derive valid XML results from `GetValidReff`.

Perseus may also return `429 Too Many Requests` when a workflow sends many CTS
requests in a short period. Pause before retrying, reduce concurrency, and add
delays to passage-processing loops. The server currently exposes the upstream
HTTP error instead of retrying automatically.

## Setup

### 1) Install dependencies

Using `uv`:

```bash
uv sync
```

Or with `pip`:

```bash
pip install -e .
```

Once a release is published to PyPI, users can install it without cloning the
repository:

```bash
pip install perseus-mcp
```

For development and tests:

```bash
pip install -e ".[dev]"
```

### 2) Run tests

```bash
pytest
```

With `uv`, use:

```bash
uv run --extra dev pytest
```

### 3) Run locally

```bash
uv run perseus-mcp
```

The installed console command and module entry point are equivalent:

```bash
perseus-mcp
python -m perseus_mcp
```

### 4) Inspect tools (optional)

```bash
npx @modelcontextprotocol/inspector uv run perseus-mcp
```

## Test strategy and automation

Perseus MCP uses layered checks rather than relying on one end-to-end test.
Most behavior is covered by deterministic pytest tests with local XML/JSON
fixtures and mocked asynchronous HTTP calls. GitHub Actions separately verifies
the supported Python and operating-system matrix, package artifacts, secrets,
release tags, and publication.

The workflow files under `.github/workflows/` are the executable source of
truth.

### Test suite organization

Pytest is configured in `pyproject.toml` to import from `src/` and discover
tests under `tests/`.

| Test module | Main responsibility |
| --- | --- |
| `test_author_resources.py` | CTS author, work, and resource parsing; merged-author behavior |
| `test_disk_cache.py` | Atomic cache writes, cache disabling, cleanup, and concurrent writers |
| `test_exploration_tools.py` | Discovery, navigation, cache tools, author scope, and structured responses |
| `test_greek_query_normalization.py` | Unicode Greek, Beta Code, Scaife parameters, and search operators |
| `test_limits_and_language.py` | Result limits, paging bounds, and language aliases |
| `test_packaging.py` | Metadata, dependencies, documentation assets, notebooks, and workflow expectations |
| `test_scaife_urls.py` | Safe URL construction and CTS URN percent encoding |
| `test_shared_http_client.py` | Connection reuse, event-loop changes, shutdown, and HTTP errors |
| `test_xml_hardening.py` | Safe XML parsing and rejection of entity-based XML attacks |

A regression fix should include a focused test that fails for the original
problem. Tests should assert observable behavior and cover failure paths and
boundary values as well as successful calls.

### Isolation from Perseus and Scaife

Routine tests do not depend on live upstream services. HTTP helpers are
monkeypatched with asynchronous test doubles, while representative CTS XML and
Scaife JSON are stored in test fixtures. This keeps CI deterministic when
catalogs change or an upstream service is unavailable, avoids unnecessary
traffic to public scholarly infrastructure, and makes malformed-response tests
safe.

Live read-only probes may be used during manual review for endpoint
compatibility or connection-lifecycle changes, but they supplement rather than
replace the automated suite.

### Async test cleanup

Several tests invoke tools with `asyncio.run()`, which creates a new event loop
for each call. The server uses a process-wide shared `httpx.AsyncClient`, so the
autouse fixture in `tests/conftest.py` closes and resets that client after every
test. Tests that manipulate shared client state must also leave it reset.

### Local test commands

Run the complete suite:

```bash
python -m pytest
```

Run a module or one test:

```bash
python -m pytest tests/test_disk_cache.py
python -m pytest tests/test_disk_cache.py::test_disk_cache_set_writes_readable_content
```

Show skipped tests, the slowest tests, and local variables on failure:

```bash
python -m pytest -ra --durations=10 -l
```

Disable metadata-cache reads and writes during a test run:

```bash
PERSEUS_MCP_DISABLE_CACHE=1 python -m pytest
```

PowerShell equivalent:

```powershell
$env:PERSEUS_MCP_DISABLE_CACHE = "1"
python -m pytest
```

### GitHub Actions test matrix

`.github/workflows/tests.yml` installs the editable project with development
dependencies and runs `python -m pytest` on:

- Ubuntu and Windows;
- Python 3.11, 3.12, and 3.13.

The matrix uses `fail-fast: false`, so every platform/version job finishes even
when one fails. This makes version-specific and Windows-specific regressions
visible in one run. Documentation-only changes under `docs/**` are excluded
from the Python test workflow; exact event and branch filters remain defined in
the workflow file.

### Package validation

`.github/workflows/package.yml` checks that the repository produces a valid
source distribution and universal wheel. It installs Python 3.12, runs:

```bash
python -m build
python -m twine check dist/*
```

and uploads `dist/` as the `python-package` workflow artifact. The workflow is
path-filtered to package-relevant files and supports manual dispatch.

`tests/test_packaging.py` complements this build by checking repository-level
expectations such as metadata, dependencies, documentation files, notebook
JSON, and workflow configuration. Both layers matter: metadata tests can pass
while an isolated build fails, and a package can build while required
repository assets are missing.

### Secret scanning

`.github/workflows/secret-scan.yml` rejects tracked OpenRouter keys matching:

```text
sk-or-v1-[A-Za-z0-9_-]{20,}
```

The workflow reports affected files without printing the matching secret. A
detected key must be removed and rotated; the check should never be bypassed.
This focused scan does not replace normal credential hygiene: do not commit
`.env` files, tokens, private MCP configuration, or notebook outputs containing
credentials.

### Release and publication gates

`.github/workflows/release.yml` runs for `v*` tags or manual dispatch. For tag
runs it verifies that the tag equals `v<project.version>`, builds the wheel and
source archive, validates both with Twine, attaches them to a generated GitHub
release, and dispatches the PyPI workflow using the same tag.

`.github/workflows/publish.yml` requires a tag reference, repeats the
tag/version check, rebuilds and revalidates the artifacts, and publishes through
PyPI trusted publishing. The protected `pypi` GitHub environment uses OIDC
(`id-token: write`), so no long-lived PyPI API token is stored.

Rebuilding during publication avoids trusting an unrelated workflow artifact,
while the repeated tag check prevents publishing from a branch or mismatched
release tag.

### Documentation deployment

`.github/workflows/pages.yml` builds `docs/` with Jekyll and deploys the
generated artifact to GitHub Pages after documentation changes reach `main` or
`master`. This Pages site is intended primarily for end users; development and
test-strategy documentation lives in this repository README.

### Interpreting failures

- Failures on every matrix job usually indicate a general regression.
- A single Python-version failure suggests version-specific syntax,
  dependencies, or standard-library behavior.
- Windows-only failures commonly involve paths, permissions, read-only
  attributes, or event-loop lifecycle.
- A package failure with green pytest jobs usually concerns metadata,
  manifests, README rendering, or build isolation.
- A secret-scan failure requires credential removal and rotation.
- A release failure before publication commonly means the tag and
  `project.version` do not match.

Understand the failure before rerunning a job, and preserve useful workflow
logs or tracebacks in the pull request when the cause is not obvious.

Maintainer-level conventions for extending the suite are also kept beside the
tests in [`tests/testing.md`](tests/testing.md).


## Example notebooks

The `examples/` directory includes Jupyter notebooks that demonstrate both direct endpoint calls and MCP client usage with real Greek and Latin data:

- `examples/00_install_and_run_perseus_mcp.ipynb` — installation and launch guide covering PyPI, pip, uv, local repository development, MCP client configuration, verification, upgrades, and troubleshooting.
- `examples/01_basic_cts_workflow.ipynb` — minimal direct CTS requests.
- `examples/02_search_and_navigation.ipynb` — direct Scaife JSON search and CTS navigation from valid references.
- `examples/03_mcp_connection_homer_iliad.ipynb` — FastMCP client connection, Homer resource discovery, and *Iliad* Greek passage analysis.
- `examples/04_mcp_greek_search_and_navigation.ipynb` — MCP Greek search with Unicode/Beta Code, valid references, and passage navigation.
- `examples/05_mcp_all_tools.ipynb` — complete MCP tool catalog with descriptions and input schemas.
- `examples/06_openrouter_llm_mcp_interaction.ipynb` — optional OpenRouter LLM tool-calling loop over the local MCP tools, using OpenRouter's Free Models Router by default.
- `examples/07_mcp_advanced_search_options.ipynb` — MCP form/lemma search, Scaife operator queries, and author-scoped search examples.
- `examples/08_mcp_cache_and_search_tools.ipynb` — advanced demonstration of cache tools, paged references, scoped search, reader search, highlights, and Scaife metadata/text retrieval.
- `examples/09_openrouter_philo_politeia_analysis.ipynb` — OpenRouter-assisted, evidence-first analysis of `πολιτεία` in Philo of Alexandria using scoped MCP search results and cited passages.
- `examples/10_mcp_latin_augustine_workflow.ipynb` — Latin-language discovery, CTS navigation, passage retrieval, and a small text analysis using Augustine's *Epistulae* selections.

Run them after installing the project dependencies. The MCP notebooks use
FastMCP's in-process client transport and call the same tools exposed to
external MCP clients. The optional OpenRouter notebook also requires an
OpenRouter API key; the MCP server itself does not.
Notebook setup cells install notebook-only helpers such as `python-dotenv`
directly. Those helpers are not core runtime dependencies of `perseus-mcp`.

### Configure the OpenRouter API key

For `examples/06_openrouter_llm_mcp_interaction.ipynb` and
`examples/09_openrouter_philo_politeia_analysis.ipynb`, copy `.env.example` to
`.env` in the project root and replace the placeholder:

```dotenv
OPENROUTER_API_KEY=sk-or-v1-...
```

Get your API key at [openrouter.ai](https://openrouter.ai/settings/keys). See
[OpenRouter's API key documentation](https://openrouter.ai/docs/api-keys) for
authentication details.
The `.env` file is ignored by Git. You can also set `OPENROUTER_API_KEY` in your
environment or enter it securely when the notebook prompts.

Both OpenRouter notebooks default to `openrouter/free`. This router selects
among free models currently available on OpenRouter and filters for capabilities
required by the request, such as tool calling or structured output. It avoids
binding the examples to one free model that may later be removed or temporarily
unavailable. The tradeoff is reduced reproducibility: separate runs may use
different concrete models, so the notebooks record the resolved model returned
by OpenRouter. Set `OPENROUTER_MODEL` to a fixed model slug when exact model
selection matters.

Notebook `06_` can be saved and committed with its LLM and tool-call outputs so
they render on GitHub. Python variables and kernel memory are not stored in an
`.ipynb` file, and the notebook does not print the API key. Before committing a
credentialed run, review the visible outputs and scan for a full OpenRouter key:

```bash
rg "sk-or-v1-[A-Za-z0-9_-]{20,}" examples/06_openrouter_llm_mcp_interaction.ipynb
```

The command should produce no output. It does not match the documented
`sk-or-v1-...` placeholder.

## Using with any MCP-capable LLM client

This project does not require a specific LLM. Configure your client to launch the local MCP server with:

```bash
uv --directory /full/path/to/Perseus-mcp run perseus-mcp
```

Most MCP clients need the same pieces: server name `perseus`, command `uv`, args `--directory /full/path/to/Perseus-mcp run perseus-mcp`, and an empty environment unless you have local customizations. See `docs/enduser.md` for generic client guidance and `docs/architecture.md` for the architecture choices, including why FastMCP is used.

### Claude Desktop and Claude Code

The server runs with Claude over stdio, with no OpenRouter or API key required (OpenRouter is only needed for the optional demo client).

**Claude Desktop** — add to `claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "perseus": {
      "command": "uv",
      "args": ["--directory", "/full/path/to/Perseus-mcp", "run", "perseus-mcp"]
    }
  }
}
```

Restart Claude Desktop; the Perseus tools appear in the tools list.

**Claude Code** — one line:

```bash
claude mcp add perseus -- uv --directory /full/path/to/Perseus-mcp run perseus-mcp
```

Verified against a stdio MCP handshake: all 23 tools register and live calls return (tested with `search_perseus` and `list_text_groups`).

## Build a PyPI distribution

Install the development dependencies, then build and validate both distribution
formats:

```bash
python -m pip install -e ".[dev]"
python -m build
python -m twine check dist/*
```

The build creates a wheel and source archive under `dist/`. Test the wheel in a
clean virtual environment before publishing. Upload to TestPyPI first:

```bash
python -m twine upload --repository testpypi dist/*
```

After verifying installation from TestPyPI, upload the same artifacts to PyPI:

```bash
python -m twine upload dist/*
```

PyPI does not allow replacing an existing release. Update `project.version` in
`pyproject.toml`, rebuild from a clean `dist/` directory, and publish each
version only once. The package build workflow also builds and checks artifacts
in CI without publishing them.

### Automated GitHub release and PyPI publishing

The release automation follows the same trusted-publishing pattern as
MorphKit:

1. Set the release version in `pyproject.toml`, for example `1.2.3`.
2. Merge the version change to the commit that should be released.
3. Create and push the matching tag, for example `v1.2.3`.
4. The `Build release artifacts` workflow verifies the tag/version match,
   builds and validates both distributions, and attaches them to a generated
   GitHub release.
5. That workflow dispatches `Publish to PyPI`, which rebuilds and validates the
   package before publishing through PyPI trusted publishing.

Configure the repository once before the first automated upload:

- Create a GitHub Actions environment named `pypi`.
- In the existing PyPI project settings, or as a pending publisher before the
  first upload, add a trusted publisher for owner `tonyjurg`, repository
  `Perseus-mcp`, workflow `publish.yml`, and environment `pypi`.
- Do not add a PyPI API token; the workflow uses GitHub OIDC with
  `id-token: write`.

The workflows reject a tag such as `v1.2.4` when `project.version` is still
`1.2.3`. PyPI versions are immutable, so increment the version before retrying
a release that was already uploaded.

## Contributing and reporting issues

Bug reports, documentation fixes, focused feature requests, and pull requests
are welcome. Please report problems through the GitHub issue tracker and include
the command, Python version, MCP client, tool arguments, traceback, and any
relevant CTS URN or Greek search query when possible.

See `docs/contributing.md` for contribution guidance.

## Responsible disclosure

This project was created with assistance from OpenAI Codex. The human
maintainer remains responsible for reviewing, testing, and accepting all code
and documentation changes.

## License

This project is released under the MIT License. See `LICENSE` for details.
