Metadata-Version: 2.4
Name: ptm-mcp
Version: 0.14.0
Summary: MCP stdio server for the Prompt Test Manager API
Author: PTM Team
License: Proprietary
Project-URL: Homepage, https://github.com/15five/prompt-test-manager
Project-URL: Repository, https://github.com/15five/prompt-test-manager
Project-URL: Changelog, https://github.com/15five/prompt-test-manager/blob/main/packages/ptm-mcp/CHANGELOG.md
Project-URL: Issues, https://github.com/15five/prompt-test-manager/issues
Classifier: Development Status :: 3 - Alpha
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Intended Audience :: Developers
Classifier: Typing :: Typed
Requires-Python: >=3.12
Description-Content-Type: text/markdown
Requires-Dist: mcp<2.0,>=1.2
Requires-Dist: ptm-client<1.0,>=0.15.0
Requires-Dist: pydantic<3.0,>=2.8
Requires-Dist: packaging<26.0,>=24.0
Provides-Extra: dev
Requires-Dist: pytest<9.0,>=8.2; extra == "dev"
Requires-Dist: pytest-asyncio<1.0,>=0.23; extra == "dev"
Requires-Dist: pytest-cov<7.0,>=5.0; extra == "dev"
Requires-Dist: responses<1.0,>=0.25; extra == "dev"
Requires-Dist: ruff<1.0,>=0.6; extra == "dev"

# ptm-mcp

MCP (Model Context Protocol) stdio server for the Prompt Test Manager API.

Lets agents built on top of MCP-capable clients (Claude Desktop, Claude Code, Codex, etc.) call PTM as first-class tools: list prompts, run evaluations, update prompt content, submit optimizations. All traffic is tagged with `X-PTM-Client: ptm-mcp/<version>` + a per-process `X-PTM-MCP-Session` UUID so the PTM backend can rate-limit, budget, and audit agent traffic separately from humans and service accounts.

## Prereqs

- Python >= 3.12.
- A reachable PTM backend (>= 1.9.0) and a personal access token or service-account token with the scopes your flow needs.

## Install

```
pip install ptm-mcp
```

Or zero-install via `uvx`:

```
uvx ptm-mcp
```

`ptm-mcp` pulls in `ptm-client` and the `mcp` SDK automatically.

## Configure your MCP client

### Claude Desktop

Config file:

| OS | Path |
|---|---|
| macOS | `~/Library/Application Support/Claude/claude_desktop_config.json` |
| Windows | `%APPDATA%\Claude\claude_desktop_config.json` |

Merge into `mcpServers` (create the key if it doesn't exist):

```json
{
  "mcpServers": {
    "ptm": {
      "command": "uvx",
      "args": ["ptm-mcp"],
      "env": {
        "PTM_API_BASE_URL": "https://ptm.example.com",
        "PTM_API_TOKEN": "ptm_u_PASTE_HERE",
        "PTM_MCP_READ_ONLY": "true"
      }
    }
  }
}
```

Fully quit + reopen Claude Desktop after editing. stderr lands in `~/Library/Logs/Claude/mcp-server-ptm.log` (macOS) or `%APPDATA%\Claude\logs\mcp-server-ptm.log` (Windows).

### Claude Code

```bash
# macOS / Linux
claude mcp add --transport stdio --scope user ptm \
  --env PTM_API_BASE_URL=https://ptm.example.com \
  --env PTM_API_TOKEN=ptm_u_PASTE_HERE \
  --env PTM_MCP_READ_ONLY=true \
  -- uvx ptm-mcp

# Windows (PowerShell / cmd) - needs cmd /c wrapper
claude mcp add --transport stdio --scope user ptm `
  --env PTM_API_BASE_URL=https://ptm.example.com `
  --env PTM_API_TOKEN=ptm_u_PASTE_HERE `
  --env PTM_MCP_READ_ONLY=true `
  -- cmd /c uvx ptm-mcp
```

### Codex

`~/.codex/config.toml` (macOS / Linux) or `%USERPROFILE%\.codex\config.toml` (Windows):

```toml
[mcp_servers.ptm]
command = "uvx"
args = ["ptm-mcp"]
env = { PTM_API_BASE_URL = "https://ptm.example.com", PTM_API_TOKEN = "ptm_u_PASTE_HERE", PTM_MCP_READ_ONLY = "true" }
startup_timeout_sec = 10
tool_timeout_sec = 60
```

Or CLI-first: `codex mcp add ptm -- uvx ptm-mcp` (with `--env KEY=VAL` per var).

## Environment variables

Consumed at startup. Missing required values fail fast with a descriptive error.

| Variable | Required | Default | Notes |
|---|---|---|---|
| `PTM_API_BASE_URL` | yes | - | e.g. `https://ptm.example.com` |
| `PTM_API_TOKEN` | yes | - | PTM bearer. Service-account tokens preferred for long-running agent sessions. |
| `PTM_MCP_READ_ONLY` | no | `true` | Flip to `false` to unlock write tools. |
| `PTM_MCP_TIMEOUT_SECONDS` | no | `30` | Per-request timeout (1..600). |
| `PTM_MCP_LOG_LEVEL` | no | `INFO` | `DEBUG` / `INFO` / `WARNING` / `ERROR` / `CRITICAL`. |
| `PTM_SSL_VERIFY` | no | `true` | Set to `false` to disable TLS certificate verification (allows self-signed / invalid certs against a homelab PTM). Local dev only - never disable in production. |
| `CF_ACCESS_CLIENT_ID` | no | - | Cloudflare Access service-token Client ID. Paired with `CF_ACCESS_CLIENT_SECRET`. |
| `CF_ACCESS_CLIENT_SECRET` | no | - | Cloudflare Access service-token Client Secret. Paired with `CF_ACCESS_CLIENT_ID`. |
| `CF_ACCESS_JWT` | no | - | Cloudflare Access user JWT (alternative to the service-token pair). |
| `PTM_CF_AUTO_DISCOVER` | no | `true` | Falsy value opts out of auto-discovery via `cloudflared`. |

Startup scrubs every env var outside a narrow allow-list (cloud creds, GitHub tokens, etc. get dropped).

## Cloudflare Access

If your PTM deployment sits behind Cloudflare Access:

- **Default path** (recommended): install `cloudflared` (`brew install cloudflared` or equivalent) and run `cloudflared access login https://your-ptm-host` once. ptm-mcp auto-detects CF challenges and injects the cached JWT on request.
- **Service token** (CI / headless): ask an admin to mint a service token for the PTM app in Cloudflare Zero Trust -> Access -> Service Auth. Set `CF_ACCESS_CLIENT_ID` + `CF_ACCESS_CLIENT_SECRET` in the MCP env block. Explicit config disables auto-discovery.
- **Direct access** (no CF Access): skip this section; no CF env vars needed.

On a Cloudflare Access block, ptm-mcp surfaces a `CloudflareAccessError` with the exact next step rather than a raw JSON decode crash.

## Tool inventory

53 tools total (24 read + 29 write) + 5 resource URI patterns. Write tools are gated by `PTM_MCP_READ_ONLY=false`; the backend enforces per-prompt ownership, group membership, admin role, and the `manage_tool_registry` / `manage_skill_catalog` permissions on top of the PAT's scopes. ptm-mcp relays backend errors verbatim with actionable hints; it does not re-implement permission checks.

### Read (24)

`list_providers`, `list_prompts`, `get_prompt`, `get_prompt_tests`, `list_prompt_versions`, `get_prompt_version`, `compare_prompt_versions`, `list_runs`, `get_run`, `get_run_report`, `get_optimization_status`, `get_optimization_history`, `get_optimization_detail`, `list_skills`, `get_skill`, `search_skills`, `load_prompt`, `load_skill`, `list_groups`, `list_tools`, `get_tool_dependents`, `list_triage`, `get_skill_contents`, `get_prompt_contents`.

**`get_skill_contents`** - Fetch skill bundle in memory (no disk write). Returns parsed prompt.md + manifest + files. **No runs row, no LLM call.** Increments inline-usage counter.

**`get_prompt_contents`** - Same for a library prompt: prompt_text, version, tests, deepeval_metrics, tool_definitions. Active version by default; pass `version=N` for a specific version.

### Write (29, gated by `PTM_MCP_READ_ONLY`)

**Eval / optimization (4):** `run_manual_eval` (QA eval with custom prompt + test cases), `run_prompt_eval` (QA eval against stored test suite), `submit_optimization`, `cancel_optimization`.

`submit_optimization` accepts variance-aware fields: `stability_samples`, `validation_samples`, `flakiness_threshold`, `min_consistent_improvement`, `variance_aware_mutator`, `variance_signal`, `enforce_target_score`, `chained_baseline_mode`. Omit any to inherit admin defaults.

**Ad-hoc execution (2):** `run_prompt` (run a library prompt against your inputs through the full Promptfoo + DeepEval pipeline - real runs row), `run_skill` (same, resolved by skill ID or qualified name). Both accept an `amend` flag: `false` (default) replaces matching vars per test; `true` only fills missing vars.

**Library mutation (9):** `create_prompt`, `update_prompt`, `activate_prompt_version`, `share_prompt`, `unshare_prompt`, `add_prompt_to_group`, `remove_prompt_from_group`, `transfer_prompt_ownership`, `update_prompt_sampling`.

**Skill Library (6):** `install_skill`, `publish_skill`, `update_skill`, `deprecate_skill`, `run_skill`, `run_prompt`.

**Tool Registry (5):** `create_tool`, `update_tool`, `deprecate_tool`, `delete_mock_profile`, `export_tool_registry`.

**Triage (4):** `promote_run_to_golden`, `resolve_triage_item`, `reopen_triage_item`, `bulk_resolve_triage`.

**Runs (1):** `submit_run_feedback`.

### Permissions model

| Action | Required |
|---|---|
| Read tools | Valid PAT (results filtered by visibility scope) |
| `update_prompt`, `activate_prompt_version` | Prompt owner OR prompt-overwrite role |
| `share_prompt`, group tools | Admin OR group-manager role |
| `transfer_prompt_ownership` | Admin ONLY |
| Eval + run tools | `run_evaluations` scope |
| `get_skill_contents`, `get_prompt_contents` | `install_skill` scope |

Backend error codes surface as actionable tool errors:

- **403** -> "Permission denied. Check prompt ownership or required role."
- **404** -> "Not found. Verify prompt_id / skill_id and PAT visibility scope."
- **409** -> "Conflict. Repository-backed prompt (edit source files) OR concurrent writer. Re-fetch and retry."
- **422** -> "Validation failed. Check prompt_text length (max 500k), test / metric shapes, field types."

### Resources (5 URI patterns)

- `ptm://prompts/{prompt_id}` - active version's `prompt_text` (text/plain)
- `ptm://prompts/{prompt_id}/v{N}` - that version's `prompt_text` (text/plain)
- `ptm://runs/{run_key}/report.md` - markdown report (text/markdown)
- `ptm://runs/{run_key}/report.html` - HTML report (text/html)
- `ptm://optimizations/{optimization_id}/report.md` - markdown summary (text/markdown)

Dynamic segments are allow-list validated (`^[a-zA-Z0-9_.-]+$` plus explicit `.`/`..` rejection).

## Security defaults

- `PTM_MCP_READ_ONLY=true` blocks every write tool at call time.
- `X-PTM-Client` + `X-PTM-MCP-Session` on every outbound request so the backend can classify and audit agent traffic.
- Env scrub at startup drops anything outside the allow-list.
- Startup preflight (`/healthz` + `/auth/me` + `/meta`) with exponential backoff on transient failures and dedicated exit codes per failed layer.

## Exit codes

| Code | Meaning |
|---|---|
| `0` | clean shutdown |
| `1` | unhandled exception |
| `2` | `/healthz` unreachable after 31s of backoff |
| `3` | `/auth/me` rejected the token |
| `4` | backend version < 1.9.0 or unparseable |
| `130` | interrupted (SIGINT) |

## Status

0.12.0. 53 tools: full PTM surface (prompts, evals, optimization, triage, runs, skills, tool registry, ad-hoc execution, inline-contents fetch) + 5 resource URI patterns, read-only gate, Cloudflare Access auto-discovery. See `CHANGELOG.md` for release notes.
