Metadata-Version: 2.4
Name: openclaw-upgrade-orchestrator-mcp
Version: 1.0.1
Summary: MCP server for safe AI agent runtime upgrades — version-aware regression catalog, pre-upgrade snapshot capture, post-upgrade verification via diff against baseline, rollback guide. Read-only advisor: never executes the upgrade itself; operator retains full agency. Ships with OpenClaw regression catalog; bring your own catalog for other runtimes.
Project-URL: Homepage, https://github.com/temurkhan13/openclaw-upgrade-orchestrator-mcp
Project-URL: Documentation, https://github.com/temurkhan13/openclaw-upgrade-orchestrator-mcp/blob/main/SPEC.md
Project-URL: Bug Tracker, https://github.com/temurkhan13/openclaw-upgrade-orchestrator-mcp/issues
Project-URL: Custom MCP Build, https://github.com/temurkhan13/openclaw-upgrade-orchestrator-mcp#need-this-adapted-to-your-stack
Project-URL: Changelog, https://github.com/temurkhan13/openclaw-upgrade-orchestrator-mcp/blob/main/CHANGELOG.md
Author-email: Temur Khan <temur@pixelette.tech>
License: MIT License
        
        Copyright (c) 2026 Temur Khan
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: agent-ops,ai-agent,claude,deployment-safety,mcp,model-context-protocol,openclaw,production-ai,regression-detection,rollback,upgrade-management,version-management
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: System :: Installation/Setup
Classifier: Topic :: System :: Systems Administration
Requires-Python: >=3.11
Requires-Dist: mcp>=1.0.0
Requires-Dist: pydantic>=2.0.0
Provides-Extra: dev
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.5; extra == 'dev'
Description-Content-Type: text/markdown

# openclaw-upgrade-orchestrator-mcp

<!-- mcp-name: io.github.temurkhan13/openclaw-upgrade-orchestrator-mcp -->

> **MCP server for safe AI agent runtime upgrades** — version-aware regression catalog, pre/post snapshot diffing, step-by-step upgrade + rollback guides. Captures deployment state before upgrade, re-runs detection checks after, surfaces `new_failures` (caused by the upgrade) separately from `unchanged_failures` (pre-existing) and `recovered` (fixed by the upgrade). **Read-only by design** — never executes the upgrade itself; the operator retains full agency. v1.0 ships with the OpenClaw regression catalog (8 entries grounded in real field reports); the same machinery accepts a custom catalog for any AI runtime via Custom MCP Build adapters. Keywords: AI runtime upgrade, regression detection, safe deployment, version-specific bug catalog, AI agent ops.

[![Status: v1.0.0](https://img.shields.io/badge/status-v1.0.0-brightgreen)](https://github.com/temurkhan13/openclaw-upgrade-orchestrator-mcp) [![License: MIT](https://img.shields.io/badge/license-MIT-blue)](./LICENSE) [![MCP](https://img.shields.io/badge/protocol-MCP-purple)](https://modelcontextprotocol.io/) [![PyPI](https://img.shields.io/pypi/v/openclaw-upgrade-orchestrator-mcp)](https://pypi.org/project/openclaw-upgrade-orchestrator-mcp/)

---

## What it does

Production AI runtime upgrades — OpenClaw, Claude Code, agent harnesses, runtime servers — carry recurring regressions that you only find *after* upgrading. Recent examples:

- **The "claw tax."** Anthropic announced standard Claude subscriptions would no longer cover usage through external "claw" harnesses like OpenClaw — forcing those workloads onto metered API billing — without a clean upgrade path. [r/AI_Agents thread on the OpenClaw creator suspension (201 pts, April 12 2026)](https://old.reddit.com/r/AI_Agents/comments/1sjqxat/anthropic_suspended_the_openclaw_creators_claude/) walks through the structural conflict.
- **OpenClaw 2026.4.8** brought a CPU-spike bug. **2026.4.23-26** broke Discord `on_message`. **2026.4.30+** surfaced OOM under sustained 200k-token contexts. The pattern: upgrade on Friday, hit a new failure mode on Tuesday, spend Wednesday-Thursday excavating release notes + field reports.
- **Claude Code itself** has documented compaction failures at 48% context usage ([GitHub Issue #23751](https://github.com/anthropics/claude-code/issues/23751)) and the "1M context window does not work as marketed" complaint ([GitHub Issue #35296](https://github.com/anthropics/claude-code/issues/35296)) — recurring upgrade regressions in the underlying runtime, not just OpenClaw.
- **HERMES.md detection** (the [billing routing bug](https://old.reddit.com/r/ClaudeAI/comments/1svdm1w/psa_the_string_hermesmd_in_your_git_commit/)) is itself an upgrade regression — Anthropic added detection logic that fires false positives, and operators only discover it when they're already $200 deep in unintended API charges.

This MCP server moves the regression excavation **upfront — before the upgrade, not after** — and verifies the post-upgrade state by diffing against a snapshot you took beforehand. **Read-only by design**: never executes the upgrade itself; the operator retains full agency.

```
> claude: should I upgrade my 2026.4.23 deployment?
[MCP tools: current_version + available_upgrades]

Current: 2026.4.23
Recommended target: 2026.5.2 (no CRITICAL regressions in path)

Available upgrades:
  2026.4.24-.26  HIGH    R-73421 Discord-receive breakage
  2026.4.27       —      clean
  2026.4.30       HIGH    R-OOM-DURING-LARGE-CONTEXT (unfixed)
  2026.5.1-.2     HIGH    R-OOM + R-LOG-ROTATION-DROP (unfixed)
```

```
> claude: walk me through upgrading to 2026.4.27.
[MCP tool: upgrade_guide]

2026.4.23 → 2026.4.27 — proceed with mitigations applied.

Applicable known regressions:
  R-41372 (HIGH)  — Cron --session web-search silent fail.
                    Mitigation: silentwatch-mcp covers detection until upgrade.
  R-73421 (HIGH)  — Discord-receive callbacks not firing.
                    Mitigation: `openclaw skill reload discord` after upgrade.

Pre-upgrade steps:
  1. Capture pre-upgrade snapshot (call pre_upgrade_snapshot)
  2. Verify backups: cp -r ~/.openclaw ~/.openclaw.backup-$(date +%Y%m%d)

Upgrade steps:
  1. openclaw gateway stop
  2. openclaw upgrade --to 2026.4.27
  3. openclaw gateway start

Post-upgrade steps:
  1. Run post_upgrade_verify(snapshot_id=<your-pre-upgrade-id>)
  2. openclaw skill reload discord  (R-73421 mitigation)

Rollback steps: stop → openclaw upgrade --to 2026.4.23 → restore backup → start.

Confidence: Path includes 2 HIGH regressions but no CRITICAL.
```

```
> claude: I just upgraded. Verify it.
[MCP tool: post_upgrade_verify(pre_snapshot_id="snap-...")]

Upgrade 2026.4.23 → 2026.4.27: SUCCESS.
0 new failures, 1 recovered (skills.discord_receive_registered),
0 unchanged failures.
```

---

## Why `openclaw-upgrade-orchestrator-mcp`

Three things existing tools (vendor changelogs, internal runbooks, generic CI/CD orchestrators) don't do:

1. **Catalog-grounded regression awareness.** A generic upgrade tool tells you the version exists. This server tells you which versions have known issues, which fix versions remediate them, and which mitigations apply if you have to use the affected version.

2. **Pre/post snapshot diffing tied to the catalog.** The same checks run before + after the upgrade. The diff highlights `new_failures` (caused by the upgrade) separately from `unchanged_failures` (pre-existing) and `recovered` (fixed by the upgrade). No more "did this break in 2026.4.27 or was it already broken?"

3. **Read-only by design.** Never runs `openclaw upgrade --to ...` for you. Never modifies state. Operators retain full agency over the actual upgrade — this server gives them the information to make the decision, then verifies it after they execute.

Built for the **production-AI operator** who owns OpenClaw deployments and has been through enough upgrade-day fire drills.

---

## Tool surface

| Tool | What it returns |
|------|-----------------|
| `current_version` | Currently-installed version + detection method |
| `available_upgrades` | Newer versions with regression-count flags + recommended target |
| `pre_upgrade_snapshot` | Captures every check's pass/fail state, persists with snapshot_id |
| `upgrade_guide` | Step-by-step plan: pre / upgrade / post / rollback steps + applicable regressions + confidence note |
| `post_upgrade_verify` | Diff post-upgrade against a stored pre-upgrade snapshot — new_failures / recovered / unchanged |
| `rollback_guide` | Recovery plan for a given snapshot — downgrade command + state-restore steps + risk note |
| `regression_catalog` | Full known-regression catalog, optionally filtered to one version |
| `list_snapshots` | All stored snapshots (id + version + summary) |

Resources:
- `upgrade://current` — current version info
- `upgrade://snapshots` — every stored snapshot
- `upgrade://catalog` — full regression catalog

Prompts:
- `plan-upgrade(target_version)` — walks through the upgrade decision
- `verify-upgrade(pre_snapshot_id)` — walks through post-upgrade verification

---

## Quickstart

### Install

```bash
pip install openclaw-upgrade-orchestrator-mcp
```

### Configure for Claude Desktop

```json
{
  "mcpServers": {
    "openclaw-upgrade": {
      "command": "python",
      "args": ["-m", "openclaw_upgrade_orchestrator_mcp"],
      "env": {
        "OPENCLAW_UPGRADE_BACKEND": "mock"
      }
    }
  }
}
```

### Backends

| Backend | Status | Description |
|---------|--------|-------------|
| `mock` | ✅ v1.0 | 2026.4.23 deployment with active R-73421 Discord-receive breakage; in-memory snapshots; suitable for protocol verification + bundle demos |
| `openclaw-system` | ✅ v1.0 | Reads `~/.openclaw/version` + `~/.openclaw/gateway.yaml`; persists snapshots as JSON in `~/.openclaw/upgrades/snapshots/`. Override via `OPENCLAW_VERSION_FILE`, `OPENCLAW_GATEWAY_CONFIG`, `OPENCLAW_UPGRADE_SNAPSHOT_DIR` |

### Regression catalog (v1.0)

8 hand-curated entries covering documented OpenClaw regressions:

- `R-41372-CRON-WEB-SEARCH-SILENT-FAIL` (HIGH, 2026.4.20–2026.5.1)
- `R-63002-POST-UPGRADE-CPU-SPIKE` (CRITICAL, 2026.4.8–2026.4.10)
- `R-73421-DISCORD-RECEIVE-BREAKAGE` (HIGH, 2026.4.23–2026.4.27)
- `R-GATEWAY-PORT-CONFLICT-2026.4.15` (MEDIUM, 2026.4.15–2026.4.18)
- `R-OOM-DURING-LARGE-CONTEXT-2026.4.30` (HIGH, 2026.4.30–unfixed)
- `R-STATUS-RECONCILIATION-DRIFT-2026.4.5` (LOW, 2026.4.5–2026.4.10)
- `R-CLAWHUB-CACHE-POISONING-2026.3.28` (HIGH, 2026.3.28–2026.4.2)
- `R-LOG-ROTATION-DROP-2026.5.1` (MEDIUM, 2026.5.1–unfixed)

Use `regression_catalog` for the full, queryable list.

---

## Risk-aware recommendation logic

`available_upgrades` flags every version reachable from current and computes a `recommended_target`:

```
For each available version V > current:
  applicable_regressions = regressions_in_path(current, V)
  has_known_critical = any(r.severity == CRITICAL for r in applicable_regressions)

recommended_target = highest V with has_known_critical == False
```

`regressions_in_path(current, target)` includes a regression if:
- The target version is in the regression's range (post-upgrade deployment will be affected), OR
- The current version is in the regression's range (current deployment is already affected — the operator should know whether the upgrade fixes it)

OpenClaw upgrades atomically (no execution on intermediate versions), so a regression strictly between current and target without affecting either endpoint is NOT included. This avoids over-conservative recommendations.

---

## Roadmap

| Version | Scope | Status |
|---------|-------|--------|
| v1.0 | mock + openclaw-system backends, 8 tools / 3 resources / 2 prompts, 8-entry regression catalog, 6 detection checks, GitHub Actions CI matrix, PyPI Trusted Publishing | ✅ |
| v1.1 | Catalog auto-fetch from upstream changelog feed; richer detection checks tied to OpenClaw's `/healthz` endpoint; multi-step upgrade pathing | ⏳ |
| v1.2 | Custom catalog packs (operator can ship internal-only regression entries alongside the canonical catalog); rule-overrides | ⏳ |
| v1.x | Webhook emit on detected regression; integration with CI to gate merges of OpenClaw-version bumps | ⏳ |

---

## Need this adapted to your stack?

If your AI deployment uses a different runtime (custom agent harness, internal fork of OpenClaw, vendor-locked deployment) and you want the same regression-aware upgrade discipline, that's a **Custom MCP Build** engagement.

| Tier | Scope | Investment | Timeline |
|------|-------|------------|----------|
| Simple | Single backend adapter for your existing version-source | **$8,000–$12,000** | 1–2 weeks |
| Standard | Custom backend + custom regression catalog (initial 10-15 entries from your incident history) + integration with your alerting | **$15,000–$25,000** | 2–4 weeks |
| Complex | Multi-deployment fleet view + auto-catalog ingestion from internal changelog + per-environment recommendation tuning | **$30,000–$45,000** | 4–8 weeks |

**To engage:**
1. Email **temur@pixelette.tech** with subject `Custom MCP Build inquiry — upgrade orchestration`
2. Include: 1-paragraph description of your runtime + which tier
3. Reply within 2 business days with a 30-min discovery call slot

This server is part of a **production-AI infrastructure MCP suite** — companion to [silentwatch-mcp](https://github.com/temurkhan13/silentwatch-mcp), [openclaw-health-mcp](https://github.com/temurkhan13/openclaw-health-mcp), [openclaw-cost-tracker-mcp](https://github.com/temurkhan13/openclaw-cost-tracker-mcp), and [openclaw-skill-vetter-mcp](https://github.com/temurkhan13/openclaw-skill-vetter-mcp). Install all five for full operational visibility.

---

## Production AI audits

If you're running production AI and want an outside practitioner to score readiness, find the failure patterns already present (upgrade regression cycles being one of the most damaging), and write the corrective-action plan:

| Tier | Scope | Investment | Timeline |
|------|-------|------------|----------|
| Audit Lite | One system, top-5 findings, written report | **$1,500** | 1 week |
| Audit Standard | Full audit, all 14 patterns, 5 Cs findings, 90-day follow-up | **$3,000** | 2–3 weeks |
| Audit + Workshop | Standard audit + 2-day team workshop + first monthly audit included | **$7,500** | 3–4 weeks |

Same email channel: **temur@pixelette.tech** with subject `AI audit inquiry`.

---

## Contributing

PRs welcome. Detection checks are pluggable — see `src/openclaw_upgrade_orchestrator_mcp/checks/__init__.py` for the contract.

To add a check:

1. Define `def run(state: DeploymentState) -> CheckResult` in the checks module
2. Register it in `CHECKS: dict[str, callable]`
3. Reference its `check_id` from a regression's `detection_check_id` in `catalog.py`
4. Add tests in `tests/test_checks.py`

To add a backend:

1. Subclass `UpgradeBackend` in `backends/<your_backend>.py`
2. Implement `collect_state`, `save_snapshot`, `load_snapshot`, `list_snapshots`
3. Register in `backends/__init__.py`
4. Add tests in `tests/test_backends.py`

To add a regression entry:

1. Append to `CATALOG` in `catalog.py` with stable `regression_id`
2. Reference an existing or new `detection_check_id` (or set to None for advisory-only)
3. Add a test confirming version-range membership in `tests/test_catalog.py`

Bug reports + feature requests: open a GitHub issue.

---

## License

MIT — see [LICENSE](./LICENSE).

---

## Related

- [Production-AI MCP Suite (Gumroad bundle)](https://temurah.gumroad.com/l/production-ai-mcp-suite) — this server plus 4 others in one curated bundle with a decision tree, day-one drill, and Custom MCP Build CTA. $99, or $49 with `LAUNCH50` for the first 30 days.
- [silentwatch-mcp](https://github.com/temurkhan13/silentwatch-mcp) — cron silent-failure detection
- [openclaw-health-mcp](https://github.com/temurkhan13/openclaw-health-mcp) — deployment health
- [openclaw-cost-tracker-mcp](https://github.com/temurkhan13/openclaw-cost-tracker-mcp) — token-cost telemetry
- [openclaw-skill-vetter-mcp](https://github.com/temurkhan13/openclaw-skill-vetter-mcp) — skill security vetting
- [AI Production Discipline Framework](https://temurah.gumroad.com/l/ai-production-discipline-framework) — Notion template, $29 — methodology these MCPs implement
- [SPEC.md](./SPEC.md) — full server design

---

Built by [Temur Khan](https://www.notion.so/@temurkhan) — independent practitioner on production AI systems.
Contact: **temur@pixelette.tech**
