# Agents Shipgate · Long-Form Agent Reference (llms-full.txt)

> Single-fetch concatenation of the canonical agent-facing reference
> material. AI search engines and coding agents that prefer one document
> over chasing links should fetch this file. The short index is at
> [`llms.txt`](llms.txt); machine-readable triggers are at
> [`docs/triggers.json`](docs/triggers.json).
>
> Generated by `scripts/build-llms-full.py` from the source files below.
> Do not edit by hand — re-run the script to update.

## Sources (in order)

- [`AGENTS.md`](AGENTS.md)
- [`docs/agent-recipes.md`](docs/agent-recipes.md)
- [`docs/agent-contract-current.md`](docs/agent-contract-current.md)
- [`docs/checks.md`](docs/checks.md)
- [`docs/concepts.md`](docs/concepts.md)
- [`docs/autofix-policy.md`](docs/autofix-policy.md)

---


<!-- ===== source: AGENTS.md ===== -->

# Agents Shipgate · Agent Instructions

Authoritative instructions for AI coding agents (Claude Code, Codex, Cursor, Aider) working **with** this repository or a project that uses Agents Shipgate.

> If you are a human, the README and the [wiki](https://github.com/ThreeMoonsLab/agents-shipgate/wiki) are the right places to start. This file is optimized for agent ingest: short, copy-pasteable, machine-friendly.

---

## What this project is

The deterministic merge gate for AI-generated agent capability changes. Reads `shipgate.yaml` plus tool sources (MCP exports, OpenAPI specs, OpenAI Agents SDK Python files, Anthropic Messages API tool/prompt artifacts, Google ADK Python/config files, LangChain/LangGraph Python files, CrewAI Python files, OpenAI API artifacts, Codex plugin packages and marketplaces, n8n workflow JSON/stubs) and produces deterministic findings. Local-first and static by default — no agent execution, tool calls, LLM calls, or network access.

- **Inputs:** MCP · OpenAPI · OpenAI Agents SDK · Anthropic Messages API · Google ADK · LangChain/LangGraph · CrewAI · OpenAI API · Codex plugin · n8n
- **Outputs:** Markdown · JSON · SARIF
- **Trust:** Static-by-default. No agent execution, tool calls, LLM calls, or network access.
- **Marketing site:** [threemoonslab.com](https://threemoonslab.com/) — canonical brand URL with human-readable companion pages: [/quickstart/](https://threemoonslab.com/quickstart/), [/glossary/](https://threemoonslab.com/glossary/), [/checks/](https://threemoonslab.com/checks/), [/design-partners/](https://threemoonslab.com/design-partners/). The site also serves a [/.well-known/agents-shipgate.json](https://threemoonslab.com/.well-known/agents-shipgate.json) discovery file **pinned to the latest released tag** for external consumers and AI search. **If you are an agent working inside this repo, use the in-tree [`.well-known/agents-shipgate.json`](.well-known/agents-shipgate.json) (current `main` contract, may be ahead of the released file) for schema-version and gating-signal decisions.**

---

## Naming (canonical)

Use exactly one form depending on context. Mixing them in user-visible copy is an adoption cost.

| Form | When to use |
|---|---|
| **Agents Shipgate** | Display name. Prose, headings, marketing copy, social cards, slide titles, blog posts. |
| **`agents-shipgate`** | Package, CLI binary, repo, GitHub Action, PyPI distribution name, env-var prefix (`AGENTS_SHIPGATE_*`), import path (`agents_shipgate`). Always lowercase, kebab-case. |
| **`shipgate`** | Short alias for the CLI binary only. Acceptable in shell snippets where brevity helps; never as the project name. |

Do **not** use any of: `Agent Shipgate` (singular), `Agent Shipcheck`, `agents shipgate` (display lowercase), `Agents-Shipgate` (display kebab). When in doubt: prose → `Agents Shipgate`; code → `agents-shipgate`.

The canonical tagline is:

> The deterministic merge gate for AI-generated agent capability changes.

This single sentence is the source of truth for the GitHub repo description, [README.md](README.md), the [wiki Home page](https://github.com/ThreeMoonsLab/agents-shipgate/wiki/Home), and the [marketing site](https://threemoonslab.com/) `<meta name="description">`. Keep them in sync; the website's `.well-known` discovery file is pinned to the latest released tag and refreshes at each release.

Use **Tool-Use Readiness** in Title Case when naming the product/category or
the **Tool-Use Readiness Report** artifact. Use **tool-use readiness** in
sentence case when describing the general concept.

---

## Install (canonical)

```bash
pipx install agents-shipgate
```

Alternatives if `pipx` is unavailable:

```bash
python -m pip install agents-shipgate                   # global pip
uv tool install agents-shipgate                          # via uv
python -m agents_shipgate --help                         # run from a pip install without PATH
```

The CLI binary is `agents-shipgate`. A short alias `shipgate` is also installed.

---

## Run (canonical)

**First-time setup** — in a repo that contains an agent and its tools:

```bash
agents-shipgate init --workspace . --write
agents-shipgate scan -c shipgate.yaml
```

Reports land at `agents-shipgate-reports/report.{md,json}`.

**Before reporting an agent-capability change complete** — once `shipgate.yaml`
exists, run the deterministic verifier on the diff and read its JSON first:

```bash
AGENTS_SHIPGATE_AGENT_MODE=1 agents-shipgate verify \
  --workspace . --config shipgate.yaml \
  --ci-mode advisory --format json
```

Omit `--base`/`--head` for local pre-commit work so uncommitted edits are
scanned; add `--base origin/main --head HEAD` only for a committed PR/CI ref
after making the base ref available. Read
`agents-shipgate-reports/verifier.json` first and lead with `merge_verdict`
(`mergeable | human_review_required | insufficient_evidence | blocked |
unknown`), `can_merge_without_human`, `first_next_action`, `fix_task`, and
`capability_review.top_changes[]`. Then read
`agents-shipgate-reports/report.json.release_decision.decision`
(`blocked | review_required | insufficient_evidence | passed`), which remains
the release gate. Do not report completion while `merge_verdict` is `blocked`,
`insufficient_evidence`, or `human_review_required` unless the user explicitly
accepts human review.

Do not bypass the verifier by suppressing findings, lowering severity,
expanding baselines or waivers, removing Shipgate CI, or weakening agent
instructions. Verify-mode `SHIP-VERIFY-*` checks make those trust-root edits
release-visible and route them to human review.

To reproduce the verify-native blocked refund PR demo without writing YAML:

```bash
agents-shipgate fixture run ai_generated_refund_pr
```

To verify your install on the older static scan fixture:

```bash
agents-shipgate fixture run support_refund_agent
```

---

## First-adoption helper flow (v0.6+)

For coding agents adopting Shipgate end-to-end in one turn:

```bash
agents-shipgate detect --json
agents-shipgate init --write --ci --json
agents-shipgate scan -c shipgate.yaml --suggest-patches --format json
agents-shipgate apply-patches --from agents-shipgate-reports/report.json \
    --confidence high --apply
```

Or chain all four in one call:

```bash
agents-shipgate bootstrap --json
```

`bootstrap` runs `detect → init --write --ci → scan --suggest-patches → apply-patches --confidence high` against the current workspace, stopping on the first non-recoverable error and emitting a structured per-step summary. Use it for first-time adoption; for ongoing CI keep using the GitHub Action. Flags: `--workspace`, `--confidence`, `--no-ci`, `--no-apply`, `--json`.

- **`detect`** — read-only; classifies the workspace. `is_agent_project: false`
  means stop early.
- **`init`** — auto-detects by default. `--ci` writes
  `.github/workflows/agents-shipgate.yml`; orthogonal to `--write`. Use
  `--minimal` for the pre-v0.6 CHANGE_ME-heavy template.
  `--agent-instructions=all` (or a comma-separated subset of
  `agents-md,codex-skill,claude-code-skill,claude-md,cursor,pr-template`)
  renders agent-facing snippets to stdout; combined with `--write` it commits
  them to the target repo via managed `<!-- agents-shipgate:start -->` markers
  (idempotent for managed-block hosts; full-file and skill-bundle targets use
  safe-update checks). The `codex-skill` and `claude-code-skill` targets write
  multi-file skill bundles under `.agents/skills/agents-shipgate/` and
  `.claude/skills/agents-shipgate/` respectively. Strict CI and baselines
  remain opt-in human decisions; the flag emits advisory guidance only.
- **`scan --suggest-patches`** — attaches Patch objects to every active
  finding. `Finding.patches` is absent without the flag.
- **`apply-patches`** — file-grouped, dry-run by default. Containment-
  checked against `report.manifest_dir`. v0.6 default `--confidence high`
  applies only manifest stale-removals; scope-coverage appends require
  `--confidence medium`. Trace approval/confirmation findings are
  always `ManualPatch` — never auto-applied (flipping the trace patches
  the evidence, not the agent's runtime gate).

---

## Agent mode

Every command supports JSON output for programmatic consumption:

```bash
agents-shipgate detect --workspace . --json
agents-shipgate init --workspace . --write --json
agents-shipgate scan -c shipgate.yaml                    # already produces report.json
agents-shipgate apply-patches --from agents-shipgate-reports/report.json --json
agents-shipgate doctor --json
agents-shipgate contract --json
agents-shipgate explain SHIP-POLICY-APPROVAL-MISSING --json
agents-shipgate list-checks --json
agents-shipgate self-check --json
agents-shipgate fixture list --json
```

Errors carry a structured `next_action` (single string, back-compat) and `next_actions` (ranked list) when run with `AGENTS_SHIPGATE_AGENT_MODE=1`:

```bash
$ AGENTS_SHIPGATE_AGENT_MODE=1 agents-shipgate scan -c missing.yaml
Config error: Config file not found: missing.yaml
{"error": "config_error", "message": "...", "next_action": "agents-shipgate detect --workspace . --json", "next_actions": [{"kind": "command", "command": "agents-shipgate detect --workspace . --json", "why": "..."}, {"kind": "command", "command": "agents-shipgate init --workspace . --write", "why": "..."}]}
```

The full set of error kinds emitted in agent mode: `config_error`, `config_already_exists`, `input_parse_error`, `unknown_check_id`, `unknown_fingerprint`, `other_error`, `internal_error`, `malformed_patch`. `unknown_fingerprint` is emitted by `explain-finding` when the fingerprint doesn't match any entry in the supplied report; the payload includes `suggestion` (a close-match fingerprint, when one exists) and `source_report`.

The machine-readable catalog of error kinds — exit codes, typical causes, additional fields per kind, recovery hints — lives at [`docs/errors.json`](docs/errors.json). Pre-fetch it once and pattern-match the `error` field instead of re-deriving the recovery vocabulary from this prose.

`detect --json` and each `doctor --json` payload also carry `diagnostics: [...]` and `next_actions: [...]` fields. `next_action` (single string) remains the rank-1 action projected to a string; `next_actions` is the ranked list with `kind`, `command|path`, `why`, and `expects` fields. See [docs/diagnostics.md](docs/diagnostics.md) for the full catalog and schema.

### Doctor behavior change for unresolved tool_sources

When a required `tool_sources[].path` does not resolve under the manifest directory (file missing OR resolves outside the manifest dir):

- `agents-shipgate doctor --json` exits **0** with a `SHIP-DIAG-MISSING-SOURCE-FILE` diagnostic and an `unresolved_sources: [{id, declared_path, line, reason}]` field in the payload, so an agent can route to a fix without parsing the error message. `reason` is `"missing"` or `"outside_manifest_dir"`.
- `agents-shipgate doctor` (no `--json`) prints the same `unresolved_sources` + diagnostic block in human-readable form and **exits 3**, preserving the pre-feature loud failure for interactive users.
- `agents-shipgate scan` is unchanged — it still raises `InputParseError(3)` regardless of `--json`. Once you're past doctor, missing sources are real scan failures.

### Missing vs invalid manifests

`config_error` covers two distinct shapes — agent-mode emits a different rank-1 action for each:

- **Missing**: file does not exist → `SHIP-DIAG-MISSING-MANIFEST`, rank-1 is `agents-shipgate detect --workspace <dir> --json` (then `init --write`).
- **Invalid**: file exists but the loader rejected it (invalid YAML, schema validation, unsupported version) → `SHIP-DIAG-INVALID-MANIFEST`, rank-1 is `edit <path>` with the loader error in `why`. Do **not** re-run `init` — it refuses to overwrite an existing file.

---

## When to use this tool

Use it when the user is:

- Writing or modifying an AI agent that calls tools (MCP servers, OpenAPI APIs, `@function_tool`-decorated Python, Google ADK, LangChain/LangGraph, CrewAI, n8n)
- Publishing or modifying a Codex plugin package or marketplace entry that declares skills, apps, MCP servers, or hooks
- Adding a new tool, scope, or policy to an existing agent
- Preparing an agent for production promotion
- Wiring a release gate into PR CI

Do NOT use it for:

- General code linting (ruff, eslint, etc.)
- Runtime monitoring (use observability/tracing)
- Behavior testing (use evals)
- Stopping a tool call in flight (use a gateway or guardrail)

### Should I run Shipgate on this PR?

| Trigger in this PR | Run Shipgate? |
|---|---|
| Adds/changes MCP exports, OpenAPI specs, or `tools/*openai*tools*.json` | Yes |
| Adds/changes Codex plugin manifests, marketplace files, `.app.json`, `.mcp.json`, or `SKILL.md` files | Yes |
| Adds/changes `@function_tool`/`@tool` decorators (LangChain, CrewAI, OpenAI Agents SDK) | Yes |
| Adds/changes n8n workflow JSON, credential stubs, or n8n tool inventories | Yes |
| Edits `prompts/`, `policies/`, or `permissions.scopes` in `shipgate.yaml` | Yes |
| Adds/edits `.github/workflows/agents-shipgate.yml` or related CI | Yes |
| Pure read-only doc/test changes with no manifest impact | Skip |
| Refactor with no behavior change to tools or policies | Skip (or dry-run only) |

Two implicit triggers also fire even when no row above matches:

- **Repo already opted in (shipgate.yaml present in the workspace)** — run on every PR; the manifest's existence is the opt-in.
- **(Optional) Refactor or framework upgrade that may shift the extracted tool surface** — dry-run only; bumping `openai-agents`, `langchain`, `crewai`, or `google-adk` can change AST extraction even without app-code edits.

A machine-readable mirror of these triggers lives at [`docs/triggers.json`](docs/triggers.json). Coding agents that have not yet adopted Shipgate can fetch the file (raw URL: `https://raw.githubusercontent.com/ThreeMoonsLab/agents-shipgate/main/docs/triggers.json`), apply the rules to a PR diff, and decide whether to propose `agents-shipgate detect`. The catalog is stable for `0.x` and pinned by the public-surface contract test against this prose table — if you change a row above, update `triggers.json` in the same commit. To evaluate a diff locally, use the first-class `trigger` subcommand:

```bash
# From a list of changed paths (and optional diff body for diff_contains rules):
agents-shipgate trigger --changed-files changed.txt --diff pr.diff --json
# Or straight from git (the ONLY mode that shells out to git):
agents-shipgate trigger --base origin/main --head HEAD --json
agents-shipgate trigger --list-rules --json
```

The command emits a stable JSON verdict: `should_run` (alias of `run_shipgate`), `force_run`, `dry_run_recommended`, `skip_reason`, `matched_rules`, `changed_files`, and `diff_tokens`. The developer entry point `python -m agents_shipgate.triggers shipgate.yaml prompts/refund.md` is preserved.

**Stop conditions.** Stop and do not run `init` only when **all** of these hold:

- `agents-shipgate detect --json` returns `is_agent_project: false`, AND
- `suggested_sources` is empty (no MCP/OpenAPI hits flowing in as `mcp` or `openapi`), AND
- `codex_plugin_candidates` is empty (no Codex plugin package or marketplace hits), AND
- no `shipgate.yaml` already exists in the workspace, AND
- the user did not explicitly request a scan.

Otherwise proceed to `init`. MCP/OpenAPI tool-surface repos and Codex plugin package repos register as `is_agent_project: false` because they have no Python framework imports — but they are valid Shipgate targets. MCP/OpenAPI hits surface as `suggested_sources`; Codex plugin hits surface as `codex_plugin_candidates`. The trigger table above is the authoritative go/no-go.

---

## Five common agent tasks

### Task 1 · Add the gate to an existing repo

```bash
pipx install agents-shipgate
agents-shipgate init --workspace . --write
# edit shipgate.yaml to replace any CHANGE_ME values
agents-shipgate scan -c shipgate.yaml
```

`init` writes a manifest with `CHANGE_ME` placeholders for `agent.name` and `agent.declared_purpose`. Replace them by reading the agent's prompt or main file.

### Task 2 · Read findings programmatically

Always parse `agents-shipgate-reports/report.json`, not the markdown.

The canonical field list — `release_decision`, `capability_facts` / `declared_intentions` / `misalignments` / `release_consequence` / `suggested_scenarios`, `tool_surface_facts` / `tool_surface_diff`, and `action_surface_facts` / `action_surface_diff` — lives in [`docs/agent-contract-current.md`](docs/agent-contract-current.md#read-these-first-for-release-gating). It updates first when the contract bumps; this file links to it instead of restating the field set.

Other stable top-level fields:

- `summary.{critical_count, high_count, medium_count, status}` (status preserved for v0.7 compat — see note below)
- `findings[].{id, fingerprint, check_id, severity, tool_name, evidence, recommendation, suppressed}`
- `findings[].{autofix_safe, requires_human_review, suggested_patch_kind, docs_url}` (v0.7+)
- `findings[].patches[]` (v0.6+, only when scan ran with `--suggest-patches`)
- `baseline.{matched_count, new_count, resolved_count}`
- `tool_inventory[]`
- `codex_plugin_surface` (v0.13+, static Codex plugin package/marketplace facts)
- `findings[].provenance_kind` (v0.15+, per-finding rule provenance — `static_declaration | ast_extraction | keyword_heuristic | regex_heuristic | policy_pack`; independent of `confidence`, useful for reviewer filtering via `agents-shipgate findings`; never a release-gate input)
- `findings[].blocks_release` (v0.16+, explicit release-policy blockers from Action Surface Diff policies)
- `action_surface_facts` / `action_surface_diff` (v0.16+, deterministic action snapshot and base/head action delta)
- `release_decision.contribution_rules[]` (v0.17+, per-finding audit of how each finding contributed to the decision; one row per `report.findings` entry, with `category` ∈ `{blocker, review_item, excluded}` and `rule` ∈ `{policy_block_new, severity_block_new, policy_baseline_accepted, severity_baseline_accepted, review_required, sub_threshold, suppressed}`)
- `policy_audit.severity_overrides_applied[]` (v0.17+, top-of-report audit envelope listing every manifest-driven severity override with `{check_id, default_severity, applied_severity, manifest_path, reason, tier_crossed, direction, expires}`)
- `privacy_audit` (v0.18+, top-level audit proving default redaction ran before public artifacts were written; `redacted_paths[]` contains counts and structural paths only, never raw values or raw hashes)
- `heuristics_filter` (v0.21+, top-level audit envelope describing the `--no-heuristics` CLI filter pass; `enabled` is `False` and counts are zero when the flag is unset, so the field shape is stable across runs. When enabled, findings whose `provenance_kind` is `keyword_heuristic` or `regex_heuristic` are marked `suppressed=True` with `suppression_reason="filtered by --no-heuristics"` before the release decision is built — they remain in `findings[]` for transparency but do not gate release.)

The full schema is at [`docs/report-schema.v0.22.json`](docs/report-schema.v0.22.json) (current; emitted reports carry `report_schema_version: "0.22"`, adding the verifier-cycle top-level blocks `capability_change`, `protected_surface_changes`, `effective_policy`, `human_ack`, and `verifier_summary` — reviewer-facing projections that never gate independently — alongside v0.21's `heuristics_filter` audit envelope). v0.21 (frozen at [`docs/report-schema.v0.21.json`](docs/report-schema.v0.21.json)) added the `heuristics_filter` envelope on top of v0.20's `reviewer_summary` deterministic projection of reviewer-lens surfaces and audit envelopes. v0.19 added `Finding.policy_evidence_source` and `ReleaseDecisionItem.{source, policy_evidence_source}` for reviewer-grade dual-source provenance on top of v0.18's `privacy_audit`, v0.17's `policy_audit`, and `release_decision.contribution_rules[]` audit fields. What's-stable is documented in [STABILITY.md](STABILITY.md).

**Release gating signal**: prefer `release_decision.decision` (`"blocked" | "review_required" | "insufficient_evidence" | "passed"`) over `summary.status`. The new field is **baseline-aware** — a baseline-matched critical surfaces in `release_decision.review_items` (accepted debt), not `release_decision.blockers`. `summary.status` stays baseline-blind for v0.7 compatibility, so a baseline-matched-only critical produces both `summary.status = "release_blockers_detected"` AND `release_decision.decision = "review_required"` (intentional divergence — see [STABILITY.md](STABILITY.md#release_decisiondecision-vs-summarystatus)). `insufficient_evidence` (added v0.14) signals that the scan saw too many low-confidence tools or source-loader warnings to be trustworthy; consumers that switch on the enum must fall back to `review_required` for unknown future values.

For a step-by-step reader's primer with anti-patterns and concrete code rewrites, see [`docs/report-reading-for-agents.md`](docs/report-reading-for-agents.md).

### Task 3 · Suppress a finding with a reason

```yaml
# shipgate.yaml
checks:
  ignore:
    - check_id: SHIP-DOC-MISSING-DESCRIPTION
      tool: legacy_search
      reason: tool deprecated 2026-Q2
```

`reason` is required and non-empty; the manifest fails validation otherwise.

### Task 4 · Save a baseline before enabling strict CI

```bash
agents-shipgate baseline save -c shipgate.yaml --out .agents-shipgate/baseline.json
```

Then in CI:

```bash
agents-shipgate scan -c shipgate.yaml \
  --baseline .agents-shipgate/baseline.json \
  --ci-mode strict --fail-on critical,high
```

Strict mode fails CI only on **new** findings (those not in the baseline).

### Task 5 · Explain a check or a specific finding

For static catalog metadata about a check ID (rationale, fires-when, recommendation):

```bash
agents-shipgate explain SHIP-POLICY-APPROVAL-MISSING --json
```

Returns the full `CheckMetadata` with `id`, `category`, `default_severity`, `description`, `rationale`, `fires_when`, `evidence_fields`, `recommendation`.

For a contextual explanation tied to a specific finding from a real scan (catalog metadata + the finding's evidence + a 3–5 sentence templated prose summary):

```bash
agents-shipgate explain-finding fp_<fingerprint> \
    --from agents-shipgate-reports/report.json --json
```

Returns the canonical Finding fields plus `metadata` (CheckMetadata for the check_id) and `explanation` — a deterministic prose summary suitable for direct quotation in a PR comment or chat reply. The companion prompt is [`prompts/explain-finding-to-user.md`](prompts/explain-finding-to-user.md).

---

## Agent FAQ

### Where is the manifest schema?

Use [`docs/manifest-v0.1.json`](docs/manifest-v0.1.json) for machine
validation and [`docs/manifest-v0.1.md`](docs/manifest-v0.1.md) for prose.

### Where is the report schema?

Parse `agents-shipgate-reports/report.json` and validate against
[`docs/report-schema.v0.22.json`](docs/report-schema.v0.22.json) (current).
Older reports (`report_schema_version: "0.10"`) validate against the
frozen [`docs/report-schema.v0.10.json`](docs/report-schema.v0.10.json).
Do not scrape Markdown when JSON is available.

### How do I add a new check?

Follow [`docs/architecture.md`](docs/architecture.md) and update the check
registry, tests, [`docs/checks.md`](docs/checks.md), and
[`docs/checks.json`](docs/checks.json). Check IDs must not change after
publication.

### How do I add a new framework adapter?

Start with [`docs/framework-adapter-checklist.md`](docs/framework-adapter-checklist.md).
Adapters must be static by default: no user-code import, no network access, no
agent execution.

### Where are runnable examples?

Use [`samples/README.md`](samples/README.md) for sample agents and
[`docs/examples.md`](docs/examples.md) for a narrative overview. The fastest
fixture is `agents-shipgate fixture run support_refund_agent`.

### What vocabulary should I use in user-facing copy?

Use the [canonical names](#canonical-names) table above and the website
glossary: https://threemoonslab.com/glossary/.

---

## Schemas

For the short, current statement of "which fields to read", see [`docs/agent-contract-current.md`](docs/agent-contract-current.md). It is the single file that updates first when the contract bumps; the table below lists the underlying schemas.

| What | Path | Stable |
|---|---|---|
| Manifest schema | [`docs/manifest-v0.1.json`](docs/manifest-v0.1.json) | `0.1` |
| Report schema (current) | [`docs/report-schema.v0.22.json`](docs/report-schema.v0.22.json) | `0.22` |
| Report schema (v0.21 frozen reference) | [`docs/report-schema.v0.21.json`](docs/report-schema.v0.21.json) | `0.21` |
| Report schema (v0.20 frozen reference) | [`docs/report-schema.v0.20.json`](docs/report-schema.v0.20.json) | `0.20` |
| Report schema (v0.19 frozen reference) | [`docs/report-schema.v0.19.json`](docs/report-schema.v0.19.json) | `0.19` |
| Report schema (v0.18 frozen reference) | [`docs/report-schema.v0.18.json`](docs/report-schema.v0.18.json) | `0.18` |
| Report schema (v0.17 frozen reference) | [`docs/report-schema.v0.17.json`](docs/report-schema.v0.17.json) | `0.17` |
| Report schema (v0.16 frozen reference) | [`docs/report-schema.v0.16.json`](docs/report-schema.v0.16.json) | `0.16` |
| Report schema (v0.15 frozen reference) | [`docs/report-schema.v0.15.json`](docs/report-schema.v0.15.json) | `0.15` |
| Report schema (v0.14 frozen reference) | [`docs/report-schema.v0.14.json`](docs/report-schema.v0.14.json) | `0.14` |
| Report schema (v0.13 frozen reference) | [`docs/report-schema.v0.13.json`](docs/report-schema.v0.13.json) | `0.13` |
| Report schema (v0.12 frozen reference) | [`docs/report-schema.v0.12.json`](docs/report-schema.v0.12.json) | `0.12` |
| Report schema (v0.11 frozen reference) | [`docs/report-schema.v0.11.json`](docs/report-schema.v0.11.json) | `0.11` |
| Report schema (v0.10 frozen reference) | [`docs/report-schema.v0.10.json`](docs/report-schema.v0.10.json) | `0.10` |
| Report schema (v0.9 frozen reference) | [`docs/report-schema.v0.9.json`](docs/report-schema.v0.9.json) | `0.9` |
| Report schema (v0.8 frozen reference) | [`docs/report-schema.v0.8.json`](docs/report-schema.v0.8.json) | `0.8` |
| Report schema (v0.7 frozen reference) | [`docs/report-schema.v0.7.json`](docs/report-schema.v0.7.json) | `0.7` |
| Report schema (v0.6 frozen reference) | [`docs/report-schema.v0.6.json`](docs/report-schema.v0.6.json) | `0.6` |
| Packet schema (Release Evidence Packet, latest) | [`docs/packet-schema.v0.6.json`](docs/packet-schema.v0.6.json) | `0.6` |
| Packet schema (v0.5 frozen reference) | [`docs/packet-schema.v0.5.json`](docs/packet-schema.v0.5.json) | `0.5` |
| Check catalog | [`docs/checks.json`](docs/checks.json) | regenerated each release |
| Anti-patterns (what NOT to write) | [`samples/_anti_patterns/`](samples/_anti_patterns/) | reference |
| Minimal manifest example | [`docs/manifest-v0.1.example.minimal.yaml`](docs/manifest-v0.1.example.minimal.yaml) | reference |

For VS Code / Cursor live YAML validation, every manifest produced by `init` includes:

```yaml
# yaml-language-server: $schema=https://raw.githubusercontent.com/ThreeMoonsLab/agents-shipgate/main/docs/manifest-v0.1.json
```

---

## Stable command surface

Promised to not break in `0.x` minor versions. See [STABILITY.md](STABILITY.md) for the full contract.

| Command | Stable flags |
|---|---|
| `agents-shipgate scan` | `-c`, `--out`, `--format`, `--ci-mode`, `--fail-on`, `--baseline`, `--diff-from`, `--changed-files`, `--no-plugins`, `--no-heuristics`, `--verbose`, `--packet`/`--no-packet`, `--packet-format` |
| `agents-shipgate evidence-packet` | `--from`, `--out`, `--format`, `--json` |
| `agents-shipgate init` | `--workspace`, `--write`, `--json` |
| `agents-shipgate doctor` | `-c`, `--workspace`, `--json`, `--verbose` |
| `agents-shipgate contract` | `--json` |
| `agents-shipgate explain` | `<check_id>`, `--no-plugins`, `--json` |
| `agents-shipgate explain-finding` | `<fingerprint>`, `--from`, `--no-plugins`, `--json` |
| `agents-shipgate findings` | `--from`, `--provenance-kind`, `--include-suppressed`, `--json` |
| `agents-shipgate trigger` | `--workspace`, `--changed-files`, `--diff`, `--base`, `--head`, `--manifest-present`/`--no-manifest-present`, `--user-requested`, `--list-rules`, `--json` |
| `agents-shipgate bootstrap` | `--workspace`, `--confidence`, `--no-ci`, `--no-apply`, `--json` |
| `agents-shipgate list-checks` | `--json`, `--no-plugins` |
| `agents-shipgate baseline save` | `-c`, `--out` |
| `agents-shipgate fixture` | `list`, `run`, `copy`, `verify` |
| `agents-shipgate self-check` | `--json` |

### Release Evidence Packet (v0.6)

`scan` emits a reviewer-shaped Release Evidence Packet alongside `report.{md,json}` by default. The packet is a curated synthesis with fixed reviewer sections plus a compact evidence matrix derived from public `report.json`, and (v0.6) carries the same `Finding.source` / `Finding.policy_evidence_source` dual-source provenance pointers on `ReleaseDecisionItem` so packet §1 / §2 cite the originating evidence inline; outputs land at `agents-shipgate-reports/packet.{md,json,html}` (and `packet.pdf` when the optional `[pdf]` extras are installed). For the field-level packet contract, see [`docs/agent-contract-current.md`](docs/agent-contract-current.md#read-these-for-release-review) and [STABILITY.md §Release Evidence Packet](STABILITY.md#release-evidence-packet-v06).

```bash
pipx install agents-shipgate                  # md, json, html packet outputs
pipx install 'agents-shipgate[pdf]'           # adds packet.pdf via weasyprint
agents-shipgate scan -c shipgate.yaml         # default: emit packet
agents-shipgate scan -c shipgate.yaml --no-packet                    # skip
agents-shipgate scan -c shipgate.yaml --packet-format md,json,html,pdf
# Re-render from the existing packet (full fidelity):
agents-shipgate evidence-packet --from agents-shipgate-reports/packet.json --format html,pdf
# Or rebuild from a CI-archived report.json (degraded — see §10 of the output):
agents-shipgate evidence-packet --from agents-shipgate-reports/report.json --format md,html
```

Rules of the packet contract (do not break in 0.x):
- The packet is **derived from JSON** (the in-memory scan) and is a **local artifact only** — no hosted/SaaS view.
- §10 ("What this packet did NOT prove") **always** lists the four canonical disclaimers verbatim — prompt robustness, runtime behavior, model correctness, adversarial resistance — regardless of run state.
- All reviewer sections are **always present** in `packet.json`, including `evidence_matrix`, `tool_surface_diff`, and `action_surface_diff`. Sections that have no evidence render with `status: "not_declared"` (or `"informational"`) and refer the reviewer to §10.
- §1A (`evidence_matrix`) is a compact packet-only review aid. It never contributes to `release_decision`, CI exit behavior, severity, suppression, baseline matching, or `agent_summary`; blocker/review-item cells are copied from `release_decision`.
- §8 (`human_in_the_loop`) always carries `runtime_control_disclaimer`. When local validation artifacts are available, `source_provenance[]` traces approval traces, override logs, high-risk exclusions, promotion criteria, and manifest requirements.
- §1 verdict (`PASSED` / `REVIEW REQUIRED` / `INSUFFICIENT EVIDENCE` / `BLOCKED`) derives from `release_decision.decision` only (with `INSUFFICIENT EVIDENCE` mirroring the v0.14 `insufficient_evidence` decision value). CI behavior (`fail_policy`) is rendered separately as metadata, not as the verdict source.
- The current manifest schema does **not** model `agent.memory`. §7 always renders "not declared, see §10" until a future schema bump adds the field.

Exit codes (stable):

| Code | Meaning |
|---|---|
| `0` | Pass (advisory or strict-no-blockers) |
| `2` | Manifest config error |
| `3` | Input parse error (file missing, malformed, path traversal blocked, file too large) |
| `4` | Other Agents Shipgate error |
| `20` | Strict-mode gate failure |

---

## What you can't do (intentionally)

This section is the **CLI's** invariants. For the **agent's** behavioral boundary — what an agent driving Shipgate may assert in PR comments and review summaries — see [`docs/agent-autofix-boundary.md`](docs/agent-autofix-boundary.md).

- The CLI does not modify user code; it only reads.
- The CLI does not connect to MCP servers; it reads exported JSON only.
- Tool sources outside the manifest directory are rejected (path traversal containment).
- Files larger than 10 MB are rejected.
- Plugins are off by default (`AGENTS_SHIPGATE_ENABLE_PLUGINS=1` to enable; `--no-plugins` to force off).

---

## When you make changes to this repo

- Run `python -m ruff check .` and `python -m pytest` before committing.
- Bumping a check's behavior requires updating the test suite and any golden fixtures under `samples/*/expected/`.
- New checks must include: code in `src/agents_shipgate/checks/<category>.py` plus a `BUILTIN_CHECKS` entry in `checks/registry.py`, metadata in `docs/checks/<category>.yaml` (loaded into `CHECK_METADATA` at registry import time by `agents_shipgate.checks._metadata_loader`), a test in `tests/`, and a row in `docs/checks.md`. After editing YAML, regenerate `docs/checks.json` with `python scripts/generate_schemas.py`.
- Do not change check IDs in published versions; always add new ones.
- If you regenerate the JSON schemas, run `python scripts/generate_schemas.py` and commit `docs/manifest-v0.1.json` + `docs/checks.json`.

---

## Reusable prompts

Prebuilt prompts for common workflows live in [`prompts/`](prompts/):

- [`decide-shipgate-relevance.md`](prompts/decide-shipgate-relevance.md) — apply [`docs/triggers.json`](docs/triggers.json) to decide whether Shipgate should run at all
- [`add-shipgate-to-repo.md`](prompts/add-shipgate-to-repo.md) — bootstrap a repo
- [`fix-top-finding.md`](prompts/fix-top-finding.md) — iterate on a single finding
- [`recommend-fixes.md`](prompts/recommend-fixes.md) — walk all active findings and surface targeted fix recommendations across the four autofix-policy classes
- [`explain-finding-to-user.md`](prompts/explain-finding-to-user.md) — translate one finding into 3–5 sentences of user-facing prose; companion to `agents-shipgate explain-finding`
- [`stabilize-strict-mode.md`](prompts/stabilize-strict-mode.md) — tune → baseline → promote
- [`triage-false-positive.md`](prompts/triage-false-positive.md) — override vs suppress decision
- [`upgrade-shipgate-version.md`](prompts/upgrade-shipgate-version.md) — bump agents-shipgate version safely (regenerate baseline if needed)

For downstream repos, use [`docs/target-repo-agent-snippets.md`](docs/target-repo-agent-snippets.md)
to copy Shipgate trigger rules into `AGENTS.md`, `CLAUDE.md`, Cursor rules,
PR templates, and advisory CI. Use
[`docs/agent-adoption-harness.md`](docs/agent-adoption-harness.md) to evaluate
whether coding agents discover and use Shipgate without being prompted by name.

### Editor / agent integrations

Per-agent install guides for dropping Shipgate into your own agent project:

- [`docs/agents/use-with-claude-code.md`](docs/agents/use-with-claude-code.md) — install the `/shipgate` slash command and `agents-shipgate` auto-discoverable skill. Source surfaces ship at [`.claude/commands/shipgate.md`](.claude/commands/shipgate.md) and [`skills/agents-shipgate/`](skills/agents-shipgate/) (named `agents-shipgate` to avoid colliding with the slash command — Claude Code lets a same-named skill preempt a command). The skill bundles the recipes in [`skills/agents-shipgate/prompts/`](skills/agents-shipgate/prompts/) and a starter advisory CI workflow at [`skills/agents-shipgate/ci-recipes/advisory-pr-comment.yml`](skills/agents-shipgate/ci-recipes/advisory-pr-comment.yml); when you change anything in [`prompts/`](prompts/) or `examples/github-actions/01-advisory-pr-comment.yml`, sync the bundled copy.
- [`docs/agents/use-with-codex.md`](docs/agents/use-with-codex.md) — install the canonical `AGENTS.md` snippet plus repo-scoped Codex skill. Source surfaces ship at [`.agents/skills/agents-shipgate/`](.agents/skills/agents-shipgate/) and are generated into downstream repos with `agents-shipgate init --write --agent-instructions=agents-md,codex-skill` (or `all`). The skill is Codex-optimized: concise `SKILL.md`, on-demand references, and an advisory CI template.
- [`docs/agents/use-with-cursor.md`](docs/agents/use-with-cursor.md) — drop the canonical `.cursor/rules/agents-shipgate.mdc` auto-attach rule (from [`docs/target-repo-agent-snippets.md`](docs/target-repo-agent-snippets.md)) into your repo. The rule fires whenever a chat touches `shipgate.yaml`, an MCP/OpenAPI spec, a tool JSON, or a `.py` file.

---

## Verification

After you (the agent) complete a task involving Agents Shipgate, verify:

1. `agents-shipgate self-check --json` returns `"ready": true`.
2. `agents-shipgate contract --json` matches the installed CLI contract you expect.
3. The user's `shipgate.yaml` has no `CHANGE_ME` placeholders.
4. A scan completes with exit code 0 (advisory mode) and writes `report.json`.
5. The user's repo `.gitignore` includes `agents-shipgate-reports/` (do not commit reports).


<!-- ===== source: docs/agent-recipes.md ===== -->

# Agent recipes

Copy-pasteable workflows for AI coding agents (Claude Code, Codex, Cursor,
Aider) that need to drive `agents-shipgate` end-to-end without prompting
the user. Every command is read-only or schema-validated;
static-by-default, with audited exceptions pinned in
[`tests/test_adapter_static_only.py::ALLOWED_EXCEPTIONS`](../tests/test_adapter_static_only.py).

> If you are a human, [`quickstart.md`](quickstart.md) is the friendlier
> entry point. This page is structured for agents that consume `--json`.

---

## Recipe 0 · Verify an agent-related PR

Use this before claiming completion on a PR or local diff that changes tools,
MCP/OpenAPI surfaces, prompts, permissions, policies, release gates, or
`shipgate.yaml`.

```bash
agents-shipgate verify --preview --json
agents-shipgate verify --workspace . --config shipgate.yaml \
  --ci-mode advisory --format json
```

For committed PR/CI refs, add `--base origin/main --head HEAD` after making the
base ref available. Read `agents-shipgate-reports/verifier.json` first and lead
with `merge_verdict`, then inspect `capability_review.top_changes[]`,
`first_next_action.actor`, and `fix_task.safe_to_attempt`. Then read
`report.json.release_decision.decision`, which remains the release gate.

Do not claim completion when `merge_verdict` is `blocked`,
`insufficient_evidence`, or `human_review_required` unless the user explicitly
accepts human review.

## Recipe 1 · First adoption helper

Use this when a repo doesn't yet have `shipgate.yaml`. Four calls in one user
turn take it from "looks like an agent project" to "Shipgate is integrated,
scan green or with safe trivial findings auto-applied, CI workflow optionally
drafted." This is an adoption helper; ongoing PR work should use Recipe 0.

```bash
agents-shipgate detect --json
agents-shipgate init --write --ci --json
agents-shipgate scan -c shipgate.yaml --suggest-patches --format json
agents-shipgate apply-patches \
    --from agents-shipgate-reports/report.json \
    --confidence high --apply
```

### Step 1 · `detect --json` (read-only)

Consume the response to decide whether to proceed. Key fields:

- `is_agent_project` — `true` when at least one Python framework
  scored ≥ 2.0 with a strong signal.
- `frameworks[]` — per-framework scores + evidence + candidate file
  paths.
- `agent_name_candidates[]` — ranked `{value, source}`. Source values:
  `Agent_name_literal` (highest), `ADK_name_field`, `workspace_dir`
  (lowest).
- `project_name_candidates[]` — same shape; `pyproject` source seeds
  `project.name` only.
- `suggested_sources[]` — MCP/OpenAPI files matched by glob. These do
  NOT bump `is_agent_project` on their own.
- `codex_plugin_candidates[]` — Codex plugin package or marketplace
  artifacts matched by convention. These also do NOT bump
  `is_agent_project` on their own.

**Stop condition.** Stop and skip `init` only when ALL of:

- `is_agent_project` is `false`, AND
- `suggested_sources` is empty, AND
- `codex_plugin_candidates` is empty, AND
- no `shipgate.yaml` already exists, AND
- the user did not explicitly request a scan.

Otherwise proceed. MCP/OpenAPI-only tool-surface repos and Codex plugin
package repos surface as `is_agent_project: false` but should still be
onboarded — their sources will land in `tool_sources` during `init`.

### Step 2 · `init --write --ci --json`

Auto-detection runs again inside `init` and writes:

- `shipgate.yaml` with `tool_sources` populated per detected framework
  candidate file.
- `.github/workflows/agents-shipgate.yml` (if `--ci` is set; refuses
  to overwrite an existing workflow file or one that already calls
  `ThreeMoonsLab/agents-shipgate@*` from a sibling workflow).

Key response fields:

- `manifest_status`: `"written"` | `"skipped_existing"` | `"not_attempted"`.
- `workflow.status` (when `--ci`): `"written"` | `"skipped_existing_target"`
  | `"skipped_cross_reference"`.
- `placeholders[]` — entries the template intentionally leaves as
  `CHANGE_ME` because no high-confidence signal was available. Each has
  a `path` (YAML-pointer-ish location) and `current` value. Replace
  these before scanning.
- `auto_detected.agent_name` — the value the manifest carries
  (`null` when the template fell back to `CHANGE_ME`; matches the YAML
  exactly).

`--ci` is orthogonal to `--write`: each gets its own overwrite-refusal.
Exit code is the max of per-action outcomes; manifest-error and
workflow-skip can co-occur.

### Step 3 · `scan -c shipgate.yaml --suggest-patches --format json`

Writes to `agents-shipgate-reports/report.json`. Read it, walk
`findings[]` filtering on `suppressed`. Per-finding fields you can rely
on today:

- `check_id`, `title`, `severity`, `category`, `evidence`,
  `confidence`, `recommendation`.
- `patches[]` (only when `--suggest-patches` is set) — list of
  patch objects with `kind` ∈ `{set_pointer, append_pointer,
  remove_pointer, manual}`. Non-manual patches additionally carry
  `confidence` ∈ `{low, medium, high}`, `target_file`, `pointer`,
  `target_format`, `rationale`, `target_sha256`.
- `manifest_dir` (top-level on the report) — absolute path to the
  directory containing `shipgate.yaml`. `apply-patches` enforces a
  containment check against this.

When `--suggest-patches` is set, every active (unsuppressed) finding
has at least one patch. Manual-only findings (e.g. trace approval
flips, per-check policy decisions) carry a single `ManualPatch` with
`instructions` instead of a machine-applicable patch.

Optional dynamic-validation handoff:

```bash
agents-shipgate scenario suggest \
    --from agents-shipgate-reports/report.json \
    --out agents-shipgate-reports/suggested-scenarios.yaml
```

This YAML is a concrete per-finding/per-tool fan-out of
`report.json.suggested_scenarios[]`, not a separate scenario engine.
Suppressed findings are omitted; baseline-matched findings remain because
they are accepted debt, not resolved risk.

### Step 4 · `apply-patches --confidence high --apply`

Default `--confidence high` only auto-applies patches whose `confidence`
field is `"high"`. Today that's the 3 stale-manifest removals
(`SHIP-MANIFEST-STALE-{SUPPRESSION,POLICY,RISK-OVERRIDE}`). Scope
coverage appends ship at `medium` and require explicit
`--confidence medium` to apply.

`apply-patches` is dry-run by default — `--apply` is required to
mutate files. Containment-checked: any `target_file` outside
`report.manifest_dir` aborts with exit code 5 before SHA verification.

### Step 5 (optional) · Summarize for the user

When the flow completes, summarize `report.json`:

- `release_decision.decision` (`"blocked" | "review_required" | "insufficient_evidence" | "passed"`)
  — the v0.8+ release-gate signal (`insufficient_evidence` added v0.14).
  Prefer this over `summary.status`, which stays baseline-blind for
  backwards compat. Switch on the value with a `review_required`
  fallback for unknown future values.
- `release_decision.reason` (one-sentence explanation).
- Top 3 active critical/high findings with their `check_id`,
  `tool_name` (when present), and `recommendation`.
- Whether any patches were applied (count from
  `apply-patches --json` output's `files`).

Link findings back to [`docs/checks.md#<id>`](checks.md) so the user
can read full check rationale.

---

## Recipe 2 · Add Shipgate to a repo that already has tool surfaces

Same as Recipe 1, but `detect` may report `is_agent_project: false`
when the repo only ships MCP exports or OpenAPI specs. Per the soft
stop rule above, proceed anyway when `suggested_sources` is non-empty.

`init` will populate `tool_sources` from those globs. The rest of the
flow (steps 2-5) is identical.

### First-real-repo recovery rules

When the first repo scan does not produce useful tools, follow these
rules before changing code:

- If `detect --json` has MCP/OpenAPI `suggested_sources`, continue to
  `init` even when `is_agent_project` is `false`.
- If `doctor` shows zero tools, inspect `tool_sources[].path`, MCP
  `tools[]`, OpenAPI `paths`, optional source warnings, and dynamic
  ADK/MCP warnings.
- If tools are created by factories, wrappers, runtime imports, or
  dynamic ADK/MCP toolsets, provide an explicit MCP export, OpenAPI
  spec, or local tool inventory artifact.
- Replace every `CHANGE_ME` value in `shipgate.yaml` before scanning;
  use the prompt, main agent file, README, or owner-provided context.
- Agents Shipgate requires Python 3.12+. If the project runtime is
  older, install the CLI outside the project env with `pipx` or `uv`.
- Ensure `agents-shipgate-reports/` is listed in `.gitignore`.

---

## Recipe 3 · Re-scan after editing the manifest

When the user has already replaced `CHANGE_ME` placeholders or added
policies:

```bash
agents-shipgate scan -c shipgate.yaml --suggest-patches --format json
agents-shipgate apply-patches \
    --from agents-shipgate-reports/report.json \
    --confidence high --apply
```

`run_id` is deterministic for the same input — if the report's
`run_id` is unchanged from the previous run, nothing semantic about
the manifest+tool-surface changed.

---

## Recipe 4 · Suppress a check or finding

When a finding is a known false positive, edit `shipgate.yaml`:

```yaml
checks:
  ignore:
    - check_id: SHIP-DOC-MISSING-DESCRIPTION
      tool: support_lookup_v2  # optional; omit to suppress for ALL tools
      reason: "Tool description matches the upstream OpenAPI summary."
```

`reason` is required — empty reasons fail manifest validation. Re-run
`scan` to confirm the finding is gone (it will appear in `findings[]`
with `suppressed: true` rather than disappearing from the report).

If you suppress a check that no longer fires, the next scan emits
`SHIP-MANIFEST-STALE-SUPPRESSION` — auto-removable via
`apply-patches`.

---

## Recipe 5 · Add Shipgate to CI without changing existing workflows

```bash
agents-shipgate init --workspace . --ci  # no --write
```

Without `--write`, the manifest is printed to stdout (don't write a
new one). With `--ci`, the workflow file is still written orthogonally
unless an existing workflow already references the action — in which
case `workflow.status: "skipped_cross_reference"` and the path of the
existing workflow is reported in `cross_reference_path`.

---

## Output handling

- Always pass `--json` (where supported) and parse the result. The
  human-readable stdout is unstable; the JSON shape is the contract.
- `scan` does not have `--json`; instead pass `--format json` and read
  `agents-shipgate-reports/report.json`.
- Errors emit a structured `next_action` JSON line on stderr when
  `AGENTS_SHIPGATE_AGENT_MODE=1` is set. Surface that path to the user
  rather than scraping prose.

## Pre-flight reminder

`agents-shipgate-reports/` is a local artifact directory. Before
committing, ensure it's listed in `.gitignore`:

```gitignore
agents-shipgate-reports/
```

`init` does not touch `.gitignore` — leave that to the user or follow
up with an explicit edit.

---

## Reference

- [`docs/agent-autofix-boundary.md`](agent-autofix-boundary.md) — what
  an agent may do mechanically vs. what must defer to a human reviewer.
- [`docs/report-reading-for-agents.md`](report-reading-for-agents.md) —
  reader's primer for `agents-shipgate-reports/report.json`.
- [`docs/checks.md`](checks.md) — full check catalog with rationale
- [`docs/autofix-policy.md`](autofix-policy.md) — which findings are
  safe to apply, which need review, and how `apply-patches --confidence`
  filters them
- [`docs/minimal-real-configs.md`](minimal-real-configs.md) —
  framework-specific minimal manifests
- [`AGENTS.md`](../AGENTS.md) — top-level agent instructions, install,
  trigger table


<!-- ===== source: docs/agent-contract-current.md ===== -->

# Current Agent Contract

The single, current statement of what AI coding agents and CI integrations should read from Agents Shipgate output. When the contract changes, update [STABILITY.md](../STABILITY.md) first, then this file. Other agent-facing surfaces (`AGENTS.md`, `llms.txt`, `.well-known/agents-shipgate.json`, the slash command, the skill, the FAQ) link here instead of restating field lists.

## Current versions

Verify the installed CLI contract locally before relying on hard-coded docs:

```bash
agents-shipgate contract --json
```

- Latest release: `v0.11.0` (see [pyproject.toml](../pyproject.toml) for the in-tree version)
- Runtime contract: `1`
- Current report schema: `0.22` — [`docs/report-schema.v0.22.json`](report-schema.v0.22.json)
- Current packet schema: `0.6` — [`docs/packet-schema.v0.6.json`](packet-schema.v0.6.json)
- Current verifier schema: `0.1` — [`docs/verifier-schema.v0.1.json`](verifier-schema.v0.1.json)
- Frozen-reference report schemas: [`v0.21`](report-schema.v0.21.json), [`v0.20`](report-schema.v0.20.json), [`v0.19`](report-schema.v0.19.json), [`v0.18`](report-schema.v0.18.json), [`v0.17`](report-schema.v0.17.json), [`v0.16`](report-schema.v0.16.json), [`v0.15`](report-schema.v0.15.json), [`v0.14`](report-schema.v0.14.json), [`v0.13`](report-schema.v0.13.json), [`v0.12`](report-schema.v0.12.json), [`v0.11`](report-schema.v0.11.json), [`v0.10`](report-schema.v0.10.json), [`v0.9`](report-schema.v0.9.json), [`v0.8`](report-schema.v0.8.json), [`v0.7`](report-schema.v0.7.json), [`v0.6`](report-schema.v0.6.json), older
- Frozen-reference packet schemas live in [`docs/INDEX.md`](INDEX.md#reference).

## Read these first for release gating

In `agents-shipgate-reports/report.json`:

- `release_decision.decision` — `"blocked"` / `"review_required"` / `"insufficient_evidence"` / `"passed"`. Baseline-aware. **This is the gating signal.** Blockers take precedence. If there are no blockers, `insufficient_evidence` (added v0.14) fires when evidence coverage is degraded past threshold: low-confidence tools are at least `max(1, ceil(tool_count × 0.5))`, or source-loader warnings exceed `3`. One to three source warnings without blockers route to `review_required`. `insufficient_evidence` means the scan cannot confidently gate release from the available static evidence; it does not prove the agent is unsafe. Switch on the enum with a `review_required` fallback for unknown future values.
- `release_decision.blockers[]` — items that block release on this run.
- `release_decision.review_items[]` — items the human reviewer should look at; includes baseline-matched accepted debt.
- `release_decision.fail_policy.would_fail_ci` — `true`/`false`. Matches what the CI process will exit with.
- `release_decision.reason` — one-sentence explanation suitable for a PR comment.
- `release_decision.contribution_rules[]` (v0.17+) — deterministic per-finding audit explaining how each `report.findings` entry was classified. Exactly one row per finding (including suppressed). Each row carries `{finding_id, fingerprint, check_id, category, rule, rationale}`. `category` ∈ `{blocker, review_item, excluded}`; `rule` ∈ `{policy_block_new, severity_block_new, policy_baseline_accepted, severity_baseline_accepted, review_required, sub_threshold, suppressed}`. Reading the contribution rule is sufficient to predict the gate outcome for that finding without re-deriving the decision logic — the closed grammar of `(rule, category)` pairs is documented in [STABILITY.md "Release decision truth table"](../STABILITY.md#release-decision-truth-table). The audit cannot disagree with `blockers[]` / `review_items[]` (the same classification powers both).
- `privacy_audit` (v0.18+) — confirms the default redaction pass ran before public artifacts were written. Read `enabled`, `rules_version`, `sensitive_field_inventory_version`, `redacted_occurrence_count`, `redacted_paths[]`, and `output_surfaces[]`. `redacted_paths[]` contains structural paths and counts only, never raw values or raw hashes.
- `reviewer_summary` (v0.20+) — deterministic projection of the reviewer lens surfaces and audit envelopes; the reviewer-side parallel to `agent_summary`. Read this block first when triaging a scan for a human reviewer. Carries `verdict` (mirrors `release_decision.decision`), `headline` (≤200 chars, PR-comment-friendly), per-lens activity counts (`tool_surface_changes`, `capability_misalignments`, `action_surface_changes`, `evidence_matrix_gaps`), per-audit-envelope counts (`severity_overrides_applied`, `severity_overrides_tier_crossed`, `privacy_redactions`, `baseline_integrity_issues`), and `first_recommended_surface: ReviewerSurfacePointer | None` — a deterministic pointer naming which lens/audit to open first (`{kind, name, path, why}` where `kind` ∈ `{release_decision, lens, audit, evidence_matrix}` and `name` ∈ `{tool_surface_diff, capability_intent_diff, action_surface_diff, evidence_matrix, policy_audit, privacy_audit, baseline_integrity, release_decision}`). Same inputs always produce the same output; this block cannot disagree with the underlying lens/audit data.
- `heuristics_filter` (v0.21+) — top-level audit envelope describing the `--no-heuristics` CLI filter pass. Always present, even when the flag is unset (`enabled: False` with zero counts), so the report shape is stable. Carries `enabled: bool`, `excluded_provenance_kinds: list[str]` (`["keyword_heuristic", "regex_heuristic"]`), `filtered_finding_count: int`, and `filtered_by_kind: dict[str, int]` (per-kind breakdown). When `enabled: True`, findings whose `provenance_kind` is in the excluded list have been marked `suppressed=True` with `suppression_reason="filtered by --no-heuristics"` BEFORE the release decision was built — they remain in `findings[]` for transparency but no longer gate release. The filter never un-suppresses a finding; manifest-driven suppression reasons are preserved when they overlap with the filter. Useful for security/GRC reviewers who want declared-only findings.
- `verifier_summary` (v0.22+) — top-level **composition** for one-fetch controller consumption (the AI-coding-workflow verifier surface). It derives **no independent verdict**: `verdict` mirrors `release_decision.decision` exactly (Principle: one decision engine). Carries `by_severity: dict[str,int]` and `by_reason_code: dict[str,int]` (active-finding histograms — the complete per-code map), `capability_delta_summary: {added, removed, broadened, narrowed}` (equal by construction to the `capability_change` member-list lengths), `protected_surface_touched: bool`, `policy_weakened: bool`, `human_ack_required: bool`, `human_ack_satisfied: bool`, and `top_reason_codes: list[{reason_code, count}]` — the ranked top-five highlight (severity desc → count desc → code asc; the full set stays in `by_reason_code`). This block cannot introduce a finding-independent blocker.

The remaining v0.22 verifier blocks are reviewer-facing projections / declared inputs — none gates independently (`release_decision.decision` stays the only gate). They populate with real values only under `verify` mode (a `VerificationContext` from `agents-shipgate verify` or an equivalent scan context); a plain `scan` emits their stable empty shape:

- `capability_change` (v0.22+) — the diff-derived capability delta, grouped into `{enabled, added, removed, broadened, narrowed}` member lists over `action_surface_diff` / `tool_surface_diff`. Each `CapabilityChangeMember` carries `{id, direction, subject_kind, tool, action, scope, before_scope, after_scope, risk_tags, release_impact, provenance_kind, confidence, rationale, related_finding_ids}`. `broadened` = more effective capability (wider scope, escalated effect, removed control); `narrowed` = less (removed scope, added control). `enabled: false` when no base diff is available.
- `protected_surface_changes` (v0.22+) — list of touched release trust roots, each `{path, kind, glob, related_finding_ids}`. Derived from the active `SHIP-VERIFY-*` findings, so every row's `related_finding_ids` resolves to a real `findings[]` entry and the rollup can never disagree with the gate. A row means "a protected file was touched"; purely-semantic weakenings with no file path stay in `findings[]` and surface via `verifier_summary` flags.
- `effective_policy` (v0.22+) — normalized (not text-diff) snapshot of the release-policy surface for base-vs-head weakening comparison: `{ci_mode, fail_on[], suppressed_check_ids[], waiver_scopes[], severity_overrides{}, baseline_integrity_mode, baseline_fingerprints[], ci_gate_present}`. Every list/dict is sorted for byte-stable output; derived purely from the manifest (plus accepted-debt fingerprints).
- `human_ack` (v0.22+) — declared human-acknowledgement state, `{required, satisfied, acks[], outstanding[]}`. Within the static boundary, acknowledgement is **declared evidence only — never inferred** (human authority cannot be synthesized). A trust-root weakening (`SHIP-VERIFY-POLICY-WEAKENED`, `-CI-GATE-REMOVED`, `-BASELINE-OR-WAIVER-EXPANDED`) makes a surface `required`; it is `satisfied` only by a matching `human_ack` entry in `shipgate.yaml` (owner + reason + affected surface, optional expiry). `required == (acks-covering-required) + outstanding`. The acknowledgement section lives in `shipgate.yaml` — itself a trust root — so a coding agent cannot add its own ack without tripping `SHIP-VERIFY-TRUST-ROOT-TOUCHED`.

New `SHIP-VERIFY-*` reason codes (v0.22+, category `verify` — suppression-immune and floor-protected; emit only under `verify` mode): `SHIP-VERIFY-POLICY-WEAKENED` (base-vs-head policy weakened; fail-safe to review when the base is unavailable), `SHIP-VERIFY-BASELINE-OR-WAIVER-EXPANDED` (suppression/waiver/baseline broadened), `SHIP-VERIFY-CI-GATE-REMOVED` (Shipgate CI workflow deleted), `SHIP-VERIFY-AGENT-INSTRUCTIONS-WEAKENED` (agent-instruction trust root changed; routed to human review), `SHIP-VERIFY-TRIGGER-CATALOG-DRIFT` (trigger catalog changed). They are ordinary `Finding`s routed through `release_decision` — never a second verdict.

The action exposes these as outputs `decision`, `blocker_count`, `review_item_count`, `ci_would_fail` (v0.8+).
For verifier-cycle PR workflows it also exposes additive outputs
`should_run`, `trigger_action`, `trigger_rule_ids`, `verifier_verdict`,
`merge_verdict`, `can_merge_without_human`, `trust_root_touched`,
`policy_weakened`, `capability_changes_added`,
`capability_changes_modified`, and `capability_changes_removed`. These are
review and routing aids only. `trust_root_touched` and `policy_weakened`
mirror `verifier_summary`; the capability counts mirror
`capability_change` (`modified` is `broadened + narrowed`). Keep using
`decision` as the preferred gating output.

For ongoing PR workflows, prefer:

```bash
agents-shipgate verify --workspace . --config shipgate.yaml \
  --base origin/main --head HEAD --ci-mode advisory --format json
```

`verify` writes `verifier.json` and `pr-comment.md` alongside the head scan
artifacts. The packet artifact is intentionally `packet.json` only; use
`scan` for manifest-driven packet Markdown/HTML/PDF rendering. Read
`verifier.json.base_status` to understand whether base diff enrichment ran;
do not use it as a release verdict. The release gate is still
`report.json.release_decision.decision`. `verify` never fetches, so CI callers
must make the base ref available before invocation. Supplying `--head` makes
verify scan an isolated archive of that ref; omitting it scans the checked-out
workspace. If an explicit `--base` ref or PR diff cannot be inspected, verify
skips a head-only scan; `verifier.json.merge_verdict` is `unknown` and the
command exits 2.

`agents-shipgate verify --preview --json` is a lightweight relevance check — no
scan, no manifest required, exits 0. It emits a `verifier.json` with
`mode: "preview"` and a `first_next_action` carrying the next recommended action:
`none` for irrelevant diffs, `detect`/`init` for relevant unconfigured repos, or
`verify` for configured repos. Use it as the first touch before a full scan. To
evaluate just the run/skip trigger, run
`agents-shipgate trigger --base origin/main --head HEAD --json`.

In `agents-shipgate-reports/verifier.json`, read these additive fields
(`verifier_schema_version` stays `"0.1"`; full schema
[`docs/verifier-schema.v0.1.json`](verifier-schema.v0.1.json)). **Lead with
`merge_verdict`.** Every field below is a mirror or deterministic projection of
`report.json`; `release_decision.decision` remains the gate.

- `merge_verdict` — `"mergeable"` / `"human_review_required"` /
  `"insufficient_evidence"` / `"blocked"` / `"unknown"`. Deterministic projection
  of `release_decision.decision` (`passed`→`mergeable`,
  `review_required`→`human_review_required`,
  `insufficient_evidence`→`insufficient_evidence`, `blocked`→`blocked`, missing
  decision→`unknown`). It cannot disagree with the gate; switch on the enum with
  an `unknown`/`human_review_required` fallback for future values.
- `can_merge_without_human` — `bool`.
- `decision` — mirror of `release_decision.decision` (or `null` when no scan ran).
- `headline` — single-sentence, PR-comment-friendly summary (or `null`).
- `human_review` — `{required: bool, why: str|null}`.
- `first_next_action` — `{actor: "coding_agent"|"human", kind, command, why}`.
  The `actor` separates mechanical coding-agent work from human-only decisions.
- `fix_task` — `{actor, safe_to_attempt, instructions[], forbidden_shortcuts[],
  verification_command}` or `null`. This is the deterministic repair boundary:
  `actor: coding_agent` with `safe_to_attempt: true` means the agent may attempt
  the listed mechanical fix and rerun `verification_command`; `actor: human`
  means the agent must not invent approval, idempotency, policy, waiver,
  baseline, or trust-root evidence to make the gate pass.
- `trust_root_touched` — `bool`; `true` when the PR changed a release-gate trust
  root (`shipgate.yaml`, the Shipgate CI workflow, `AGENTS.md`/`CLAUDE.md`,
  policy packs, prompts, baselines, waivers, etc.). Backed by the
  `SHIP-VERIFY-TRUST-ROOT-TOUCHED` check.
- `capability_review` — reviewer-facing projection of `capability_change` with
  `{trust_root_touched, policy_weakened, capability_changes_added,
  capability_changes_removed, capability_changes_modified, top_changes[]}`.
  `top_changes[]` carries the highest-signal capability deltas with
  `{id, title, impact, rationale, related_finding_ids}`. `impact` mirrors the
  gate (`blocks_release`, `review_required`, `insufficient_evidence`, or
  informational values) and never introduces a finding-independent blocker.
- `mode` — `"advisory"` / `"strict"` / `"skipped"` / `"preview"`.

`verifier.json` also carries `trigger`, `base_status`, `head_status`, `base_ref`,
`head_ref`, `changed_files`, `base_notes`, the embedded `release_decision`, and an
`artifacts` map. The matching GitHub Action outputs are `merge_verdict`,
`can_merge_without_human`, `trust_root_touched`, and
`capability_changes_{added,modified,removed}` (the original `decision`,
`blocker_count`, `review_item_count`, `ci_would_fail` outputs are preserved). See
[STABILITY.md §Verify Orchestrator](../STABILITY.md#verify-orchestrator) for the
authoritative contract.

The default Action PR comment style for the verifier-cycle minor is
`capability-review`: decision first, then the top capability changes,
trust-root warnings, required next steps, and artifact links. Existing adopters
that need the v1 findings-oriented comment during migration can set
`pr_comment_style: findings` for one minor release cycle.

## Read these for release review

`agents-shipgate contract --json` exposes `manual_review_signals[]` as the
installed CLI's stable list of report/packet fields to inspect for human review
work. `findings[].provenance_kind` is included there as a filter/review signal
only; it never changes the release decision, severity, fingerprints, baselines,
or CI exit behavior.

The capability/intent diff fields (v0.9+), used by reviewers to spot misalignment between declared agent intent and actual tool surface:

- `capability_facts[]` — every capability surfaced from the tool inventory.
- `declared_intentions[]` — what the manifest says the agent is supposed to do.
- `misalignments[]` — where capabilities exceed (or fall short of) declared intent.
- `release_consequence` — capability-aware roll-up of the release decision.
- `suggested_scenarios[]` — dynamic-validation scenarios derived from misalignments and findings.

The Action Surface Diff fields (v0.16+), reviewer-facing PR/release delta:

- `action_surface_facts.actions[]` — deterministic snapshot of the current agent action surface: action id, operation, effect, normalized risk tags, scopes, approval policy, safeguards, evidence, and hashes.
- `action_surface_diff.{enabled, base, summary, added, removed, modified, notes}` — what changed vs. a base report or v0.4 baseline. Policy findings generated from this diff can set `findings[].blocks_release=true` and appear in `release_decision.blockers`.
- `findings[].blocks_release` and `release_decision.{blockers,review_items}[].blocks_release` — explicit release-policy blockers from Action Surface Diff policies and policy-pack rules with `block: true`. Advisory CI may still exit 0; strict CI exits nonzero when an active unbaselined release blocker is present.

The tool-surface diff fields (v0.10+), lower-level explanatory data:

- `tool_surface_facts.{tools, scopes, controls, policies}` — current static facts about the tool surface.
- `tool_surface_diff.{enabled, base, summary, tools, high_risk_effects, scopes, controls, metadata_changes, policy_drift, finding_deltas, notes}` — what changed vs. a base ref. Disabled diffs render as `enabled: false` with a `notes` reason.

Source provenance fields on `findings[].source` (v0.11+), additive and optional:

- `path`, `start_line`, `end_line`, `start_column`, `pointer` — manifest-relative file path, 1-based line/column, and RFC 6901 JSON pointer for the offending tool. Populated for OpenAPI, MCP, OpenAI tool artifacts, and Anthropic tool artifacts when the source is YAML. JSON inputs carry `path` and `pointer` but no line in v0.11.

Per-finding `agent_action` enum (v0.12+), deterministic projection — read this **first** when deciding what to do with a finding so you don't have to synthesize an action from `patches`/`autofix_safe`/`requires_human_review`/`suggested_patch_kind`:

- `auto_apply` — `apply-patches --confidence high` will resolve cleanly. Every patch is non-manual and high-confidence.
- `propose_patch_for_review` — at least one non-manual patch is attached and machine-applicable, but the full patch set is not auto-safe. Two shapes land here: (a) every non-manual patch is medium- or low-confidence, and (b) a high-confidence non-manual patch sits alongside one or more `ManualPatch` siblings (the non-manual is safe to apply, but the manual instructions still need a human). In both cases the agent should ask the user before `--apply` and surface any manual instructions verbatim.
- `escalate_to_human` — no machine-applicable patch. Either every patch is `ManualPatch`, or `patches` is empty/absent and the check requires human review.
- `suppress_with_reason` — reserved for future check classes that explicitly mark themselves as suppressible. Not emitted by the v0.12 deterministic projection; the schema accepts it so callers can extend.
- `informational` — no action required (suppressed finding or non-actionable advisory).

Top-level `agent_summary` block (v0.12+), one-fetch summary shaped for direct agent consumption — read this when you want the headline numbers without traversing arrays:

- `verdict` — mirrors `release_decision.decision`.
- `headline` — single-sentence verdict + counts; suitable for a PR comment lead. The headline uses `needs_human_review` (action-driven) for "require human review" wording, so a `review_required` verdict with only auto-applicable findings reads honestly as "auto-applicable; none require human input" rather than falsely claiming N findings need review.
- `blocker_count` — mirrors `len(release_decision.blockers)`.
- `review_item_count` — mirrors `len(release_decision.review_items)`; **severity-driven** (medium-and-up severity findings that aren't blockers, plus baseline-matched accepted debt). Use this when reporting release-review debt to the human reviewer.
- `auto_appliable_patches` — number of active findings with `agent_action == "auto_apply"`.
- `needs_human_review` — **action-driven**: number of active findings with `agent_action ∈ {"escalate_to_human", "propose_patch_for_review"}`. Both kinds need explicit human attention before any change applies — full escalations have no machine path, and proposed patches ship at medium/low confidence and require an explicit `--apply` after the user confirms. Use this when reasoning about what work an agent must do.
- **`review_item_count` and `needs_human_review` track different populations and can diverge.** A medium-severity stale-suppression finding lands in `release_decision.review_items` (severity rule) but its `agent_action` is `auto_apply` (high-confidence patch attached), so it's counted in `review_item_count` and `auto_appliable_patches` but **not** in `needs_human_review`.
- `first_recommended_action` — `{kind, command|null, why}`; deterministic next step. `kind: "command"` carries an actual CLI invocation; `kind: "info"` is a "surface this to the user" hint with no command. The agent_summary block is a deterministic projection — same inputs, same output, no agent-side aggregation needed.

Codex plugin surface block (v0.13+), explanatory only — never a release-gate
input by itself:

- `codex_plugin_surface.{plugins, marketplaces, skills, apps, mcp_server_stubs, hook_stubs, mcp_inventory_files, component_path_issues, warnings}` — local static plugin package and marketplace facts.
- Only explicit MCP inventory tools from `codex_plugins.mcp_tool_inventories` appear in `tool_inventory[]`; apps, hooks, skills, and MCP server declarations stay in `codex_plugin_surface`.

Per-finding `provenance_kind` enum (v0.15+), additive classification — read this when you want to filter findings by the kind of rule that fired, independent of `confidence` (sureness):

- `static_declaration` — declared metadata: manifest, MCP export, OpenAPI schema, ADK YAML agent config, LangChain/CrewAI inventory JSON. High-trust structural facts.
- `ast_extraction` — Tool parsed from user Python source by a framework extractor (LangChain function/structured tools, CrewAI function/class tools, ADK Python toolsets). Subject to extraction errors; agents that distrust AST quality may filter these as a class.
- `keyword_heuristic` — matched a keyword list (broad-scope tokens, read-only/approval prompt terms, free-text parameter names). Higher false-positive risk than declarative facts.
- `regex_heuristic` — matched a regex (secret-like values in descriptions, prompt-injection patterns). Highest false-positive risk; pair with the recommendation before acting.
- `policy_pack` — emitted by an external policy pack rule. The rule's own confidence applies — Shipgate does not second-guess the pack.

Provenance generally follows the rule's own trigger (e.g., a rule that checks for a declared manifest field is `static_declaration` even when the underlying Tool was AST-extracted). For framework checks that fire across both AST and declarative tool sources (ADK's per-tool checks against `google_adk_function` AND `google_adk_config` tools), the label tracks the underlying tool's source. Third-party plugin checks that don't yet set the field land at `static_declaration` by default — pre-v0.15 plugins continue to validate against the v0.15 wire schema. Use `findings[].source.type` for the precise underlying tool source.

To filter operationally, use:

```bash
agents-shipgate findings --from agents-shipgate-reports/report.json \
  --provenance-kind keyword_heuristic,regex_heuristic --json
```

The command reads active findings by default; add `--include-suppressed` when a
reviewer needs suppressed entries in the same provenance summary.

For reviewer-shaped output, also read the **Release Evidence Packet** at `agents-shipgate-reports/packet.{md,json,html}` (and `packet.pdf` when the `[pdf]` extras are installed). Packet outputs are redacted by the same default privacy layer as the report. The packet has fixed reviewer sections governed by [`docs/packet-schema.v0.6.json`](packet-schema.v0.6.json) — see [STABILITY.md §Release Evidence Packet](../STABILITY.md#release-evidence-packet-v06).
Packet schema `0.6` preserves the v0.5 `action_surface_diff` section and
adds two independent additive extensions:

- `evidence_matrix` (PR #104) — a compact packet-only review aid
  derived from public `report.json` fields. The matrix never contributes
  to `release_decision`, CI exit behavior, severity, suppression,
  baseline matching, or `agent_summary`; its blocker and review-item
  cells are copied from `release_decision`.
- `ReleaseDecisionItem.source` and `ReleaseDecisionItem.policy_evidence_source`
  (PR #103) — packet §1 / §2 re-renders carry the same dual-source
  provenance that `Finding.source` / `Finding.policy_evidence_source`
  expose in the report.

It preserves every v0.5 field
(`human_in_the_loop.runtime_control_disclaimer`,
`human_in_the_loop.source_provenance[]`, `action_surface_diff`). The
`release_decision.verdict` label includes `INSUFFICIENT EVIDENCE` when
the report decision is insufficient evidence.

## Don't use for new gating

- `summary.status` — preserved for v0.7 callers, **baseline-blind**. A baseline-matched critical flips this to `release_blockers_detected` even though `release_decision.decision` correctly classifies it as `review_required`. New consumers should not gate on `summary.status`. See [STABILITY.md §`release_decision.decision` vs `summary.status`](../STABILITY.md#release_decisiondecision-vs-summarystatus).

## Per-finding contextual explanation (v0.12+)

For prose summaries of a single finding (PR comments, chat replies, commit messages), use:

```bash
agents-shipgate explain-finding <FINGERPRINT> \
    --from agents-shipgate-reports/report.json --json
```

The payload is the full `Finding` shape (every field on `findings[]` in `report.json`, including `source`, `patches`, `confidence`, `agent_id`, etc.) overlaid with three derived fields:

- `metadata` — full `CheckMetadata` for the check_id (rationale, fires_when, evidence_fields, docs_url, `mvp_tier`) when the check is in the catalog; null for unknown ids (third-party plugins, future checks). `mvp_tier` is display/triage metadata only and never affects gating.
- `explanation` — a deterministic 3–5 sentence prose summary suitable for direct quotation. Names the affected tool, the severity, the recommended fix, and an action-aware closing sentence keyed to `agent_action`. Same inputs always produce the same output.
- `source_report` — **absolute** path (always; relative `--from` values are resolved before serialization) to the report file the explanation was sourced from; round-trippable for caching and audit.

`explain-finding` requires `report_schema_version >= 0.12` because the action-aware explanation depends on per-finding `agent_action`. Pre-v0.12 reports are rejected with `input_parse_error` and a `next_action` pointing at the canonical scan command. The Pydantic `ReadinessReport` model is intentionally looser than this command's contract (so test fixtures can construct minimal findings); the version gate is what enforces v0.12 semantics on emitted reports.

Companion prompt: [`prompts/explain-finding-to-user.md`](../prompts/explain-finding-to-user.md). Use it when you need to translate a finding for a human who has never read the Shipgate docs. Keep `agents-shipgate explain <CHECK_ID>` for static catalog metadata (no specific finding); use `explain-finding` whenever you have a fingerprint and want the evidence-tied prose.

## Authoritative references

- [STABILITY.md](../STABILITY.md) — full 0.x stability contract. Source of truth for everything above.
- [AGENTS.md](../AGENTS.md) — agent-facing instructions: install, run, single-turn flow, error semantics.
- [`docs/report-schema.v0.22.json`](report-schema.v0.22.json) — machine-validatable JSON Schema for the current report.
- [`docs/privacy.md`](privacy.md) and [`docs/report-sensitive-fields.json`](report-sensitive-fields.json) — default redaction behavior and sensitive-field inventory.
- [`docs/packet-schema.v0.6.json`](packet-schema.v0.6.json) — machine-validatable JSON Schema for the current packet.
- [`docs/checks.json`](checks.json) — check catalog, including `mvp_tier` for MVP/readiness triage.

## See also

- [`report-reading-for-agents.md`](report-reading-for-agents.md) — reader's primer that walks the JSON in the order a new consumer should read it; complements this field index.
- [`agent-autofix-boundary.md`](agent-autofix-boundary.md) — what an agent may assert mechanically vs. what must defer to a human reviewer when surfacing findings from `report.json`.


<!-- ===== source: docs/checks.md ===== -->

# Check Catalog

Agents Shipgate checks are deterministic static checks. They do not certify safety, run agents, call tools, call LLMs, or verify runtime routing.

## Severity Contract

- `critical`: strict CI exits `20` unless the finding is explicitly suppressed with a reason.
- `high`: requires human review but does not fail CI by default.
- `medium`: review during release hardening.
- `low` and `info`: informational.

Only unsuppressed `critical` findings block strict mode. Suppressed findings remain in JSON with `suppressed: true` and are excluded from active severity counts.

## Evidence Coverage

- `static`: all enumerated tools came from high-confidence static sources.
- `mixed`: at least one enumerated tool came from lower-confidence enrichment, such as SDK AST extraction.

Suppressions do not change evidence coverage.

## Baselines

v0.2 adds local baseline gating. `agents-shipgate baseline save` writes active,
unsuppressed findings to `.agents-shipgate/baseline.json`. A later
`agents-shipgate scan --baseline .agents-shipgate/baseline.json --ci-mode strict`
marks findings as `matched` or `new` and fails only on new findings that match
the active fail policy. Resolved baseline findings are counted in the report
baseline summary and do not fail CI.

## Checks

| Check ID | Severity | Meaning |
| --- | --- | --- |
| `SHIP-INVENTORY-NOT-ENUMERABLE` | high | No tool surface could be enumerated from the manifest inputs. |
| `SHIP-INVENTORY-WILDCARD-TOOLS` | high | A source exposes wildcard/all tools instead of an explicit allowlist. |
| `SHIP-INVENTORY-TOOL-SURFACE-TOO-LARGE` | medium | The normalized tool count exceeds the MVP review threshold. |
| `SHIP-DOC-MISSING-DESCRIPTION` | medium | A tool has no description or a description too short for reliable review. |
| `SHIP-DOC-INJECTION-RISK` | medium/high | A tool description contains instruction-override style language. High only when multiple patterns match on a write/high-risk tool. |
| `SHIP-DOC-SECRET-IN-DESCRIPTION` | medium/high | A tool description contains a secret-like token or credential value. High only when multiple patterns match on a write/high-risk tool. |
| `SHIP-SCHEMA-BROAD-FREE-TEXT` | high | A write/action-like tool accepts broad `action`, `body`, `command`, `updates`, or similar free-form input. |
| `SHIP-SCHEMA-MISSING-BOUNDS` | high | A risky numeric parameter such as `amount`, `count`, or `quantity` lacks a maximum. |
| `SHIP-SCHEMA-FREEFORM-OUTPUT` | medium | A tool returns free-form string output that may later be placed in model context. |
| `SHIP-AUTH-MISSING-SCOPE` | high | A write-like tool has no declared auth scope metadata. |
| `SHIP-AUTH-MANIFEST-BROAD-SCOPE` | high | The manifest declares broad scopes such as `*`, `admin`, or `service:*`. |
| `SHIP-AUTH-TOOL-BROAD-SCOPE` | high | A tool declares broad scopes such as `*`, `admin`, or `service:*`. |
| `SHIP-AUTH-SCOPE-COVERAGE-MISSING` | high | A tool requires scopes that are not covered by `permissions.scopes`. |
| `SHIP-SCOPE-TOOL-OUTSIDE-PURPOSE` | high | A write-capable tool contradicts a read-only declared purpose. |
| `SHIP-SCOPE-PROHIBITED-TOOL-PRESENT` | high | A tool appears to overlap with a manifest `prohibited_actions` entry. |
| `SHIP-POLICY-APPROVAL-MISSING` | critical | A high-risk tool lacks a manifest approval policy. |
| `SHIP-POLICY-CONFIRMATION-MISSING` | high | A destructive, external-write, or customer-communication tool lacks a confirmation policy. |
| `SHIP-ACTION-UNDECLARED` | high | A loaded tool lacks explicit action-surface metadata when explicit actions are required. |
| `SHIP-ACTION-POLICY-VIOLATION` | high | A user-declared action-surface policy requirement is not satisfied. |
| `SHIP-ACTION-FINANCIAL-WRITE-CONTROL-MISSING` | critical | A newly added financial write action lacks approval, audit, or idempotency controls. |
| `SHIP-ACTION-DESTRUCTIVE-ROLLBACK-MISSING` | critical | A newly added destructive action lacks approval or rollback controls. |
| `SHIP-ACTION-EXTERNAL-COMMUNICATION-AUDIT-MISSING` | high | A newly added external communication action lacks audit evidence. |
| `SHIP-ACTION-WILDCARD-SCOPE` | critical | An action declares or expands into a wildcard/admin-like scope. |
| `SHIP-ACTION-EFFECT-ESCALATED` | critical | An action effect escalated compared with the base surface. |
| `SHIP-ACTION-EFFECT-DOWNGRADE-DECLARED` | high | An action declaration weakens the effect inferred from the loaded tool surface. |
| `SHIP-ACTION-CONTROL-DOWNGRADE` | high | An action declaration weakens an inherited approval or safeguard control. |
| `SHIP-ACTION-APPROVAL-REMOVED` | critical | An existing action approval policy was removed. |
| `SHIP-ACTION-SAFEGUARD-REMOVED` | high | An existing action safeguard was removed. |
| `SHIP-EVIDENCE-APPROVAL-TRACE-MISSING` | high | Local HITL approval trace evidence is missing or incomplete for an approval-required tool. |
| `SHIP-EVIDENCE-OVERRIDE-REASON-MISSING` | high | Local HITL override reason evidence is missing or incomplete. |
| `SHIP-EVIDENCE-HIGH-RISK-EXCLUSION-MISSING` | high | Local high-risk auto-approval exclusion evidence is missing or incomplete. |
| `SHIP-EVIDENCE-HITL-PROMOTION-CRITERIA-MISSING` | high | Local HITL promotion criteria evidence is missing or incomplete. |
| `SHIP-SIDEFX-IDEMPOTENCY-MISSING` | critical/high | A risky write tool lacks idempotency evidence. Critical only when retry behavior is known. |
| `SHIP-API-FUNCTION-SCHEMA-STRICTNESS` | high/medium | An OpenAI API function schema is missing strictness, required fields, or bounded risky fields. |
| `SHIP-API-STRUCTURED-OUTPUT-READINESS` | high/medium | An OpenAI API response format is missing or too broad for downstream decisions. |
| `SHIP-API-PROMPT-TOOL-SCOPE-MISMATCH` | high/medium | Prompt language contradicts the enabled OpenAI API tool surface or lacks approval/confirmation instructions. |
| `SHIP-API-RETRY-POLICY-MISSING` | medium | High-risk OpenAI API tools are enabled without retry policy metadata. |
| `SHIP-API-TIMEOUT-MISSING` | medium | High-risk OpenAI API tools are enabled without timeout metadata. |
| `SHIP-API-TEST-CASES-MISSING` | medium | High-risk OpenAI API tools are enabled without declared test cases. |
| `SHIP-API-TOOL-OUTPUT-SCHEMA-MISSING` | medium | A high-risk OpenAI API tool lacks success/failure output modeling. |
| `SHIP-API-RETRY-WITHOUT-IDEMPOTENCY` | high | A risky OpenAI API write tool may be retried without idempotency evidence. |
| `SHIP-API-TRACE-APPROVAL-MISSING` | medium | A trace sample shows a policy-controlled tool call without approval. |
| `SHIP-API-TRACE-CONFIRMATION-MISSING` | medium | A trace sample shows a policy-controlled tool call without confirmation. |
| `SHIP-API-OPERATIONAL-READINESS` | medium | Deprecated v0.3 compatibility alias for the v0.4 atomic OpenAI API operational readiness checks. |
| `SHIP-ADK-DYNAMIC-TOOLSET-NOT-ENUMERABLE` | high | A Google ADK toolset cannot be statically enumerated and no explicit inventory is declared. |
| `SHIP-ADK-MCP-TOOLSET-UNFILTERED` | high/medium | A Google ADK `McpToolset` has no static `tool_filter`. |
| `SHIP-ADK-FUNCTION-TOOL-METADATA-MISSING` | medium | A Google ADK function/config tool lacks static description or parameter metadata. |
| `SHIP-ADK-LONGRUNNING-CONTRACT-MISSING` | high | A Google ADK long-running tool lacks operation-id and status/progress contract evidence. |
| `SHIP-ADK-GUARDRAIL-EVIDENCE-MISSING` | high | High-risk Google ADK tools lack callback/plugin or policy guardrail evidence. |
| `SHIP-ADK-EVAL-COVERAGE-MISSING` | medium | Production-like Google ADK inputs are present without declared eval files. |
| `SHIP-LANGCHAIN-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLE` | high | A LangChain/LangGraph tool surface cannot be statically enumerated and no explicit inventory is declared. |
| `SHIP-LANGCHAIN-FUNCTION-TOOL-METADATA-MISSING` | medium | A LangChain/LangGraph function tool lacks static description or parameter metadata. |
| `SHIP-CREWAI-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLE` | high | A CrewAI tool surface cannot be statically enumerated and no explicit inventory is declared. |
| `SHIP-CREWAI-FUNCTION-TOOL-METADATA-MISSING` | medium | A CrewAI function/class tool lacks static description or parameter metadata. |
| `SHIP-CODEX-PLUGIN-METADATA-MISSING` | medium | A Codex plugin package has incomplete or ambiguous identity metadata. |
| `SHIP-CODEX-PLUGIN-COMPONENT-PATH-MISSING` | high | A declared Codex plugin component path is missing or outside the package/workspace. |
| `SHIP-CODEX-PLUGIN-MARKETPLACE-POLICY-MISSING` | medium | A Codex plugin marketplace entry lacks installation/authentication policy metadata. |
| `SHIP-CODEX-PLUGIN-MCP-SERVER-NOT-ENUMERABLE` | high | A Codex plugin MCP server is declared without a local enumerable tool inventory. |
| `SHIP-CODEX-PLUGIN-APP-SURFACE-NOT-ENUMERABLE` | medium | A Codex plugin connector app surface is not statically enumerable from local metadata. |
| `SHIP-CODEX-PLUGIN-SKILL-METADATA-MISSING` | medium | A Codex plugin skill lacks unique name/description frontmatter. |
| `SHIP-N8N-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLE` | high | An n8n tool surface uses runtime, unresolved, wildcard, or uninventoried custom exposure. |
| `SHIP-N8N-MCP-CLIENT-TOOLSET-UNFILTERED` | high/medium | An n8n MCP Client Tool exposes `All` or `All Except` tools without an explicit inventory. |
| `SHIP-N8N-AI-TOOL-METADATA-MISSING` | medium | An n8n AI-exposed tool lacks static description or parameter metadata. |
| `SHIP-N8N-CREDENTIAL-EVIDENCE-MISSING` | high | Production-like n8n workflows reference credentials without declared credential stubs. |
| `SHIP-N8N-EVAL-COVERAGE-MISSING` | medium | Production-like n8n workflows are present without declared eval files. |
| `SHIP-N8N-SECRET-IN-WORKFLOW-PARAMETER` | high | n8n workflow JSON contains a secret-like value; evidence is redacted. |
| `SHIP-MANIFEST-STALE-SUPPRESSION` | medium | A suppression references a missing check ID or missing tool. |
| `SHIP-MANIFEST-STALE-POLICY` | medium | An approval, confirmation, or idempotency policy references a missing tool. |
| `SHIP-MANIFEST-STALE-RISK-OVERRIDE` | medium | A risk override references a missing tool. |
| `SHIP-MANIFEST-HIGH-RISK-OWNER-MISSING` | high | A high-risk production or production-like tool lacks owner metadata. |
| `SHIP-MANIFEST-UNUSED-SCOPE` | medium/high | `permissions.scopes` contains a scope unused by any loaded tool; broad unused scopes are high. |
| `SHIP-VERIFY-TRUST-ROOT-TOUCHED` | medium | A PR changed a release trust-root file; emitted only when a verification context (changed files) is supplied. |
| `SHIP-VERIFY-POLICY-WEAKENED` | high | Base-vs-head effective policy weakened (CI mode downgraded, fail-on loosened, or a severity override lowered across a tier); fail-safe to review when the base is unavailable. |
| `SHIP-VERIFY-BASELINE-OR-WAIVER-EXPANDED` | high | The PR broadens what the gate forgives — a new suppression, a widened waiver scope, or a larger accepted-debt baseline — versus the base. |
| `SHIP-VERIFY-CI-GATE-REMOVED` | critical | The PR deletes the Shipgate CI workflow from an opted-in repo, which would stop the release gate from running. |
| `SHIP-VERIFY-AGENT-INSTRUCTIONS-WEAKENED` | medium | The PR edits agent-instruction trust roots and weakening cannot be statically disproven; routed to human review. |
| `SHIP-VERIFY-TRIGGER-CATALOG-DRIFT` | medium | The PR changes the trigger catalog that decides when Shipgate runs; routed to human review to rule out gate evasion. |

## Check Details

### SHIP-INVENTORY-NOT-ENUMERABLE

The scanner could not enumerate any tools from required manifest inputs. Add a local MCP JSON or OpenAPI source before relying on the report.

### SHIP-INVENTORY-WILDCARD-TOOLS

A source exposes wildcard or all-tools access. Replace it with an explicit allowlist so review can reason about the actual release surface.

### SHIP-INVENTORY-TOOL-SURFACE-TOO-LARGE

The normalized tool count exceeds the MVP review threshold. Split or reduce the surface when the report becomes too broad to review.

### SHIP-INVENTORY-LOW-CONFIDENCE-PRODUCTION-SURFACE

A production target depends on lower-confidence extraction, such as SDK AST enrichment. Declare the tools through manifest, MCP, or OpenAPI inputs.

### SHIP-DOC-MISSING-DESCRIPTION

A tool has no description or a description too short for reliable review. Add a concise capability description.

### SHIP-DOC-INJECTION-RISK

A tool description contains instruction-override-like language. Rewrite it as neutral metadata.
Purely heuristic matches default to `medium`; multiple matches on write/high-risk tools are `high`.

### SHIP-DOC-SECRET-IN-DESCRIPTION

A tool description contains a secret-like token or credential value. Remove it and rotate the exposed secret.
Purely heuristic matches default to `medium`; multiple matches on write/high-risk tools are `high`.

### SHIP-SCHEMA-BROAD-FREE-TEXT

A write/action-like tool accepts broad free-form input. Constrain the field with structured schema or enums.

### SHIP-SCHEMA-MISSING-BOUNDS

A risky numeric parameter lacks a maximum. Add a maximum or equivalent policy limit.

### SHIP-SCHEMA-FREEFORM-OUTPUT

A tool returns free-form string output that may later be placed in model context. Prefer structured output for model-consumed tool results.

### SHIP-AUTH-MISSING-SCOPE

A write or sensitive-data tool has no auth scope metadata. Declare scopes in OpenAPI, MCP, or manifest metadata.

### SHIP-AUTH-MANIFEST-BROAD-SCOPE

The manifest declares broad permission scopes such as wildcard or admin scopes. Replace them with operation-specific scopes.

### SHIP-AUTH-TOOL-BROAD-SCOPE

A tool declares broad auth scopes. Use narrower tool scopes where possible.

### SHIP-AUTH-SCOPE-COVERAGE-MISSING

A tool requires scopes that are not covered by `permissions.scopes`. Reconcile the manifest with the tool requirements.

### SHIP-SCOPE-TOOL-OUTSIDE-PURPOSE

A write-capable tool contradicts a read-only declared purpose. Remove the tool or update the declared release scope.

### SHIP-SCOPE-PROHIBITED-TOOL-PRESENT

A tool appears to overlap with a manifest `prohibited_actions` entry. Remove or narrow the tool, or revise policy/scope text.

### SHIP-POLICY-APPROVAL-MISSING

A high-risk tool lacks a declared approval policy. Add an approval policy or remove the tool from the release.

### SHIP-POLICY-CONFIRMATION-MISSING

A destructive, external-write, or customer-communication tool lacks a confirmation policy. Add confirmation policy or remove the tool.

### SHIP-ACTION-UNDECLARED

`action_surface.require_explicit_actions` is true, but a loaded tool has no
matching `action_surface.actions[]` declaration. Add action metadata for the
tool or disable the explicit-action requirement.

### SHIP-ACTION-POLICY-VIOLATION

A user-declared `action_surface.policies[]` rule matched an action, and one or
more required dot-path values were absent or different. Satisfy the policy
requirements or narrow/remove the action.

### SHIP-ACTION-FINANCIAL-WRITE-CONTROL-MISSING

A newly added action is classified as `financial_write` and is missing
`approval.required`, `safeguards.audit_log`, or `safeguards.idempotency`.
Declare the required controls before releasing the action.

### SHIP-ACTION-DESTRUCTIVE-ROLLBACK-MISSING

A newly added destructive action is missing `approval.required` or
`safeguards.rollback`. Declare the approval and rollback controls, or remove
the destructive action from the release surface.

### SHIP-ACTION-EXTERNAL-COMMUNICATION-AUDIT-MISSING

A newly added external communication action lacks `safeguards.audit_log`.
Declare audit evidence so reviewers can trace outbound side effects.

### SHIP-ACTION-WILDCARD-SCOPE

An added action declares a broad scope, or a modified action expands into a
broad scope such as wildcard/admin access. Replace it with operation-specific
scopes.

### SHIP-ACTION-EFFECT-ESCALATED

An action changed to a higher-risk effect, such as read to write or write to
destructive. Add reviewer approval for the escalation or reduce the effect.

### SHIP-ACTION-EFFECT-DOWNGRADE-DECLARED

An `action_surface.actions[]` declaration sets a lower-risk effect than
Shipgate inferred from the loaded tool metadata. Align the declared effect
with the inferred operation or remove the weaker declaration.

### SHIP-ACTION-CONTROL-DOWNGRADE

An `action_surface.actions[]` declaration sets an inherited approval or
safeguard control from `true` to `false`. Keep the inherited control enabled
or remove the weakening declaration.

### SHIP-ACTION-APPROVAL-REMOVED

The base action required approval, but the current action no longer does.
Restore `approval.required` or document a reviewed override.

### SHIP-ACTION-SAFEGUARD-REMOVED

An existing action lost a safeguard such as audit logging, idempotency,
rollback, or dry-run support. Restore the safeguard or document a reviewed
override.

### SHIP-EVIDENCE-APPROVAL-TRACE-MISSING

`validation.required_evidence.approval_trace_required` is true, but local
validation evidence does not show `approved: true` for an approval-required
tool. Add local approval trace evidence produced by runtime middleware or
change the declared review posture. Agents Shipgate reads this evidence; it
does not produce or certify it. Missing local evidence does not prove the
runtime approval control is absent.

### SHIP-EVIDENCE-OVERRIDE-REASON-MISSING

`validation.required_evidence.override_reason_required` is true, but override
logs are absent, empty, or include normalized `override`, `bypass`, or
`auto_approve` events without a non-empty `reason`. Record reviewer-visible
reasons in the local override log. Missing local evidence does not prove the
runtime override control is absent.

### SHIP-EVIDENCE-HIGH-RISK-EXCLUSION-MISSING

`validation.required_evidence.high_risk_auto_approval_exclusion_required` is
true, and a high-risk tool with declared approval policy is not listed under
`high_risk_auto_approval_exclusions`. This is separate from
`SHIP-POLICY-APPROVAL-MISSING`: it only fires after approval policy is already
declared, because it checks the local evidence that the tool is excluded from
auto-approval review posture. Missing local evidence does not prove the
runtime exclusion control is absent.

### SHIP-EVIDENCE-HITL-PROMOTION-CRITERIA-MISSING

`validation.target_review_posture` is `limited_auto_approval`, but local
promotion criteria evidence is missing or the canonical required-evidence
flags are not true in the manifest and criteria file. Finding evidence includes
`reason: file_missing` or `reason: flags_missing` so reviewers can distinguish
an absent local source from incomplete criteria. Missing local evidence does
not prove runtime controls are absent.

### SHIP-SIDEFX-IDEMPOTENCY-MISSING

A risky write tool lacks idempotency evidence. Add an idempotency key, idempotent annotation, or declared idempotency policy.

### SHIP-API-FUNCTION-SCHEMA-STRICTNESS

An OpenAI API function schema is not strict enough for reliable tool calls. The check flags missing `strict: true`, missing object parameters, `additionalProperties` not set to `false`, properties omitted from `required`, broad free-text action fields, and risky numeric fields without bounds or enums.

### SHIP-API-STRUCTURED-OUTPUT-READINESS

An OpenAI API response format is missing or under-specified. The check flags missing response schemas for high-risk API tools, broad response objects, decision/status fields without enums, missing `refusal` / `needs_review` / `error` modeling, and missing `downstream_critical_fields`.

### SHIP-API-PROMPT-TOOL-SCOPE-MISMATCH

Prompt files contradict the enabled API tool surface. The check flags prompts that say "advise only" or "read-only" while write/high-risk tools are enabled, and high-risk tools whose prompts do not mention approval and confirmation expectations.

### OpenAI API Operational Readiness Checks

v0.4 splits the former `SHIP-API-OPERATIONAL-READINESS` bundle into atomic
check IDs so suppressions, severity overrides, SARIF rules, and baselines can
target one missing contract at a time. The split checks use `model_config`,
`policy_rules`, simple test cases, and trace samples to flag missing retry
policy, missing timeouts, missing test cases, non-idempotent high-risk tools
with retry evidence, missing success/failure tool-output modeling, and trace
samples that show required approval or confirmation missing.

The old bundled check ID remains as a deprecated compatibility alias through at
least one minor release. v0.4 does not emit new findings with
`SHIP-API-OPERATIONAL-READINESS`, but existing suppressions, severity overrides,
baseline entries, `explain`, `list-checks`, and stale-suppression validation
continue to recognize it. New configs should use the specific v0.4 ID that
represents the condition.

### SHIP-API-OPERATIONAL-READINESS

Deprecated compatibility alias for the v0.3 OpenAI API operational readiness
bundle. Migrate suppressions, severity overrides, and baselines to the specific
v0.4 `SHIP-API-*` readiness checks when you touch the config.

### SHIP-API-RETRY-POLICY-MISSING

A high-risk OpenAI API tool flow runs without declared retry policy metadata.
Reviewers cannot reason about duplicate side effects when retry behavior is
unspecified. Declare `retry_policy` in `openai_api.policy_rules` or
`openai_api.model_config`.

### SHIP-API-TIMEOUT-MISSING

A high-risk OpenAI API tool flow runs without declared timeout metadata.
Without an explicit timeout, failure behavior and tool-call continuation
become ambiguous. Declare a tool-call timeout in policy rules or model
config.

### SHIP-API-TEST-CASES-MISSING

High-risk OpenAI API tools exist with no declared test cases. Tool-call flows
that approve refunds, send mail, or modify state should ship with simple test
cases as release evidence. Add cases under `openai_api.test_cases`.

### SHIP-API-TOOL-OUTPUT-SCHEMA-MISSING

A high-risk OpenAI API tool lacks declared success/failure output modeling.
Reviewers depend on `success_fields` and `failure_fields` to reason about
downstream failure handling. Declare them in policy rules.

### SHIP-API-RETRY-WITHOUT-IDEMPOTENCY

A retry policy is declared and a risky write tool lacks idempotency evidence.
Retries against non-idempotent writes can duplicate financial, destructive, or
external side effects. Either add idempotency evidence or remove the retry
policy for this tool.

### SHIP-API-TRACE-APPROVAL-MISSING

A trace sample shows a policy-controlled tool call with `approved: false` for
a tool that has approval policy evidence elsewhere in the manifest. Implement
the runtime approval gate; **do not edit the trace recording** to flip
`approved` — that patches the evidence, not the agent's behavior.

### SHIP-API-TRACE-CONFIRMATION-MISSING

A trace sample shows a policy-controlled tool call with `confirmed: false`
for a tool that has confirmation policy evidence. Implement the runtime
confirmation gate; **do not edit the trace recording** to flip `confirmed`
— same anti-pattern as the approval-missing finding above.

### SHIP-ADK-DYNAMIC-TOOLSET-NOT-ENUMERABLE

A Google ADK `OpenAPIToolset`, `McpToolset`, or dynamic tools expression could
not be enumerated statically. Provide explicit local OpenAPI, MCP, or ADK tool
inventory inputs before relying on the release report.

### SHIP-ADK-MCP-TOOLSET-UNFILTERED

An ADK `McpToolset` has no static `tool_filter`. Add a narrow filter and an
explicit inventory file so reviewers can see the intended runtime surface.

### SHIP-ADK-FUNCTION-TOOL-METADATA-MISSING

An ADK function or Agent Config tool reference lacks description or parameter
metadata. Add docstrings, type annotations, or explicit local inventory
metadata.

### SHIP-ADK-LONGRUNNING-CONTRACT-MISSING

An ADK `LongRunningFunctionTool` lacks static evidence for operation id and
status/progress fields. Google-style `name` plus `done`, `state`, `phase`,
`metadata`, or `result` fields count as contract evidence; tools may also carry
`annotations.long_running_contract: true` in explicit inventory metadata.
Document the handoff and completion contract before promotion.

### SHIP-ADK-GUARDRAIL-EVIDENCE-MISSING

High-risk ADK tools are present without static callback/plugin or manifest
policy evidence. ADK callbacks and plugins count only as static evidence of
intent; they are not proof that runtime enforcement works.

### SHIP-ADK-EVAL-COVERAGE-MISSING

Google ADK inputs target `production_like` or `production` without declared eval
files. Add eval artifacts that cover expected responses and tool-use
trajectories.

### SHIP-LANGCHAIN-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLE

A LangChain/LangGraph tool list, binding, or graph node could not be enumerated
statically. Provide an explicit local inventory when tools are produced by
factories, comprehensions, loop-built lists, unresolved imports, or other
runtime-only code. This ID uses `TOOL-SURFACE` instead of ADK's `TOOLSET`
because LangChain exposes ad hoc tool lists and model/graph bindings rather
than a consistent toolset abstraction.

### SHIP-LANGCHAIN-FUNCTION-TOOL-METADATA-MISSING

A LangChain/LangGraph `@tool` function or `StructuredTool.from_function(...)`
surface lacks a static description or parameter metadata. Add docstrings,
function annotations, or same-file Pydantic `args_schema` metadata.

### SHIP-CREWAI-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLE

A CrewAI agent or crew tool surface could not be enumerated statically. Provide
an explicit local inventory when tools are produced by factories,
comprehensions, loop-built lists, unresolved imports, or other runtime-only
code. This ID uses `TOOL-SURFACE` instead of ADK's `TOOLSET` because CrewAI
agents bind ad hoc tool lists rather than a consistent toolset abstraction.

### SHIP-CREWAI-FUNCTION-TOOL-METADATA-MISSING

A CrewAI `@tool` function or `BaseTool` subclass lacks a static description or
parameter metadata. Add descriptions, `_run` annotations, or same-file Pydantic
`args_schema` metadata.

### SHIP-CODEX-PLUGIN-METADATA-MISSING

A Codex plugin package has incomplete or ambiguous identity metadata. Fill
`name`, `version`, and `description`; keep the plugin name aligned with the
package root; and avoid duplicate plugin names across scanned package roots.

### SHIP-CODEX-PLUGIN-COMPONENT-PATH-MISSING

A Codex plugin component path for skills, MCP servers, apps, or hooks could not
be loaded. Paths must resolve inside both the plugin package and the manifest
directory.

### SHIP-CODEX-PLUGIN-MARKETPLACE-POLICY-MISSING

A marketplace entry lacks `policy.installation`, `policy.authentication`, or
`category`. Add those fields so coding agents can see installation and
authentication posture before adoption.

### SHIP-CODEX-PLUGIN-MCP-SERVER-NOT-ENUMERABLE

A plugin declares an MCP server in `.mcp.json`, but Agents Shipgate does not
execute MCP commands to discover tools. Provide a local MCP tools inventory via
`codex_plugins.mcp_tool_inventories`.

### SHIP-CODEX-PLUGIN-APP-SURFACE-NOT-ENUMERABLE

A plugin declares a connector app in `.app.json`. Connector-backed capabilities
are externally mediated and are review items unless a local inventory or policy
artifact documents the effective surface.

### SHIP-CODEX-PLUGIN-SKILL-METADATA-MISSING

A `skills/**/SKILL.md` file is missing parseable `name` or `description`
frontmatter, or duplicates another skill name in the same plugin. Give every
skill a unique routing name and clear description.

### SHIP-N8N-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLE

An n8n workflow uses a runtime expression in a tool name, an unresolved
Call-Workflow target, wildcard MCP Server/Client exposure, or an uninventoried
community/custom tool node. Provide a local n8n/MCP inventory or replace the
dynamic exposure with a static allowlist. This is high severity in every
environment because static release evidence cannot prove the actual tool
inventory.

### SHIP-N8N-MCP-CLIENT-TOOLSET-UNFILTERED

An n8n MCP Client Tool exposes `All` or `All Except` tools without a local
inventory. Select explicit MCP tools or provide a local MCP inventory for
release review. The severity is environment-sensitive because the selector is
easy to narrow before production, while production-like use increases blast
radius.

### SHIP-N8N-AI-TOOL-METADATA-MISSING

An n8n AI-exposed tool lacks a static description or parameter metadata. Add
tool descriptions, `$fromAI()` metadata, workflow input schemas, or explicit
inventory metadata.

### SHIP-N8N-CREDENTIAL-EVIDENCE-MISSING

Production-like n8n workflows reference credentials but no local credential
stubs are declared. Declare source-control credential stubs so reviewers can
see credential types without seeing secret values.

### SHIP-N8N-EVAL-COVERAGE-MISSING

n8n workflows target `production_like` or `production` without declared eval
files. Add eval artifacts that cover expected responses and tool-use
trajectories.

### SHIP-N8N-SECRET-IN-WORKFLOW-PARAMETER

An n8n workflow parameter, node note, `pinData` entry, or `staticData` entry
contains a secret-like value. Evidence includes only the source reference,
stable pointer, and secret kind; it never includes the matched secret value or
a verifier hash for that value.

### SHIP-MANIFEST-STALE-SUPPRESSION

A suppression references an unknown check ID or a tool that is not loaded in the
current scan. Remove stale suppressions so reviewers can trust the suppression
list as current release intent.

### SHIP-MANIFEST-STALE-POLICY

A policy entry references a tool that is not loaded. Remove or update stale
approval, confirmation, or idempotency policies so release policy matches the
actual tool surface.

### SHIP-MANIFEST-STALE-RISK-OVERRIDE

`risk_overrides.tools` references a tool that is not loaded. Remove stale
overrides or update them to the current tool names.

### SHIP-MANIFEST-HIGH-RISK-OWNER-MISSING

A high-risk tool in `production_like` or `production` has no owner metadata.
Declare an owner in the tool source or `risk_overrides.tools` so reviewers know
who is accountable for remediation.

### SHIP-MANIFEST-UNUSED-SCOPE

`permissions.scopes` includes a scope not required by any loaded tool. Remove
unused scopes or add tool metadata showing why the permission is needed. Broad
unused write/admin scopes are `high`; other unused scopes are `medium`.

### SHIP-BASELINE-INTEGRITY-MISMATCH

Baseline file integrity check failed. Emitted when the baseline JSON has been
edited outside `agents-shipgate baseline save` (hash mismatch against the
audit log), when the audit log is missing or empty for a non-empty baseline,
when the audit log is malformed, when an entry's `provenance.run_id` is not
present in the audit log, or when an entry pre-dates the v0.5 provenance
contract. In
`baseline.integrity_mode: strict` the finding carries `blocks_release=true`
and `agents-shipgate baseline verify --strict` exits with code 6.
Re-run `agents-shipgate baseline save` to refresh the baseline alongside its
audit row; investigate the diff before accepting.

### SHIP-BASELINE-ENTRY-EXPIRED

A baseline entry's reviewer-set `provenance.expires` date is past today.
Renewable consent is a deliberate choice: accepted technical debt should
need re-review on a schedule, not a silent extension. Re-review the entry
and either remove it, fix the underlying finding, or extend
`provenance.expires` with a new `reason`.

### SHIP-BASELINE-ENTRY-STALE

A baseline entry no longer corresponds to an active finding or check ID.
Two sub-kinds, both `low` severity:

- `deprecated_check_id` — entry references an alias in `LEGACY_CHECK_ID_ALIASES`.
  Update the entry to the canonical replacement check IDs (re-running
  `baseline save` does not rewrite check IDs).
- `resolved_not_pruned` — entry matched no active scan finding. Re-run
  `agents-shipgate baseline save` to drop the entry from the baseline.

### SHIP-VERIFY-TRUST-ROOT-TOUCHED

A PR changed a file that defines the release gate's trust spine — the
manifest (`shipgate.yaml`), `.agents-shipgate/` state (baselines,
waivers), `policies/`, `prompts/`, the Shipgate CI gate
(`.github/workflows/agents-shipgate.yml`), agent instructions
(`AGENTS.md`, `CLAUDE.md`, `.claude/`, `.cursor/rules/`,
`.agents/skills/`, `.codex/`), Codex plugin packages (`.codex-plugin/`),
or tool-surface declarations (`.app.json`, `.mcp.json`, `SKILL.md`).

This is Tier A trust-root protection: pure path/glob classification of
the changed files. It is the cheap half of the reward-hacking guard — a
coding agent told to make CI pass can weaken the gate instead of fixing
the readiness issue, so touching a trust root must require human review.
The finding fires only when a verification context (changed files) is
supplied (`agents-shipgate scan --changed-files ...` or, later, `verify`);
a plain `scan` emits nothing. It is one ordinary `Finding` at `medium`
severity routed through `release_decision` — never a second verdict.

### SHIP-VERIFY-POLICY-WEAKENED

Tier B trust-root protection: instead of classifying *which* files
changed, it compares the normalized effective-policy snapshot of the base
report (supplied via `--diff-from`) against the head manifest and fires
when the gate moved toward *less* review or *less* blocking — CI mode
downgraded (e.g. `strict` → `advisory`), the fail-on severity set lost a
tier, or a check's severity override dropped across a tier boundary. The
comparison is semantic, not a text diff, so it is robust to reformatting.

When no base snapshot is available (no `--diff-from`, or a pre-v0.22 base)
but the PR touched a policy/manifest trust root, the check fails safe to a
single `medium` review-required finding rather than passing silently — a
reward-hacker must not be able to dodge review by breaking the base scan.
Category `verify` (suppression-immune, floor `high`); never a second
verdict.

### SHIP-VERIFY-BASELINE-OR-WAIVER-EXPANDED

Tier B: detects a PR that broadens what the gate forgives — a new entry in
`checks.ignore`, a widened waiver scope (e.g. one tool widened to `*`), or
a larger accepted-debt baseline — by a base-vs-head superset comparison of
the effective-policy snapshot. Suppressing or baselining a finding instead
of fixing it is a classic reward hack; this makes the expansion
release-visible. Requires a base snapshot (touching the files alone is
already covered by `SHIP-VERIFY-TRUST-ROOT-TOUCHED`). Category `verify`,
floor `high`.

### SHIP-VERIFY-CI-GATE-REMOVED

Tier B: fires when, in verify mode, a Shipgate CI workflow path
(`.github/workflows/agents-shipgate.yml`/`.yaml`) appears in the changed
files **and** that file no longer exists on disk — i.e. the PR deleted the
gate. Detectable without a base snapshot. Emitted at `critical` (floor
`high`): removing CI enforcement from an opted-in repo is the strongest
weakening signal in the family.

### SHIP-VERIFY-AGENT-INSTRUCTIONS-WEAKENED

Tier B: agent-instruction files (`AGENTS.md`, `CLAUDE.md`, `.claude/`,
`.cursor/rules/`, `.agents/skills/`, `.codex/`, `SKILL.md`) tell coding
agents how to behave around the gate. Shipgate is static and makes no NLP
judgement, so it cannot prove semantic weakening from text — per Principle
3 ("prompts are not controls"), any verify-mode change to these trust
roots is routed to human review at `medium`. Deterministic on changed-file
membership; the human confirms no gate-protecting instruction was removed.

### SHIP-VERIFY-TRIGGER-CATALOG-DRIFT

Tier B: the trigger catalog (`docs/triggers.json` or an
`.agents-shipgate` trigger config) decides *when* Shipgate runs. Editing
it can carve out paths so the gate stops firing — a gate-evasion one level
up from suppressing findings. Fires on changed-file membership in verify
mode at `medium`; the human confirms the change does not create a path
that evades the release gate.

Risk tags are hints, not findings by themselves. Checks consume tags with confidence thresholds.

Common tags:

- `read_only`
- `write`
- `destructive`
- `external_write`
- `financial_action`
- `customer_communication`
- `sensitive_data_access`
- `infrastructure_change`
- `code_execution`

Manual `risk_overrides` in `shipgate.yaml` are treated as high-confidence evidence. Use `remove_tags` to subtract heuristic tags that are known to be wrong for a specific tool.

## Listing Checks

Use the CLI to inspect the built-in catalog:

```bash
agents-shipgate list-checks
agents-shipgate list-checks --json
agents-shipgate explain SHIP-POLICY-APPROVAL-MISSING
```

The JSON catalog includes `mvp_tier` for display and triage:

| `mvp_tier` | Meaning |
|---|---|
| `core` | Core Tool-Use Readiness MVP signal. |
| `adapter` | Framework or provider-specific readiness signal. |
| `evidence` | Validation, trace, or HITL evidence signal. |
| `lifecycle` | Baseline, diff, or action-surface evolution signal. |
| `hygiene` | Useful quality or maintenance signal, not core MVP positioning. |

`mvp_tier` never changes check execution, severity, fingerprints, baselines,
`release_decision`, or CI exit behavior.

Third-party packages can register checks through the `agents_shipgate.checks` Python entry-point group. Plugins are disabled by default because loading them imports third-party Python modules. Set `AGENTS_SHIPGATE_ENABLE_PLUGINS=1` to opt in, or pass `--no-plugins` to force them off for a scan or catalog command. Reports include `loaded_plugins` provenance for every third-party check entry point Shipgate discovered — including ones that failed validation. A plugin check should expose a callable with the same `ScanContext -> list[Finding]` shape as built-ins and attach `AGENTS_SHIPGATE_METADATA` as either a `CheckMetadata` instance or a compatible dictionary. Adapter artifacts are available through `context.framework_artifacts` or `context.artifact("openai_api", OpenAIApiArtifacts)`. Legacy `context.*_artifacts` read-only properties remain available for v0.11 plugin compatibility, raise `TypeError` on artifact type mismatch, and are scheduled for removal in v0.12.

**Plugin validation (v0.17+; six gates v0.18+).** Shipgate runs six load-time gates against every entry point — load, signature, metadata, dynamic-default-not-supported (v0.18+), ID-collision, and floor-consistency — before letting it produce findings. Metadata may use either `id` or `check_id` as the identifier key (the alias is symmetric with `Finding.check_id`); both names map to `CheckMetadata.id`. The `dynamic_default_not_supported` gate (v0.18+) rejects plugins declaring `AGENTS_SHIPGATE_METADATA.dynamic_default=True`: plugins have no path to wire into `core/dynamic_defaults.py:dynamic_check_defaults`, so a swing check would never receive a manifest-effective default and would be silently bypassable. This gate runs **before** `_coerce_metadata` so a plugin declaring `dynamic_default=True` without `floor_severity` lands here under a precise status rather than being mis-classified as `bad_floor`. Plugins that fail validation surface in `loaded_plugins[]` with a non-`valid` `validation_status` and human-readable `validation_errors`, and they do not run. At runtime, findings whose `check_id` does not match the declared plugin metadata are dropped and recorded under `loaded_plugins[].runtime_errors` — a plugin cannot smuggle findings under another check ID. Default behavior is lenient (record failures, continue scanning). Pass `--strict-plugins` to exit non-zero (code 4) when any plugin has a non-`valid` status or non-empty `runtime_errors`. See [STABILITY.md § Trust-model invariants](../STABILITY.md#trust-model-invariants) and [STABILITY.md § Severity-override floor](../STABILITY.md#severity-override-floor) (for the dynamic-default contract) for the full contracts.

## Declarative Policy Packs

v0.4 adds local YAML policy packs for organization-specific release rules.
Policy packs are static data and are safe to enable by default when declared in
`checks.policy_packs` or passed with `scan --policy-pack`. External rule IDs
must use a non-`SHIP-*` namespace such as `ORG-*`; `SHIP-*` is reserved for
built-in checks. Pack findings behave like built-ins for suppressions, severity
overrides, baselines, Markdown, JSON, and SARIF. Python plugins remain a
separate opt-in extension mechanism.

## OpenAI Agents SDK Static Extraction

SDK extraction is optional enrichment. Agents Shipgate detects Python functions decorated directly with `@function_tool`, `@function_tool(...)`, `@agents.function_tool`, `@openai_agents.function_tool`, or simple import aliases such as `from agents import function_tool as ft`, for example:

```python
@function_tool
def search_customer(customer_id: str) -> str:
    ...
```

When `tool_sources[].path` points at a directory, the extractor scans immediate
`*.py` files in sorted order; it does not recurse into nested packages. The
static extractor does not execute user code and intentionally does not detect
dynamic wrappers, factory-created tools, `Tool.from_fn()` style objects, runtime
imports, or dynamic tool lists. Declare those tools through MCP/OpenAPI inputs or
manifest metadata.

## Google ADK Static Extraction

Google ADK extraction is optional static enrichment. Agents Shipgate detects
Python `Agent` / `LlmAgent` definitions, literal function tools,
`FunctionTool`, `LongRunningFunctionTool`, `OpenAPIToolset`, `McpToolset`,
callbacks, plugins, sub-agents, and Agent Config YAML references where those
values are statically knowable.

The ADK extractor does not import user modules, run `adk`, connect to MCP
servers, fetch OpenAPI specs over the network, call tools, or call models.
Dynamic ADK toolsets produce source warnings and one ADK finding per unresolved
toolset unless explicit local MCP/OpenAPI/tool inventory inputs are provided.

## LangChain And CrewAI Static Extraction

LangChain/LangGraph and CrewAI extraction are optional static enrichment.
Agents Shipgate detects supported Python tool definitions, wrappers, agent
bindings, and local inventory files where those values are statically knowable.
CrewAI `BaseTool` class metadata may use literal strings or Pydantic-style
`Field(default="...")` assignments for `name` and `description`.

The extractors do not import user modules, import framework packages, run
agents, run graphs, run crews, connect to MCP servers, fetch specs over the
network, call tools, call models, or execute framework subprocesses. Dynamic
tool surfaces produce source warnings and framework findings unless explicit
local tool inventory inputs are provided. CrewAI prebuilt `crewai_tools.*Tool()`
references are emitted as low-confidence stubs and warnings; they do not by
themselves produce the dynamic-tools finding.

## n8n Static Extraction

n8n extraction reads only local workflow JSON exports/source-control files and
optional local stubs or evidence artifacts declared under `n8n:`. It does not
call a live n8n instance, run `n8n`, execute workflows, decrypt credentials,
connect to MCP endpoints, execute code nodes, or fetch network resources.

The adapter enumerates AI Agent tool sub-nodes, MCP Client Tool selections,
MCP Server Trigger exposed tools, Call n8n Workflow Tool entrypoints, Custom
Code Tool nodes, HTTP Request Tool nodes, and explicit inventories when those
surfaces are statically visible. Workflow triggers such as Webhook and Chat
Trigger are recorded as ingress evidence, not as tools.

Inactive workflows (`active: false`) are recorded as workflow evidence but are
not normalized as live tool or ingress surfaces; their workflow JSON is still
scanned for secret-like values. Workflow tags, error-workflow settings, and
node execution controls such as retry/continue-on-fail are preserved as
review metadata when present.

Credential names, workflow/node names, code bodies, request bodies, headers,
pinned data, static data, node notes, variable values, execution payloads, and
detected secrets are redacted or omitted from reports. Credential types and
credential IDs may be preserved as local release evidence.


<!-- ===== source: docs/concepts.md ===== -->

# Concepts

The mental model behind Agents Shipgate, the deterministic merge gate for
AI-generated agent capability changes — a local-first, static Tool-Use
Readiness review.

For the product-level definition of a Tool-Use Readiness release gate, see
[`category.md`](category.md). For the agent-facing
walkthrough, see [`AGENTS.md`](../AGENTS.md).

## Tool-use readiness

**Tool-use readiness** is the static check that an agent's tool surface
is ready for promotion. It is *not* "did the tool call succeed" (a
runtime concern) or "did the model pick the right tool" (an eval
concern). It is the question a release reviewer answers at PR time:

> Given the tool surface declared in this PR, do we have explicit
> approval policies, scope coverage, idempotency evidence, and review
> readiness for every action — *before* promotion?

Tool-use readiness has seven dimensions. agents-shipgate produces
findings against each one.

| Dimension | What it asks | Evidence in the manifest |
|---|---|---|
| **Inventory** | What tools can the agent call? | A complete, named list — no wildcards, no "whatever this MCP server returns" |
| **Schema** | What inputs does each tool accept? | Strict JSON schema — `additionalProperties: false`, complete `required`, bounded numeric fields |
| **Auth** | What scopes does each tool need? | Declared per-tool or in `permissions.scopes` — narrower than the service account's actual scopes |
| **Approval** | Who reviews destructive actions before they fire? | `policies.require_approval_for_tools: [...]` for every write/destructive/financial action |
| **Side effects** | What does this tool change in the world? | Risk tags on the tool: `write`, `destructive`, `external_write`, `financial_action`, `customer_communication` |
| **Idempotency** | Can it be retried safely? | Idempotency key in the schema, documented retry policy, or explicit "do not retry" |
| **Blast radius** | If this tool fires unexpectedly, how bad is it? | Owner declared, prohibited actions enumerated, scope of resources bounded |

## Tool surface

The **tool surface** is the set of named, schemaed actions an agent can
invoke at runtime. It is declared via:

- Model Context Protocol (MCP) exports
- OpenAPI specs
- Framework-specific code (OpenAI Agents SDK Python, Google ADK, LangChain/LangGraph, CrewAI)
- API-specific artifacts (Anthropic Messages API tools.json, OpenAI
  Agents API function schemas)

The tool surface is a **release artifact** in the same sense as a
service deployment's binary or an API contract: it's a checked-in,
diff-able statement of what the agent can do, and it should be reviewed
on every PR.

## Manifest-first

agents-shipgate is **manifest-first**: the canonical claim about an
agent's surface lives in a single `shipgate.yaml` checked into the
repo. Every tool source the manifest references is reviewed at scan
time. There is one place to look for "what does this agent ship with."

This is intentional. Implicit configurations (e.g. "use whatever the
MCP registry returns") fail the inventory dimension above. The manifest
is what makes the release gate reviewable.

## Static vs dynamic

agents-shipgate is **static**. It does not run the agent, invoke the
model, call MCP servers, or make any network calls by default. Every
finding is derived from the artifact diff alone.

Static analysis covers the Tool-Use Readiness release slice. Dynamic concerns —
behavior under unusual inputs, runtime tool routing, latency,
hallucination — belong in evals, observability, and runtime guardrails.
agents-shipgate is additive to those, not a replacement.

## Where this fits in the wider stack

| Guard | When it runs | What it catches |
|---|---|---|
| Tests | CI on every PR | Code paths in the agent's *code* |
| Evals | On a schedule or per release | Model behavior on curated inputs |
| **agents-shipgate** | CI on every PR | Tool surface, scopes, policies, prompt/surface alignment |
| Runtime guardrails / gateway | At call time | Per-call policy enforcement |
| Observability | Runtime | What actually happened in production |

Each catches something the others can't. Removing any of them is a
regression.

## Related reading

- [`category.md`](category.md) — the product-level "what is an agent release gate"
- [`checks.md`](checks.md) — every check the scanner runs
- [`manifest-v0.1.md`](manifest-v0.1.md) — full manifest schema
- [`trust-model.md`](trust-model.md) — local-only guarantees and disclosure process
- [`glossary.md`](glossary.md) — category vocabulary


<!-- ===== source: docs/autofix-policy.md ===== -->

# Autofix policy

Which Agents Shipgate findings are safe to apply automatically, which
need human review, and how the per-finding metadata in `report.json`
maps to `apply-patches --confidence` flag semantics.

> **Audience.** AI coding agents driving verify-first PR checks or first-adoption helper flows
> (see [`agent-recipes.md`](agent-recipes.md)) and CI integrators
> deciding what to gate on.

---

## The four classes

Every active finding falls into one of four classes. The class is
encoded by the `autofix_safe` and `requires_human_review` fields on
each Finding, plus the `kind` and `confidence` fields on each
attached Patch.

| Class | Finding fields | Patch shape | v0.7 examples |
|---|---|---|---|
| **Safe auto-fix** | `autofix_safe: true`, `requires_human_review: false` | All patches non-manual AND high confidence | The 3 stale-manifest removals (`SHIP-MANIFEST-STALE-{SUPPRESSION,POLICY,RISK-OVERRIDE}`) when the match is unique |
| **Medium-confidence config fix** | `autofix_safe: false`, `requires_human_review: true`, `suggested_patch_kind: append_pointer/set_pointer` | Non-manual patch but at `medium` confidence | `SHIP-AUTH-SCOPE-COVERAGE-MISSING` scope appends |
| **Manual source/policy fix** | `autofix_safe: false`, `requires_human_review: true`, `suggested_patch_kind: manual` | `ManualPatch` with curated `instructions` | All other ~30 active checks (documentation, schema bounds, owner gaps, ADK/LangChain/CrewAI metadata, …) |
| **Never auto-fix** | `autofix_safe: false`, `requires_human_review: true`, `suggested_patch_kind: manual` | `ManualPatch` with explicit anti-pattern language | `SHIP-API-TRACE-{APPROVAL,CONFIRMATION}-MISSING` (flipping the trace patches the *evidence*, not the agent's runtime gate) |

Class four is a deliberate subset of class three — the distinction is
that an agent must NEVER attempt to "auto-fix" a trace finding by
editing the trace recording, even if the user asks. The
`ManualPatch.instructions` for these checks spell out the
anti-pattern in prose so even a curious operator gets the message.

---

## Catalog vs. Finding (the dual-source contract)

Two sources describe per-check remediation policy, and they answer
different questions:

| Source | Endpoint | What it answers |
|---|---|---|
| **CheckMetadata** | `agents-shipgate list-checks --json`, `agents-shipgate explain <ID> --json`, `docs/checks.json` | What an agent should *assume* when it has only the catalog and no scan output. Conservative across the board. |
| **Finding** | `agents-shipgate-reports/report.json` (per-finding) | What this *specific* instance produced. Can be more permissive than the catalog when the generator emitted clean high-confidence patches. |

**Catalog `autofix_safe` and `requires_human_review` describe the
worst-case per-check outcome.** A check whose generator USUALLY emits
a safe non-manual patch but falls back to `ManualPatch` in edge
cases (e.g. ambiguous duplicate matches in the stale-manifest
generators) keeps the safe-closed defaults at the catalog level. The
per-Finding fields tell the truth for that instance.

`suggested_patch_kind` at the catalog level is **informational** —
it documents the kind the generator *targets* when conditions are
clean, not what the report carries. An agent that sees
`suggested_patch_kind: "remove_pointer"` in `list-checks --json`
should still consult `Finding.patches` (or the per-Finding
`suggested_patch_kind`) to know whether this particular instance
actually produced one.

When in doubt, **trust the per-Finding fields over the catalog**
for any specific finding. The catalog is for static planning
("which check IDs *might* yield safe fixes"); the report is for
acting on a specific scan.

---

## Strict derivation rule

When a scan runs with `--suggest-patches`, every active finding
gets one or more attached patches and the four per-Finding fields
are derived from those patches with this rule:

```text
autofix_safe = True iff EVERY patch is non-manual AND has confidence == "high"
```

That is: a single `ManualPatch` mixed in, or a single `medium`/`low`
confidence patch mixed in, drops the entire finding to safe-closed.
The earlier "at least one safe patch wins" rule was unsafe — it
would have marked a `[high_remove, manual]` combination
auto-fixable while a ManualPatch still required review.

`suggested_patch_kind` is the kind of the **first non-manual patch**
even when ManualPatches are also present. (If ALL patches are
manual: `"manual"`. If the patches list is empty: `"none"`.)

`requires_human_review` is always the inverse of `autofix_safe`.

`docs_url` always comes from `CheckMetadata.docs_url`. Patches
don't carry per-instance documentation URLs.

### Catalog-driven escalation override

The strict derivation rule above can be **forced safe-closed** by a
per-check policy flag. When `CheckMetadata.requires_human_review_regardless_of_patch`
is `True`, `annotate_remediation` sets `autofix_safe=False` and
`requires_human_review=True` regardless of the per-patch derivation,
so `agent_action` lands at `propose_patch_for_review` (the patch is
still surfaced) instead of `auto_apply`. This catches the
approval/confirmation/idempotency, broad-scope, prohibited-action,
runtime-trace and HITL-evidence categories listed in
[`agent-autofix-boundary.md`](agent-autofix-boundary.md) §"Check-ID
mapping"; even a third-party patch generator emitting a clean
high-confidence non-manual patch on one of those check IDs cannot
auto-apply. The catalog is the contract: those check IDs always
escalate, regardless of how the patches were derived.

### Three patch states

| `Finding.patches` | Source of derived fields |
|---|---|
| `None` (scan ran without `--suggest-patches`) | CheckMetadata, with safe-closed fallback for unknown check IDs |
| `[]` (scan ran WITH `--suggest-patches` but generator emitted nothing) | Safe-closed shape, `suggested_patch_kind: "none"`. Does NOT fall back to catalog — the report carries no patches, so reporting a catalog-level kind would mislead. |
| Non-empty | Strict derivation rule above |

### Unknown check IDs (policy packs and third-party plugins)

A finding whose `check_id` isn't in the loaded catalog (a policy
pack rule, a third-party plugin emitted while plugins are disabled)
gets the safe-closed fallback when patches are absent:

```text
autofix_safe: false
requires_human_review: true
suggested_patch_kind: "manual"
docs_url: null
```

The fallback only applies when patches are absent. A high-confidence
non-manual patch from a policy pack still derives correctly.

---

## How `apply-patches --confidence` filters

`apply-patches` reads the report, filters patches by `--confidence`
and `--kinds`, and applies the survivors. Default flags:

```bash
agents-shipgate apply-patches \
    --from agents-shipgate-reports/report.json \
    --confidence high \
    --kinds set_pointer,append_pointer,remove_pointer \
    --apply
```

| Flag | Default | What it accepts |
|---|---|---|
| `--confidence` | `high` | Minimum patch confidence. Patches below this are skipped. |
| `--kinds` | `set_pointer,append_pointer,remove_pointer` | Patch kinds to include. ManualPatch is filtered out unconditionally — even with `--kinds manual`. |
| `--apply` | (off) | Without this, dry-run only. Always preview before mutating. |

So in v0.7 with the default flags:

- The 3 stale-manifest removals (when unambiguous) auto-apply.
- `SHIP-AUTH-SCOPE-COVERAGE-MISSING` scope appends are **skipped**
  (medium confidence). Pass `--confidence medium` to opt in — but
  read the appended scopes before merging, since adding scopes can
  encode policy choices.
- Trace approval/confirmation findings are **never** applied —
  ManualPatch is filtered out.
- Everything else with a ManualPatch is **never** applied.

`apply-patches` enforces a **containment check**: every patch's
`target_file` must resolve under `report.manifest_dir`. Anything
outside aborts with exit code 5 before any SHA verification.

---

## Decision tree for agents

When walking `findings[]` from a `--suggest-patches` report:

```text
for finding in active_findings:
    if finding.suggested_patch_kind == "manual":
        # Manual source/policy fix or never-auto-fix.
        # Read finding.patches[0].instructions and surface to user.
        # Do NOT attempt to auto-edit, especially for trace findings.
        surface_to_user(finding)
        continue

    if finding.suggested_patch_kind == "none":
        # Scan ran with --suggest-patches but the generator emitted
        # nothing for this finding (empty patches list — see "Three
        # patch states" above). There's nothing to apply via
        # apply-patches at any confidence level. Surface for human
        # triage instead.
        surface_to_user(finding)
        continue

    if finding.autofix_safe is True:
        # Safe to include in the next `apply-patches --confidence high`.
        plan_to_apply(finding)
        continue

    # Medium-confidence non-manual patch (e.g. scope coverage).
    # Surface as "review and run apply-patches --confidence medium"
    # but do not auto-apply on the high-confidence path.
    surface_for_medium_review(finding)
```

After running `apply-patches --apply`, re-run `scan` to confirm the
fixed findings are gone. The `run_id` will only change if the
manifest or tool surface actually changed — patches are excluded
from the hash so toggling `--suggest-patches` doesn't shift it.

---

## See also

- [`agent-autofix-boundary.md`](agent-autofix-boundary.md) — the
  *behavioral* counterpart to this *mechanical* page. What an agent may
  assert in a PR comment or review summary, beyond which patches
  `apply-patches` will run.
- [`agent-recipes.md`](agent-recipes.md) — copy-pasteable AI-agent
  workflows, including the soft-stop rule for `detect`.
- [`report-reading-for-agents.md`](report-reading-for-agents.md) —
  reader's primer for `report.json`.
- [`checks.md`](checks.md) — full check catalog with rationale.
- [`minimal-real-configs.md`](minimal-real-configs.md) — per-framework
  minimal manifests to build from.
- [`report-schema.v0.22.json`](report-schema.v0.22.json) — current JSON
  Schema for `report.json`.
- [`AGENTS.md`](../AGENTS.md) — top-level agent instructions, install,
  trigger table.
- [`STABILITY.md`](../STABILITY.md) — what won't break across `0.x`.
