v0.7.0 — Live Agent Runtime + Repo Audit

PROMOTE, HOLD, or BLOCK
— against your real agent.

release-gate runs evals live against your actual agent, validates traces, checks cost budgets, and generates an evidence pack — then gives you one number (0–100) and one decision.

30 seconds after pip install
$ pip install release-gate && release-gate demo Two agents. Same model. Same cloud provider. Different governance. Different outcome. customer-support-agent 100/100 ✓ PROMOTE data-export-agent 33/100 ✗ BLOCK ✗ FALLBACK_DECLARED: No kill switch — loop runs until credit limit hit ✗ ACTION_BUDGET: $22,500/day projected, no ceiling declared ✗ IDENTITY_BOUNDARY: No auth — anyone can exhaust your budget
✓ Fully governed — PROMOTE
governance-safe-pass.yaml
$ release-gate score governance.yaml Readiness Score 94 / 100 Confidence high safety 100 access_control 100 cost 90 fallback 100 eval_quality 85 observability 80 Critical failures none Decision: ✓ PROMOTE (score 94/100)
✗ Missing safeguards — BLOCK
governance-unsafe-fail.yaml
$ release-gate score governance.yaml Readiness Score 41 / 100 Confidence low safety 20 access_control 30 cost 40 fallback 0 Critical failures: FALLBACK_DECLARED — kill switch missing ACTION_BUDGET — no budget cap set Decision: ✗ BLOCK (score 41/100)

How it works

Four steps to a defensible release decision

release-gate slots into your CI/CD pipeline. No backend, no dashboard, no sign-up.

1

Write a governance.yaml next to your code

Declare your model, expected usage, daily budget cap, kill switch, eval cases, and trace policies. Takes about 5 minutes — or use release-gate init for an interactive wizard.

governance.yaml
agent: model: gpt-4-turbo daily_requests: 5000 checks: action_budget: {max_daily_cost: 500} fallback_declared: kill_switch: {type: feature-flag} team_owner: platform-team trace_policies: forbidden_tools: [delete_database, export_data] max_tool_calls: 10 max_retries: 2
2

Score every deploy candidate — live or static

Add --agent to run evals against your actual agent. release-gate calls it, scores the real responses, and captures latency. Without --agent, evals run in safe static mode — no LLM key, CI-friendly.

$ release-gate score governance.yaml \ --evals evals.yaml \ --agent py:my_pkg.agent:handle Readiness Score 94 / 100 confidence: high Evals run 7 (7 pass, 0 fail) pass rate 100% [live mode] Agent runtime 7 live call(s) avg 312ms · p95 480ms (0 error(s)) safety 100 (weight 30%) cost 90 (weight 20%) eval_quality 95 (weight 10%) ↑ live evals Decision: ✓ PROMOTE (score 94/100)
3

Catch regressions before they ship

Compare a baseline report against the candidate. Any dimension that drops more than 10 points — especially safety, fallback, or access control — is flagged as a regression and blocks the release automatically.

$ release-gate compare baseline.json candidate.json Baseline score 94/100 (PROMOTE) Candidate score 71/100 (HOLD) Score delta −23 points Regressions detected: safety 100 → 60 (-40 pts) CRITICAL fallback 100 → 75 (-25 pts) Decision: ✗ BLOCK — critical regression in safety
4

Generate an evidence pack for every release

One command produces three audit artefacts — a machine-readable JSON report, an executive Markdown summary, and a full HTML dashboard — ready for compliance, security review, or stakeholder sign-off.

$ release-gate evidence-pack governance.yaml ✓ release-evidence/readiness_report.json ✓ release-evidence/executive_summary.md ✓ release-evidence/release-gate-evidence.html Upload as CI artifact or attach to release PR.

Live demo

Real commands. Real output.

Run the interactive demo locally — no config file needed. Or explore individual commands below.

Interactive walkthrough — no files needed
$ release-gate demo 🚪 release-gate  |  Interactive Demo The CI step your AI agent is missing before it goes to production. SCENARIO A — customer-support-agent (full governance) Checking FALLBACK_DECLARED ... fallback_declared: kill_switch: {type: feature-flag, name: disable_support_agent} team_owner: platform-team # they get paged at 3am if costs spike ✓ FALLBACK_DECLARED kill switch declared, team owner assigned Checking ACTION_BUDGET ... action_budget: max_daily_cost: 500 # hard ceiling — agent pauses if hit ✓ ACTION_BUDGET projected $132/day — well within $500 cap Score: 100/100 Decision: ✓ PROMOTE SCENARIO B — data-export-agent (missing controls) Checking FALLBACK_DECLARED ... fallback_declared: fallback_mode: retry # ← no kill_switch, no team_owner, no runbook ✗ FALLBACK_DECLARED Config gap: No kill switch, no team owner, no runbook URL Real risk: Loop runs until OpenAI credit limit hit — could take hours Score: 33/100 Decision: ✗ BLOCK
Run it now:  pip install release-gate && release-gate demo  —  no governance.yaml needed, works immediately after install.
$ release-gate score examples/governance-safe-pass.yaml \ --evals examples/evals.yaml \ --traces examples/traces/safe-trace.json release-gate | Readiness Scorer v0.7.0 Project customer-support-agent v1.0.0 Checks run 5 (5 pass, 0 warn, 0 fail) Evals run 7 (7 pass, 0 fail) pass rate 100% Traces checked 1 (0 violations) Score 94 / 100 confidence: high Dimension Breakdown: safety 100 ██████████ (wt 30%) cost 90 █████████░ (wt 20%) access_control 100 ██████████ (wt 20%) fallback 100 ██████████ (wt 15%) eval_quality 85 ████████░░ (wt 10%) observability 80 ████████░░ (wt 5%) Critical failures none Decision: ✓ PROMOTE (score 94/100) exit 0
Try it yourself: pip install release-gate  then  release-gate score examples/governance-safe-pass.yaml

Features

From governance linter to release decision engine

Score, compare, validate traces, resolve live pricing — one tool, one decision before every deploy.

New v0.6
📊

Readiness Scorer — 0–100

Six weighted dimensions (safety 30%, cost 20%, access control 20%, fallback 15%, eval quality 10%, observability 5%) collapse into one number and one decision: PROMOTE, HOLD, or BLOCK.

New v0.6
🔍

Regression Gate

Compare any two readiness report snapshots. Drops >10 points in any dimension — especially safety, fallback, or access control — automatically BLOCK the release. Ship with a diff, not a guess.

New v0.6
🧪

Eval Runner

Declare behavior test cases in YAML: refuse_or_mask, contains_keywords, valid_json, no_tool_calls. Runs in static mode (CI-safe, no LLM key needed) or live mode with any agent callable.

New v0.6
🛡️

Trace Validator

Feed your agent’s execution trace (JSON or JSONL). Detects forbidden tool calls, allowed-list violations, retry storms, token budget overruns, and tool-call loops before they reach production.

New v0.6
📄

Evidence Pack

One command generates three audit artefacts: readiness_report.json, executive_summary.md, and release-gate-evidence.html. Attach to PRs, compliance tickets, or security reviews.

Phase 2

Live Agent Runtime

Add --agent py:module:fn, cmd:./script, or an https:// endpoint to run your eval suite live against the real agent. release-gate calls it, scores actual responses, and records per-call latency (avg / p50 / p95 / max). A failing or unreachable agent is a failed eval — no silent passes. Stdlib-only, no SDK required.

New
🧩

Model Intelligence Layer

Stop hardcoding prices. A model: block declares pricing source: static, custom, locked snapshot, OpenRouter live, or LiteLLM. Unknown pricing with on_unknown: hold fails the check — never assumes $0. Works for LLMs, embeddings, and self-hosted models.

New
🔒

Pricing Lock

Snapshot live prices into a tamper-evident pricing.lock.json (sha256-protected). CI scores offline, reproducibly. Stale snapshots (> max_age_days) surface as WARN so prices never drift silently.

v0.5
💸

Impact Simulator

Normal vs. runaway cost side-by-side. Engineering leaders see the dollars at risk, not YAML warnings. The HTML report uploads as a CI artifact automatically.

v0.5
🔒

Cryptographic Sign & Verify

Sign governance.yaml with RSA-PSS + SHA-256. Verify in CI that no one changed budget limits or policies after review.

v0.5
⚙️

GitHub Actions Native

5 lines in your workflow. Exit code 0 = PROMOTE, 10 = HOLD, 1 = BLOCK. The HTML report is auto-uploaded as a CI artifact — your team reviews it without leaving GitHub.

CI/CD Integration

Gate every push automatically

Works with GitHub Actions, GitLab CI, Jenkins, and any shell. All commands return structured exit codes.

# .github/workflows/governance.yml name: AI Release Gate on: [push, pull_request] jobs: release-gate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Score & gate release uses: VamsiSudhakaran1/release-gate@v0.7.0 with: command: score config: governance.yaml evals: evals.yaml html-report: evidence.html # evidence pack auto-uploaded as CI artifact
release-gate demo

Interactive demo — no config needed

release-gate score

0–100 score + PROMOTE/HOLD/BLOCK

release-gate compare

Regression detection vs baseline

release-gate evidence-pack

JSON + Markdown + HTML artefacts

score --agent <spec>

Run evals live against your real agent (py: / cmd: / https://)

release-gate pricing-lock

Snapshot live model prices to lock file

release-gate impact

Cost simulation & runaway scenario

🟢

Exit codes your pipeline understands

0 = PROMOTE / PASS — deploy it.
10 = HOLD / WARN — review before deploying.
1 = BLOCK / FAIL — do not deploy.

📋

Evidence pack as CI artifact

Every PR gets a readiness report, executive summary, and HTML dashboard — attached automatically so reviewers see the full picture without running anything locally.

🔄

Regression baseline in git

Commit readiness_report.json as your baseline. Run release-gate compare on every PR to catch silent degradations in safety or fallback coverage.

Governance checks

5 checks + evals + traces. One decision.

Each check maps to a real failure mode — cost explosion, no kill switch, open access, bad inputs, forbidden tool use.

Check / LayerWhat it validatesBlocked when
ACTION_BUDGET Estimated daily cost vs. declared budget cap Cost exceeds max_daily_cost or no budget set
BUDGET_SIMULATION Projected cost with retries, caching & spike multipliers across 10+ models Projected cost exceeds budget or multipliers are out of range
FALLBACK_DECLARED Kill switch, fallback mode, team owner, runbook URL Any field missing — no owner means no one gets paged at 3 AM
IDENTITY_BOUNDARY Auth required, rate limit configured, data isolation rules Auth is optional or rate limit absent — anyone can exhaust budget
INPUT_CONTRACT JSON Schema defined, valid & invalid sample payloads provided Schema missing (FAIL) or no valid samples (WARN)
Evals (behavior) refuse_or_mask, contains_keywords, valid_json, no_tool_calls — static (CI-safe) or live against a real agent via --agent Critical evals fail (safety category), or agent raises an error
Live Agent Runtime Per-call latency (avg / p50 / p95 / max), error rate, optional token usage — captured when --agent is set Unreachable or error-throwing agent counts as failed eval
Trace Validator Forbidden tools, allowed-list violations, retry storms, token budget, tool loops Any forbidden tool called or retry storm detected
Pricing Resolver Model token pricing from static table, custom inline, lock file, OpenRouter, or LiteLLM Pricing unknown & on_unknown: hold — never silently assumes $0

Get started in 30 seconds

pip install release-gate && release-gate demo

Run evals live: release-gate score governance.yaml --evals evals.yaml --agent py:my_pkg.agent:handle

View on GitHub →