v0.6.0 — AI release decision engine

PROMOTE, HOLD, or BLOCK
— with a score to prove it.

release-gate runs evals, validates agent traces, checks cost budgets, and generates an evidence pack — then gives you one number (0–100) and one decision.

✓ Fully governed — PROMOTE
governance-safe-pass.yaml
$ release-gate score governance.yaml Readiness Score 94 / 100 Confidence high safety 100 access_control 100 cost 90 fallback 100 eval_quality 85 observability 80 Critical failures none Decision: ✓ PROMOTE (score 94/100)
✗ Missing safeguards — BLOCK
governance-unsafe-fail.yaml
$ release-gate score governance.yaml Readiness Score 41 / 100 Confidence low safety 20 access_control 30 cost 40 fallback 0 Critical failures: FALLBACK_DECLARED — kill switch missing ACTION_BUDGET — no budget cap set Decision: ✗ BLOCK (score 41/100)

How it works

Four steps to a defensible release decision

release-gate slots into your CI/CD pipeline. No backend, no dashboard, no sign-up.

1

Write a governance.yaml next to your code

Declare your model, expected usage, daily budget cap, kill switch, eval cases, and trace policies. Takes about 5 minutes — or use release-gate init for an interactive wizard.

governance.yaml
agent: model: gpt-4-turbo daily_requests: 5000 checks: action_budget: {max_daily_cost: 500} fallback_declared: kill_switch: {type: feature-flag} team_owner: platform-team trace_policies: forbidden_tools: [delete_database, export_data] max_tool_calls: 10 max_retries: 2
2

Score every deploy candidate

One command evaluates safety, cost, access control, fallback, eval quality, and observability. You get a 0–100 readiness score and a PROMOTE / HOLD / BLOCK decision — not just pass/fail YAML warnings.

$ release-gate score governance.yaml --evals evals.yaml Readiness Score 94 / 100 confidence: high Dimension Breakdown: safety 100 (weight 30%) cost 90 (weight 20%) access_control 100 (weight 20%) fallback 100 (weight 15%) eval_quality 85 (weight 10%) observability 80 (weight 5%) Decision: ✓ PROMOTE (score 94/100)
3

Catch regressions before they ship

Compare a baseline report against the candidate. Any dimension that drops more than 10 points — especially safety, fallback, or access control — is flagged as a regression and blocks the release automatically.

$ release-gate compare baseline.json candidate.json Baseline score 94/100 (PROMOTE) Candidate score 71/100 (HOLD) Score delta −23 points Regressions detected: safety 100 → 60 (-40 pts) CRITICAL fallback 100 → 75 (-25 pts) Decision: ✗ BLOCK — critical regression in safety
4

Generate an evidence pack for every release

One command produces three audit artefacts — a machine-readable JSON report, an executive Markdown summary, and a full HTML dashboard — ready for compliance, security review, or stakeholder sign-off.

$ release-gate evidence-pack governance.yaml ✓ release-evidence/readiness_report.json ✓ release-evidence/executive_summary.md ✓ release-evidence/release-gate-evidence.html Upload as CI artifact or attach to release PR.

Live demo

Real commands. Real output.

Four scenarios showing score, regression detection, trace violation, and the impact simulator.

$ release-gate score examples/governance-safe-pass.yaml \ --evals examples/evals.yaml \ --traces examples/traces/safe-trace.json release-gate | Readiness Scorer v0.6.0 Project customer-support-agent v1.0.0 Checks run 5 (5 pass, 0 warn, 0 fail) Evals run 7 (7 pass, 0 fail) pass rate 100% Traces checked 1 (0 violations) Score 94 / 100 confidence: high Dimension Breakdown: safety 100 ██████████ (wt 30%) cost 90 █████████░ (wt 20%) access_control 100 ██████████ (wt 20%) fallback 100 ██████████ (wt 15%) eval_quality 85 ████████░░ (wt 10%) observability 80 ████████░░ (wt 5%) Critical failures none Decision: ✓ PROMOTE (score 94/100) exit 0
Try it yourself: pip install release-gate  then  release-gate score examples/governance-safe-pass.yaml

What’s new in v0.6

From governance linter to release decision engine

Five new capabilities turn static YAML checks into a real AI deployment gate.

New v0.6
📊

Readiness Scorer — 0–100

Six weighted dimensions (safety 30%, cost 20%, access control 20%, fallback 15%, eval quality 10%, observability 5%) collapse into one number and one decision: PROMOTE, HOLD, or BLOCK.

New v0.6
🔍

Regression Gate

Compare any two readiness report snapshots. Drops >10 points in any dimension — especially safety, fallback, or access control — automatically BLOCK the release. Ship with a diff, not a guess.

New v0.6
🧪

Eval Runner

Declare behavior test cases in YAML: refuse_or_mask, contains_keywords, valid_json, no_tool_calls. Runs in static mode (CI-safe, no LLM key needed) or live mode with any agent callable.

New v0.6
🛡️

Trace Validator

Feed your agent’s execution trace (JSON or JSONL). Detects forbidden tool calls, allowed-list violations, retry storms, token budget overruns, and tool-call loops before they reach production.

New v0.6
📄

Evidence Pack

One command generates three audit artefacts: readiness_report.json, executive_summary.md, and release-gate-evidence.html. Attach to PRs, compliance tickets, or security reviews.

v0.5
💸

Impact Simulator

Normal vs. runaway cost side-by-side. Engineering leaders see the dollars at risk, not YAML warnings. The HTML report uploads as a CI artifact automatically.

v0.5
🔒

Cryptographic Sign & Verify

Sign governance.yaml with RSA-PSS + SHA-256. Verify in CI that no one changed budget limits or policies after review.

v0.5
⚙️

GitHub Actions Native

5 lines in your workflow. Exit code 0 = PROMOTE, 10 = HOLD, 1 = BLOCK. The HTML report is auto-uploaded as a CI artifact — your team reviews it without leaving GitHub.

CI/CD Integration

Gate every push automatically

Works with GitHub Actions, GitLab CI, Jenkins, and any shell. All commands return structured exit codes.

# .github/workflows/governance.yml name: AI Release Gate on: [push, pull_request] jobs: release-gate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Score & gate release uses: VamsiSudhakaran1/release-gate@v0.6.0 with: command: score config: governance.yaml evals: evals.yaml html-report: evidence.html # evidence pack auto-uploaded as CI artifact
release-gate score

0–100 score + PROMOTE/HOLD/BLOCK

release-gate compare

Regression detection vs baseline

release-gate evidence-pack

JSON + Markdown + HTML artefacts

release-gate impact

Cost simulation (v0.5)

🟢

Exit codes your pipeline understands

0 = PROMOTE / PASS — deploy it.
10 = HOLD / WARN — review before deploying.
1 = BLOCK / FAIL — do not deploy.

📋

Evidence pack as CI artifact

Every PR gets a readiness report, executive summary, and HTML dashboard — attached automatically so reviewers see the full picture without running anything locally.

🔄

Regression baseline in git

Commit readiness_report.json as your baseline. Run release-gate compare on every PR to catch silent degradations in safety or fallback coverage.

Governance checks

5 checks + evals + traces. One decision.

Each check maps to a real failure mode — cost explosion, no kill switch, open access, bad inputs, forbidden tool use.

Check / LayerWhat it validatesBlocked when
ACTION_BUDGET Estimated daily cost vs. declared budget cap Cost exceeds max_daily_cost or no budget set
BUDGET_SIMULATION Projected cost with retries, caching & spike multipliers across 10+ models Projected cost exceeds budget or multipliers are out of range
FALLBACK_DECLARED Kill switch, fallback mode, team owner, runbook URL Any field missing — no owner means no one gets paged at 3 AM
IDENTITY_BOUNDARY Auth required, rate limit configured, data isolation rules Auth is optional or rate limit absent — anyone can exhaust budget
INPUT_CONTRACT JSON Schema defined, valid & invalid sample payloads provided Schema missing (FAIL) or no valid samples (WARN)
Evals (behavior) refuse_or_mask, contains_keywords, valid_json, no_tool_calls — declared in YAML Critical evals fail (safety category)
Trace Validator Forbidden tools, allowed-list violations, retry storms, token budget, tool loops Any forbidden tool called or retry storm detected

Get started in 60 seconds

pip install release-gate && release-gate init
View on GitHub →