Metadata-Version: 2.4
Name: quorum-gate
Version: 0.1.0
Summary: Quorum: a verification gate that runs isolated, fallible checks against a throwaway copy of a codebase and promotes a change only if it truly passes.
Author-email: Douglas Gregor <1dg618@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/1dg618/quorum-gate
Project-URL: Repository, https://github.com/1dg618/quorum-gate
Project-URL: Issues, https://github.com/1dg618/quorum-gate/issues
Keywords: testing,verification,ci,code-review,quality-gate,sandbox,isolation
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Testing
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Provides-Extra: yaml
Requires-Dist: PyYAML>=5.1; extra == "yaml"
Provides-Extra: dev
Requires-Dist: PyYAML>=5.1; extra == "dev"
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"

# Quorum — a verification gate for code changes

> Distributed on PyPI as **`quorum-gate`**; imported and invoked as **`quorum`**.

Quorum decides whether a code change is safe to keep. It is built around one
idea: **"it compiled" and "it passed all checks" are different claims, and only
the second — backed by checks that can actually fail — is worth trusting.**

You give it a list of independent check functions. Each one takes a *throwaway
copy* of your codebase and returns pass/fail with a reason. Some checks are
static ("does every file still compile?"), but the useful ones are
**behavioral**: they spin up a subprocess, import the candidate's modified code
in isolation, and actually run it — e.g. launch 16 threads at a spend tracker to
check for lost updates, or feed a `tool_use`/`tool_result` pair through a message
trimmer to confirm it's never split.

A change is promoted to your live files only if a quorum of checks agrees: it
passes everything (or, in scored mode, strictly improves the score without
breaking any check that was already passing). Otherwise it's discarded — and
since it only ever touched a copy, there's nothing to roll back.

## Why subprocesses

Two layers of isolation, and the second is the one that matters:

1. **Filesystem isolation** — every check sees a fresh disposable copy. Nothing
   it writes survives or affects your real files.
2. **Process isolation** — each check runs in its own subprocess with a wall
   clock timeout. A broken candidate that infinite-loops, deadlocks 16 threads,
   segfaults, runs out of memory, or hard-exits takes down a *disposable child*
   and is reported back as `TIMEOUT` / `CRASHED`. It cannot crash or pollute the
   verifier that is judging it.

A naive test runner imports candidate code into its own process — one bad
candidate can then hang or corrupt the judge. Quorum never does this.

## Install

```bash
pip install quorum-gate          # PyYAML is only needed if you use --config
# or, from a checkout:
pip install -e .
```

The installed command is `quorum`; the import package is `quorum`.

## Define checks

A **function check** takes the path to the throwaway copy and returns a
`CheckResult`, a `bool`, or a `(passed, reason[, score])` tuple. Register checks
on a module-level `gate` object:

```python
# checks.py
import importlib, threading
from quorum import Gate, CheckResult, Outcome

gate = Gate()

@gate.check(name="spend_tracker_no_lost_updates", timeout_s=20)
def spend_tracker(codebase_path):
    core = importlib.import_module("myapp.core")
    tracker = core.SpendTracker()
    threads = [threading.Thread(target=lambda: [tracker.add(1) for _ in range(5000)])
               for _ in range(16)]
    for t in threads: t.start()
    for t in threads: t.join()
    expected = 16 * 5000
    ok = tracker.total == expected
    return CheckResult("spend_tracker_no_lost_updates", ok,
                       "all increments recorded" if ok else f"lost {expected - tracker.total}",
                       Outcome.PASSED if ok else Outcome.FAILED,
                       score=tracker.total / expected)
```

A **shell check** lets you reuse tools you already have (`pytest`, `mypy`,
`ruff`, a compile step), expressed in YAML:

```yaml
# quorum.yaml
checks:
  - name: types
    command: "mypy myapp/"
    timeout_s: 60
  - name: unit_tests
    command: "pytest -q && echo passed=1"
    timeout_s: 120
    score_from: passed     # parses `passed=<number>` from stdout as the score
```

## Run

```bash
# pass/fail mode: promote iff every check passes
quorum verify --candidate ./candidate --checks checks.py --config quorum.yaml

# scored mode: promote iff candidate strictly improves the total score
# without regressing any check that was passing on the baseline
quorum scored --candidate ./candidate --baseline ./live --checks checks.py

# actually copy a passing candidate over your live files
quorum verify --candidate ./candidate --checks checks.py --promote --live ./live
```

(`python -m quorum.cli ...` works identically if you prefer not to rely on the
console script.)

Exit code is `0` if promoted, `1` otherwise — so it drops straight into CI or a
patch-proposing loop.

## The two modes

**pass/fail** — promote iff every check passes. Simple and strict.

**scored** — each check can return a numeric `score`. Quorum runs the checks
against your baseline (the current live code) first to learn which checks were
already passing and what the baseline score was, then runs them against the
candidate. It promotes only if the candidate's total score is **strictly
greater** and **no previously-passing check now fails**. This is the mode for
"make it better without making anything worse" — e.g. an optimizer or an
agent proposing patches.

## Use as a library

```python
from quorum import Gate

g = Gate()
g.add_function(my_check)
g.add_shell("tests", "pytest -q")

report = g.verify("./candidate")
print(report.summary())
if g.promote("./candidate", "./live", report):
    print("shipped")
```

## What a result tells you

Every check returns a reason, not just a verdict — that's the point. Outcomes
are `PASSED`, `FAILED`, `TIMEOUT` (exceeded its budget), `CRASHED` (process died
without a verdict), or `ERROR` (the check function itself raised).

## Try the example

```bash
quorum verify --candidate examples/candidate_fixed  --checks examples/checks.py  # promoted
quorum verify --candidate examples/live_code         --checks examples/checks.py  # rejected: real bugs
quorum verify --candidate examples/candidate_broken  --checks examples/checks.py  # one check times out, Quorum survives
```
