Metadata-Version: 2.4
Name: ghostjury
Version: 0.3.0
Summary: Multi-model deliberation engine for security code review. Empanels a jury of LLMs, returns a hash-chained verdict.
Author-email: Adam Thomas <adam@ghostlogic.dev>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/adam-scott-thomas/ghostjury
Project-URL: Source, https://github.com/adam-scott-thomas/ghostjury
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3.12
Classifier: Intended Audience :: Developers
Classifier: Topic :: Security
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.6
Requires-Dist: httpx>=0.27
Requires-Dist: typing-extensions>=4.10
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Provides-Extra: spine
Requires-Dist: maelspine>=0.1.0; extra == "spine"
Dynamic: license-file

# GhostJury

**Multi-model deliberation engine for security code review.**

Submit code. GhostJury empanels a jury of 5 independent LLMs, each returns a binary verdict with reasoning, and the aggregated result is hash-chained so votes and appeals bind to the exact analysis that produced them.

```
         ┌─────────────┐
         │  Submission │
         └──────┬──────┘
                ▼
         ┌─────────────┐
         │    Triage   │  Haiku 4.5 — "worth empaneling a jury?"
         └──────┬──────┘
                ▼
    ┌───────┬───┴───┬────────┬───────┐
    ▼       ▼       ▼        ▼       ▼
 Sonnet  Opus   GPT-5.2  o4-mini  Grok 4
    │       │       │        │       │
    └───────┴───┬───┴────────┴───────┘
                ▼
         ┌─────────────┐
         │  Synthesis  │  rule-based aggregation
         └──────┬──────┘
                ▼
         ┌─────────────┐
         │   Verdict   │  APPROVED │ REVIEW │ REJECTED
         └─────────────┘
```

## Cost per audit (verified 2026-04-14)

Per-audit assumptions: ~3,500 input tokens / ~200 output tokens per juror, plus a Haiku triage call. **Prompt caching is enabled by default on Claude jurors** — the ~800-token system prompt becomes a cache hit after the first call, so subsequent jurors and the triage all read it at a 90% discount.

| Juror | Model | $/M in | $/M out | Per call (cached) |
|---|---|---|---|---|
| 1 | claude-sonnet-4-6 | $3.00 | $15.00 | ~$0.011 |
| 2 | claude-opus-4-6 | $15.00 | $75.00 | ~$0.054 |
| 3 | gpt-5.2 | $5.00* | $20.00* | ~$0.022 |
| 4 | o4-mini | $3.00* | $12.00* | ~$0.022 |
| 5 | grok-4.20-0309-reasoning | $2.00 | $6.00 | ~$0.008 |
| Triage | claude-haiku-4-5 | $0.80 | $4.00 | ~$0.001 |
| | | **Per audit** | **~$0.118** | (cached path) |

\* Estimate, see `pricing.py` for confidence tier and source.

At volume, with a Haiku triage that skips ~90% of trivial submissions:

| Audits/day | Cost/day | Cost/month |
|---|---|---|
| 100 | ~$1.20 | ~$36 |
| 1,000 | ~$12 | ~$360 |
| 10,000 | ~$120 | ~$3,600 |

Without the triage gate the numbers are roughly 10× higher — the gate is the primary cost lever.

## Voting rule

Each juror returns `{ malicious: bool, reason: str }` on a submission. GhostJury aggregates:

| Yes votes | Tier | Action |
|---|---|---|
| 0 | `APPROVED` | auto-approve |
| 1 | `REVIEW` | human review queue — dissenting juror's reason is visible |
| 2+ | `REJECTED` | auto-reject — appeal allowed |

Fail-closed posture. Any juror flag bumps the submission out of auto-approve.

## Hash chain

Every verdict carries a `verdict_hash` derived from:

```
sha256(
  submission_sha      ||
  triage_result_json  ||
  jurors_json         ||  # all 5 juror verdicts, in roster order
  synthesis_json      ||
  roster_json            # model versions + juror order
)
```

This matters because when users sign votes on the public page, their signature binds to the *specific* analysis, not the submission id. Editing the analysis invalidates every existing signature — which is the point.

## Install

```bash
pip install ghostjury
```

## Quick start

**No network — safe demo path:**

```bash
ghostjury audit path/to/script.py --mock
```

**Real jury (requires API keys):**

```bash
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export XAI_API_KEY=xai-...

ghostjury audit path/to/suspicious_script.py
```

The CLI refuses to start a real jury with partial keys — it checks every
provider in the default roster first and lists exactly which env vars
are missing. No half-juries, no wasted tokens.

## Architecture

- `schema.py` — Pydantic models for every artifact
- `hashing.py` — content-hash computation
- `prompts.py` — triage + juror system prompts
- `pricing.py` — per-model cost estimates (audit and update before production)
- `router.py` — backend abstraction: `MockBackend` for tests, `HttpxBackend` for real calls
- `triage.py` — Haiku 4.5 gate
- `jury.py` — juror orchestration (currently sequential; becomes parallel once ghostrouter ships `execute_parallel`)
- `synthesis.py` — rule-based verdict aggregation
- `cli.py` — command-line entry

## License

Apache 2.0.
