Metadata-Version: 2.4
Name: rc-verdict
Version: 0.1.0a1
Summary: Modular Python library for multi-agent collective decision-making via LLM expert panels
Project-URL: Homepage, https://github.com/spidey99/rc-verdict
Project-URL: Repository, https://github.com/spidey99/rc-verdict
Project-URL: Issues, https://github.com/spidey99/rc-verdict/issues
Project-URL: Documentation, https://github.com/spidey99/rc-verdict/blob/main/docs/PROTOCOL.md
Author: Robert C. Jones
License-Expression: MIT
License-File: LICENSE
Keywords: decision-making,deliberation,expert-panel,llm,multi-agent
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: pydantic>=2.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: structlog>=23.0
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.20; extra == 'anthropic'
Provides-Extra: dev
Requires-Dist: anthropic>=0.20; extra == 'dev'
Requires-Dist: fastapi>=0.100; extra == 'dev'
Requires-Dist: httpx>=0.27; extra == 'dev'
Requires-Dist: huggingface-hub>=0.23; extra == 'dev'
Requires-Dist: mypy>=1.8; extra == 'dev'
Requires-Dist: openai>=1.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff<0.14,>=0.13; extra == 'dev'
Provides-Extra: huggingface
Requires-Dist: huggingface-hub>=0.23; extra == 'huggingface'
Provides-Extra: litellm
Requires-Dist: litellm>=1.0; extra == 'litellm'
Provides-Extra: ollama
Requires-Dist: openai>=1.0; extra == 'ollama'
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == 'openai'
Description-Content-Type: text/markdown

# rc-verdict

[![CI](https://github.com/spidey99/rc-verdict/actions/workflows/ci.yml/badge.svg)](https://github.com/spidey99/rc-verdict/actions/workflows/ci.yml)

**Multi-agent collective decision-making via LLM expert panels.**

rc-verdict assembles a panel of LLM "experts" — each wearing a distinct expertise hat with its own behavioral disposition — and drives them through a structured **vote → debate → converge** protocol mediated by a single overseer agent. Instead of trusting one model's first answer, you get a verdict with calibrated confidence, recorded dissent, and a fully auditable deliberation trace. It is a modular Python library (not a framework): plug in OpenAI, Anthropic, HuggingFace, Ollama — or the built-in MockBackend — and call it from any project.

📖 **[How the protocol works → docs/PROTOCOL.md](docs/PROTOCOL.md)**

---

## Installation

```bash
pip install rc-verdict                  # base package (includes MockBackend)
pip install "rc-verdict[openai]"        # or [anthropic], [huggingface], [ollama]
pip install "rc-verdict[dev]"           # development: testing, linting, coverage
```

> **Requires Python ≥ 3.11.** Until the first PyPI release lands, install from source: `pip install git+https://github.com/spidey99/rc-verdict`

---

## Quickstart

Run a full deliberation locally — no API keys needed:

```python
import asyncio, json
from rc_verdict import Panel, VerdictConfig
from rc_verdict.backends.mock import MockBackend

vote = json.dumps({"position": "approve", "conviction": "high",
                   "evidence_quality": "high", "reasoning": "Analysis complete.",
                   "dissent_points": []})
config = VerdictConfig(default_backend="mock", min_panel_size=3, max_panel_size=3,
                       max_token_budget=50000, pool_experts_ratio=1.0)
panel = Panel(config, backend_factory=lambda provider="mock", **kw: MockBackend(responses=[vote] * 3))

result = asyncio.run(panel.deliberate("A caching PR with 92% test coverage.",
                                      "Should we merge this PR?"))
print(f"Decision: {result.decision}  Conviction: {result.confidence.conviction.value}")
```

With a real backend it is one call — experts are selected and generated for your question automatically:

```python
from rc_verdict import deliberate

result = await deliberate("Should we open-source our ETL tool?")  # uses OPENAI_API_KEY
print(result.decision)
print(result.trace.to_markdown(verdict=result.verdict))  # full deliberation trace
```

See [examples/](examples/) for more — custom experts, multi-backend routing, and more.

---

## Core Architecture

```
┌─────────────────────────────────────────┐
│              CALLER / HOST APP          │
│  verdict = await panel.deliberate(input)│
└────────────────┬────────────────────────┘
                 │
        ┌────────▼────────┐
        │    OVERSEER     │  Mediates all rounds
        │  (single agent) │  Controls escalation
        └────────┬────────┘
                 │
     ┌───────────┼───────────┐
     │           │           │
 ┌───▼───┐  ┌───▼───┐  ┌───▼───┐
 │Expert │  │Expert │  │Expert │   3-7 panelists
 │  #1   │  │  #2   │  │  #3   │   Diverse hats
 │ "hat" │  │ "hat" │  │ "hat" │   + dispositions
 └───────┘  └───────┘  └───────┘
```

---

## Protocol Flow

1. **Expert Selection** — Overseer analyzes input context and selects relevant experts from:
   - A pool of **pre-canned experts** (YAML-defined domain specialists)
   - **Dynamically generated experts** — model-derived hats + dispositions tailored to the input

2. **Vote Round** — Each expert independently casts a confidence-weighted vote

3. **Unanimity Check** — All agree → verdict returned immediately (cheap path)

4. **Debate Round** — If not unanimous:
   - Each expert's reasoning is shared with all others
   - Experts process peer reasoning and update positions
   - Overseer may inject clarifying questions

5. **Re-Vote** — Experts vote again with updated reasoning

6. **Convergence** — If still stuck after N rounds:
   - **Elimination** — Remove the expert with lowest confidence
   - **Supermajority** — Accept at ≥80% agreement threshold
   - **Overseer Override** — Synthesize a verdict from all reasoning
   - **Deadlock** — Return structured disagreement with all positions

---

## Key Differentiators

| Feature | rc-verdict | Typical MAD | Why it matters |
|---------|-----------|-------------|----------------|
| Expert pool (pre-canned + dynamic) | ✅ | ❌ | Right experts for the problem |
| Escalation ladder (vote→debate→eliminate) | ✅ | ❌ | Cheap when agreement is easy |
| Overseer-mediated rounds | ✅ | Partial | Prevents drift, controls cost |
| Confidence-gated elimination | ✅ | ❌ | Novel convergence mechanism |
| Modular library (not a framework) | ✅ | ❌ | Call from any project |
| Heterogeneous model backends | ✅ | Rare | Diversity > count (empirically proven) |

---

## Research Foundation

Design informed by 30+ papers (2023–2026) on multi-agent deliberation:

- **Debate-or-Vote (NeurIPS 2025):** Voting alone captures most MAD gains; debate is a martingale without bias correction
- **Diversity of Thought (2024):** Diverse model families outperform N copies of the same model
- **MARS (2025):** Meta-reviewer pattern achieves MAD accuracy at 50% token cost
- **Demystifying MAD (Jan 2026):** Confidence-modulated debate + diversity-aware initialization are the two interventions that actually work
- **MachineSoM (EMNLP 2023):** LLM agents exhibit human-like social dynamics; personality traits affect collaboration

---

## More Resources

- [examples/](examples/) -- Runnable scripts for common use cases
- [docs/PROTOCOL.md](docs/PROTOCOL.md) -- Protocol state machine and convergence strategies
- [docs/](docs/) -- Signal capture protocol and design documentation
- [LICENSE](LICENSE) -- MIT

---

## Status

🚧 **Alpha** — Core protocol working. All backends (OpenAI, Anthropic, HuggingFace, Ollama, Mock) implemented. See [KICKOFF_PROMPT.md](KICKOFF_PROMPT.md) for the full implementation spec and [RELEASE_CHECKLIST.md](RELEASE_CHECKLIST.md) for the path to PyPI.

## License

MIT
