Metadata-Version: 2.4
Name: agent-council
Version: 0.2.0
Summary: Config-driven multi-agent debate council — add a reliability layer to any LLM pipeline
Author: Supreeth Ravi
License: MIT
Project-URL: Homepage, https://github.com/supreethravi/agent-council
Project-URL: Repository, https://github.com/supreethravi/agent-council
Project-URL: Bug Tracker, https://github.com/supreethravi/agent-council/issues
Keywords: llm,agents,multi-agent,debate,council,anthropic,openai,ollama,reliability
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: anthropic>=0.28
Requires-Dist: openai>=1.30
Requires-Dist: httpx>=0.27
Requires-Dist: pydantic>=2.7
Requires-Dist: pyyaml>=6.0
Requires-Dist: typer>=0.12
Requires-Dist: rich>=13.7
Provides-Extra: server
Requires-Dist: fastapi>=0.111; extra == "server"
Requires-Dist: uvicorn[standard]>=0.30; extra == "server"

# agent-council

A reliability layer for LLM pipelines. Multiple agents with distinct personas debate a topic across iterative rounds; a judge synthesizes a final verdict.

Drop it into any agent to pressure-test a decision before committing to it.

Inspired by [mshumer/llmcouncil](https://github.com/mshumer/llmcouncil) and the [Mixture of Agents](https://arxiv.org/abs/2406.04692) research.

---

## Install

```bash
pip install agent-council
```

Requires Python ≥ 3.11.

---

## Programmatic usage

This is the primary interface — use it inside your own agents and pipelines.

```python
from agent_council import CouncilOrchestrator, MemberConfig, JudgeConfig

orchestrator = CouncilOrchestrator(
    members=[
        MemberConfig(
            id="analyst",
            name="The Analyst",
            provider="anthropic",
            model="claude-sonnet-4-6",
            persona="Rigorous analytical thinker. Evidence-based, structured.",
        ),
        MemberConfig(
            id="skeptic",
            name="The Skeptic",
            provider="openai",
            model="gpt-4o",
            persona="Challenge every assumption. Surface hidden risks.",
        ),
    ],
    judge=JudgeConfig(provider="anthropic", model="claude-opus-4-6"),
    rounds=3,
    early_exit_threshold=0.85,
)

session, verdict = await orchestrator.run("Should we adopt microservices?")

print(verdict.verdict)           # synthesized conclusion
print(verdict.consensus_level)   # ConsensusLevel.STRONG / MODERATE / WEAK / NONE
print(verdict.consensus_score)   # float 0–1
print(verdict.key_agreements)    # list[str]
print(verdict.dissenting_views)  # list[str]
```

API keys are read from environment variables (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`) by default.

### Streaming callbacks

Callbacks fire as each member finishes — no waiting for a full round to complete.
Both sync and async callbacks are supported.

```python
def on_member(resp):
    print(f"[{resp.member_name}] {resp.stance} ({resp.confidence:.0%} confident)")

async def on_round(round_):
    await db.save_round(round_)   # async is fine too

session, verdict = await orchestrator.run(
    "Should we rewrite in Rust?",
    on_member_response=on_member,
    on_round_complete=on_round,
)
```

### Provider overrides

Override temperature, token limits, or use a custom Ollama endpoint:

```python
from agent_council import (
    CouncilOrchestrator, MemberConfig, JudgeConfig,
    ProvidersConfig, AnthropicProviderConfig, OllamaProviderConfig,
)

orchestrator = CouncilOrchestrator(
    members=[...],
    judge=JudgeConfig(provider="anthropic", model="claude-opus-4-6"),
    provider_configs=ProvidersConfig(
        anthropic=AnthropicProviderConfig(temperature=0.5, max_tokens=1024),
        ollama=OllamaProviderConfig(base_url="http://my-ollama:11434"),
    ),
)
```

### Result types

```python
@dataclass FinalVerdict:
    topic: str
    verdict: str
    consensus_level: ConsensusLevel      # STRONG / MODERATE / WEAK / NONE
    consensus_score: float               # 0–1
    key_agreements: list[str]
    dissenting_views: list[str]
    rounds_completed: int
    early_exit: bool
    total_duration_seconds: float

@dataclass CouncilSession:
    session_id: str
    topic: str
    started_at: datetime
    rounds: list[DebateRound]            # full transcript
    verdict: FinalVerdict | None

@dataclass MemberResponse:
    member_id, member_name, round_number: str / int
    content: str                         # full response text
    stance: str                          # one-line position summary
    confidence: float                    # 0–1
    changed_position: bool
```

---
## Why Reliability

LLMs can be confident and wrong. A single-shot response hides uncertainty and failure modes (hallucinations, missed trade‑offs, overfitting to prompt). Agent Council creates informed disagreement and then reconciles it:
- Independent agents surface blind spots and competing views.
- Iterative rounds reward stable, consistent positions (consensus score).
- A neutral judge synthesizes agreements and dissent for transparent decision‑making.

What improves:
- Decision quality: fewer unexamined assumptions, clearer trade‑offs.
- Traceability: full transcript and structured verdict for audits/reviews.
- Safety: configurable early exit threshold to avoid premature consensus.
- Extensibility: mix providers/models (Anthropic, OpenAI, OpenRouter, Ollama).

---
## Examples & Use Cases

- Architecture choices: “Monolith vs microservices for product X?”
- Launch reviews: “Are we production‑ready? What risks remain?”
- AI safety checks: “Could this prompt produce unsafe output?”
- Product strategy: “Should pricing move to usage‑based?”
- Code migration: “Rewrite to Rust? What are the costs/benefits?”

Programmatic snippet (minimal):

```python
session, verdict = await CouncilOrchestrator(
    members=[
        MemberConfig(id="analyst", name="Analyst", provider="openrouter", model="anthropic/claude-3.5-sonnet"),
        MemberConfig(id="skeptic", name="Skeptic", provider="openrouter", model="openai/gpt-4o"),
    ],
    judge=JudgeConfig(provider="openrouter", model="anthropic/claude-3.5-sonnet"),
).run("Should we adopt microservices?")
print(verdict.verdict)
```

CLI:

```bash
export OPENROUTER_API_KEY=sk-or-...
council review "Remote work vs office?" -c config/council.yaml
```

HTTP (server):

```bash
pip install "agent-council[server]"
council serve
# POST http://127.0.0.1:8000/review {"topic":"Should we rewrite in Rust?"}
```

---

## How the debate works

```
Round 1 — all members respond independently to the topic
Round 2..N — each member reads all peers' responses and may revise
             → early exit if consensus score ≥ threshold
Judge — reads full transcript, synthesizes final verdict
```

**Consensus score** = `avg_confidence × (1 − changed_fraction)`
Rewards both high confidence *and* stability across rounds.

---

## Adding a provider

Implement `BaseModelAdapter` (one method) and register it in the factory:

```python
# agent_council/adapters/my_provider.py
from agent_council.adapters.base import BaseModelAdapter

class MyProviderAdapter(BaseModelAdapter):
    async def complete(self, system: str, user: str) -> str:
        # call your model here
        ...
```

```python
# agent_council/adapters/__init__.py — add to build_adapter()
case "myprovider":
    return MyProviderAdapter(member_cfg, provider_cfg)
```

---

## Config file (optional)

For teams who prefer YAML over code:

```yaml
# config/council.yaml
council:
  debate_rounds: 3
  early_exit_threshold: 0.85

members:
  - id: "analyst"
    name: "The Analyst"
    provider: "anthropic"
    model: "claude-sonnet-4-6"
    persona: "Rigorous analytical thinker."

  - id: "skeptic"
    name: "The Skeptic"
    provider: "openai"
    model: "gpt-4o"
    persona: "Challenge every assumption."

judge:
  provider: "anthropic"
  model: "claude-opus-4-6"

providers:
  # Optional: use OpenRouter with OpenAI-compatible models
  openrouter:
    api_key_env: "OPENROUTER_API_KEY"
    base_url: "https://openrouter.ai/api/v1"
    max_tokens: 2048
    temperature: 0.7
```

```python
orchestrator = CouncilOrchestrator.from_config_file("config/council.yaml")
```

---

## CLI (convenience)

```bash
# Install with CLI support (included by default)
pip install agent-council

export ANTHROPIC_API_KEY=sk-ant-...
council review "Is Python the best language for data science?"
council review "Should we rewrite in Rust?" --config config/council.yaml
council review "Remote work vs office?" --no-rounds
```

---

## HTTP server (optional)

```bash
pip install "agent-council[server]"
council serve                       # http://127.0.0.1:8000
```

| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/health` | Liveness check |
| `POST` | `/review` | Run debate, return full JSON result |
| `POST` | `/review/stream` | Run debate, stream events via SSE |

Interactive docs at `http://localhost:8000/docs`.

**SSE event stream** (`POST /review/stream`):
```
data: {"event": "member_response", "data": {...}}
data: {"event": "round_complete",  "data": {"round_number": 1, "consensus_score": 0.62}}
data: {"event": "verdict",         "data": {...}}
```

---
## Tracing

Enable JSON traces to inspect every response, round, and the final verdict. Two ways:

1) Programmatic recorder

```python
from agent_council.tracing import TraceRecorder

rec = TraceRecorder()
session, verdict = await orchestrator.run("Your topic", trace=rec)
path = rec.save("traces/")
print("Trace saved to", path)
```

2) Build a trace from a finished session

```python
from agent_council.tracing import TraceRecorder
rec = TraceRecorder.from_session(session, verdict)
rec.save("traces/")
```

Each trace is a single JSON file containing per-member responses, round summaries, and the final verdict.

---
## Budget Awareness (optional)

Control spend with a simple per‑call budget guard. Set these env vars:

- `COUNCIL_COST_PER_CALL_USD` — estimated cost charged per model call.
- `COUNCIL_MAX_BUDGET_USD` — cap for the whole run; if exceeded and `COUNCIL_BUDGET_HARD_STOP` is true (default), the run stops.
- `COUNCIL_BUDGET_HARD_STOP` — set `0`/`false` for soft cap.

This wraps each provider adapter and enforces the budget before calling the model. For precise accounting, plug in your own adapter with exact token‑based pricing.

---

## Project layout

```
agent_council/
├── __init__.py          # public API
├── orchestrator.py      # CouncilOrchestrator — primary entry point
├── member.py            # prompt construction + JSON parsing
├── debate.py            # round loop + consensus scoring
├── judge.py             # synthesis + FinalVerdict
├── config.py            # Pydantic config schema
├── types.py             # result dataclasses
├── server.py            # FastAPI app (optional)
└── adapters/
    ├── base.py
    ├── anthropic_adapter.py
    ├── openai_adapter.py
    └── ollama_adapter.py
```

---

## License

MIT
