Metadata-Version: 2.4
Name: consilium
Version: 0.3.0
Summary: Multi-model deliberation CLI. 4 frontier LLMs debate with rotating challenger, then Claude judges.
Project-URL: Homepage, https://github.com/terry-li-hm/consilium
Project-URL: Repository, https://github.com/terry-li-hm/consilium
Project-URL: Issues, https://github.com/terry-li-hm/consilium/issues
Author-email: Terry Li <terry.li.hm@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: ai,consilium,council,debate,deliberation,llm,multi-model,openrouter
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Requires-Dist: httpx>=0.25.0
Requires-Dist: pydantic>=2.0
Requires-Dist: pyyaml>=6.0
Description-Content-Type: text/markdown

# Consilium

Multi-model deliberation CLI. 4 frontier LLMs debate a question, then Claude judges and synthesizes.

> *consilium* (Latin): counsel, deliberation, plan

Inspired by [Andrej Karpathy's LLM Council](https://github.com/karpathy/llm-council), with added blind phase (anti-anchoring), explicit engagement requirements, rotating challenger role, and social calibration mode.

## Models

**Council (deliberators):**
- GPT (gpt-5.2-pro)
- Gemini (gemini-3-pro-preview)
- Grok (grok-4)
- Kimi (kimi-k2.5)

**Judge:** Claude Opus 4.5 (synthesizes + adds own perspective)

## Installation

```bash
pip install consilium
```

Or with uv:
```bash
uv tool install consilium
```

## Setup

Set your OpenRouter API key:
```bash
export OPENROUTER_API_KEY=sk-or-v1-...
```

Optional fallback keys (for flaky models):
```bash
export GOOGLE_API_KEY=AIza...      # Gemini fallback
export MOONSHOT_API_KEY=sk-...     # Kimi fallback
```

## Usage

```bash
# Basic question
consilium "Should we use microservices or monolith?"

# With social calibration (for interview/networking questions)
consilium "What questions should I ask in the interview?" --social

# With persona context
consilium "Should I take the job?" --persona "builder who hates process work"

# Multiple rounds
consilium "Architecture decision" --rounds 3

# Save transcript
consilium "Career question" --output transcript.md

# Share via GitHub Gist
consilium "Important decision" --share

# List past sessions
consilium --sessions
```

All sessions are auto-saved to `~/.consilium/sessions/` for later review.

## Options

| Flag | Description |
|------|-------------|
| `--rounds N` | Number of deliberation rounds (default: 1, exits early on consensus) |
| `--output FILE` | Save transcript to file |
| `--named` | Let models see real names during deliberation (may increase bias) |
| `--no-blind` | Skip blind first-pass (faster, but first speaker anchors others) |
| `--context TEXT` | Context hint for judge (e.g., "architecture decision") |
| `--share` | Upload transcript to secret GitHub Gist |
| `--social` | Enable social calibration mode (auto-detected for interview/networking) |
| `--persona TEXT` | Context about the person asking |
| `--challenger MODEL` | Which model starts as challenger (gpt/gemini/grok/kimi). Rotates each round. |
| `--domain DOMAIN` | Regulatory domain context (banking, healthcare, eu, fintech, bio) |
| `--followup` | Enable interactive drill-down after judge synthesis |
| `--practical` | Actionable rules only, no philosophy |
| `--quiet` | Suppress progress output |
| `--sessions` | List recent saved sessions |
| `--no-save` | Don't auto-save transcript to ~/.consilium/sessions/ |

## How It Works

**Blind First-Pass (Anti-Anchoring):**
1. All models generate short "claim sketches" independently and in parallel
2. This prevents the "first speaker lottery" where whoever speaks first anchors the debate
3. Each model commits to an initial position before seeing any other responses

**Deliberation Protocol:**
1. All models see everyone's blind claims, then deliberate
2. Each model MUST explicitly AGREE, DISAGREE, or BUILD ON previous speakers by name
3. After each round, the system checks for consensus (3/4 non-challengers agreeing triggers early exit)
4. Judge synthesizes the full deliberation

**Rotating Challenger:**
- One model each round is assigned the "challenger" role
- The challenger MUST argue the contrarian position and identify weaknesses in emerging consensus
- Role rotates each round (GPT R1 → Gemini R2 → Grok R3 → Kimi R4...) to ensure sustained disagreement
- Challenger is excluded from consensus detection (forced disagreement shouldn't block early exit)

**Anonymous Deliberation:**
- Models see each other as "Speaker 1", "Speaker 2", etc. during deliberation
- Prevents models from playing favorites based on vendor reputation
- Output transcript shows real model names for readability

## When to Use

Use the council when:
- Making an important decision that benefits from diverse perspectives
- You want models to actually debate, not just answer in parallel
- You need a synthesized recommendation, not raw comparison
- Exploring trade-offs where different viewpoints matter

Skip the council when:
- You're just thinking out loud (exploratory discussions)
- The answer depends on personal preference more than objective trade-offs
- Speed matters (council takes 60-90 seconds)

## Python API

```python
from consilium import run_council, COUNCIL
import os

api_key = os.environ["OPENROUTER_API_KEY"]

transcript, failed_models = run_council(
    question="Should we use microservices or monolith?",
    council_config=COUNCIL,
    api_key=api_key,
    rounds=2,
    verbose=True,
    social_mode=False,
)

print(transcript)
```

## License

MIT
