Metadata-Version: 2.4
Name: consult-mcp-server
Version: 0.2.0
Summary: Multi-model panel orchestration engine with an MCP adapter. The engine (consult.*) is MCP-free and usable as a library; the MCP adapter (consult.mcp.*) is an optional extra.
Project-URL: Homepage, https://github.com/irwin-r/consult-mcp-server
Project-URL: Repository, https://github.com/irwin-r/consult-mcp-server
Project-URL: Issues, https://github.com/irwin-r/consult-mcp-server/issues
Project-URL: Changelog, https://github.com/irwin-r/consult-mcp-server/blob/main/CHANGELOG.md
Author-email: Irwin Razaghi <irwin@razaghi.com.au>
License: MIT
License-File: LICENSE
Keywords: consensus,consultation,litellm,llm,mcp,model-context-protocol,panel
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.11
Requires-Dist: litellm<1.57.0,>=1.55.0
Requires-Dist: pydantic<3,>=2.6.0
Requires-Dist: python-dotenv<2,>=1.0.0
Provides-Extra: dev
Requires-Dist: mcp<2,>=1.2.0; extra == 'dev'
Requires-Dist: pip-audit>=2.7; extra == 'dev'
Requires-Dist: pre-commit>=3.7; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff<0.16,>=0.15; extra == 'dev'
Provides-Extra: mcp
Requires-Dist: mcp<2,>=1.2.0; extra == 'mcp'
Provides-Extra: otel
Requires-Dist: opentelemetry-api<2,>=1.20; extra == 'otel'
Requires-Dist: opentelemetry-exporter-otlp<2,>=1.20; extra == 'otel'
Requires-Dist: opentelemetry-sdk<2,>=1.20; extra == 'otel'
Description-Content-Type: text/markdown

# consult-mcp-server

> **Get a second opinion from a parallel panel of LLMs — without bloating your agent's context window.**

[![CI](https://github.com/irwin-r/consult-mcp-server/actions/workflows/tests.yml/badge.svg)](https://github.com/irwin-r/consult-mcp-server/actions/workflows/tests.yml)
[![PyPI](https://img.shields.io/pypi/v/consult-mcp-server.svg)](https://pypi.org/project/consult-mcp-server/)
[![Python](https://img.shields.io/pypi/pyversions/consult-mcp-server.svg)](https://pypi.org/project/consult-mcp-server/)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
[![Smithery](https://smithery.ai/badge/consult-mcp-server)](https://smithery.ai/server/consult-mcp-server)

`consult` is an MCP server that lets your agent (Claude Desktop, Cursor,
Claude Code, etc.) fan a single prompt out to many LLMs in parallel, then
return either the synthesised answer or a manifest of structured
~200-token capsules — so panel breadth doesn't cost parent-context tokens.

```
┌────────────┐    consult tool call     ┌──────────────────┐    parallel    ┌──────────┐
│ Your agent │ ───────────────────────▶ │  consult-mcp     │ ─────────────▶ │ Claude   │
│ (Claude    │   "what's your take?"    │  (this server)   │                │ GPT      │
│  Desktop / │ ◀─────────────────────── │                  │ ◀───────────── │ Gemini   │
│  Cursor /  │  synthesis + manifest    │  capsules ~200t  │   capsules     │ Grok     │
│  …)        │                          │  + resources     │                │ DeepSeek │
└────────────┘                          └──────────────────┘                │ …        │
                                                                            └──────────┘
```

## Why this exists

If your agent already calls `claude` once, you might wonder why you'd want to
ask 8 more models the same question. Three reasons:

1. **One pass, many perspectives.** Different families catch different things.
   Anthropic finds different bugs than OpenAI; Gemini calls out different
   risks; DeepSeek often surfaces the contrarian take.
2. **Cheap structured second opinion.** The manifest's per-panellist capsule
   is ~200 tokens — your agent can synthesise it in-band without paying for
   another flagship round-trip.
3. **No context-window bloat.** Full panellist bodies live as MCP resources
   at `consult://runs/<id>/responses/<slug>`; your agent only fetches them
   when it needs depth.

Alternatives fall short: PAL `consensus` serialises calls (sum of latencies);
`multi_mcp` parallelises but no escape hatch from server-side synth;
skill-only fan-outs assembled by the LLM via bash are brittle (token traps,
key handling, endpoint drift).

---

## Install

### Claude Desktop

> Claude Desktop does **not** inherit your shell's `PATH` or environment
> variables — you must give it the absolute path to `consult-mcp` and
> declare API keys inside the `env` block.

Tip: run `consult-doctor --config` after install to print a ready-to-paste
JSON block populated with the absolute binary path and whichever keys are
present in your shell environment.

Edit `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS)
or `%APPDATA%\Claude\claude_desktop_config.json` (Windows):

```json
{
  "mcpServers": {
    "consult": {
      "command": "/Users/you/.local/bin/uvx",
      "args": ["--from", "consult-mcp-server[mcp]", "consult-mcp"],
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-…",
        "OPENAI_API_KEY": "sk-…",
        "GEMINI_API_KEY": "AIza…",
        "OPENROUTER_API_KEY": "sk-or-…"
      }
    }
  }
}
```

Restart Claude Desktop, then ask: *"use the consult tool to ask 3 models
which Python package manager I should use."*

### Cursor

Edit `~/.cursor/mcp.json`:

```json
{
  "mcpServers": {
    "consult": {
      "command": "/Users/you/.local/bin/uvx",
      "args": ["--from", "consult-mcp-server[mcp]", "consult-mcp"],
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-…",
        "OPENAI_API_KEY": "sk-…"
      }
    }
  }
}
```

Same caveat as Claude Desktop: absolute path to `uvx`, env keys in the block.

### Claude Code CLI

```sh
claude mcp add consult -- uvx --from "consult-mcp-server[mcp]" consult-mcp
```

The CLI inherits your shell env, so the keys you already have in `.env` /
your shell rc will be visible.

### Docker

```sh
docker run -i --rm \
  -e ANTHROPIC_API_KEY -e OPENAI_API_KEY -e GEMINI_API_KEY -e OPENROUTER_API_KEY \
  -v ~/.consult:/root/.consult \
  ghcr.io/irwin-r/consult-mcp-server:latest
```

Stdio in / stdio out, just like the local binary. Image published per release
to GHCR (multi-stage Python 3.12-slim base, ~150MB).

### Smithery

```
https://smithery.ai/server/consult-mcp-server
```

Smithery's hosted UI prompts for keys; the same `smithery.yaml` config-schema
applies.

### From source (development)

```sh
git clone https://github.com/irwin-r/consult-mcp-server
cd consult-mcp-server
uv venv
uv pip install -e ".[dev]"
cp .env.example .env   # fill in keys
uv run pytest -v
```

### Verify the install

```sh
consult-doctor          # offline: config + paths + key presence
consult-doctor --ping   # also fires a 1-token call per provider (~$0.0001)
consult-doctor --config # print copy-paste-ready MCP client JSON
```

---

## The five tools

| Tool | What it does | Use when |
|---|---|---|
| **`consult`** | Parallel panel + server-side synthesis. Hero. | "Just give me the answer." |
| **`panel`** | Parallel panel, returns raw manifest (no synth). | You want to synthesise yourself. |
| **`refine`** | Iterative consortium with arbiter scoring (≤3 rounds). | High-stakes; disagreement-heavy. |
| **`sequence`** | Chained multi-step where step N depends on N-1. | Decompose-then-answer; plan-then-execute. |
| **`synthesise`** | Re-collapse an existing run via a flagship model. | Different rubric/synthesiser on a prior `run_id`. |

Tool descriptions are intentionally written as **prompts for the calling
agent** (verb-first, explicit "use when…/don't use for…") so the agent
reliably picks the right one without you having to spell it out.

---

## Tiers & cost

Aliases are `<family>-<tier>` — version-neutral. The registry maps each alias
to the current best model; the resolved LiteLLM ID is captured per run in
`registry_snapshot.json` for reproducibility.

| Tier | Models | Typical run cost | Use |
|---|---|---|---|
| `nano` (3) | claude-haiku, gemini-flash, gpt-nano | < $0.01 | smoke tests / trivia |
| `quick` (5) | claude-haiku, gemini-pro, grok, qwen-max, kimi | ~$0.05 | snap second opinions |
| `standard` (10) | opus, sonnet, gpt-pro, gpt, gemini-pro, grok, qwen-max, kimi, glm, llama | $0.30–0.60 | normal decisions |
| `wide` (10) | as standard, openrouter-routed where possible | $0.20–0.50 | maximum diversity |
| `deep` (14) | standard + mistral, deepseek, mimo, sonar-pro | $0.50–1.00 | high-stakes, includes web search |
| `code` (5) | opus, gpt-codex, gpt-mini, gemini-pro, deepseek | $0.20–0.40 | code-heavy questions |
| `review` (6) | opus, gpt-codex, gpt-pro, gemini-pro, deepseek, grok | $0.30–0.60 | PR / code review |

A per-run cap (`max_run_usd`, default `$5.00`) refuses panels whose estimated
cost exceeds the limit before any provider is called.

---

## The manifest capsule

Each panellist returns a ~200-token structured extract (decision shape shown
below; `review` and `research` kinds also supported):

```json
{
  "slug": "claude-opus-1",
  "model_id": "anthropic/claude-opus-4-7",
  "status": "OK",
  "capsule": {
    "kind": "decision",
    "position": "supports B with caveats",
    "recommendation": "Use B with fallback to A",
    "key_points": ["…"],
    "unique_claims": ["Only model to flag cold-start regression"],
    "caveats": ["Assumes >100 RPS steady-state"],
    "confidence": 0.85
  },
  "resource_uri": "consult://runs/abc/responses/claude-opus-1",
  "latency_ms": 3420,
  "cost_usd": 0.04
}
```

Your agent can synthesise from this alone in most cases. Read the full body
via the resource URI only when depth is needed.

---

## Quickstart

After installing, from any connected agent:

```text
> consult: prompt="Polars vs DuckDB for a 10GB Parquet timeseries?", tier="code"
```

Returns `{run_id, synthesis, manifest, cost_usd, synthesiser}`. The synthesis
is markdown, ready to drop into your conversation.

For iterative consensus:

```text
> refine:
    prompt="Should we migrate from REST to gRPC for the internal mesh?",
    models=[{model:"claude-opus"},{model:"gpt-pro"},{model:"gemini-pro"},{model:"deepseek"}],
    threshold=0.85
```

For chained reasoning:

```text
> sequence:
    prompts=[
      "Decompose 'how should we scale our event pipeline?' into 4 sub-questions",
      "Answer sub-question 1: throughput requirements",
      "Answer sub-question 2: ordering guarantees",
      "Synthesise the final recommendation across the prior steps"
    ],
    models=[{model:"claude-opus"},{model:"gpt-pro"}]
```

---

## End-to-end walkthrough

A full tour. Assumes the install above and at least one provider key in
`.env`.

### 1. Smoke-test the install (no API spend)

```sh
.venv/bin/python -c "
import asyncio
from consult import panel, ModelSpec
async def go():
    h = await panel('hello', [ModelSpec(model='claude-haiku')], dry_run=True)
    print('partial:', h.partial, '| reason:', h.partial_reason)
asyncio.run(go())
"
# partial: True | reason: dry_run: estimated cost $0.0001
```

### 2. First real consult (~$0.20 on the `code` tier)

```text
> consult: prompt="Polars vs DuckDB for 10GB Parquet timeseries?", tier=code
```

### 3. Inspect a panellist's full body

```text
> read resource: consult://runs/<run_id>/responses/claude-opus-1
```

### 4. Tail progress in real time

```sh
tail -f ~/.consult/runs/<run_id>/_progress.log
```

Agents that send a `progressToken` get the same events as
`notifications/progress`.

### 5. Follow-up via `continuation_id`

```text
> refine: prompt="OK now what about Iceberg vs Delta on top of that?",
          continuation_id="<prior run_id>",
          models=[{model:"claude-opus"},{model:"deepseek"}]
```

The prior run's synthesis is prepended as "Prior consultation summary".

### 6. Stochastic averaging with `model:N`

```text
> panel: models=[{model:"claude-haiku:3"},{model:"gpt-mini:3"}], prompt="…"
```

Six panellists total — three runs each of two cheap models.

### 7. Check today's spend

```sh
consult-ledger today
# {"date":"2026-05-21","total_usd":2.36,"total_known":false,"runs":[…]}
```

`total_known: false` means at least one panellist had pricing missing from
the LiteLLM table.

### 8. View a run as a rich HTML page

```sh
consult-view <run_id>          # writes ~/.consult/runs/<run_id>/feed.html
consult-view <run_id> --open   # also opens in default browser
```

Self-contained HTML — header pills, prompt, synthesis (markdown), per-round
arbiter verdicts (refine), per-panellist cards with capsule + full body, and
a chronological timeline from `_progress.log`. No external assets, no JS.

---

## Driving the engine without MCP

The engine package (`consult.*`) is MCP-free and reusable as a library:

```python
from consult import consult, panel, refine, ModelSpec

# Hero tool
result = await consult("question?", tier="standard")
print(result.synthesis, result.cost_usd)

# Lower-level
handle = await panel("question?", [ModelSpec(model="claude-opus"), ModelSpec(model="gpt-pro")])

# Iterative
verdict = await refine(
    "tough decision?",
    [ModelSpec(model="claude-opus"), ModelSpec(model="deepseek")],
    threshold=0.85,
)
```

Swap the URI scheme for a non-MCP transport:

```python
from consult import artifacts
artifacts.set_resource_uri_formatter(
    lambda run_id, slug: f"https://api.example.com/runs/{run_id}/{slug}"
)
```

---

## Security

Read [`SECURITY.md`](SECURITY.md) for the full threat model. Short version:

- **File attachments and `git_diff`** must resolve under
  `CONSULT_TRUSTED_REPO_ROOTS` (defaults to CWD). Symlinks resolved with
  `strict=True`; escape attempts fail closed.
- **Run artefacts are `chmod 0o700`** — per-run prompts (often containing
  pasted credentials or code) are not world-readable on shared hosts.
- **`git diff` runs with global/system git config neutralised** so a
  malicious `.gitattributes` filter can't execute.
- **LiteLLM exception strings are scrubbed** for `sk-…`, `AIza…`,
  `Bearer …`, `x-api-key:` and similar before anything hits disk or the
  manifest.

### Privacy note

The model registry tags each entry with a `privacy_tier`:

- `first_party` — direct API to Anthropic / OpenAI / Google.
- `aggregator` — routed via OpenRouter (Grok, Kimi, Qwen, DeepSeek, Llama,
  Mistral, GLM, MiMo, Sonar-Pro).

Mixing tiers in one panel broadcasts the **same prompt** to providers with
**different data-retention policies**. For prompts containing sensitive
material, prefer `tier="standard"` (mostly first-party) over `tier="wide"`
or `tier="deep"` (heavily aggregator-routed).

---

## Repo layout

```
consult/                # ENGINE — no mcp.* imports
  runner.py             # async fanout + LiteLLM + progress log
  capsule.py            # post-fanout structured extraction
  synth.py              # flagship synthesiser
  refine.py             # arbiter-driven loop (max 3 rounds) + continuation
  sequence.py           # chained multi-step
  orchestrate.py        # consult() hero
  ledger.py             # daily cost ledger (consult-ledger)
  viewer.py             # static HTML run renderer (consult-view)
  doctor.py             # diagnostic CLI (consult-doctor)
  registry.py           # models.json + stances.json loader
  artifacts.py          # ~/.consult/runs/<id>/ layout + URI formatter
  attachments.py        # file/diff inlining + trusted-roots enforcement
  sources.py            # git_diff resolver (hardened subprocess)
  context.py            # per-run bundle + blinding
  progress.py           # typed ProgressEvent union
  status.py             # LiteLLM response → Status
  types.py              # Pydantic models (StrictModel base)
  mcp/                  # MCP ADAPTER — only thing that imports mcp.*
    server.py, handlers.py, schemas.py, errors.py, __main__.py
  config/
    models.json         # registry with privacy_tier annotations
    stances.json        # persona prompts
tests/                  # pytest (offline + live, gated on keys)
.github/workflows/      # CI: ruff + pytest on Py 3.11/3.12/3.13
FRICTION.md             # internal dogfooding log (kept for transparency)
SECURITY.md             # threat model + disclosure path
CONTRIBUTING.md         # dev setup + style
```

---

## Contributing

See [`CONTRIBUTING.md`](CONTRIBUTING.md). Issues and PRs welcome; please open
an issue first for non-trivial changes so we can agree on shape.

## License

MIT. See [`LICENSE`](LICENSE).
