jeevesagent.architecture.debate¶
Multi-Agent Debate: N debaters argue, judge synthesizes.
Du et al. 2023 — Improving Factuality and Reasoning in Language Models through Multiagent Debate. Liang et al. 2023 (divergent thinking via debate). Production patterns in AutoGen GroupChat, CAMEL.
Reserve for high-stakes contested questions where a wrong answer is expensive. The 2026 production literature is cautious — debate adds 3-5× cost over single-agent and works best on a narrow set of decision-style questions where blind-spot triangulation matters.
Pattern¶
Round 0 (independent). All debaters answer the original question simultaneously, with no awareness of each other. Run in parallel.
Rounds 1..K (debate). Each debater receives the original question + the full transcript so far. They defend or update their position. All debaters in a round run in parallel.
Optional convergence check. If all debaters in a round produce exactly-matching answers (after whitespace normalize), terminate early. Defaults on; disable for adversarial-only debates where you want full rounds.
Judge synthesis. A separate
judgeAgentreceives the full transcript and produces the final answer.judge=Nonefalls back to majority vote (modal answer across the final round; tie-broken by first appearance).
Deterministic session ids¶
Each debater invocation uses
{parent}__debater_<i>_round_<r>; the judge uses
{parent}__judge. Replays of the parent journal cache the
sub-results and don’t re-execute debaters.
Strengths¶
Surfaces blind spots through disagreement — different priors produce different errors; debate transcribes them.
Strong on factuality. TruthfulQA improvement over single-agent is well-documented (Du et al. 2023, +12% on 3 debaters / 2 rounds).
Heterogeneous-model friendly. Use Claude + GPT + Llama for genuine prior diversity rather than 3× of the same model.
Weaknesses¶
3-5× cost. N debaters × K rounds + judge. Real money.
Sequential rounds. Even with parallel debaters per round, you still wait round-by-round.
Groupthink risk. Same model → same priors → no real disagreement. Differentiate models or personas.
Judge quality is critical. Bad judge = bad final answer.
Attributes¶
Classes¶
N debaters + optional judge orchestration. |
Module Contents¶
- class jeevesagent.architecture.debate.MultiAgentDebate(*, debaters: list[jeevesagent.agent.api.Agent], judge: jeevesagent.agent.api.Agent | None = None, rounds: int = 2, convergence_check: bool = True, convergence_similarity: float = 0.85, debater_instructions: str | None = None, judge_instructions: str | None = None)[source]¶
N debaters + optional judge orchestration.
- declared_workers() dict[str, jeevesagent.agent.api.Agent][source]¶
- async run(session: jeevesagent.architecture.base.AgentSession, deps: jeevesagent.architecture.base.Dependencies, prompt: str) collections.abc.AsyncIterator[jeevesagent.core.types.Event][source]¶
- name = 'debate'¶
- jeevesagent.architecture.debate.DEFAULT_DEBATER_INSTRUCTIONS = Multiline-String¶
Show Value
"""You are participating in a structured debate. Other debaters have proposed answers (shown below). Your task on this round: 1. State your position clearly. 2. Address each other debater's argument: where you agree, where you disagree, and why. 3. If a counter-argument is convincing, update your position openly — don't be stubborn for its own sake. 4. Cite specifics; avoid hand-waving. End with a clear final answer for THIS round. """
- jeevesagent.architecture.debate.DEFAULT_JUDGE_INSTRUCTIONS = Multiline-String¶
Show Value
"""You are an impartial judge synthesizing a multi-agent debate. Read the original question and the full debate transcript below, then output the answer you believe is best supported by the arguments. Output the final answer as plain text. Be decisive — pick the strongest position even when debaters didn't fully agree. """