jeevesagent.architecture.tree_of_thoughts

Tree of Thoughts: branching exploration with per-node evaluation.

Yao et al. 2023 — Tree of Thoughts: Deliberate Problem Solving with Large Language Models. Useful for combinatorial reasoning, multi-step planning, math (Game of 24), puzzle solving — anywhere a single straight-shot ReAct trajectory would commit too early.

Cost

branch_factor × beam_width × max_depth × 2 model calls (one proposer + one evaluator per candidate). With defaults (3, 2, 3) that’s 36 calls. Reserve ToT for problems where the search structure earns the cost — math/planning tasks where ReAct visibly meanders.

Strengths

  • Explicit search tree. Every candidate, score, and decision is observable through architecture_event events.

  • Composable. Wrap inside Reflexion to learn which evaluation patterns predict real success.

  • Replay-correct. Each proposer / evaluator call is a named runtime.step, so journaled runtimes replay deterministically.

Weaknesses

  • Expensive. 30-50× a single ReAct turn for typical settings.

  • Evaluator-quality bound. A weak evaluator picks weak branches and the search wastes budget on dead ends.

  • Domain-specific. Branch-and-evaluate makes sense for combinatorial problems; for open-ended writing tasks, use Self-Refine or Actor-Critic.

Attributes

Classes

ThoughtNode

One node in the Tree-of-Thoughts search tree.

TreeOfThoughts

Branch + evaluate + prune. BFS beam search over thoughts.

Module Contents

class jeevesagent.architecture.tree_of_thoughts.ThoughtNode(/, **data: Any)[source]

Bases: pydantic.BaseModel

One node in the Tree-of-Thoughts search tree.

Children are stored implicitly (each node has a parent_id). The full tree is reconstructable from the node list ToT keeps in its session metadata.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

content: str
depth: int
id: str
parent_id: str | None
score: float = 0.0
class jeevesagent.architecture.tree_of_thoughts.TreeOfThoughts(*, branch_factor: int = 3, max_depth: int = 3, beam_width: int = 2, solved_threshold: float = 1.0, min_score: float = 0.0, parallel: bool = True, proposer_prompt: str | None = None, evaluator_prompt: str | None = None)[source]

Branch + evaluate + prune. BFS beam search over thoughts.

declared_workers() dict[str, jeevesagent.agent.api.Agent][source]
async run(session: jeevesagent.architecture.base.AgentSession, deps: jeevesagent.architecture.base.Dependencies, prompt: str) collections.abc.AsyncIterator[jeevesagent.core.types.Event][source]
name = 'tree-of-thoughts'
jeevesagent.architecture.tree_of_thoughts.DEFAULT_EVALUATOR_PROMPT = Multiline-String
Show Value
"""You evaluate a candidate reasoning step. Given the original problem
and the proposed thought, score how promising this thought is for
arriving at the correct solution.

Output exactly one line:
score: <number between 0 and 1>

Then optionally one line of brief justification. The first line
must match the score format exactly so it can be parsed.

- 1.0 = this thought is correct and final / will obviously lead to a
  correct answer
- 0.7-0.9 = strong direction, likely correct
- 0.4-0.6 = plausible but uncertain
- 0.0-0.3 = wrong direction or contradicts the problem
"""
jeevesagent.architecture.tree_of_thoughts.DEFAULT_PROPOSER_PROMPT = Multiline-String
Show Value
"""You are exploring possible reasoning paths to solve a problem.

Given the problem and any prior steps, propose ONE next step (a
"thought") toward a solution. A thought can be a sub-step,
intermediate calculation, sub-decision, or partial answer.

Output only the thought itself — concise, one paragraph at most.
Do not number it; do not preface with "Thought:".
"""