jeevesagent.architecture.helpers

Cross-architecture helpers.

Small utilities multiple architectures need. Putting them here keeps each architecture’s module focused on its strategy and avoids circular re-implementation:

  • text_only_model_call() — run a single model call with no tools, collecting the response text and usage. Used by Self-Refine (critic / refiner), Reflexion (evaluator / reflector), Plan-and-Execute (planner / replanner), Router (classifier), and any other architecture that needs a one-shot structured LLM call.

  • add_usage() — sum two Usage records.

  • parse_score() — extract a 0-1 confidence number from free-form evaluator output. Used by Reflexion and Tree of Thoughts; any architecture with an evaluator step.

  • SubagentInvocation — run a sub-Agent and stream its events through to the parent’s generator while capturing the final RunResult separately. Used by Swarm, Supervisor, Router, ActorCritic, Debate, and Blackboard so the inner agent’s MODEL_CHUNK / TOOL_CALL / TOOL_RESULT events surface in the outermost agent.stream(...) consumer.

Classes

SubagentInvocation

Run a sub-Agent and stream its events to the parent.

Functions

add_usage(→ jeevesagent.core.types.Usage)

parse_score(→ float)

Extract a 0-1 score from free-form evaluator output.

text_only_model_call(→ tuple[str, ...)

Run a single text-only model call through runtime.step.

Module Contents

class jeevesagent.architecture.helpers.SubagentInvocation(agent: jeevesagent.agent.api.Agent, prompt: str, *, session_id: str | None = None, context: jeevesagent.core.context.RunContext | None = None, extra_tools: list[jeevesagent.tools.registry.Tool] | None = None, buffer_size: int = 128)[source]

Run a sub-Agent and stream its events to the parent.

Use this from any multi-agent architecture instead of calling await worker.run(prompt, ...) directly. The plain run() drops events on the floor; this helper forwards them to the parent generator so token-level streaming works end-to-end.

Usage from inside an architecture’s run() async generator:

invocation = SubagentInvocation(
    worker, prompt, session_id="...", extra_tools=[...]
)
async for event in invocation.events():
    yield event
result = invocation.result   # dict version of RunResult

Filtering policy:

  • Suppressed in the parent stream: STARTED and COMPLETED from the sub-agent — those are internal framing events; the parent owns its own STARTED/COMPLETED. The sub-agent’s RunResult (carried in its COMPLETED event payload) is captured into self.result for the architecture to read.

  • Forwarded as-is: MODEL_CHUNK (token-level streaming), TOOL_CALL / TOOL_RESULT (with full args / output), BUDGET_WARNING / BUDGET_EXCEEDED, ERROR, ARCHITECTURE_EVENT (so a nested architecture’s progress events bubble up too).

async events() collections.abc.AsyncIterator[jeevesagent.core.types.Event][source]

Yield the sub-agent’s events (filtered) as they happen.

After the iterator drains, self.result contains the sub-agent’s RunResult as a dict (with output, turns, tokens_in, tokens_out, cost_usd, interrupted, interruption_reason).

result: dict[str, Any]
jeevesagent.architecture.helpers.add_usage(a: jeevesagent.core.types.Usage, b: jeevesagent.core.types.Usage) jeevesagent.core.types.Usage[source]
jeevesagent.architecture.helpers.parse_score(text: str) float[source]

Extract a 0-1 score from free-form evaluator output.

Prefers the score: X (or score=X) pattern; falls back to any plausible number in the text. Clamps to [0.0, 1.0]. Returns 0.0 on parse failure (treated as a failed evaluation — let the caller decide what that means).

Used by Reflexion (attempt score) and TreeOfThoughts (per-thought evaluation).

async jeevesagent.architecture.helpers.text_only_model_call(deps: jeevesagent.architecture.base.Dependencies, step_name: str, messages: list[jeevesagent.core.types.Message]) tuple[str, jeevesagent.core.types.Usage][source]

Run a single text-only model call through runtime.step.

Returns (text, usage). The call is journaled so replays are deterministic, but no tools are exposed — used for one-shot structured prompts (critique, evaluation, classification, planning).