Disclaimer: This chatbot is for educational and informational purposes only. It does not provide medical advice, diagnosis, or treatment. Always consult a qualified healthcare professional before starting any exercise program or making dietary changes.
Part 1: Understanding the Foundations
What is an LLM?
Imagine you had a friend who had read every book, article, and webpage ever written. You could ask them anything — "What's a good stretching routine?" or "How many calories are in a banana?" — and they'd give you a thoughtful answer based on everything they'd absorbed.
That's roughly what a Large Language Model (LLM) is. It's a computer program that has been trained on massive amounts of text. During training, it learned patterns: how sentences flow, how ideas connect, how questions relate to answers. When you give it a prompt, it predicts what text should come next — word by word — drawing on those patterns.
But here's the key insight: predicting text turns out to be the same as reasoning. When an LLM predicts the next word in "The capital of France is ___," it isn't just doing word association. It has built an internal model of the world that lets it answer questions, write code, analyze data, and solve problems.
Modern LLMs can do three things that matter for us:
- Understand natural language — You can talk to them like you'd talk to a person. No special syntax required.
- Follow instructions — You can give them a role ("You are a nutrition expert") and constraints ("Only recommend foods under 500 calories"), and they'll stay in character.
- Use tools — You can give them access to functions (search the web, query a database, do math) and they'll decide when and how to use them.
That third capability — tool use — is what makes agents possible.
What are AI Agents?
A basic chatbot is just an LLM with a text box. You type something, it responds. There's no memory, no tools, no ability to take actions. It's like talking to someone who forgets you exist between every message.
An AI agent is an LLM wrapped in a system that gives it superpowers:
┌─────────────────────────────────┐
│ AI Agent │
│ │
│ ┌───────────────────────────┐ │
│ │ Instructions │ │ ← "You are a nutrition expert..."
│ │ (System Prompt) │ │
│ └───────────────────────────┘ │
│ │
│ ┌───────────────────────────┐ │
│ │ LLM Brain │ │ ← Reads, thinks, decides
│ └───────────────────────────┘ │
│ │
│ ┌───────────────────────────┐ │
│ │ Tools │ │ ← Search web, calculate calories, ...
│ └───────────────────────────┘ │
│ │
│ ┌───────────────────────────┐ │
│ │ Memory │ │ ← "User is allergic to peanuts"
│ └───────────────────────────┘ │
└─────────────────────────────────┘
- Instructions tell the agent who it is, what it should do, and what it should avoid.
- Tools let the agent take actions beyond just generating text.
- Memory lets the agent remember things across conversations.
A single agent can handle simple tasks well. But what happens when you need an agent that knows about exercise, nutrition, recipes, calorie counting, and proper form all at once? You'd need an impossibly long system prompt, dozens of tools, and the agent would constantly get confused about which hat to wear.
That's where multi-agent systems come in.
Why Multi-Agent?
Think about how a hospital works. You don't have one doctor who does everything — surgery, psychiatry, radiology, pediatrics. Instead, you have specialists who each excel at one thing, coordinated by a primary care physician who knows when to refer you to whom.
Multi-agent systems work the same way:
Hospital Multi-Agent System
──────── ──────────────────
Primary care physician → Supervisor agent (routes requests)
Cardiologist → Specialist agent (heart/exercise)
Nutritionist → Specialist agent (diet/food)
Lab technician → Utility agent (runs tests/searches)
Medical records → Session manager (remembers history)
The benefits are the same as in the real world:
- Specialization — Each agent has a focused system prompt and limited tools. A nutrition agent doesn't need to know about exercise form.
- Separation of concerns — You can develop, test, and update each agent independently.
- Maintainability — When you need to update calorie data, you change one agent, not the entire system.
- Reliability — A bug in the recipe agent doesn't break the workout planner.
Routing Patterns
How do agents talk to each other? There are three main patterns:
Supervisor-Led (Hub and Spoke)
One agent (the supervisor) receives every user message and decides which specialist should handle it. Specialists always report back to the supervisor.
┌──── Agent A
│
User ──► Supervisor ──── Agent B
│
└──── Agent C
When to use: When you need centralized control, clear accountability, and predictable conversation flow. Good for customer service, general assistants.
Mesh (Peer-to-Peer)
Agents talk directly to each other. Agent A can hand off to Agent B, who can hand off to Agent C, without going through a central coordinator.
Agent A ◄──► Agent B
▲ ▲
│ │
└──► Agent C ┘
When to use: When specialists need to collaborate closely. Good for research pipelines, data processing chains.
Hybrid (What We'll Build)
The hybrid pattern is the most powerful — and the most common in production. A supervisor handles top-level routing, but underneath it, groups of agents form mesh clusters where they can talk freely to each other.
Here's the abstract pattern:
User
│
▼
┌─────────────────────────────────────────────────┐
│ Supervisor S │
│ (top-level routing) │
│ │
│ ┌──────────┐ ┌──────────┐ │
│ │ Agent A │ │ Agent B │ │
│ │ │ │ │ │
│ │ ┌─────┐ │ │ │ │
│ │ │ E │ │ │ │ │
│ │ └──┬──┘ │ └──────────┘ │
│ │ │ │ │
│ │ ┌──▼──┐ │ │
│ │ │ F │ │ │
│ │ └─────┘ │ │
│ └──────────┘ │
└─────────────────────────────────────────────────┘
The key rules:
- Supervisor S controls the top level. It routes user requests to Agent A or Agent B. Both A and B report back to S.
- Agent A has sub-agents E and F underneath it. Agent A can delegate tasks to E or F.
- E, F, and A form a mesh — they can all talk to each other directly. E can hand off to F, F can hand off back to A, A can re-delegate to E, and so on. They don't need to go through the supervisor for internal collaboration.
- Agent B is a standalone specialist. It receives work from S and returns results to S. No sub-agents.
Let's trace a request through this topology:
Step 1: User asks a question
Step 2: Supervisor S reads the question, decides Agent A should handle it
Step 3: Agent A reads the question, realizes it needs detail from E
Step 4: Agent A hands off to E (mesh routing, no supervisor involved)
Step 5: E does its work, decides F should add its perspective
Step 6: E hands off to F (mesh routing between peers)
Step 7: F does its work, hands back to A (mesh routing)
Step 8: A synthesizes the results from E and F
Step 9: A returns the final answer to Supervisor S
Step 10: S delivers the response to the user
Notice that steps 4–7 happen entirely within the mesh cluster. The supervisor doesn't see or manage those handoffs. This is the power of hybrid routing: the supervisor handles high-level orchestration while mesh clusters handle low-level collaboration.
The topology edges that make this work:
# Supervisor routes (hub-and-spoke)
S → A (supervisor delegates to agent A)
S → B (supervisor delegates to agent B)
A → S (agent A reports back to supervisor)
B → S (agent B reports back to supervisor)
# Mesh routes (A, E, F can all talk to each other)
A → E (agent A delegates to sub-agent E)
A → F (agent A delegates to sub-agent F)
E → A (sub-agent E returns to agent A)
E → F (sub-agent E hands off to peer F)
F → A (sub-agent F returns to agent A)
F → E (sub-agent F hands off to peer E)
Why this matters:
- The supervisor stays simple — it only knows about A and B, not the internal details of A's cluster.
- Agent A can coordinate E and F for complex tasks without round-tripping through the supervisor.
- E and F can collaborate directly (e.g., E produces data, F validates it) without unnecessary hops.
- You can add more sub-agents to A's cluster (G, H, ...) without changing the supervisor at all.
Now let's map this to the health assistant we're building:
Abstract Pattern Health Assistant
────────────────── ──────────────────────────
Supervisor S → Coach
Agent A → Trainer
Agent E → Exercise Agent
Agent F → Form Guide Agent
Agent B → Nutrition Agent
(with its own mesh: Recipe + Calorie)
┌── Trainer ◄──► Exercise Agent
│ ◄──► Form Guide
│ (A) (E) (F)
│ ▲───────┼────────┘
│ └───────┘ (full mesh)
User ──► Coach
│
└── Nutrition ◄──► Recipe Agent
◄──► Calorie Agent
(B*) (E') (F')
▲────────┼─────────┘
└────────┘ (full mesh)
Nutrition is like a second "Agent A" — it has its own mesh cluster underneath it.
The Coach (supervisor) decides: "This is a workout question, send it to Trainer." The Trainer then uses mesh routing to collaborate with Exercise Agent and Form Guide without going back through the Coach for every sub-task. Same for Nutrition with Recipe and Calorie agents.
When to use: When you have natural groupings of specialists that need internal collaboration. This is the most common pattern for production systems because it balances centralized control (the supervisor knows the big picture) with decentralized efficiency (mesh clusters handle the details).
Part 2: Meet the Stack
LangGraph
LangGraph is a framework for building stateful, multi-step applications with LLMs. Think of it as a state machine for agents.
In LangGraph, your application is a graph:
- Nodes are functions that do work (usually agents)
- Edges connect nodes and control the flow
- State is a shared dict that flows through the graph and accumulates results
[Node A] ──edge──► [Node B] ──edge──► [Node C]
│ │
└─────── state flows through ─────────┘
LangGraph handles the hard parts: running nodes in the right order, passing state between them, and managing the conversation history. You just define the nodes and edges.
llm-orchestrator
llm-orchestrator is a Python library built on top of LangGraph. While LangGraph gives you the raw execution engine, llm-orchestrator adds the structure you need for production multi-agent systems:
| What LangGraph gives you | What llm-orchestrator adds |
|---|---|
| State machine execution | Topology validation (catch routing bugs before runtime) |
| Nodes and edges | BaseAgent ABC (consistent agent interface) |
| Shared state dict | SlotManager (tiered token budgets, auto-eviction, persistence) |
| Manual wiring | YAML configuration (declarative agent topology) |
| — | Session management interface (conversation memory) |
| — | HandoffManager (validated agent-to-agent handoffs) |
| — | Knowledge module system (per-agent knowledge from filesystem) |
| — | ReAct execution engine (multi-pass reasoning loop) |
| — | Built-in Supervisor (simple + ReAct modes) and SessionManager agents |
In short: LangGraph is the engine, llm-orchestrator is the chassis.
Prerequisites and Installation
You'll need:
- Python 3.11+
- An OpenAI API key (or any LangChain-compatible LLM provider)
- uv (recommended) or pip for package management
bash
# Create a new project
mkdir health-assistant && cd health-assistant
# Install dependencies
pip install llm-orchestrator langchain-openai
# Set your API key
export OPENAI_API_KEY="sk-..."
Create the following project structure:
health-assistant/
├── config.yaml # Agent topology configuration
├── agents/
│ ├── __init__.py
│ ├── search.py # Search agent (shared utility)
│ ├── trainer.py # Trainer specialist + ExerciseAgent, FormGuideAgent
│ └── nutrition.py # Nutrition specialist + RecipeAgent, CalorieAgent
├── graph.py # LangGraph wiring
└── app.py # CLI entry point
Note: For clarity, each agent class is shown separately in this tutorial, but in the project we co-locate mesh group agents in one file —
trainer.pycontainsTrainerAgent,ExerciseAgent, andFormGuideAgent;nutrition.pycontainsNutritionAgent,RecipeAgent, andCalorieAgent.
Part 3: Building the Health Assistant
Architecture Overview
Here's the full architecture of what we're building:
User
│
▼
┌─────────────────────────────────────────────────────────┐
│ Coach (Supervisor) │
│ Routes user questions to specialists │
├─────────────┬─────────────┬──────────────┬──────────────┤
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │ │
│ │ Trainer │ │ Nutrition │ │ Search │
│ │ Agent │ │ Agent │ │ Agent │
│ │ │ │ │ │ (shared) │
│ │ ┌────────┐ │ │ ┌────────┐ │ │ │
│ │ │Exercise│ │ │ │ Recipe │ │ │ │
│ │ │ Agent │ │ │ │ Agent │ │ │ │
│ │ └────────┘ │ │ └────────┘ │ │ │
│ │ ┌────────┐ │ │ ┌────────┐ │ │ │
│ │ │ Form │ │ │ │Calorie │ │ │ │
│ │ │ Guide │ │ │ │ Agent │ │ │ │
│ │ └────────┘ │ │ └────────┘ │ │ │
│ └──────────────┘ └──────────────┘ │ │
│ │ │
│ Session Manager ◄──────┘ │
│ (conversation memory) │
└─────────────────────────────────────────────────────────┘
Nine agents, two routing patterns:
| Agent | Role | Routing |
|---|---|---|
| Coach | Supervisor — routes user questions to the right specialist | Supervisor (hub) |
| Trainer | Exercise plans, workout suggestions | Supervisor target + mesh coordinator |
| Exercise Agent | Specific exercises, routines | Mesh (under Trainer) |
| Form Guide | Proper form and technique | Mesh (under Trainer) |
| Nutrition | Food and diet advice | Supervisor target + mesh coordinator |
| Recipe Agent | Finds and suggests recipes | Mesh (under Nutrition) |
| Calorie Agent | Calorie counting, macro breakdowns | Mesh (under Nutrition) |
| Search Agent | Internet search — all specialists can use it | Shared utility |
| Session Manager | Remembers user preferences, goals, allergies | Infrastructure |
Knowledge Module System
Before building any agents, you need to understand how llm-orchestrator externalizes agent prompts and domain knowledge into the filesystem. Instead of hardcoding system prompts as Python string constants, each agent gets a knowledge folder containing markdown files and a YAML configuration.
Folder Structure
Every agent that uses the ReAct execution engine has a folder under a shared llm_resources/ directory:
llm_resources/
<agent_name>/
react_config.yaml # Slot definitions, max passes, metadata
react_core.md # Core prompt template (the agent's "system prompt")
slots/
topic_a.md # On-demand knowledge loaded via request_slot
topic_b.md # Another knowledge module
react_config.yaml
This file defines what knowledge slots the agent can load and how many ReAct passes it is allowed:
yaml
max_passes: 5
slots:
exercises:
description: "Exercise database with sets, reps, and muscle groups"
modules: ["slots/exercises.md"]
triggers: "When the user asks about specific exercises or routines"
safety:
description: "Safety guidelines and contraindications"
modules: ["slots/safety.md"]
triggers: "When recommending intense exercises or discussing injuries"
Each slot has a description (shown to the LLM so it knows what is available), modules (the markdown files to load), and triggers (hints for when to load it).
react_core.md
This is the agent's main prompt template. It replaces hardcoded SYSTEM_PROMPT Python constants. The special {available_slots} placeholder is replaced at runtime with a formatted list of slots the agent can request:
markdown
You are an exercise specialist. Your role is to provide specific exercise
details -- muscle groups, sets, reps, rest periods, equipment needed.
## Available Knowledge
{available_slots}
To load knowledge, use: <action type="request_slot">{"slot": "..."}</action>
## Response Format
Use <thought>...</thought> to reason about what the user needs.
Use <action type="...">...</action> to take actions.
The ReAct XML Format
The ReactExecutor parses structured XML tags from the LLM's response:
<thought>...</thought>— The agent's internal reasoning (logged but not shown to users).<action type="action_name">{"param": "value"}</action>— An action to execute. The JSON body contains parameters for the action handler.
Terminal vs Non-Terminal Actions
Actions fall into two categories:
- Non-terminal actions produce an observation and continue the loop. Example:
request_slotloads knowledge and returns its content as an observation for the next pass. - Terminal actions end the ReAct loop and return a result. They work by raising
TerminalActionResultinside the handler. Examples:respond(return content to the user),route(hand off to another agent),complete(finish session management).
A typical ReAct pass sequence looks like:
Pass 1: <thought>User wants a leg workout. Let me load exercise data.</thought>
<action type="request_slot">{"slot": "exercises"}</action>
→ Observation: loaded exercises.md content
Pass 2: <thought>Now I have the exercise data. I can recommend squats and lunges.</thought>
<action type="respond">{"content": "Here is your leg workout..."}</action>
→ TerminalActionResult → loop ends, content returned
The ReactTemplateManager handles loading react_config.yaml, resolving slot file paths, formatting {available_slots}, and rendering the core prompt template. You create one per agent using the async factory await ReactTemplateManager.create(...) and pass it to the agent's constructor.
Project Structure with Knowledge Folders
Here is the complete project layout including llm_resources/ for all eight agents:
health-assistant/
├── config.yaml
├── agents/
│ ├── __init__.py
│ ├── search.py
│ ├── trainer.py # + ExerciseAgent, FormGuideAgent
│ └── nutrition.py # + RecipeAgent, CalorieAgent
├── llm_resources/
│ ├── coach/
│ │ ├── react_config.yaml
│ │ ├── react_core.md
│ │ └── slots/
│ │ ├── routing_rules.md
│ │ └── safety_guidelines.md
│ ├── trainer/
│ │ ├── react_config.yaml
│ │ ├── react_core.md
│ │ └── slots/
│ │ └── training.md
│ ├── nutrition/
│ │ ├── react_config.yaml
│ │ ├── react_core.md
│ │ └── slots/
│ │ └── dietary_guidelines.md
│ ├── exercise_agent/
│ │ ├── react_config.yaml
│ │ ├── react_core.md
│ │ └── slots/
│ │ └── exercises.md
│ ├── form_guide/
│ │ ├── react_config.yaml
│ │ ├── react_core.md
│ │ └── slots/
│ │ └── form_cues.md
│ ├── recipe_agent/
│ │ ├── react_config.yaml
│ │ ├── react_core.md
│ │ └── slots/
│ │ └── recipes.md
│ ├── calorie_agent/
│ │ ├── react_config.yaml
│ │ ├── react_core.md
│ │ └── slots/
│ │ └── nutrition_data.md
│ └── session_manager/
│ ├── react_config.yaml
│ └── react_core.md
├── graph.py
└── app.py
Step 1: Configuration (YAML)
The YAML config defines every agent and the edges between them. This is the single source of truth for your topology — llm-orchestrator validates it at startup.
Create config.yaml:
yaml
agents:
# === Supervisor ===
- name: coach
type: supervisor
knowledge_enabled: true
description: >
Health and fitness coach. Routes user questions to the appropriate
specialist (trainer or nutrition). Synthesizes final responses.
# === Specialist agents (supervisor targets) ===
- name: trainer
type: custom
knowledge_enabled: true
description: >
Exercise and fitness specialist. Creates workout plans,
suggests exercises, delegates to exercise_agent and form_guide.
- name: nutrition
type: custom
knowledge_enabled: true
description: >
Nutrition and diet specialist. Provides dietary advice,
delegates to recipe_agent and calorie_agent.
# === Mesh agents (under Trainer) ===
- name: exercise_agent
type: custom
knowledge_enabled: true
description: >
Exercise database. Provides specific exercise details,
muscle groups targeted, sets/reps recommendations.
- name: form_guide
type: custom
knowledge_enabled: true
description: >
Exercise form and technique expert. Explains proper form,
common mistakes, and injury prevention.
# === Mesh agents (under Nutrition) ===
- name: recipe_agent
type: custom
knowledge_enabled: true
description: >
Recipe finder. Suggests healthy recipes based on dietary
preferences, restrictions, and calorie targets.
- name: calorie_agent
type: custom
knowledge_enabled: true
description: >
Calorie and macro calculator. Breaks down nutritional
content of foods and meals.
# === Shared utility ===
- name: search_agent
type: custom
description: >
Internet search utility. Finds current information about
exercises, nutrition facts, and health topics.
# === Session management ===
- name: session_manager
type: session_manager
knowledge_enabled: true
description: >
Manages conversation memory. Stores user preferences,
health goals, allergies, and conversation history.
topology:
# --- Supervisor routes ---
edges:
# Coach can route to specialists and shared agents
- from_agent: coach
to_agent: trainer
- from_agent: coach
to_agent: nutrition
- from_agent: coach
to_agent: search_agent
- from_agent: coach
to_agent: session_manager
# Specialists return to coach
- from_agent: trainer
to_agent: coach
- from_agent: nutrition
to_agent: coach
# --- Mesh routes (Trainer group) ---
- from_agent: trainer
to_agent: exercise_agent
- from_agent: trainer
to_agent: form_guide
- from_agent: exercise_agent
to_agent: trainer
- from_agent: exercise_agent
to_agent: form_guide
- from_agent: form_guide
to_agent: trainer
- from_agent: form_guide
to_agent: exercise_agent
# --- Mesh routes (Nutrition group) ---
- from_agent: nutrition
to_agent: recipe_agent
- from_agent: nutrition
to_agent: calorie_agent
- from_agent: recipe_agent
to_agent: nutrition
- from_agent: recipe_agent
to_agent: calorie_agent
- from_agent: calorie_agent
to_agent: nutrition
- from_agent: calorie_agent
to_agent: recipe_agent
# --- Shared utility routes (all specialists can use search) ---
- from_agent: trainer
to_agent: search_agent
- from_agent: nutrition
to_agent: search_agent
- from_agent: exercise_agent
to_agent: search_agent
- from_agent: form_guide
to_agent: search_agent
- from_agent: recipe_agent
to_agent: search_agent
- from_agent: calorie_agent
to_agent: search_agent
# Search returns to the agent that called it
# (handled by storing the caller in context slots)
- from_agent: search_agent
to_agent: coach
- from_agent: search_agent
to_agent: trainer
- from_agent: search_agent
to_agent: nutrition
- from_agent: search_agent
to_agent: exercise_agent
- from_agent: search_agent
to_agent: form_guide
- from_agent: search_agent
to_agent: recipe_agent
- from_agent: search_agent
to_agent: calorie_agent
# Session manager returns to coach
- from_agent: session_manager
to_agent: coach
default_return: coach
context:
shared_slots:
max_tokens: 8192
tiered_slots:
turn:
max_tokens: 4096
session:
max_tokens: 4096
long_term:
max_tokens: 4096
eviction_strategy: demotion
session:
enabled: true
knowledge:
enabled: true
base_dir: llm_resources
max_total_tokens: 8192
Notice the hybrid topology: the Coach (supervisor) edges form a hub-and-spoke pattern, while the Trainer/Exercise/FormGuide and Nutrition/Recipe/Calorie groups have full mesh connectivity. The Search Agent has edges from and to every specialist, making it a shared utility.
Step 2: The Search Agent
The Search Agent is our shared utility — any specialist can route to it when they need current information from the internet. It demonstrates the tool use pattern: giving an LLM access to external functions.
Create agents/search.py:
python
"""Search agent — shared utility for internet lookups."""
from __future__ import annotations
from typing import Any
from langchain_core.language_models import BaseLanguageModel
from langchain_core.messages import HumanMessage
from langchain_core.runnables import RunnableConfig
from llm_orchestrator import AgentConfig, BaseAgent, AgentResponse, OrchestratorState
class SearchAgent(BaseAgent):
"""Searches the internet and returns results to the calling agent.
Uses context slots to track which agent requested the search,
so it can route back to the correct caller.
"""
def __init__(
self,
config: AgentConfig,
llm: BaseLanguageModel[Any],
search_fn: Any | None = None,
) -> None:
super().__init__(config=config)
self._llm = llm
self._search_fn = search_fn # Injected search tool
async def process(self, state: OrchestratorState, config: RunnableConfig | None = None) -> AgentResponse:
context = state.get("context_slots", {})
caller = context.get("search_caller", "coach")
last_message = state["messages"][-1].content if state["messages"] else ""
# Extract the search query from the conversation
query = await self._extract_query(last_message)
# Perform the search
if self._search_fn is not None:
results = await self._search_fn(query)
else:
results = f"[Search results for: {query}] (no search tool configured)"
# Summarize results using the LLM
summary = await self._summarize_results(query, results)
return AgentResponse(
agent_name=self.name,
content=summary,
next_agent=caller, # Return to whoever asked for the search
slots_update={"last_search_query": query, "last_search_results": summary},
)
async def _extract_query(self, message: str) -> str:
"""Use the LLM to extract a clean search query from the message."""
prompt = (
"Extract a concise search query from the following message. "
"Return ONLY the search query, nothing else.\n\n"
f"Message: {message}"
)
response = await self._llm.ainvoke([HumanMessage(content=prompt)])
return response.content.strip()
async def _summarize_results(self, query: str, results: str) -> str:
"""Summarize search results into a useful response."""
prompt = (
f"Summarize the following search results for the query '{query}'. "
"Be concise and factual. Include sources where available.\n\n"
f"Results:\n{results}"
)
response = await self._llm.ainvoke([HumanMessage(content=prompt)])
return response.content.strip()
Key patterns to notice:
- Caller tracking — The Search Agent reads
search_callerfrom context slots to know who to return results to. When a specialist routes to Search, it sets this slot. - Tool injection — The
search_fnis passed in at construction time, not hardcoded. This makes testing easy (pass a mock) and keeps the agent decoupled from any specific search API. - LLM-powered extraction — The agent uses its LLM to extract a clean search query from the conversational message, then to summarize the results.
Step 3: Leaf Agents
Leaf agents are the workers at the bottom of the hierarchy. They do specialized tasks and hand results back to their coordinator. Each leaf agent uses the ReactExecutor to reason about the user's request, load domain knowledge on demand, and produce a response.
Exercise Agent
First, create the knowledge folder for the exercise agent. This replaces what would have been a hardcoded SYSTEM_PROMPT Python constant.
llm_resources/exercise_agent/react_config.yaml:
yaml
max_passes: 3
slots:
exercises:
description: "Exercise database with muscle groups, sets, reps, equipment, and difficulty levels"
modules: ["slots/exercises.md"]
triggers: "When the user asks about specific exercises, routines, or workout details"
safety:
description: "Exercise safety guidelines and contraindications"
modules: ["slots/safety.md"]
triggers: "When recommending exercises to beginners or when injuries are mentioned"
llm_resources/exercise_agent/react_core.md:
markdown
You are an exercise specialist in a health assistant system.
## Your Role
- Provide specific exercise details: muscle groups, sets, reps, rest periods,
equipment needed, and difficulty level.
- Check context for the user's fitness level and goals.
- Structure responses with clear sections: Exercise Name, Target Muscles,
Instructions, Sets/Reps, Modifications.
- Always mention when an exercise requires a spotter or has injury risks.
- You provide fitness information, not medical advice.
## Available Knowledge
{available_slots}
## Instructions
1. Use <thought>...</thought> to reason about the user's request.
2. If you need exercise data, load it: <action type="request_slot">{"slot": "exercises"}</action>
3. When ready to respond, use: <action type="respond">{"content": "Your response here"}</action>
llm_resources/exercise_agent/slots/exercises.md:
markdown
# Exercise Database
## Compound Movements
- **Barbell Squat**: Quads, glutes, hamstrings, core. 3-4x8-12. Requires rack.
- **Deadlift**: Posterior chain, back, grip. 3x5-8. Barbell required.
- **Bench Press**: Chest, shoulders, triceps. 3-4x8-12. Bench + barbell.
- **Overhead Press**: Shoulders, triceps, core. 3x8-10. Barbell or dumbbells.
- **Barbell Row**: Back, biceps, rear delts. 3x8-12. Barbell required.
## Bodyweight Movements
- **Push-ups**: Chest, shoulders, triceps. 3x10-20. No equipment.
- **Bodyweight Squats**: Quads, glutes. 3x15-20. No equipment.
- **Lunges**: Quads, glutes, balance. 3x10-12 each leg. No equipment.
- **Plank**: Core stabilization. 3x20-60 seconds. No equipment.
- **Glute Bridges**: Glutes, hamstrings. 3x12-15. No equipment.
## Difficulty Scaling
- Beginner: Bodyweight movements, machines, light dumbbells
- Intermediate: Free weights, compound movements, supersets
- Advanced: Heavy compounds, plyometrics, advanced programming
Now the Python class. Notice there is no SYSTEM_PROMPT constant — the prompt lives in react_core.md:
python
"""Exercise Agent — provides specific exercise information via ReAct."""
from __future__ import annotations
from typing import Any
from langchain_core.language_models import BaseLanguageModel
from langchain_core.runnables import RunnableConfig
from llm_orchestrator import (
AgentConfig,
AgentResponse,
BaseAgent,
OrchestratorState,
ReactExecutor,
ReactTemplateManager,
TerminalActionResult,
)
class ExerciseAgent(BaseAgent):
"""Provides specific exercise details, routines, and recommendations.
Uses ReactExecutor to load exercise knowledge on demand and
reason about the best response.
"""
def __init__(
self,
config: AgentConfig,
llm: BaseLanguageModel[Any],
template_manager: ReactTemplateManager,
) -> None:
self._llm = llm
action_handlers = {
"request_slot": self._create_request_slot_handler(template_manager),
"respond": self._create_respond_handler(),
}
react_executor = ReactExecutor(
agent_name=config.name,
llm=llm,
template_manager=template_manager,
action_handlers=action_handlers,
)
super().__init__(
config=config,
template_manager=template_manager,
react_executor=react_executor,
)
async def process(self, state: OrchestratorState, config: RunnableConfig | None = None) -> AgentResponse:
context = state.get("context_slots", {})
last_message = state["messages"][-1].content if state["messages"] else ""
context_vars = {k: str(v) for k, v in context.items()}
# Reset clears loaded slot caches between invocations
await self._template_manager.reset()
result = await self._react_executor.execute(
user_input=last_message,
context=context_vars,
config=config,
)
return AgentResponse(
agent_name=self.name,
content=result.get("content", ""),
next_agent="trainer", # Always return to coordinator
)
def _create_request_slot_handler(
self, tm: ReactTemplateManager
) -> Any:
async def handler(
params: dict[str, Any], state: dict[str, Any], config: RunnableConfig | None = None,
) -> tuple[str, dict[str, Any]]:
slot = params.get("slot", "")
result = await tm.load_slot(slot)
state.setdefault("_loaded_slots", []).append(slot)
state.setdefault("_loaded_slot_contents", {})[slot] = result.content
return f"Loaded slot '{slot}':\n{result.content}", {}
return handler
def _create_respond_handler(self) -> Any:
async def handler(
params: dict[str, Any], state: dict[str, Any], config: RunnableConfig | None = None,
) -> tuple[str, dict[str, Any]]:
raise TerminalActionResult({
"content": params.get("content", ""),
"next_agent": "trainer",
})
return handler
Pattern: Always call
await self._template_manager.reset()beforeself._react_executor.execute(). This clears loaded slot caches between invocations. TheSupervisorAgentdoes this internally — custom agents must do it explicitly. All other agents in this tutorial follow the same pattern.
Form Guide Agent
The Form Guide Agent follows the same pattern. Its knowledge folder (llm_resources/form_guide/) contains react_config.yaml, react_core.md, and slots/form_cues.md with form instructions, common mistakes, and safety cues. The Python class is nearly identical:
python
"""Form Guide Agent — exercise form and technique expert via ReAct."""
from __future__ import annotations
from typing import Any
from langchain_core.language_models import BaseLanguageModel
from langchain_core.runnables import RunnableConfig
from llm_orchestrator import (
AgentConfig,
AgentResponse,
BaseAgent,
OrchestratorState,
ReactExecutor,
ReactTemplateManager,
TerminalActionResult,
)
class FormGuideAgent(BaseAgent):
"""Explains proper exercise form, common mistakes, and safety cues.
Uses ReactExecutor to load form knowledge on demand.
"""
def __init__(
self,
config: AgentConfig,
llm: BaseLanguageModel[Any],
template_manager: ReactTemplateManager,
) -> None:
self._llm = llm
action_handlers = {
"request_slot": self._create_request_slot_handler(template_manager),
"respond": self._create_respond_handler(),
}
react_executor = ReactExecutor(
agent_name=config.name,
llm=llm,
template_manager=template_manager,
action_handlers=action_handlers,
)
super().__init__(
config=config,
template_manager=template_manager,
react_executor=react_executor,
)
async def process(self, state: OrchestratorState, config: RunnableConfig | None = None) -> AgentResponse:
context = state.get("context_slots", {})
last_message = state["messages"][-1].content if state["messages"] else ""
context_vars = {k: str(v) for k, v in context.items()}
await self._template_manager.reset()
result = await self._react_executor.execute(
user_input=last_message,
context=context_vars,
config=config,
)
return AgentResponse(
agent_name=self.name,
content=result.get("content", ""),
next_agent="trainer", # Always return to coordinator
)
def _create_request_slot_handler(
self, tm: ReactTemplateManager
) -> Any:
async def handler(
params: dict[str, Any], state: dict[str, Any], config: RunnableConfig | None = None,
) -> tuple[str, dict[str, Any]]:
slot = params.get("slot", "")
result = await tm.load_slot(slot)
state.setdefault("_loaded_slots", []).append(slot)
state.setdefault("_loaded_slot_contents", {})[slot] = result.content
return f"Loaded slot '{slot}':\n{result.content}", {}
return handler
def _create_respond_handler(self) -> Any:
async def handler(
params: dict[str, Any], state: dict[str, Any], config: RunnableConfig | None = None,
) -> tuple[str, dict[str, Any]]:
raise TerminalActionResult({
"content": params.get("content", ""),
"next_agent": "trainer",
})
return handler
Recipe Agent and Calorie Agent
The Recipe Agent and Calorie Agent follow the same pattern under the Nutrition group. Their knowledge folders (llm_resources/recipe_agent/ and llm_resources/calorie_agent/) contain domain-specific react_core.md prompts and slot files for recipes and nutritional data respectively.
Here is the Recipe Agent:
python
"""Recipe Agent — finds and suggests healthy recipes via ReAct."""
from __future__ import annotations
from typing import Any
from langchain_core.language_models import BaseLanguageModel
from langchain_core.runnables import RunnableConfig
from llm_orchestrator import (
AgentConfig,
AgentResponse,
BaseAgent,
OrchestratorState,
ReactExecutor,
ReactTemplateManager,
TerminalActionResult,
)
class RecipeAgent(BaseAgent):
"""Finds and suggests healthy recipes based on user preferences.
Uses ReactExecutor to load recipe knowledge on demand.
"""
def __init__(
self,
config: AgentConfig,
llm: BaseLanguageModel[Any],
template_manager: ReactTemplateManager,
) -> None:
self._llm = llm
action_handlers = {
"request_slot": self._create_request_slot_handler(template_manager),
"respond": self._create_respond_handler(),
}
react_executor = ReactExecutor(
agent_name=config.name,
llm=llm,
template_manager=template_manager,
action_handlers=action_handlers,
)
super().__init__(
config=config,
template_manager=template_manager,
react_executor=react_executor,
)
async def process(self, state: OrchestratorState, config: RunnableConfig | None = None) -> AgentResponse:
context = state.get("context_slots", {})
last_message = state["messages"][-1].content if state["messages"] else ""
context_vars = {k: str(v) for k, v in context.items()}
await self._template_manager.reset()
result = await self._react_executor.execute(
user_input=last_message,
context=context_vars,
config=config,
)
return AgentResponse(
agent_name=self.name,
content=result.get("content", ""),
next_agent="nutrition", # Always return to coordinator
)
def _create_request_slot_handler(
self, tm: ReactTemplateManager
) -> Any:
async def handler(
params: dict[str, Any], state: dict[str, Any], config: RunnableConfig | None = None,
) -> tuple[str, dict[str, Any]]:
slot = params.get("slot", "")
result = await tm.load_slot(slot)
state.setdefault("_loaded_slots", []).append(slot)
state.setdefault("_loaded_slot_contents", {})[slot] = result.content
return f"Loaded slot '{slot}':\n{result.content}", {}
return handler
def _create_respond_handler(self) -> Any:
async def handler(
params: dict[str, Any], state: dict[str, Any], config: RunnableConfig | None = None,
) -> tuple[str, dict[str, Any]]:
raise TerminalActionResult({
"content": params.get("content", ""),
"next_agent": "nutrition",
})
return handler
The Calorie Agent follows the same structure — it takes food items and returns macro breakdowns:
python
"""Calorie Agent — calorie counting and macro breakdowns via ReAct."""
from __future__ import annotations
from typing import Any
from langchain_core.language_models import BaseLanguageModel
from langchain_core.runnables import RunnableConfig
from llm_orchestrator import (
AgentConfig,
AgentResponse,
BaseAgent,
OrchestratorState,
ReactExecutor,
ReactTemplateManager,
TerminalActionResult,
)
class CalorieAgent(BaseAgent):
"""Provides calorie counts and macronutrient breakdowns.
Uses ReactExecutor to load nutritional data on demand.
"""
def __init__(
self,
config: AgentConfig,
llm: BaseLanguageModel[Any],
template_manager: ReactTemplateManager,
) -> None:
self._llm = llm
action_handlers = {
"request_slot": self._create_request_slot_handler(template_manager),
"respond": self._create_respond_handler(),
}
react_executor = ReactExecutor(
agent_name=config.name,
llm=llm,
template_manager=template_manager,
action_handlers=action_handlers,
)
super().__init__(
config=config,
template_manager=template_manager,
react_executor=react_executor,
)
async def process(self, state: OrchestratorState, config: RunnableConfig | None = None) -> AgentResponse:
context = state.get("context_slots", {})
last_message = state["messages"][-1].content if state["messages"] else ""
context_vars = {k: str(v) for k, v in context.items()}
await self._template_manager.reset()
result = await self._react_executor.execute(
user_input=last_message,
context=context_vars,
config=config,
)
return AgentResponse(
agent_name=self.name,
content=result.get("content", ""),
next_agent="nutrition", # Always return to coordinator
)
def _create_request_slot_handler(
self, tm: ReactTemplateManager
) -> Any:
async def handler(
params: dict[str, Any], state: dict[str, Any], config: RunnableConfig | None = None,
) -> tuple[str, dict[str, Any]]:
slot = params.get("slot", "")
result = await tm.load_slot(slot)
state.setdefault("_loaded_slots", []).append(slot)
state.setdefault("_loaded_slot_contents", {})[slot] = result.content
return f"Loaded slot '{slot}':\n{result.content}", {}
return handler
def _create_respond_handler(self) -> Any:
async def handler(
params: dict[str, Any], state: dict[str, Any], config: RunnableConfig | None = None,
) -> tuple[str, dict[str, Any]]:
raise TerminalActionResult({
"content": params.get("content", ""),
"next_agent": "nutrition",
})
return handler
Pattern: Clean Agent Design
Notice the pattern every leaf agent follows:
- Externalized prompts — Domain knowledge lives in
react_core.md, not in Python string constants. Update the prompt by editing a markdown file, no code changes needed.
- ReAct reasoning — The LLM explicitly reasons in
<thought>tags before acting. This produces auditable logs and more reliable behavior than implicit single-call reasoning.
- Consistent handlers — Every leaf agent registers two handlers:
request_slot(non-terminal, loads knowledge) andrespond(terminal, returns content). This consistency makes the codebase predictable — once you understand one leaf agent, you understand them all.
- Explicit tier assignments via
slots_tier— When agents set context slots, they declare which tier each slot belongs to.
This separation of concerns means the LLM focuses on generating domain-specific content via the ReAct loop, while Python handles routing (hardcoded next_agent), data structure, and tier assignment.
Step 4: Specialist Agents
Specialist agents (Trainer and Nutrition) sit between the Coach and the leaf agents. They receive tasks from the Coach, decide which leaf agents to involve, coordinate the work, and return a consolidated response.
Unlike leaf agents that always return to their coordinator, specialist agents use the ReAct loop to decide routing — the LLM reasons about whether to delegate to a sub-agent or return to the supervisor. This replaces the old _determine_routing() method that used Python keyword matching.
Trainer Agent
First, the knowledge folder.
llm_resources/trainer/react_config.yaml:
yaml
max_passes: 5
slots:
training:
description: "Training program design principles, periodization, and workout structures"
modules: ["slots/training.md"]
triggers: "When designing workout plans or programs"
llm_resources/trainer/react_core.md:
markdown
You are a certified personal trainer and fitness specialist in a health
assistant system.
## Your Role
- Create workout plans, suggest exercises, design training programs,
and answer fitness questions.
- Check context for: fitness_level, fitness_goals, injuries, equipment.
- Include disclaimers for intense workouts. Recommend warm-ups and cool-downs.
- Never prescribe exercises for medical conditions.
## Available Knowledge
{available_slots}
## Routing
You coordinate two sub-agents:
- **exercise_agent**: Knows specific exercises, muscle groups, sets/reps.
Route here when the user needs exercise details or a routine.
- **form_guide**: Knows proper form, technique, and injury prevention.
Route here when the user asks about form or technique.
If you can answer directly, respond. Otherwise, route to the appropriate
sub-agent. When you have a complete answer (possibly after receiving
sub-agent results), route back to **coach**.
## Instructions
1. Use <thought>...</thought> to reason about the request and routing.
2. Load knowledge if needed: <action type="request_slot">{"slot": "training"}</action>
3. To delegate to a sub-agent: <action type="route">{"next_agent": "exercise_agent", "content": "optional context"}</action>
4. To return a final answer: <action type="route">{"next_agent": "coach", "content": "Your complete response"}</action>
llm_resources/trainer/slots/training.md:
markdown
# Training Program Design
## Beginner Programs (0-6 months)
- Full body 3x/week, 48h rest between sessions
- Focus on compound movements with bodyweight or light weights
- Progressive overload: add reps before adding weight
- Target: learn movement patterns, build base fitness
## Intermediate Programs (6-18 months)
- Upper/Lower split 4x/week or Push/Pull/Legs
- Mix compound and isolation exercises
- Periodization: 3 weeks progressive, 1 week deload
- Target: hypertrophy and strength development
## Advanced Programs (18+ months)
- Specialized splits, 5-6x/week
- Block periodization, advanced techniques (supersets, drop sets)
- Target: specific goals (powerlifting, bodybuilding, sport-specific)
## Safety Principles
- Always warm up 5-10 minutes before lifting
- Cool down and stretch after every session
- Never train through sharp pain -- distinguish from muscle fatigue
- Beginners should avoid 1RM attempts
Now the Python class. Notice there is no _determine_routing() method — the LLM decides routing through the ReAct route action:
Create agents/trainer.py:
python
"""Trainer Agent — exercise and fitness specialist with ReAct routing."""
from __future__ import annotations
from typing import Any
from langchain_core.language_models import BaseLanguageModel
from langchain_core.runnables import RunnableConfig
from llm_orchestrator import (
AgentConfig,
AgentResponse,
BaseAgent,
OrchestratorState,
ReactExecutor,
ReactTemplateManager,
TerminalActionResult,
)
class TrainerAgent(BaseAgent):
"""Exercise and fitness specialist that coordinates mesh sub-agents.
Uses ReactExecutor with a `route` terminal action to decide whether
to delegate to exercise_agent, form_guide, or return to coach.
"""
def __init__(
self,
config: AgentConfig,
llm: BaseLanguageModel[Any],
template_manager: ReactTemplateManager,
) -> None:
self._llm = llm
action_handlers = {
"request_slot": self._create_request_slot_handler(template_manager),
"route": self._create_route_handler(),
}
react_executor = ReactExecutor(
agent_name=config.name,
llm=llm,
template_manager=template_manager,
action_handlers=action_handlers,
)
super().__init__(
config=config,
template_manager=template_manager,
react_executor=react_executor,
)
async def process(self, state: OrchestratorState, config: RunnableConfig | None = None) -> AgentResponse:
context = state.get("context_slots", {})
last_message = state["messages"][-1].content if state["messages"] else ""
context_vars = {k: str(v) for k, v in context.items()}
await self._template_manager.reset()
result = await self._react_executor.execute(
user_input=last_message,
context=context_vars,
config=config,
)
return AgentResponse(
agent_name=self.name,
content=result.get("content", ""),
next_agent=result.get("next_agent", "coach"),
slots_update=result.get("slots_update", {}),
)
def _create_request_slot_handler(
self, tm: ReactTemplateManager
) -> Any:
async def handler(
params: dict[str, Any], state: dict[str, Any], config: RunnableConfig | None = None,
) -> tuple[str, dict[str, Any]]:
slot = params.get("slot", "")
result = await tm.load_slot(slot)
state.setdefault("_loaded_slots", []).append(slot)
state.setdefault("_loaded_slot_contents", {})[slot] = result.content
return f"Loaded slot '{slot}':\n{result.content}", {}
return handler
def _create_route_handler(self) -> Any:
async def handler(
params: dict[str, Any], state: dict[str, Any], config: RunnableConfig | None = None,
) -> tuple[str, dict[str, Any]]:
next_agent = params.get("next_agent", "coach")
content = params.get("content", "")
slots_update = params.get("slots_update", {})
raise TerminalActionResult({
"content": content,
"next_agent": next_agent,
"slots_update": slots_update,
})
return handler
Nutrition Agent
Create agents/nutrition.py. The knowledge folder (llm_resources/nutrition/) follows the same structure as the trainer, with react_core.md containing routing instructions for recipe_agent and calorie_agent:
python
"""Nutrition Agent — food and diet specialist with ReAct routing."""
from __future__ import annotations
from typing import Any
from langchain_core.language_models import BaseLanguageModel
from langchain_core.runnables import RunnableConfig
from llm_orchestrator import (
AgentConfig,
AgentResponse,
BaseAgent,
OrchestratorState,
ReactExecutor,
ReactTemplateManager,
TerminalActionResult,
)
class NutritionAgent(BaseAgent):
"""Nutrition and diet specialist that coordinates mesh sub-agents.
Uses ReactExecutor with a `route` terminal action to decide whether
to delegate to recipe_agent, calorie_agent, or return to coach.
"""
def __init__(
self,
config: AgentConfig,
llm: BaseLanguageModel[Any],
template_manager: ReactTemplateManager,
) -> None:
self._llm = llm
action_handlers = {
"request_slot": self._create_request_slot_handler(template_manager),
"route": self._create_route_handler(),
}
react_executor = ReactExecutor(
agent_name=config.name,
llm=llm,
template_manager=template_manager,
action_handlers=action_handlers,
)
super().__init__(
config=config,
template_manager=template_manager,
react_executor=react_executor,
)
async def process(self, state: OrchestratorState, config: RunnableConfig | None = None) -> AgentResponse:
context = state.get("context_slots", {})
last_message = state["messages"][-1].content if state["messages"] else ""
context_vars = {k: str(v) for k, v in context.items()}
await self._template_manager.reset()
result = await self._react_executor.execute(
user_input=last_message,
context=context_vars,
config=config,
)
return AgentResponse(
agent_name=self.name,
content=result.get("content", ""),
next_agent=result.get("next_agent", "coach"),
slots_update=result.get("slots_update", {}),
)
def _create_request_slot_handler(
self, tm: ReactTemplateManager
) -> Any:
async def handler(
params: dict[str, Any], state: dict[str, Any], config: RunnableConfig | None = None,
) -> tuple[str, dict[str, Any]]:
slot = params.get("slot", "")
result = await tm.load_slot(slot)
state.setdefault("_loaded_slots", []).append(slot)
state.setdefault("_loaded_slot_contents", {})[slot] = result.content
return f"Loaded slot '{slot}':\n{result.content}", {}
return handler
def _create_route_handler(self) -> Any:
async def handler(
params: dict[str, Any], state: dict[str, Any], config: RunnableConfig | None = None,
) -> tuple[str, dict[str, Any]]:
next_agent = params.get("next_agent", "coach")
content = params.get("content", "")
slots_update = params.get("slots_update", {})
raise TerminalActionResult({
"content": content,
"next_agent": next_agent,
"slots_update": slots_update,
})
return handler
How Specialist Delegation Works
When the Coach routes to the Trainer, here's what happens:
1. Coach sets current_agent = "trainer"
2. Trainer enters ReAct loop
3. Pass 1: <thought>User wants exercise details. I should delegate.</thought>
<action type="route">{"next_agent": "exercise_agent"}</action>
4. Graph routes to Exercise Agent
5. Exercise Agent runs its own ReAct loop, loads knowledge, responds
6. Exercise Agent returns AgentResponse(next_agent="trainer")
7. Trainer enters ReAct loop again with sub-agent results
8. Pass 1: <thought>I have the exercise details. I can compile the answer.</thought>
<action type="route">{"next_agent": "coach", "content": "Complete response..."}</action>
9. Coach delivers the final response to the user
The mesh routing between Trainer and its sub-agents happens naturally — the topology config allows these edges, and the MessageRouter validates each handoff at runtime. The LLM reasons about routing in its <thought> tags, producing auditable decision traces.
Step 5: The Coach (Supervisor)
The Coach is our supervisor agent. It's the only agent that talks to the user directly. It receives every message, decides which specialist to involve, and synthesizes the final response.
We use the library's built-in SupervisorAgent, which handles the routing logic for us. The supervisor operates in two modes:
Simple Mode
When no template_manager is provided, the supervisor makes a single LLM call. You configure the system prompt via the LLM's .bind() method or a wrapper. The supervisor scans the response for valid target agent names and routes accordingly.
python
"""Coach setup — using SupervisorAgent in simple mode."""
from llm_orchestrator import SupervisorAgent, TopologyResolver
coach = SupervisorAgent(
config=agent_configs["coach"],
llm=llm,
topology=topology,
# No template_manager = simple mode (single LLM call)
)
ReAct Mode (Production)
When a ReactTemplateManager is provided, the supervisor runs a multi-pass thought/action/observation loop. It loads knowledge slots on demand and uses structured <thought> and <action> XML to reason about routing decisions. This is the recommended approach for production systems.
Create the knowledge folder for the coach.
llm_resources/coach/react_config.yaml:
yaml
max_passes: 5
actions:
request_slot:
description: "Load a knowledge slot into context"
parameters:
slot: "Name of the slot to load (routing_rules or safety_guidelines)"
required: [slot]
decide:
description: "Route to a specialist or respond directly to the user"
parameters:
next_agent: "Target agent name. Omit (or null) to respond directly."
content: "Context for the specialist, or the response text for the user"
required: [content]
slots:
routing_rules:
description: "Rules for routing user requests to the correct specialist agent"
modules: ["slots/routing_rules.md"]
triggers: "When deciding which specialist should handle the user's request"
safety_guidelines:
description: "Health and safety guidelines for all responses"
modules: ["slots/safety_guidelines.md"]
triggers: "When responding to health-related questions or before delivering final answers"
llm_resources/coach/react_core.md:
markdown
You are the Coach, the primary health and fitness assistant. You are the
only agent that talks directly to the user.
## Your Role
- Receive user questions and route them to the appropriate specialist.
- Synthesize specialist responses into clear, helpful answers.
- Maintain a friendly, encouraging tone.
- Always include health disclaimers where appropriate.
## Available Actions
{available_actions}
## Available Knowledge
{available_slots}
## Your Team
- **trainer**: Exercise and fitness specialist. Route workout, exercise, and training questions here.
- **nutrition**: Food and diet specialist. Route food, diet, calorie, and recipe questions here.
- **search_agent**: Internet search. Route when current/external data is needed.
- **session_manager**: Memory storage. Route when user states preferences, goals, or allergies that should be remembered.
## Instructions
1. Use <thought>...</thought> to reason about which specialist to involve.
2. Load routing rules or safety guidelines using request_slot if needed.
3. Use decide to route to a specialist, or to respond directly to the user.
To respond directly, omit next_agent (or pass null) and put your answer in content.
llm_resources/coach/slots/routing_rules.md:
markdown
# Routing Rules
## Route to Trainer
- Workout plans, exercise suggestions, training programs
- Questions about sets, reps, rest periods
- Fitness level assessments
- Exercise modifications and progressions
- Keywords: workout, exercise, training, gym, strength, cardio, flexibility
## Route to Nutrition
- Diet advice, meal planning, food choices
- Calorie counting, macro breakdowns
- Recipe requests
- Supplement questions
- Allergy-aware meal suggestions
- Keywords: food, diet, calories, nutrition, recipe, meal, eat
## Route to Search Agent
- Current research or trends
- Specific product information
- External data not in knowledge base
## Route to Session Manager
- User states personal info (allergies, goals, fitness level)
- User asks to remember or recall preferences
- Beginning of conversation (load user profile)
## Handle Directly
- Greetings, small talk, general encouragement
- Clarifying questions
- Synthesizing responses from multiple specialists
llm_resources/coach/slots/safety_guidelines.md:
markdown
# Health and Safety Guidelines
## Required Disclaimers
- Always recommend consulting a healthcare professional for medical concerns
- Never diagnose conditions or prescribe treatments
- Include disclaimers on exercise intensity and dietary changes
- State when nutritional values are estimates
## Absolute Constraints
- Respect stated allergies -- never suggest foods containing allergens
- Never recommend exercises beyond the user's stated fitness level
- Flag when advice may not apply to pregnant women, elderly, or children
## Escalation Triggers
- User reports pain, injury, or medical symptoms -> recommend seeing a doctor
- User mentions eating disorder symptoms -> respond with care, suggest professional help
- User asks about supplements for medical conditions -> recommend consulting a doctor
Wire it up:
python
"""Coach setup — using SupervisorAgent in ReAct mode."""
from llm_orchestrator import (
SupervisorAgent, TopologyResolver,
ReactTemplateManager, FileSystemKnowledgeLoader,
)
loader = FileSystemKnowledgeLoader(base_dir="llm_resources")
coach_tm = await ReactTemplateManager.create(
agent_name="coach",
base_dir="llm_resources",
loader=loader,
)
coach = SupervisorAgent(
config=agent_configs["coach"],
llm=llm,
topology=topology,
template_manager=coach_tm, # Enables ReAct mode
)
| Simple Mode | ReAct Mode | |
|---|---|---|
| LLM calls | Single call | Multi-pass loop |
| Knowledge | None | On-demand slot loading |
| Reasoning | Implicit | Explicit <thought> / <action> |
| Use when | Prototyping, simple routing | Production, complex decisions |
To respond directly to the user without routing to another agent, set next_agent=None (or omit it) in the decide action. BaseAgent.invoke() then sets current_agent = "coach" (the agent's own name), and the routing function returns END when it detects this self-name sentinel — completing the conversation turn cleanly.
The SupervisorAgent automatically validates routing decisions against the topology's allowed edges. If the LLM suggests an invalid target, the supervisor falls back to the default_return agent (typically itself).
Step 6: Session Manager
The library provides two pieces for session management:
SessionManager— An ABC interface for storage backends (messages, summaries, memories)SessionManagerAgent— A ReAct agent that usesSessionManagermethods as action handlers
The SessionManagerAgent is a full AI agent with an LLM brain. It reasons about which session operations to perform — loading history, storing memories, summarizing conversations — using a ReAct loop. Each SessionManager method becomes an action the agent can invoke.
Implementing the Storage Backend
The SessionManager interface has methods for three data types and four newer methods for pagination and memory management. The session models include rich metadata fields:
SessionMessage— Hasid,role,content,session_id,timestamp,agent_name,metadata, andtoken_countfields.Summary— Hasid,session_id,content,turn_start,turn_end,timestamp,topics,updated_at, andsummary_typefields.Memory— Hasid,session_id,key,value,importance,timestamp,source_agent,categories,metadata,updated_at, andexpires_atfields.PaginatedResult[T]— Generic wrapper withitems,next_cursor, andhas_morefor cursor-based pagination.
The library ships InMemorySessionManager for dev/testing, so you don't need to write your own:
python
# No custom implementation needed — use the library's built-in:
from llm_orchestrator import InMemorySessionManager
session_impl = InMemorySessionManager()
# InMemorySessionManager includes per-session asyncio.Lock concurrency
# safety and implements all 12 SessionManager methods.
# For production, implement SessionManager with a database backend.
Why not write your own? The library's
InMemorySessionManager(session/in_memory.py) already handles all 12 methods with per-sessionasyncio.Lockconcurrency safety. Writing a custom in-memory version loses these guarantees and adds maintenance burden. Use the built-in for dev/testing, and implementSessionManageragainst a real database for production.
Behavior notes:
get_messages(),get_messages_paginated(),search_memories(),search_memories_by_user(), andlist_memories()all raiseValueErrorwhenlimit < 1.recall_memory()returnsNonefor expired memories (Memory.expires_atin the past).update_memory()anddelete_memory()accept an optionaluser_idparameter —update_memory()returnsNoneanddelete_memory()returnsFalsewhen the caller'suser_iddoes not match the stored memory'suser_id(ownership check).
Capability Discovery
If your implementation only supports a subset of methods (e.g., messages but not memories), override get_capabilities():
python
class MessagesOnlySessionManager(SessionManager):
"""Only implements message storage."""
def get_capabilities(self) -> set[str]:
return {"get_messages", "save_messages", "clear_session"}
# ... implement those three methods, raise NotImplementedError for the rest
The SessionManagerAgent reads capabilities at construction time and only creates action handlers for the methods your backend supports. The base SessionManager.get_capabilities() returns only the 3 abstract methods:
python
# Base SessionManager.get_capabilities() returns only:
{"get_messages", "save_messages", "clear_session"}
# InMemorySessionManager overrides this to return all 12 methods.
# If your custom backend only supports a subset, override
# get_capabilities() to match what you've implemented.
Wiring the SessionManagerAgent
The SessionManagerAgent requires an LLM and a ReactTemplateManager — it's a real ReAct agent that reasons about which operations to perform:
python
from llm_orchestrator import (
SessionManagerAgent, ReactTemplateManager, FileSystemKnowledgeLoader,
)
# Create knowledge loader for session agent prompts
loader = FileSystemKnowledgeLoader(base_dir="llm_resources")
session_tm = await ReactTemplateManager.create(
agent_name="session_manager",
base_dir="llm_resources",
loader=loader,
)
session_manager = SessionManagerAgent(
config=agent_configs["session_manager"],
session_manager=session_impl, # Your SessionManager implementation
llm=llm, # LLM for reasoning
template_manager=session_tm, # Knowledge modules for prompts
)
Create knowledge modules for the session agent.
llm_resources/session_manager/react_config.yaml:
yaml
max_passes: 5
actions:
store_memory:
description: "Store a long-term memory about the user or conversation"
parameters:
key: "Unique identifier for this memory"
value: "Content to store"
importance: "Priority score 0.0-1.0 (default 0.5)"
required: [key, value]
recall_memory:
description: "Recall a specific memory by key"
parameters:
key: "Memory key to retrieve"
required: [key]
search_memories:
description: "Search memories matching a query within the current session"
parameters:
query: "Search query string"
limit: "Max results (default 5)"
required: [query]
search_memories_by_user:
description: "Search memories across all sessions for the current user"
parameters:
query: "Search query string"
limit: "Max results (default 10)"
required: [query]
complete:
description: "Send final response and return to supervisor"
parameters:
content: "Response message to send to the user"
required: [content]
slots: {}
llm_resources/session_manager/react_core.md:
markdown
You are {agent_name}, the Session Manager for a health assistant. Your job is
to manage conversation memory — storing user preferences, loading history, and
maintaining long-term facts about the user.
## Available Actions
{available_actions}
## Available Knowledge
{available_slots}
## Memories from Previous Sessions
{_injected_memories}
## Instructions
1. Use <thought>...</thought> to reason about what session operations are needed.
2. Use store_memory for allergies, goals, and preferences (importance ≥ 0.7).
3. Use search_memories_by_user to recall context from past sessions.
4. When finished, use the complete action to return to the supervisor.
Note:
{_injected_memories}is populated byMemoryInjectionMiddleware, which is attached to the session manager agent inbuild_graph()above. Without the middleware,load_core_prompt()would raiseKnowledgeLoadErrorbecause the placeholder would remain unresolved.
How the Agent Reasons
When invoked, the SessionManagerAgent enters a ReAct loop:
- Thought: "The user mentioned being allergic to peanuts. I should store this as a long-term memory."
- Action:
store_memory(key="allergies", value="peanuts", importance=0.9, source_agent="coach") - Observation: "Stored memory 'allergies' = 'peanuts'"
- Thought: "I should also save the current messages for history."
- Action:
save_messages(messages=[{"role": "user", "content": "...", "agent_name": "coach"}]) - Observation: "Saved 2 message(s)"
- Thought: "Session management complete. Return to supervisor."
- Action:
complete(content="Session updated", next_agent="coach", slots_tier={"allergies": "long_term"})
The agent uses slots_tier to declare persistence tiers for slot data — memories get LONG_TERM, message summaries get SESSION, and scratch data gets TURN.
Step 7: Building the Graph
Now we wire everything together into a LangGraph graph. This is where the topology config becomes a running application.
Create graph.py:
python
"""Graph builder — wires all agents into a LangGraph state machine."""
from __future__ import annotations
from typing import Any
from langchain_core.language_models import BaseLanguageModel
from llm_orchestrator import (
AgentConfig,
FileSystemKnowledgeLoader,
InMemorySessionManager,
MemoryInjectionMiddleware,
OrchestratorConfig,
ReactTemplateManager,
SessionManagerAgent,
SupervisorAgent,
TopologyResolver,
)
from llm_orchestrator.routing.graph_builder import build_topology_graph
from agents.nutrition import NutritionAgent, RecipeAgent, CalorieAgent
from agents.search import SearchAgent
from agents.trainer import TrainerAgent, ExerciseAgent, FormGuideAgent
async def build_graph(
config: OrchestratorConfig,
llm: BaseLanguageModel[Any],
) -> Any:
"""Build and compile the health assistant LangGraph graph.
Args:
config: Validated orchestrator config from YAML.
llm: LangChain-compatible language model.
Returns:
Compiled LangGraph graph ready for .ainvoke().
"""
# --- 1. Initialize infrastructure ---
topology = TopologyResolver(config=config.topology)
base_dir = config.knowledge.base_dir
loader = FileSystemKnowledgeLoader(base_dir=base_dir)
# Build agent config lookup
agent_configs: dict[str, AgentConfig] = {a.name: a for a in config.agents}
# --- 2. Create template managers for all agents ---
coach_tm = await ReactTemplateManager.create(
agent_name="coach", base_dir=base_dir, loader=loader,
)
trainer_tm = await ReactTemplateManager.create(
agent_name="trainer", base_dir=base_dir, loader=loader,
)
nutrition_tm = await ReactTemplateManager.create(
agent_name="nutrition", base_dir=base_dir, loader=loader,
)
exercise_tm = await ReactTemplateManager.create(
agent_name="exercise_agent", base_dir=base_dir, loader=loader,
)
form_guide_tm = await ReactTemplateManager.create(
agent_name="form_guide", base_dir=base_dir, loader=loader,
)
recipe_tm = await ReactTemplateManager.create(
agent_name="recipe_agent", base_dir=base_dir, loader=loader,
)
calorie_tm = await ReactTemplateManager.create(
agent_name="calorie_agent", base_dir=base_dir, loader=loader,
)
session_tm = await ReactTemplateManager.create(
agent_name="session_manager", base_dir=base_dir, loader=loader,
)
# --- 3. Create agents ---
session_impl = InMemorySessionManager()
memory_middleware = MemoryInjectionMiddleware(session_impl)
coach = SupervisorAgent(
config=agent_configs["coach"],
llm=llm,
topology=topology,
template_manager=coach_tm,
middleware=[memory_middleware],
)
trainer = TrainerAgent(
config=agent_configs["trainer"], llm=llm, template_manager=trainer_tm,
)
nutrition = NutritionAgent(
config=agent_configs["nutrition"], llm=llm, template_manager=nutrition_tm,
)
exercise_agent = ExerciseAgent(
config=agent_configs["exercise_agent"], llm=llm, template_manager=exercise_tm,
)
form_guide = FormGuideAgent(
config=agent_configs["form_guide"], llm=llm, template_manager=form_guide_tm,
)
recipe_agent = RecipeAgent(
config=agent_configs["recipe_agent"], llm=llm, template_manager=recipe_tm,
)
calorie_agent = CalorieAgent(
config=agent_configs["calorie_agent"], llm=llm, template_manager=calorie_tm,
)
search_agent = SearchAgent(config=agent_configs["search_agent"], llm=llm)
session_manager = SessionManagerAgent(
config=agent_configs["session_manager"],
session_manager=session_impl,
llm=llm,
template_manager=session_tm,
middleware=[memory_middleware],
)
# --- 4. Build the graph ---
all_agents = {
a.name: a for a in [
coach, trainer, nutrition, exercise_agent, form_guide,
recipe_agent, calorie_agent, search_agent, session_manager,
]
}
return build_topology_graph(all_agents, topology, entry_point="coach")
Let's break down the key concepts:
Nodes
Every agent becomes a graph node via agent.invoke. The invoke method (inherited from BaseAgent) handles the conversion between OrchestratorState and AgentResponse.
Conditional Edges
build_topology_graph() reads the TopologyResolver to wire each agent's conditional edges automatically — no manual add_conditional_edges() calls needed. Under the hood, build_conditional_edge(topology, agent_name) returns a routing function that reads state["current_agent"] set by the previous agent's AgentResponse.next_agent. Termination is signalled when next_agent is None — BaseAgent.invoke() sets current_agent = agent.name, and the routing function returns END when it detects this self-name sentinel.
The Hybrid Pattern
The topology naturally emerges from the configuration:
- Supervisor edges: Coach -> Trainer/Nutrition (hub and spoke)
- Mesh edges: Trainer <-> Exercise <-> Form Guide (fully connected)
- Utility edges: Any specialist -> Search -> back to caller
Step 8: Running the App
Finally, let's create the CLI entry point that ties everything together.
Create app.py:
python
"""Health Assistant CLI — interactive chatbot entry point."""
from __future__ import annotations
import asyncio
from pathlib import Path
from langchain_openai import ChatOpenAI
from llm_orchestrator import create_initial_state, load_config
from graph import build_graph
async def main() -> None:
"""Run the health assistant chatbot in an interactive CLI loop."""
# Load and validate configuration
config = load_config(Path("config.yaml"))
# Initialize the LLM
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.3)
# Build the agent graph
graph = build_graph(config, llm)
print("=" * 60)
print(" Health Assistant Chatbot")
print(" Type 'quit' to exit")
print("=" * 60)
print()
print(
" Disclaimer: This chatbot provides general health and"
)
print(
" fitness information only. It is not a substitute for"
)
print(
" professional medical advice."
)
print()
session_id = "session-001"
user_id = "user-001"
result = None # Track previous result to carry forward state
while True:
user_input = input("You: ").strip()
if not user_input:
continue
if user_input.lower() in ("quit", "exit", "q"):
print("Goodbye! Stay healthy!")
break
# First turn: create fresh state; subsequent turns: carry forward
if result is None:
state = create_initial_state(
user_input,
session_id=session_id,
user_id=user_id,
)
else:
from langchain_core.messages import HumanMessage
state = {**result, "messages": result["messages"] + [HumanMessage(content=user_input)]}
result = await graph.ainvoke(state)
# Extract the final response
last_message = result["messages"][-1]
print(f"\nCoach: {last_message.content}\n")
# Show active context slots (for debugging)
slots = result.get("context_slots", {})
if slots:
print(f" [context: {list(slots.keys())}]\n")
if __name__ == "__main__":
asyncio.run(main())
Sample Conversation
============================================================
Health Assistant Chatbot
Type 'quit' to exit
============================================================
Disclaimer: This chatbot provides general health and
fitness information only. It is not a substitute for
professional medical advice.
You: I'm a beginner looking to lose weight. I'm allergic to peanuts.
Coach: Welcome! I've noted your profile:
- Fitness level: Beginner
- Goal: Weight loss
- Allergy: Peanuts
I'll make sure all my recommendations account for these. What would
you like to start with — a workout plan or nutrition advice?
[context: ['fitness_level', 'fitness_goals', 'allergies']]
You: Can you suggest a workout plan for this week?
Coach: Here's a beginner-friendly weekly workout plan for weight loss:
**Monday — Full Body Strength**
- Bodyweight squats: 3x12
- Push-ups (knee variation): 3x8
- Dumbbell rows: 3x10
- Plank: 3x20 seconds
**Wednesday — Cardio**
- 30-minute brisk walk or light jog
- Cool-down stretches: 10 minutes
**Friday — Full Body Strength**
- Lunges: 3x10 each leg
- Incline push-ups: 3x10
- Glute bridges: 3x12
- Dead hang: 3x15 seconds
**Tips:**
- Rest 60-90 seconds between sets
- Stay hydrated
- Start light and focus on form over weight
Consult your doctor before starting any new exercise program.
[context: ['fitness_level', 'fitness_goals', 'allergies', 'current_plan']]
You: How many calories should I eat to lose weight?
Coach: Based on general guidelines for beginners with a weight loss goal:
**Estimated Daily Targets:**
- Calories: 1,500-1,800 kcal (moderate deficit)
- Protein: 100-130g (supports muscle retention)
- Carbs: 150-200g (energy for workouts)
- Fats: 45-60g (hormonal health)
**Key Principles:**
- Aim for a 300-500 calorie deficit from maintenance
- Prioritize protein at every meal
- All suggestions are peanut-free
Would you like some recipe ideas that fit these macros?
[context: ['fitness_level', 'fitness_goals', 'allergies', 'daily_calorie_target']]
You: quit
Goodbye! Stay healthy!
Part 4: Production Considerations
Prompt Engineering Best Practices
Good react_core.md files are the foundation of reliable agents. Here is the structure used throughout this project:
The Anatomy of an Agent Prompt (react_core.md)
1. ROLE — Who the agent is ("You are a certified personal trainer")
2. SCOPE — What it handles and what it doesn't
3. AVAILABLE KNOWLEDGE — {available_slots} placeholder, auto-populated at runtime
4. ROUTING — For specialists: which sub-agents are available and when to use them
5. INSTRUCTIONS — The ReAct action format (<thought>, <action type="...">)
6. SAFETY — Guardrails, disclaimers, limitations
Notice what is not in the prompt: Python code, JSON format requirements, or hardcoded routing logic. Routing decisions happen through the ReAct route action where the LLM reasons explicitly in <thought> tags. Domain knowledge lives in separate slot files loaded on demand via request_slot.
The {available_slots} Pattern
Every react_core.md includes {available_slots} which the ReactTemplateManager replaces at runtime with a formatted list of available knowledge slots:
- exercises: Exercise database with muscle groups, sets, reps [LOADED]
Triggers: When the user asks about specific exercises
- safety: Exercise safety guidelines and contraindications
Triggers: When recommending exercises to beginners
The [LOADED] marker tells the LLM which slots have already been loaded in previous passes, so it does not redundantly load them.
Health-Specific Guardrails
For any health-related chatbot, your react_core.md and safety slot files must include:
- Always recommend consulting a healthcare professional for medical concerns
- Never diagnose conditions or prescribe treatments
- Include disclaimers on exercise intensity and dietary changes
- Respect stated allergies as absolute constraints (never "suggest anyway")
- Flag when advice may not apply to pregnant women, elderly, or children
- State when nutritional values are estimates
Few-Shot Examples
For agents that handle varied requests, include example ReAct sequences in the react_core.md:
Example:
<thought>The user wants a leg workout for beginners. Let me load exercise data.</thought>
<action type="request_slot">{"slot": "exercises"}</action>
After loading:
<thought>I have the data. Bodyweight squats and lunges are good beginner exercises.</thought>
<action type="respond">{"content": "**Leg Workout — Beginner**\n- Bodyweight squats: 3x12\n- Lunges: 3x10 each leg\n- Calf raises: 3x15"}</action>
This helps the LLM produce consistently structured ReAct output.
Error Handling
llm-orchestrator provides a typed exception hierarchy. Use it to handle errors gracefully:
python
from llm_orchestrator import (
AgentProcessError,
ConfigurationError,
ContextError,
KnowledgeConfigValidationError,
KnowledgeError,
LLMInvocationError,
LLMNotConfiguredError,
LLMRetryExhaustedError,
OrchestratorError,
ReactConsecutiveErrorsError,
ReactError,
ReactExecutorNotConfiguredError,
ReactMaxPassesError,
SessionManagerError,
)
async def safe_invoke(graph: Any, state: OrchestratorState) -> dict:
"""Invoke the graph with structured error handling."""
fallback = {
"messages": state["messages"],
"context_slots": {},
"current_agent": "coach",
"knowledge_context": {},
}
try:
return await graph.ainvoke(state)
except LLMInvocationError as e:
# The LLM API call failed (rate limit, network error, etc.)
print(f"LLM error: {e}")
return fallback
except AgentProcessError as e:
# A specific agent's process() method raised an exception
print(f"Agent '{e.agent_name}' failed: {e.cause}")
return fallback
except ReactMaxPassesError as e:
# ReAct loop exceeded max_passes without a terminal action
print(f"ReAct loop timeout: agent '{e.agent_name}' after {e.max_passes} passes")
return fallback
except ReactError as e:
# Other ReAct execution errors
print(f"ReAct error: {e}")
return fallback
except KnowledgeConfigValidationError as e:
# A react_config.yaml failed schema validation
print(f"Bad config for agent '{e.agent_name}': {e.cause}")
return fallback
except KnowledgeError as e:
# Knowledge module loading failed — agent can still work without it
print(f"Knowledge error: {e}")
return fallback
except ContextError as e:
# Slot persistence or context management failed
print(f"Context error: {e}")
return fallback
except SessionManagerError as e:
# Session storage failed — the app can continue without memory
print(f"Session error ({e.operation}): {e.cause}")
return fallback
except LLMRetryExhaustedError as e:
# All LLM retry attempts exhausted
print(f"LLM retry exhausted after {e.attempts} attempts: {e.cause}")
return fallback
except ReactConsecutiveErrorsError as e:
# ReAct loop hit consecutive invalid action circuit breaker
print(f"Circuit breaker: agent '{e.agent_name}' after {e.max_errors} errors")
return fallback
except ConfigurationError as e:
# Missing LLM or ReactExecutor configuration
print(f"Configuration error: {e}")
return fallback
except OrchestratorError as e:
# Catch-all for any orchestrator error
print(f"Orchestrator error: {e}")
return fallback
Key principle: degrade gracefully. If the session manager fails, the chatbot should still work — it just won't remember preferences. If a leaf agent fails, the specialist should catch it and provide a partial response rather than crashing.
Context Management
Token Budgets
Context slots consume tokens. When agents store too much data in slots, you'll blow through your LLM's context window. The SlotManager handles this automatically:
yaml
# In config.yaml
context:
shared_slots:
max_tokens: 8192 # Total budget for shared slots
eviction_strategy: lru # Evict least-recently-used slots first
Custom Token Counting
By default, token counting uses a len(text) // 4 heuristic. For accurate counts with your specific model, inject a custom TokenCounter:
python
from llm_orchestrator import TokenCounter, SlotManager, ReactTemplateManager
class TiktokenCounter(TokenCounter):
def __init__(self) -> None:
import tiktoken
self._enc = tiktoken.encoding_for_model("gpt-4o")
def count(self, text: str) -> int:
return len(self._enc.encode(text))
counter = TiktokenCounter()
slot_manager = SlotManager(config=config.context, token_counter=counter)
The counter is also accepted by ReactTemplateManager for accurate knowledge slot token counts.
What to Store in Slots
| Store | Don't Store |
|---|---|
| User preferences (allergies, goals) | Full conversation history (use messages) |
| Current plan/workout being discussed | Large search results (summarize first) |
| Which agent is waiting for search results | Raw API responses |
| Computed values (daily calorie target) | Duplicate information already in messages |
Tiered Slot Persistence
Slots have three persistence tiers that control their lifecycle:
| Tier | Lifecycle | Use Case |
|---|---|---|
TURN |
Discarded at end of turn | Scratch data, intermediate results |
SESSION |
Persisted for the session | User preferences, conversation context |
LONG_TERM |
Persisted across sessions | User profile, allergies, goals |
python
from llm_orchestrator import SlotManager, SlotTier
# Set a slot with an explicit tier
slot_manager.set_shared_tiered("allergies", "peanuts", tier=SlotTier.LONG_TERM, priority=100)
slot_manager.set_shared_tiered("current_plan", "...", tier=SlotTier.SESSION)
slot_manager.set_shared_tiered("last_search", "...", tier=SlotTier.TURN)
# Promote a slot to a higher tier
slot_manager.promote_slot("current_plan", SlotTier.LONG_TERM)
For slot persistence across restarts, see the Graph-Level Slot Persistence section below.
Eviction Strategies
- LRU (Least Recently Used) — Best default. Keeps actively referenced data.
- FIFO (First In, First Out) — Good for streaming data where older = less relevant.
- Priority — Good when some slots (allergies) must never be evicted. Set high priority values on critical slots.
- Demotion — Demotes slots TURN -> SESSION -> LONG_TERM before evicting. Best for tiered persistence setups.
python
# Setting slot priority
slot_manager.set_shared("allergies", "peanuts", priority=100) # High — keep this
slot_manager.set_shared("last_search", "...", priority=1) # Low — evict first
Configure the eviction strategy in YAML:
yaml
context:
shared_slots:
max_tokens: 8192
tiered_slots:
turn:
max_tokens: 4096
session:
max_tokens: 4096
long_term:
max_tokens: 4096
eviction_strategy: demotion # or lru, fifo, priority
Using Tiered Slots
Agents declare which tier their slot data belongs to using slots_tier on AgentResponse:
| Tier | Lifecycle | Use Case |
|---|---|---|
TURN |
Discarded at end of turn | Scratch data, intermediate results |
SESSION |
Persisted for the session | User preferences, conversation context |
LONG_TERM |
Persisted across sessions | User profile, allergies, goals |
python
return AgentResponse(
agent_name=self.name,
content="Noted your allergy to peanuts.",
next_agent="coach",
slots_update={"allergies": "peanuts", "last_search": "peanut allergy"},
slots_tier={"allergies": "long_term", "last_search": "turn"},
)
The BaseAgent.invoke() method merges slots_tier into a _slot_tiers metadata key in context_slots. The SlotManager uses this metadata during eviction — TURN data is evicted first, then SESSION, then LONG_TERM. With the demotion eviction strategy, slots are demoted one tier before being evicted entirely.
Graph-Level Slot Persistence
To persist SESSION and LONG_TERM slots across restarts, implement the SlotPersistence ABC:
python
from llm_orchestrator import SlotPersistence
class PostgresSlotPersistence(SlotPersistence):
async def load_session_slots(self, session_id: str) -> list[Slot]: ...
async def save_session_slots(self, session_id: str, slots: list[Slot]) -> None: ...
async def delete_session_slots(self, session_id: str) -> None: ...
async def load_long_term_slots(self, user_id: str) -> list[Slot]: ...
async def save_long_term_slots(self, user_id: str, slots: list[Slot]) -> None: ...
async def delete_long_term_slots(self, user_id: str) -> None: ...
Use hydrate() and flush() at graph boundaries:
python
persistence = PostgresSlotPersistence(pool=db_pool)
slot_manager = SlotManager(config=config.context, persistence=persistence)
# At graph start: load persisted slots
await slot_manager.hydrate(session_id="abc", user_id="user-1")
# At graph end: persist durable slots
await slot_manager.flush(session_id="abc", user_id="user-1")
Pattern: Clean Agent Design
Throughout this tutorial, every agent follows the same clean pattern:
- Externalized prompts in
react_core.md— Domain knowledge, role description, and instructions live in markdown files on the filesystem. No hardcodedSYSTEM_PROMPTPython string constants. Update agent behavior by editing a markdown file, no code redeployment needed.
- ReAct reasoning — Every agent uses the
ReactExecutorfor explicit multi-pass reasoning. The LLM produces<thought>tags that create auditable decision traces, and<action>tags that map to handler functions. This replaces implicit single-call LLM invocations.
- Consistent handlers — Leaf agents register
request_slot+respond. Specialist agents registerrequest_slot+route. The session manager uses the full set of session operation handlers +complete. Every handler follows the same(params, state, config=None) -> (observation, state_updates)signature.
- Explicit slot tiers — When agents store context data, they declare tiers via
slots_tier: allergies go toLONG_TERM, current plans toSESSION, scratch data toTURN.
This separation of concerns means:
- The LLM focuses on reasoning and content generation through the ReAct loop
- Python handles data structure, agent construction, and graph wiring
- Prompts are editable without code changes (edit
react_core.mdor slot files) - ReAct actions are testable by mocking
ReactExecutor.execute()
Testing
Test agents in isolation by mocking the ReactExecutor and providing known state:
python
"""Test the Trainer agent in isolation."""
import pytest
from unittest.mock import AsyncMock, MagicMock, patch
from langchain_core.messages import HumanMessage
from llm_orchestrator import AgentConfig, OrchestratorState, TerminalActionResult
from agents.trainer import TrainerAgent
@pytest.fixture
def trainer_agent() -> TrainerAgent:
"""Create a TrainerAgent with a mocked ReactExecutor."""
config = AgentConfig(
name="trainer",
type="custom",
description="Test trainer",
)
mock_llm = AsyncMock()
mock_tm = MagicMock()
mock_tm.max_passes = 5
mock_tm.format_available_slots.return_value = "No slots"
mock_tm.load_core_prompt = AsyncMock(return_value="Core prompt")
agent = TrainerAgent(config=config, llm=mock_llm, template_manager=mock_tm)
return agent
@pytest.fixture
def sample_state() -> OrchestratorState:
"""Create a sample state for testing."""
return {
"messages": [HumanMessage(content="Suggest a leg workout")],
"context_slots": {
"fitness_level": "beginner",
"fitness_goals": "weight loss",
},
"current_agent": "trainer",
"knowledge_context": {},
}
@pytest.mark.asyncio
async def test_trainer_returns_to_coach(
trainer_agent: TrainerAgent, sample_state: OrchestratorState
) -> None:
"""Trainer should return to coach when ReAct routes there."""
# Mock the ReactExecutor to return a route-to-coach result
trainer_agent._react_executor.execute = AsyncMock(
return_value={"content": "Here is your leg workout.", "next_agent": "coach"}
)
response = await trainer_agent.process(sample_state, config=None)
assert response.next_agent == "coach"
assert response.agent_name == "trainer"
assert "leg workout" in response.content
@pytest.mark.asyncio
async def test_trainer_delegates_to_exercise_agent(
trainer_agent: TrainerAgent, sample_state: OrchestratorState
) -> None:
"""Trainer delegates to exercise_agent when ReAct decides so."""
trainer_agent._react_executor.execute = AsyncMock(
return_value={"content": "", "next_agent": "exercise_agent"}
)
response = await trainer_agent.process(sample_state, config=None)
assert response.next_agent == "exercise_agent"
@pytest.mark.asyncio
async def test_trainer_delegates_to_form_guide(
trainer_agent: TrainerAgent, sample_state: OrchestratorState
) -> None:
"""Trainer delegates to form_guide when ReAct decides so."""
trainer_agent._react_executor.execute = AsyncMock(
return_value={"content": "", "next_agent": "form_guide"}
)
response = await trainer_agent.process(sample_state, config=None)
assert response.next_agent == "form_guide"
@pytest.mark.asyncio
async def test_respond_handler_raises_terminal() -> None:
"""The respond handler should raise TerminalActionResult."""
config = AgentConfig(name="test", type="custom", description="Test")
mock_llm = AsyncMock()
mock_tm = MagicMock()
mock_tm.max_passes = 3
mock_tm.format_available_slots.return_value = ""
mock_tm.load_core_prompt = AsyncMock(return_value="")
agent = TrainerAgent(config=config, llm=mock_llm, template_manager=mock_tm)
route_handler = agent._react_executor._action_handlers["route"]
with pytest.raises(TerminalActionResult) as exc_info:
await route_handler(
{"next_agent": "coach", "content": "Done"},
{},
None,
)
assert exc_info.value.result["next_agent"] == "coach"
assert exc_info.value.result["content"] == "Done"
Testing Tips
- Mock the ReactExecutor — Mock
react_executor.execute()to return predictable result dicts. This tests the agent'sprocess()logic without calling the LLM. - Test terminal actions — Call action handlers directly and verify they raise
TerminalActionResultwith the correct result dict. - Test context reading — Ensure agents pass context slots into the
context_varsdict for the ReactExecutor. - Test slot tiers — Verify that agents set
slots_tiercorrectly when updating context slots. - Test topology — Use
TopologyResolver.is_allowed()to verify your config allows the routes you expect.
Next Steps
You now have a working multi-agent health assistant. Here's where you could take it next:
More agents:
- Progress Tracker — Log workouts and meals, track trends over time
- Motivation Agent — Provide encouragement based on progress
- Injury Rehab Agent — Safe exercises for recovery (with strong medical disclaimers)
Better memory:
- Replace
InMemorySessionManagerwith a database-backed implementation (PostgreSQL, SQLite, Redis) - Use
PaginatedResultwithget_messages_paginatedandlist_memoriesfor efficient large-history retrieval - Add semantic search over memories with
SemanticSearchMixin— subclass yourSessionManager, implementembed_text(), and overridesearch_memories_semantic()for pgvector, Chroma, or Pinecone backends (seeexamples/custom_session_manager/semantic_session_manager.py) - Implement conversation summarization to compress old history
Web UI:
- Add a FastAPI/Flask backend that wraps
graph.ainvoke() - Build a chat interface with React, Next.js, or a simple HTML/JS frontend
- Add streaming responses using LangGraph's
.astream()method
Deployment:
- Containerize with Docker
- Add structured logging with
structlog(already a dependency) - Set up rate limiting with
SemaphoreRateLimiter(single-tenant) orPerTenantRateLimiter(multi-tenant — eachuser_idgets its own concurrency semaphore, LRU-evicted atmax_tenants) and authentication - Monitor token usage and costs per agent
Summary
In this guide, you learned:
- LLM fundamentals — How large language models work, from text prediction to tool use
- Agent architecture — Instructions + LLM + tools + memory = an agent
- Multi-agent patterns — Supervisor, mesh, and hybrid topologies
- llm-orchestrator — Configuration-driven multi-agent orchestration on LangGraph
- Knowledge modules — Externalized prompts in
react_core.md, on-demand knowledge viarequest_slot, and the ReAct XML format - Building agents — From leaf workers with
respondhandlers to specialist coordinators withroutehandlers to the supervisor - Session management — Giving your chatbot memory with
SessionMessage,Memory,PaginatedResult, and the full CRUD interface - Graph construction — Wiring agents into a LangGraph state machine with conditional routing
- Production practices — Error handling, tiered context management, pluggable token counting, knowledge modules, and testing with mocked ReactExecutors
The health assistant is a starting point. The patterns here — hybrid routing, tiered context slots, session memory, knowledge modules, ReAct execution, PaginatedResult for efficient data retrieval, typed exceptions — apply to any multi-agent system: customer service, code review, data analysis, and beyond.
Happy building.