Metadata-Version: 2.4
Name: graphcrew
Version: 0.1.0
Summary: Production-ready Python library for supervisor-led multi-agent systems with declarative YAML config, pluggable persistence, and ReAct knowledge modules — built on LangGraph
Project-URL: Repository, https://github.com/Atiqul-Islam/swagent
Project-URL: Documentation, https://github.com/Atiqul-Islam/swagent#readme
Project-URL: Bug Tracker, https://github.com/Atiqul-Islam/swagent/issues
Project-URL: Changelog, https://github.com/Atiqul-Islam/swagent/blob/main/CHANGELOG.md
Author-email: Imran Atiq <iatiq@users.noreply.github.com>
License: MIT
License-File: LICENSE
Keywords: langgraph,llm,multi-agent,orchestrator,supervisor
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: langchain-core<2.0,>=1.2.11
Requires-Dist: langgraph-checkpoint<5.0,>=4.0.0
Requires-Dist: langgraph<2.0,>=1.0.10
Requires-Dist: pydantic<3.0,>=2.0
Requires-Dist: pyyaml<7.0,>=6.0
Requires-Dist: structlog<26.0,>=24.0
Requires-Dist: tenacity<10.0,>=8.0
Provides-Extra: otel
Requires-Dist: opentelemetry-api<2.0,>=1.24; extra == 'otel'
Description-Content-Type: text/markdown

# graphcrew

[![CI](https://github.com/Atiqul-Islam/swagent/actions/workflows/ci.yml/badge.svg)](https://github.com/Atiqul-Islam/swagent/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/graphcrew)](https://pypi.org/project/graphcrew/)
[![Python 3.11 | 3.12 | 3.13](https://img.shields.io/badge/python-3.11%20%7C%203.12%20%7C%203.13-blue)](https://github.com/Atiqul-Islam/swagent)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/Atiqul-Islam/swagent/blob/main/LICENSE)

Production-ready Python library for supervisor-led multi-agent systems with declarative YAML config, pluggable persistence, and ReAct knowledge modules — built on LangGraph.

## Features

- **Declarative YAML config** — define agents, topology, and context slots in a single file
- **Constrained dynamic routing** — supervisor picks the next agent at runtime, validated against a declared topology
- **Tiered context slots** — arbitrary key-value state with TURN/SESSION/LONG_TERM persistence tiers (`SlotTier` is a `StrEnum` — compare directly with strings), token budgets, and LRU/FIFO/priority/demotion eviction
- **ReAct execution engine** — multi-pass thought/action/observation loop with on-demand knowledge loading
- **External knowledge modules** — per-agent `react_config.yaml` (Pydantic-validated) + `.md` slot files loaded from filesystem; no hardcoded prompts
- **Pluggable session management** — single ABC with 3 abstract methods (messages + lifecycle); summaries and memories default to `NotImplementedError`; `InMemorySessionManager` ships for dev/testing (async-safe via per-session locks); `SemanticSearchMixin` abstract mix-in for adding vector-based memory search (pgvector, Chroma, Pinecone)
- **Pluggable slot persistence** — `SlotPersistence` ABC for backing tiered slots with any durable storage
- **Pluggable token counting** — `TokenCounter` ABC for model-specific tokenizers; `CharDivisionTokenCounter` ships as the default (`max(1, len // 4)` for non-empty, `0` for empty). *Note: The default estimator is approximate. For production accuracy, use a model-specific counter (see [`examples/custom_token_counter/`](examples/custom_token_counter/))*
- **2 built-in agents** — Supervisor (with optional ReAct mode) and SessionManagerAgent (ReAct agent using SessionManager methods as tools); everything else is yours
- **Async concurrency safety** — `asyncio.Lock` guards on `ReactTemplateManager`, `SlotManager`, and `InMemorySessionManager`; dict-merge reducers on `OrchestratorState` for safe LangGraph parallel fan-in
- **Structured error hierarchy** — typed exceptions for production debugging (`OrchestratorError` → `ConfigurationError`, `ConfigLoadError`, `ConfigValidationError`, `LLMNotConfiguredError`, `ReactExecutorNotConfiguredError`, `AgentProcessError`, `LLMInvocationError`, `LLMRetryExhaustedError`, `ReactConsecutiveErrorsError`, `KnowledgeError`, `KnowledgeConfigValidationError`, `KnowledgePathError`, `ReactError`, etc.)
- **Request correlation** — auto-generated `request_id` (UUID4) on every request; `bind_logging_context()` binds `request_id`, `session_id`, `user_id`, `langgraph_step`, `langgraph_node` into structlog contextvars; AIMessage `response_metadata` carries all five fields for downstream correlation
- **Immutable config models** — all Pydantic config classes use `frozen=True, extra="forbid"` to prevent accidental mutation and reject unknown fields (typos raise `ValidationError` immediately)
- **RunnableConfig threading** — LangGraph `RunnableConfig` (carrying LangSmith/Langfuse/OpenTelemetry callbacks) threaded from `BaseAgent.invoke()` through `process()`, `ReactExecutor.execute()`, to `llm.ainvoke()`
- **Configurable LLM retry** — `RetryConfig` with exponential backoff and jitter (tenacity); `TimeoutConfig` for per-call async timeouts
- **Circuit breaker** — `max_consecutive_errors` on `ReactExecutor` bails out after N consecutive invalid actions to prevent wasted LLM calls
- **History sliding window** — optional `max_history_tokens` on `ReactExecutor` truncates oldest (thought, observation) pairs at render time to stay within a token budget, preventing prompt overflow on long ReAct loops
- **Agent middleware** — `AgentMiddleware` with `before_invoke`, `after_invoke`, and `on_error` hooks (all concrete defaults — override only what you need); onion-model execution; intercept agent calls for auth, metrics, or logging
- **Rate limiter** — pluggable `RateLimiter` ABC with `acquire()`/`release()` lifecycle called around every LLM invocation (`try`/`finally`); built-in `SemaphoreRateLimiter` for single-tenant concurrent call throttling; `PerTenantRateLimiter` for multi-tenant deployments (per-tenant semaphore isolation, LRU eviction at `max_tenants`); exceptions wrapped as `RateLimitExceededError`
- **Streaming support** — opt-in `astream_process()` on agents + `astream_with_timeout()` utility for real-time UX; backward compatible (ReAct stays non-streaming)
- **Structured output** — opt-in `use_structured_output=True` on `ReactExecutor`/`SupervisorAgent` leverages LLM-native JSON schema responses via `with_structured_output()`; graceful fallback to XML parsing
- **OpenTelemetry instrumentation** — optional `[otel]` extra; `OpenTelemetryMiddleware` creates spans for agent invocations with zero overhead when OTel is not installed
- **Token usage tracking** — `AgentResponse.token_usage` captures `usage_metadata` from LLM responses; accumulated across ReAct passes; propagated to `AIMessage.response_metadata`
- **Built-in slot persistence** — `InMemorySlotPersistence` ships for development/testing alongside the `SlotPersistence` ABC
- **Async resource cleanup** — `close()` + `async with` on `SlotManager`, `ReactTemplateManager`, and `SessionManager` for deterministic resource cleanup
- **Coordinated lifecycle** — `OrchestratorContext` manages startup/shutdown of `SlotManager`, `SessionManager`, `ReactTemplateManager`, `RateLimiter`, and extras in safe order
- **Metrics middleware** — `MetricsMiddleware` records invocation duration, token usage, and errors via pluggable `MetricsCollector` protocol; `InMemoryMetricsCollector` ships for dev/testing
- **Tenant isolation** — `TenantIsolationMiddleware` prevents agents from modifying `session_id`/`user_id` via `slots_update`; raises `TenantIsolationError`
- **Circuit breaker middleware** — `CircuitBreakerMiddleware` with CLOSED/OPEN/HALF_OPEN states; configurable failure threshold and recovery timeout; raises `CircuitBreakerOpenError`
- **PII redaction** — `bind_logging_context(redact_pii=True)` hashes `session_id` and `user_id` with truncated SHA-256; `request_id` left untouched
- **Knowledge loading budget** — `ReactTemplateManager` accepts `max_loaded_tokens` to cap total tokens across loaded slots; raises `KnowledgeLoadError` when exceeded
- **Health checks** — `SessionManager.health_check()` and `SlotPersistence.health_check()` for readiness probes
- **Vulnerability scanning** — `pip-audit` in CI lint job
- **ReAct cancellation** — optional `cancel_event` on `ReactExecutor`; raises `ReactCancellationError` when set, enabling callers to abort long-running loops

## When to Use graphcrew

| Need | Use |
|------|-----|
| Declarative YAML config + pluggable persistence | **graphcrew** |
| Full control over graph structure | Raw LangGraph |
| Pre-built agent templates (code executor, etc.) | AutoGen / CrewAI |

## Installation

```bash
pip install graphcrew
```

Requires **Python >= 3.11**.

### Prerequisites

- **Python >= 3.11**
- Familiarity with [LangGraph](https://langchain-ai.github.io/langgraph/) (`StateGraph`, conditional edges, `ainvoke`)
- Familiarity with Python `async`/`await`
- An LLM provider SDK (e.g., `langchain-openai`) and its API key

Check the installed version:

```python
import graphcrew
print(graphcrew.__version__)
```

> The project itself uses [uv](https://docs.astral.sh/uv/) as its package manager, but consumers can install with any standard tool.

## Prerequisites

Install your LLM provider package and set API credentials:

```bash
# OpenAI
pip install langchain-openai
export OPENAI_API_KEY=sk-...

# Anthropic
pip install langchain-anthropic
export ANTHROPIC_API_KEY=sk-ant-...

# Other providers: see LangChain integration docs
```

The library itself has no direct LLM provider dependency — it uses LangChain's abstraction layer.

## Quick Start

### 1. Define config (YAML)

```yaml
# config.yaml
agents:
  - name: supervisor
    type: supervisor
    description: Routes user requests to the appropriate agent
  - name: greeter
    type: custom
    description: Greets users and tracks greeting count

topology:
  edges:
    - from_agent: supervisor
      to_agent: greeter
    - from_agent: greeter
      to_agent: supervisor
  default_return: supervisor

context:
  shared_slots:
    max_tokens: 4096
  eviction_strategy: lru

session:
  enabled: false
```

### 2. Write a custom agent

Subclass `BaseAgent` and implement `process()`:

```python
from langchain_core.runnables import RunnableConfig
from graphcrew import BaseAgent, AgentResponse, OrchestratorState

class GreeterAgent(BaseAgent):
    async def process(self, state: OrchestratorState, config: RunnableConfig | None = None) -> AgentResponse:
        count = state.get("context_slots", {}).get("greeting_count", 0) + 1
        last_msg = state["messages"][-1].content if state["messages"] else ""

        content = f"Hello again! This is greeting #{count}. You said: {last_msg}"

        return AgentResponse(
            agent_name=self.name,
            content=content,
            next_agent="supervisor",
            slots_update={"greeting_count": count},
        )
```

### 3. Build the LangGraph graph

```python
from langchain_core.language_models import BaseLanguageModel
from langgraph.graph.state import CompiledStateGraph
from graphcrew import (
    OrchestratorConfig, SupervisorAgent, TopologyResolver,
    build_hub_spoke_graph,
)


def build_graph(config: OrchestratorConfig, llm: BaseLanguageModel) -> CompiledStateGraph:
    topology = TopologyResolver(config=config.topology)
    agent_configs = {a.name: a for a in config.agents}
    supervisor = SupervisorAgent(
        config=agent_configs["supervisor"], llm=llm, topology=topology
    )
    greeter = GreeterAgent(config=agent_configs["greeter"])
    return build_hub_spoke_graph(supervisor, {"greeter": greeter}, topology)
```

### 4. Run it

```python
import asyncio
from langchain_openai import ChatOpenAI
from graphcrew import load_config, create_initial_state

async def main():
    config = load_config("config.yaml")
    llm = ChatOpenAI(model="gpt-4o-mini")
    graph = build_graph(config, llm)

    result = await graph.ainvoke(create_initial_state("Hello!"))
    print(result["messages"][-1].content)

asyncio.run(main())
```

> **Production warning:** `create_initial_state("Hello!")` uses `user_id="anonymous"` and `session_id="default"` by default. In a multi-user deployment, **always** pass explicit values from your authentication layer to prevent cross-user data leakage:
> ```python
> state = create_initial_state("Hello!", session_id=user.session_id, user_id=user.id)
> ```

> **Note:** `InMemorySessionManager` is for development only — data is lost when the process exits. For production, implement `SessionManager` with a database backend (see [`examples/custom_session_manager/`](examples/custom_session_manager/)).

A complete runnable example lives in [`examples/simple_chatbot/`](examples/simple_chatbot/).
For a more advanced example using ReAct mode with knowledge modules, see [`examples/react_knowledge_agent/`](examples/react_knowledge_agent/).

More examples for extending specific interfaces:
- [`examples/custom_middleware/`](examples/custom_middleware/) — Logging and auth middleware + runnable demo
- [`examples/custom_session_manager/`](examples/custom_session_manager/) — PostgreSQL session manager skeleton + `SemanticSearchMixin` stub (`StubSemanticSessionManager`)
- [`examples/custom_persistence/`](examples/custom_persistence/) — File-based slot persistence
- [`examples/custom_eviction/`](examples/custom_eviction/) — TTL-based eviction strategy
- [`examples/custom_token_counter/`](examples/custom_token_counter/) — tiktoken-based token counter
- [`examples/session_chatbot/`](examples/session_chatbot/) — Session-aware chatbot example
- [`examples/fastapi_app/`](examples/fastapi_app/) — FastAPI integration example

## Core Concepts

### OrchestratorState

A `TypedDict` shared across all graph nodes:

| Field | Type | Purpose |
|-------|------|---------|
| `messages` | `list[BaseMessage]` | Conversation history (LangGraph `add_messages` reducer) |
| `context_slots` | `dict[str, Any]` | Schemaless key-value context (LangGraph `_merge_dicts` reducer for parallel fan-in) |
| `current_agent` | `str` | Name of the agent that should run next |
| `knowledge_context` | `dict[str, str]` | Loaded knowledge slot contents, populated by ReAct engine (LangGraph `_merge_dicts` reducer for parallel fan-in) |
| `session_id` | `NotRequired[str]` | Session identifier for slot persistence and session management |
| `user_id` | `NotRequired[str]` | User identifier for long-term slot persistence |
| `request_id` | `NotRequired[str]` | Correlation identifier (auto-generated UUID4) for tracing across agent invocations |

### BaseAgent

Abstract base class every agent extends. Implement `process(state, config=None) → AgentResponse`. The `config` parameter receives LangGraph's `RunnableConfig` for observability callbacks. The inherited `invoke()` method wraps this for LangGraph — it converts `AgentResponse` into a state-update dict (messages, slots, routing), binds logging context via `bind_logging_context()`, and attaches `request_id`, `session_id`, `user_id`, `langgraph_step`, and `langgraph_node` to AIMessage `response_metadata`. `BaseAgent.__init__` accepts optional `middleware: Sequence[AgentMiddleware]` for intercepting agent invocations.

### SupervisorAgent

Built-in agent that decides which agent to route to. Operates in two modes:
- **Simple mode** (no `template_manager`) — single-pass LLM call, developer provides prompt externally
- **ReAct mode** (with `template_manager`) — multi-pass thought/action/observation loop using knowledge slots loaded on-demand from the filesystem. Pass `max_history_tokens` to cap the token budget for (thought, observation) history in the prompt

Also accepts `retry_config`, `timeout_config`, and `rate_limiter` for production LLM call handling.

### Topology

Directed edges define allowed agent-to-agent routes. `TopologyResolver` validates that every routing decision respects these edges at runtime.

### Context Slots

Arbitrary key-value state that persists across agent invocations within a run. Each slot has a **persistence tier** (`TURN`, `SESSION`, or `LONG_TERM`) that controls its lifecycle:

| Tier | Lifecycle | Use Case |
|------|-----------|----------|
| `TURN` | Discarded at end of turn | Scratch data, intermediate results |
| `SESSION` | Persisted for the session | User preferences, conversation context |
| `LONG_TERM` | Persisted across sessions | User profile, allergies, goals |

`SlotManager` tracks token usage per slot via a pluggable `TokenCounter` and evicts entries (LRU, FIFO, priority, or demotion) when the budget is exceeded. The default `CharDivisionTokenCounter` estimates tokens as `max(1, len(text) // 4)` for non-empty text (`0` for empty). Provide a custom `TokenCounter` for model-specific tokenizers (e.g., tiktoken for GPT-4, SentencePiece for Llama). The **demotion** strategy demotes slots TURN → SESSION → LONG_TERM before evicting. Plug in a `SlotPersistence` implementation to back SESSION and LONG_TERM slots with durable storage.

### Token Counting

Token counting is pluggable via the `TokenCounter` ABC. The library ships `CharDivisionTokenCounter` (estimates `max(1, len(text) // 4)` for non-empty text, `0` for empty). To use a model-specific tokenizer:

```python
from graphcrew import TokenCounter, SlotManager

class TiktokenCounter(TokenCounter):
    def __init__(self) -> None:
        import tiktoken
        self._enc = tiktoken.encoding_for_model("gpt-4o")

    def count(self, text: str) -> int:
        return len(self._enc.encode(text))

slot_manager = SlotManager(config=config.context, token_counter=TiktokenCounter())
```

`TokenCounter` is injected into `SlotManager`, `SlotPool`, and `ReactTemplateManager`. All three default to `CharDivisionTokenCounter` when no counter is provided. `ReactTemplateManager` exposes a `token_counter` property so that `ReactExecutor` can reuse the same counter for history truncation budgets.

### ReAct Engine

The `ReactExecutor` runs a multi-pass reasoning loop: build prompt → call LLM → parse `<thought>` and `<action>` → execute action handler → collect observation. Loops until a terminal action or `max_passes` is reached. Used by `SupervisorAgent` in ReAct mode. Set `max_history_tokens` to cap the token budget for previous (thought, observation) pairs in the prompt — oldest pairs are dropped first while state lists remain intact. `max_consecutive_errors` bails out after N consecutive invalid actions (circuit breaker). `cancel_event` allows external cancellation.

### Knowledge Modules

Per-agent knowledge loaded from the filesystem. Each agent has a `react_config.yaml` (validated at load time by `ReactConfigSchema`) defining available knowledge slots and their `.md` module files. `ReactTemplateManager` handles schema validation, dependency resolution, and prompt assembly. Malformed configs produce clear `KnowledgeConfigValidationError` messages with the agent name and validation details. `KnowledgeLoader` (ABC) abstracts the storage backend — `FileSystemKnowledgeLoader` is the built-in implementation. Path resolution enforces base-directory containment — absolute paths and traversal beyond the knowledge base directory raise `KnowledgePathError`.

### Session Management

`SessionManager` is an ABC with three tiers:

| Tier | Methods | Purpose |
|------|---------|---------|
| Messages | `save_messages`, `get_messages` | Short-term conversation history |
| Summaries | `save_summary`, `get_summary` | Medium-term compressed context |
| Memories | `store_memory`, `recall_memory`, `search_memories` | Long-term persistent facts |

Implement whichever tiers your project needs. The library ships `InMemorySessionManager` for development and testing (async-safe via per-session locks) — bring your own storage backend for production. After `close()`, all operations raise `RuntimeError`.

`SessionManager` also provides optional methods for pagination (`get_messages_paginated`), memory CRUD (`update_memory`, `delete_memory`, `list_memories`), and capability discovery (`get_capabilities`). These return `NotImplementedError` by default — implement whichever your project needs. Identity fields (`id`, `session_id`, `key`, `timestamp`) passed to `update_memory` are silently ignored — only content fields are applied.

## Error Handling Patterns

All library exceptions inherit from `OrchestratorError`, enabling a single catch-all or fine-grained handling:

```python
from graphcrew.core.exceptions import (
    OrchestratorError,
    LLMRetryExhaustedError,
    ReactMaxPassesError,
    KnowledgeError,
)

try:
    result = await graph.ainvoke(initial_state)
except LLMRetryExhaustedError as exc:
    # All retry attempts exhausted — check exc.attempts and exc.cause
    log.error("LLM unreachable", attempts=exc.attempts, cause=exc.cause)
except ReactMaxPassesError as exc:
    # ReAct loop didn't terminate — check exc.agent_name and exc.max_passes
    log.error("ReAct loop stuck", agent=exc.agent_name, passes=exc.max_passes)
except KnowledgeError as exc:
    # Knowledge loading failure (missing slot, path traversal, config error)
    log.error("Knowledge load failed", error=str(exc))
except OrchestratorError as exc:
    # Catch-all for any library error
    log.error("Orchestrator error", error=str(exc))
```

## Production Checklist

- Set `TimeoutConfig` values (2-3x your p99 observed latency)
- Configure `RetryConfig` (default 3 attempts with jitter)
- Inject a real `TokenCounter` (e.g., tiktoken) instead of default `CharDivisionTokenCounter`
- Set `max_history_tokens` on `ReactExecutor` for long conversations
- Set `max_consecutive_errors` as a circuit breaker for invalid actions
- Implement `SlotPersistence` for durable SESSION/LONG_TERM slots
- Use `bind_logging_context()` + `clear_logging_context()` at request boundaries
- Inject `RateLimiter` for LLM call throttling (custom implementations should override `release()` if they hold resources; callers use `try`/`finally`)
- Use `OrchestratorContext` (or `async with`) for coordinated resource cleanup
- Enable `redact_pii=True` in `bind_logging_context()` for multi-tenant deployments
- Add `TenantIsolationMiddleware` for multi-tenant safety
- Add `CircuitBreakerMiddleware` for fault tolerance against LLM outages
- Add `MetricsMiddleware` for observability (implement `MetricsCollector` for your metrics backend)
- Set `max_loaded_tokens` on `ReactTemplateManager` to cap knowledge loading

## Observability

The library uses [structlog](https://www.structlog.org/) with contextvars-based correlation:

```python
from graphcrew.core.logging_context import bind_logging_context, clear_logging_context

# At request start — binds request_id, session_id, user_id, langgraph_step, langgraph_node
bind_logging_context(state, config)

# At request end
clear_logging_context()
```

Every `AIMessage.response_metadata` carries `request_id`, `session_id`, `user_id`, `langgraph_step`, and `langgraph_node` for downstream correlation. Add `structlog.contextvars.merge_contextvars` as a processor in your structlog configuration to include these fields in all log output.

## Configuration Reference

All configuration is validated by Pydantic models in `src/graphcrew/config/schema.py`.

### `OrchestratorConfig` (top-level)

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `agents` | `list[AgentConfig]` | *(required, min 1)* | Agent definitions |
| `topology` | `TopologyConfig` | `{}` | Communication topology |
| `context` | `ContextConfig` | `{}` | Context management settings |
| `session` | `SessionConfig` | `{}` | Session management settings |
| `knowledge` | `KnowledgeConfig` | `{}` | Knowledge module system settings |
| `retry` | `RetryConfig` | `{}` | LLM retry with exponential backoff |
| `timeout` | `TimeoutConfig` | `{}` | Async timeout settings |

### `AgentConfig`

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `name` | `str` | *(required)* | Unique agent identifier. Must start with a letter, `[a-zA-Z][a-zA-Z0-9_-]*`, max 64 chars. `__end__` and `__start__` are reserved. |
| `type` | `"supervisor" \| "session_manager" \| "custom"` | *(required)* | Agent type |
| `description` | `str` | *(required)* | Human-readable description (also used in supervisor prompts) |
| `model` | `str \| null` | `null` | LLM model name override |
| `knowledge_enabled` | `bool` | `false` | Whether this agent uses the knowledge module system |

### `TopologyConfig`

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `edges` | `list[TopologyEdgeConfig]` | `[]` | Directed edges (`from_agent` → `to_agent`) |
| `default_return` | `str` | `"supervisor"` | Agent to return to when no explicit route is set |

### `ContextConfig`

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `shared_slots.max_tokens` | `int` | `4096` | Token budget for shared context slots |
| `tiered_slots` | `TieredSlotConfig` | `{}` | Per-tier token budgets (turn, session, long_term) |
| `eviction_strategy` | `Literal["lru", "fifo", "priority", "demotion"]` | `"lru"` | Eviction strategy: `lru`, `fifo`, `priority`, or `demotion` |

### `TieredSlotConfig`

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `turn.max_tokens` | `int` | `4096` | Token budget for TURN-tier slots |
| `session.max_tokens` | `int` | `4096` | Token budget for SESSION-tier slots |
| `long_term.max_tokens` | `int` | `4096` | Token budget for LONG_TERM-tier slots |

### `KnowledgeConfig`

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `enabled` | `bool` | `false` | Whether the knowledge module system is active |
| `base_dir` | `str` | `"llm_resources"` | Base directory for agent knowledge folders |
| `max_total_tokens` | `int` | `8192` | Token budget for loaded knowledge across all slots |

### `SessionConfig`

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `enabled` | `bool` | `true` | Whether session management is active |

### `RetryConfig`

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `initial_interval` | `float` | `0.5` | Seconds before first retry |
| `backoff_factor` | `float` | `2.0` | Exponential base multiplier |
| `max_interval` | `float` | `128.0` | Maximum seconds between retries |
| `max_attempts` | `int` | `3` | Total attempts including first (1 = no retry) |
| `jitter` | `bool` | `true` | Add random jitter to prevent thundering herd |

> **Note:** Retries target only transient exceptions (`OSError`, `TimeoutError`). Logic errors like `ValueError` and `TypeError` propagate immediately without retry.

### `TimeoutConfig`

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `llm_call_seconds` | `float \| null` | `null` | Per-call LLM timeout in seconds |
| `file_io_seconds` | `float \| null` | `null` | Per-file knowledge load timeout in seconds |
| `persistence_seconds` | `float \| null` | `null` | Per-call slot persistence timeout in seconds |

## Streaming

Agents can stream responses via the `astream_process()` method for real-time UX:

```python
async for chunk_meta, accumulated in supervisor.astream_process(state):
    if chunk_meta["chunk_type"] == "token":
        print(chunk_meta["delta"], end="", flush=True)
    elif chunk_meta["chunk_type"] == "final":
        next_agent = chunk_meta["next_agent"]
```

For lower-level streaming, use `astream_with_timeout()` directly:

```python
from graphcrew import astream_with_timeout

async for chunk in astream_with_timeout(llm, messages, timeout_seconds=30.0):
    print(chunk.content, end="")
```

> **Note:** Streaming is supported in simple mode only. ReAct mode requires full responses for XML parsing.

## Structured Output

Enable LLM-native structured output for more reliable ReAct parsing:

```python
supervisor = SupervisorAgent(
    config=agent_config,
    llm=llm,
    topology=topology,
    template_manager=template_manager,
    use_structured_output=True,  # Uses with_structured_output() if available
)
```

If the LLM doesn't support `with_structured_output()`, it falls back to XML parsing automatically.

## Callbacks and Tracing

Thread `RunnableConfig` through for LangSmith, Langfuse, or OpenTelemetry callbacks:

```python
from langchain_core.runnables import RunnableConfig

config = RunnableConfig(callbacks=[langsmith_handler])
result = await graph.ainvoke(initial_state, config=config)
```

For native OTel spans, install the optional extra and use the middleware:

```bash
pip install graphcrew[otel]
```

```python
from graphcrew.observability import OpenTelemetryMiddleware

supervisor = SupervisorAgent(
    config=agent_config, llm=llm, topology=topology,
    middleware=[OpenTelemetryMiddleware()],
)
```

## Token Usage

Access LLM token usage from agent responses:

```python
result = await graph.ainvoke(initial_state)
last_msg = result["messages"][-1]
token_usage = last_msg.response_metadata.get("token_usage")
# {"input_tokens": 150, "output_tokens": 42, "total_tokens": 192}
```

## Project Structure

```
src/graphcrew/
  config/        # YAML → Pydantic validated config + defaults
  core/          # OrchestratorState, types, exceptions, logging_context, rate_limiting, token counting, llm_utils
  agents/        # BaseAgent ABC + Supervisor + SessionManagerAgent + middleware
  context/       # SlotManager, tiered slots, eviction, persistence (ABC + InMemory)
  session/       # SessionManager ABC + InMemorySessionManager + data models
  routing/       # TopologyResolver
  knowledge/     # KnowledgeLoader, ReactTemplateManager, slot models, react_config schema
  react/         # ReactExecutor, multi-pass thought-action-observation loop, ReactStructuredResponse
  observability/ # Optional OpenTelemetry instrumentation (middleware, tracer helpers)
```

## Development

```bash
uv sync --group dev              # install deps
make all                         # lint + typecheck + test
```

Or individual commands:

```bash
make test       # run tests with coverage
make lint       # ruff linter
make typecheck  # mypy strict
make format     # auto-format with ruff
```

See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed setup and PR process.

See [CHANGELOG.md](CHANGELOG.md) for version history.

## Documentation

| Guide | Description |
|-------|-------------|
| [Architecture](docs/ARCHITECTURE.md) | Design decisions and system diagrams |
| [User Guide](docs/user-guide.md) | Step-by-step setup and usage |
| [Extending](docs/extending.md) | Decision tree for ABCs and middleware |
| [Deployment](docs/deployment.md) | Docker, lifecycle, connection pooling |
| [Performance Tuning](docs/performance-tuning.md) | Production optimization tips |
| [Common Patterns](docs/common-patterns.md) | Production patterns: PostgreSQL, Redis, tiktoken, multi-tenant |
| [Changelog](CHANGELOG.md) | Version history |

API reference via inline docstrings (IDE hover or `help()`).

## License

[MIT](LICENSE)
