Metadata-Version: 2.4
Name: ziro
Version: 0.1.1
Summary: Deep LangGraph agent with memory, RAG, skills, OpenRouter, and Langfuse
Requires-Python: >=3.11.9
Requires-Dist: aiosqlite
Requires-Dist: faiss-cpu
Requires-Dist: httpx
Requires-Dist: langchain-community
Requires-Dist: langchain-core>=0.3
Requires-Dist: langchain-huggingface
Requires-Dist: langchain-mcp-adapters>=0.3.0
Requires-Dist: langchain-openai>=0.2
Requires-Dist: langchain-postgres
Requires-Dist: langchain-text-splitters
Requires-Dist: langchain>=0.3
Requires-Dist: langfuse>=4.12.0
Requires-Dist: langgraph-checkpoint-postgres
Requires-Dist: langgraph-checkpoint-sqlite
Requires-Dist: langgraph>=0.2
Requires-Dist: langsmith>=0.3.0
Requires-Dist: loguru>=0.7.3
Requires-Dist: psycopg[binary]
Requires-Dist: pypdf
Requires-Dist: python-dotenv
Requires-Dist: rich>=13.0
Requires-Dist: sentence-transformers
Requires-Dist: textual>=8.2.7
Requires-Dist: torch>=2.4
Requires-Dist: transformers>=4.44
Provides-Extra: all
Requires-Dist: faster-whisper>=1.0; extra == 'all'
Requires-Dist: llama-cpp-python>=0.3; extra == 'all'
Requires-Dist: piper-tts>=1.2; extra == 'all'
Requires-Dist: presidio-analyzer>=2.2; extra == 'all'
Requires-Dist: sounddevice>=0.4; extra == 'all'
Provides-Extra: guardrails-local
Requires-Dist: llama-cpp-python>=0.3; extra == 'guardrails-local'
Requires-Dist: presidio-analyzer>=2.2; extra == 'guardrails-local'
Provides-Extra: voice-local
Requires-Dist: faster-whisper>=1.0; extra == 'voice-local'
Requires-Dist: piper-tts>=1.2; extra == 'voice-local'
Requires-Dist: sounddevice>=0.4; extra == 'voice-local'
Description-Content-Type: text/markdown

# Ziro

LangGraph conversational agent with persistent memory, RAG, progressive tool loading, MCP support, skills library, configurable guardrails, and automatic context compaction. Backed by OpenRouter (any LLM) and Langfuse (observability). Two runtime modes: SQLite/FAISS for local dev, PostgreSQL/pgvector for production.

## Features

- **Multi-agent** — run as one of several named agents, each with its own persona, model, tools, MCP servers, guardrails, and compaction policy; per-agent on/off flags
- **Long-term memory** — user- and agent-scoped facts persisted across sessions
- **Thread resumption** — pick up any prior conversation by thread ID
- **Context compaction** — older turns are auto-folded into a running summary when a request nears the model's context window; recent turns kept verbatim; YAML-driven, model-aware budgeting
- **RAG** — semantic search over indexed documents (`.txt`, `.pdf`, directories)
- **Skills library** — index `SKILL.md` files; agent retrieves relevant skills and loads reference files on demand
- **Progressive tool loading** — tools are deferred by default; the LLM discovers and activates only what it needs via `search_tools` / `load_tools`
- **Subagents** — delegate a self-contained subtask to a child subagent in isolated context; subagents are single-file `*.agent.md` definitions (persona + scoped tools/skills) with inherited, never-widened permissions
- **Tool permissions (F06)** — per-agent allow/deny/ask policy over tool/namespace globs; `ask` triggers a human-in-the-loop approval, with allow-once / allow-thread / allow-always memory; `shell:*` ships dangerous-default-deny
- **Hooks (F01)** — declarative lifecycle interception (`session_start`, `pre/post_turn`, `pre/post_tool`, `pre/post_model`); Python-dotted or shell callables with glob matchers; powers permission gating, shell audit, memory reflection
- **Human handoff (F07)** — optional `request_handoff` tool that pauses the turn for a human operator and injects their reply back into the conversation
- **Clarifying questions** — core `ask_user_question` tool (mirrors Claude Code's AskUserQuestion): pauses the turn to ask 1-4 structured multiple-choice questions and folds the answers back in; shares the same HITL interrupt path as F06/F07
- **Task / todo list (F08)** — `write_todos` / `update_todo` state tools; the running todo list renders in the TUI side pane
- **Self-improving memory (F03)** — per-agent `memory_policy.yaml` governs how facts are extracted and reflected into long-term store
- **Multimodal input (F02)** — attach images via `/img <path>` or `--attach`; vision-capable models receive image blocks, text-only models degrade gracefully (`attachment_policy.yaml`)
- **Filesystem tools** — cross-platform core tools `read_file`, `grep`, `glob_files` for local file inspection, plus deferred, ask-gated `write_file`/`edit_file` mutating tools (separate `fs_write` namespace); pure stdlib (identical on Windows + POSIX), path-confined to the project root, on by default (`fs_policy.yaml`)
- **Web fetch** — optional SSRF-guarded `web_fetch(url)` tool that returns cleaned, length-capped page text
- **Shell execution** — optional `run_shell` tool that runs real commands on the host or in a Docker sandbox; default-deny + human approval (F06/HITL), denylist backstop, timeout + process-tree kill, output caps, env scrubbing, and full audit.
- **Bash parity** — a cross-platform `bash` interpreter (Git Bash/WSL-probed on Windows) makes pipes/quoting/heredocs behave identically on every OS; optional persistent per-thread bash session keeps `cd`/env vars across calls
- **LLM adapter** — `LLMAdapter` abstraction (`app/llm/`) decouples the graph from OpenRouter specifically, the first step toward multi-provider support; per-agent `provider:` selection in `meta.yaml`
- **Local voice** — optional push-to-talk speech I/O (F11): faster-whisper STT + Piper TTS, fully on-device, no cloud key required (cloud STT/TTS also pluggable)
- **MCP support (F18)** — connect external MCP servers (stdio, SSE, streamable HTTP, WebSocket) with **OAuth**; a live `McpManager` (re)ingests tools into the shared registry, persistent sessions cut per-call latency, failures non-fatal
- **Interactive TUI (F12)** — Textual front-end (transcript, todo/active-tools side pane, status footer, approval/question modal); paints in ~1.6s and is usable while the engine builds on a background thread (F26); slash-commands, **live themes** (carbon/nord/gruvbox), a **context-usage meter** (F16), and an **MCP control panel** (F17, Ctrl+O); session state is a single reactive `UIStore`, with SVG snapshot tests + a `textual serve` browser demo for UI peek
- **Slash commands (F14)** — `/help /agent /thread /model /theme /think /clear /voice /img /mcp /stop /save /quit` dispatched in the driver loop before the LLM (no model call, no transcript pollution)
- **Chat queue (F13)** — in-flight buffer captures mid-turn input; optional background worker pool (`chat_once --submit/--status`) survives restart
- **Guardrails** — configurable input/output guards (regex injection, ML classifier, PII, Llama Guard content safety); YAML-driven, no code changes to add rules
- **Agent persona** — configurable `soul_prompt`, `system_prompt`, and `fallback_messages` via the agent's `agent_config.yaml`
- **Resilient replies** — empty reasoning-only turns are re-prompted once, then served a non-blank fallback rather than a blank message
- **Thinking / reasoning** — optional extended reasoning via `OPENROUTER_REASONING_EFFORT`, switchable live with `/think <high|medium|low|off>` (F15)
- **Fast startup** — embeddings warmup is off the critical path (F24); the TUI shell + graph build are torch-free, torch loads on a background thread (F26)
- **Dual backends** — dev (SQLite + FAISS, zero infra) / prod (PostgreSQL + pgvector)
- **Observability** — Langfuse and LangSmith tracing, both optional

## Quickstart

```bash
# 1. Install dependencies — includes the local guardrail backends
#    (Presidio PII + Llama Guard content safety).
uv sync

# 2. (Optional) local voice — faster-whisper STT + Piper TTS (push-to-talk, no cloud key)
uv sync --group voice-local

# Download the local models the installed backends need (spaCy, Llama Guard GGUF,
# and — only if the voice group is installed — the default Piper voice).
# Fetches just what's missing.
python -m app.cli.startup

# 3. Configure environment
cp .env.example .env
# Set OPENROUTER_API_KEY in .env

# 4. (Production only) start PostgreSQL + pgvector
docker compose up -d

# 5. Run
python -m app.main --user alice
```

Resume a previous session:

```bash
python -m app.main --user alice --thread alice_abc12345
```

Single-shot (JSON output, for programmatic / LLM-driven use):

```bash
python -m app.cli.chat_once --user alice --message "Hello"
```

## Environment Variables

| Variable | Required | Purpose |
|---|---|---|
| `OPENROUTER_API_KEY` | Yes | LLM access via OpenRouter |
| `OPENROUTER_MODEL` | No | Model override (default: `google/gemini-2.5-flash-lite`) |
| `OPENROUTER_REASONING_EFFORT` | No | Enable extended reasoning: `low` / `medium` / `high` (leave empty to disable) |
| `DATABASE_URL` | No | Enables production PostgreSQL + pgvector backends |
| `LANGFUSE_PUBLIC_KEY` / `LANGFUSE_SECRET_KEY` | No | Langfuse observability |
| `LANGFUSE_HOST` | No | Langfuse host (default: `https://cloud.langfuse.com`) |
| `LANGCHAIN_TRACING_V2` / `LANGCHAIN_API_KEY` | No | LangSmith tracing |

## Commands

```bash
uv sync                                                              # install dependencies (incl. local guardrail backends)
uv sync --group voice-local                                          # + faster-whisper STT + Piper TTS (optional local voice)
python -m app.cli.startup                                            # download spaCy + Llama Guard + default Piper voice if absent
python -m app.main --user <USER_ID>                                  # start chat session (interactive agent picker)
python -m app.main --user <USER_ID> --agent <AGENT_ID>              # start with a specific agent
python -m app.main --user <USER_ID> --voice                         # push-to-talk voice I/O (needs voice-local + voice_policy enabled)
python -m app.main --user <USER_ID> --thread <THREAD_ID>            # resume session
python -m app.cli.chat_once --user <USER_ID> --message <MSG>        # single-shot JSON output
python -m app.cli.chat_once --user <USER_ID> --agent <AGENT_ID> --message <MSG>  # single-shot, specific agent
python -m app.cli.manage_agents list                                 # list agents + per-agent on/off state
python -m app.cli.manage_agents add <AGENT_ID> --name NAME          # scaffold a new agent
python -m app.cli.manage_agents remove <AGENT_ID>                   # delete an agent
python -m app.cli.manage_agents enable <AGENT_ID>                   # turn a single agent ON
python -m app.cli.manage_agents disable <AGENT_ID>                  # turn a single agent OFF
python -m app.cli.manage_agents set-default <AGENT_ID>              # set the default agent
python -m app.cli.manage_agents set-model <AGENT_ID> <MODEL>        # set an agent's model (or 'none' to clear)
python -m app.cli.manage_agents add-subagent <AGENT_ID> --tools a,b --namespaces rag --skills x  # scaffold a single-file subagent
python -m app.cli.manage_agents remove-subagent <AGENT_ID>          # delete a subagent definition
python -m app.cli.run_scenarios                                      # replay scenario files in tmp/
python -m app.rag.indexer <path>                                     # index docs (dir, .txt, or .pdf)
python -m skills.loader                                              # index all SKILL.md files
python -m app.tools.indexer                                          # re-index tool descriptions
python -m app.tui.demo                                               # scripted TUI demo (UI peek); textual serve "python -m app.tui.demo" for a browser demo
mypy .                                                               # type check
pytest tests/test_tui_snapshots.py --snapshot-update                 # regenerate TUI SVG snapshot baselines after a UI change
docker compose up -d                                                 # start PostgreSQL + pgvector (prod)
docker compose build pdf-sandbox                                     # build F10 sandbox image (ziro-pdf:latest) for researcher_docker
```

### Visualize the graph

```bash
python -m app.cli.show_graph                          # ASCII to stdout (requires grandalf: uv sync --group dev)
python -m app.cli.show_graph --format mermaid         # Mermaid markdown to stdout
python -m app.cli.show_graph --format mermaid -o g.md # Mermaid to file
python -m app.cli.show_graph --format png -o g.png    # PNG (requires graphviz system package)
```

### Slash commands (in-session)

Typed into the running session (TUI or REPL), dispatched before the LLM — no model call, no transcript pollution (F14):

| Command | Purpose |
|---|---|
| `/help` (`/h`) | List commands |
| `/agent [id]` | Switch agent (rebuilds the session, fresh thread) |
| `/thread [id]` | Switch / resume a thread |
| `/model [id]` | Switch the model (rebuilds the LLM) |
| `/theme [name]` | Switch the UI theme live + persist (carbon / nord / gruvbox) |
| `/think <high\|medium\|low\|off>` | Set reasoning effort live (F15) |
| `/clear` | Start a fresh thread (same agent) |
| `/voice [on\|off]` | Toggle push-to-talk voice I/O (needs `--voice` + `voice_policy`) |
| `/img <path> [text]` | Attach an image to this turn (F02) |
| `/mcp [server]` | Show MCP servers (TUI: open the control panel, Ctrl+O) |
| `/stop` (`/halt`) | Abort the turn currently in flight (Ctrl+S in the TUI) |
| `/save [path]` | Save the transcript to a JSON file |
| `/quit` (`/exit`, `/q`) | Exit the session |

## Architecture

```
User input → [input_guard] → [compaction] → agent node → LLM with tools → DynamicToolNode → back to agent → ... → [output_guard] → final response
```

`input_guard`, `output_guard` (toggled by the agent's `guardrails_policy.yaml`), and `compaction` (toggled by `compaction_policy.yaml`) are conditional nodes. A blocked request returns a refusal message without reaching the LLM. The agent injects long-term user memories, the configured persona, and the running compaction summary into the system prompt before each LLM call. An empty reasoning-only reply is re-prompted once (`reprompt`), then replaced by a non-blank fallback (`reply_fallback`).

### Multi-Agent System

The agent can run as one of several named agents, each with its own persona, tool policy, MCP servers, and guardrails.

- **Per-agent folders** — each agent lives in `app/agents/<agent_id>/` and is self-contained, with its own `agent_config.yaml`, `tool_policy.yaml`, `mcp_servers.yaml`, `guardrails_policy.yaml`, `compaction_policy.yaml`, and `meta.yaml`. The shipped agents are `default` (life-coach persona), `researcher` (factual research aide with lighter guardrails; `run_shell` enabled in host mode), and `researcher_docker` (same persona with `run_shell` in Docker-sandbox mode).
- **Per-agent model** — `meta.yaml`'s `model` field sets the OpenRouter model for that agent. Omit it (or `model: null`) to use the `OPENROUTER_MODEL` env default. A `provider:` field selects the LLM adapter (default `openrouter` — see [LLM Adapter](#llm-adapter)).
- **Per-agent on/off** — `meta.yaml`'s `enabled` flag controls whether an agent is selectable. Toggle with `manage_agents enable|disable <id>`.
- **Manifest** — `app/agents/registry.yaml` records the default agent. If the default is disabled, the first enabled agent is used.
- **Memory isolation** — long-term memory is keyed per `(user_id, agent_id)`, so each agent keeps its own facts per user.

### Progressive Tool Loading

Tools are split into three tiers:

| Tier | Description |
|---|---|
| **Core tools** | Always bound to the LLM every turn (defined in `tool_policy.yaml` → `core_tools`) |
| **Meta-tools** | Always bound; provide the discovery surface (`search_tools`, `load_tools`, `list_tools`, `unload_tools`) |
| **Deferred tools** | Registered in the `ToolRegistry` but NOT bound until the LLM calls `load_tools([...])` |

Discovery loop: `search_tools("<keyword>")` → pick names → `load_tools([...])` → call the tool. Activated tools persist for the rest of the thread (up to `max_active`, LRU eviction after that).

### MCP Integration

External MCP servers are declared in the agent's `mcp_servers.yaml`. On startup, tool metadata is fetched from all enabled servers and registered into the `ToolRegistry` under their server name as namespace. MCP tools enter the same deferred-discovery surface as local tools — they are NOT bound to the LLM until activated via `load_tools`.

Supported transports: `stdio`, `sse`, `streamable_http`, `websocket`, plus **OAuth** (F18). A live `McpManager` (`app/tools/mcp_manager.py`) owns per-server state and a `ServerStatus` snapshot, holds a persistent session per server (one owner task keeps it open so tool calls reuse it — large latency win), and re-ingests tools into the shared registry on every connect so discovery stays current. `connect` / `disconnect` / `reconnect` are driven live from the F17 panel or `/mcp`. `OAuthConfig` picks the flow (auto / authorization_code / device_code; device-code when headless); tokens are cached and refreshed per request. Server failures are non-fatal; the rest of the registry is unaffected.

### Subagents

The parent can delegate a subtask to a **child subagent** via `spawn_subagent` (or `dispatch_subagents` for parallel fan-out). Each child is a fresh graph on a namespaced thread with its own persona, empty history, and a **restricted** toolset; only a compact result is folded back into the parent.

- **Single-file definitions** — subagents live in `app/agents/.subagents/<id>.agent.md`: YAML frontmatter (`name`, `description`, `enabled`, `model`, allowed `tools:`/`namespaces:`, scoped `skills:`, optional `soul`) plus a Markdown body that becomes the system prompt. The `.subagents/` dir is spawn-only — subagents never appear in the interactive picker. Shipped: `scout` (research) and `reflector` (reflection coach).
- **Permission inheritance** — a child's tool rights are the **intersection** of its declared surface and the parent's current rights; never wider. Namespace tokens (`rag`, `rag:*`) expand to the concrete tools available in the (possibly restricted) registry view.
- **Skill scoping** — a subagent's `skills:` allowlist restricts its `search_skills` / `load_skill_ref` to only those skills.
- **Runaway guards** — per-parent `subagent_policy.yaml` sets `enabled`, `allowed_children`/`denylist`, `max_depth` (default 1), `max_spawns_per_thread`, and parallelism limits; each child runs under a `recursion_limit`.

Scaffold one with `manage_agents add-subagent <id> --tools … --namespaces … --skills …`.

### Web Fetch

An optional `web_fetch(url)` deferred tool (namespace `webfetch`) fetches an http/https page and returns cleaned, length-capped text. Gated by the agent's `webfetch_policy.yaml` — when `enabled: false` the tool is never registered (invisible to discovery). An SSRF guard resolves the host and refuses loopback / private / link-local / reserved targets, re-checking the final URL after redirects.

### Filesystem Tools

Three cross-platform **core** tools (namespace `fs`) give the agent Claude-Code-style local file inspection: `read_file`, `grep`, and `glob_files`. They are pure Python stdlib (`pathlib` / `re` / `os` / `fnmatch`) — no new dependency, no shelling out — so behavior is identical on Windows and POSIX.

- **`read_file(path, offset?, limit?)`** — read a UTF-8 text file as numbered lines; byte-capped (`max_read_bytes`) with an `offset`/`limit` line window (`max_read_lines`); binary files are detected and skipped.
- **`grep(pattern, path, glob?, output_mode?, case_insensitive?)`** — regex search over file contents under a path; `glob` filters filenames; `output_mode` is `content` / `files` / `count`; prunes ignored dirs (`.git`, `node_modules`, …) and skips binaries; capped at `max_grep_matches`.
- **`glob_files(pattern, path?)`** — find files by glob (incl. `**`), returned repo-relative and mtime-sorted (newest first), capped at `max_glob_results`.

Every path is resolved against the project root and refused if it escapes (`FsConfig.confine`, default on). On by default via the agent's `fs_policy.yaml` (missing file → defaults); added to each shipped agent's `core_tools` and inherited by subagents through the rights intersection (like `web_fetch`). No F06 gate — only `shell:*` is dangerous-default-deny.

**Mutating tools (`write_file` / `edit_file`)** — deferred counterparts registered under a separate `fs_write` namespace so F06 can `ask` on writes while reads stay `allow`; byte-capped at `max_write_bytes` (512KB). Gated by `FsConfig.allow_write` (default on) — set `false` for a read-only agent.

### Shell Execution

An optional `run_shell(command, cwd?)` deferred tool (namespace `shell`) runs real shell commands — **on the host by default**, or inside a **Docker sandbox** via a config flip. The trust model is Claude-Code-like: a human approves what runs.

- **Default off & default-deny** — gated by the agent's `shell_policy.yaml` (`enabled: false` → never registered). Even when enabled, `shell:*` is dangerous-default-deny in the permission system (F06); an agent must explicitly `allow`-list `shell:run_shell` in its `permissions.yaml`. Recommended recipe = `allow` + `ask` (human approval on every call via HITL). `chat_once` auto-denies an `ask` interrupt (fail-closed), so single-shot runs never hang.
- **`ShellGate` backstop** — a non-removable hardened denylist (`rm -rf`, `sudo`, `dd`, `mkfs`, fork bombs, …) plus optional per-agent regex allow/deny and opt-in `confine` cwd containment.
- **Two executors** — `host` (`create_subprocess_shell`, real shell with pipelines) and `sandbox` (`create_subprocess_exec` with a pluggable `sandbox_launcher` template — Docker/WSL/nsjail). Both behind the same gate, audit, and tool body.
- **Bash parity** — `ShellConfig.interpreter` defaults to `"bash"`: the command runs as one argv element via `create_subprocess_exec` (no shell-launcher involved), so bash tokenizes pipes/quoting/heredocs identically on every OS. On Windows this probes Git Bash then falls back to `wsl bash` — the previous approach (pinning `shell_executable` through `create_subprocess_shell`) was unreliable there, since Windows always runs `{executable or COMSPEC} /c {command}` regardless of the executable. `interpreter: "host"` restores the old cmd.exe/`/bin/sh` behavior. An optional persistent per-thread `bash -l` session (`session: "persistent"`) keeps `cd`/env vars/venv activation across calls.
- **Always-on safety** — `timeout_s` with process-tree kill, output capped twice (at the source and at ingestion), and env scrubbing so secrets (`OPENROUTER_API_KEY`, `DATABASE_URL`, `LANGFUSE_*`) never reach the child.
- **Loop guards** — `pre_tool` hooks (`app/hooks/guards.py`) flag repeated near-identical `run_shell` commands (`shell_loop_guard`) and repeated *identical failure signatures* across different commands (`verification_loop_guard` — pytest/mypy/tsc/panic tracebacks recurring while the fix isn't landing).
- **Audit** — `pre_tool` / `post_tool` hooks log every invocation (intent + outcome, including blocked/timeout) to `data/audit/shell-audit.jsonl` (dev) or a `tool_audit` table (prod). See [`docs/audit-log.md`](docs/audit-log.md).

Two agents ship the opt-in: `researcher` (host mode; its persona bakes a map-reduce flow for summarizing large PDFs without overflowing context) and `researcher_docker` (`mode: sandbox`, launcher `docker run --rm --network none -v {workdir}:/work -w /work {image}`, image `ziro-pdf:latest` — build with `docker compose build pdf-sandbox`).

### Local Voice

Optional push-to-talk speech I/O (F11): the turn is bracketed by STT (your speech → the user message) and TTS (the post-output-guard reply → audio). Default backends are **fully local** — faster-whisper for STT, Piper for TTS — so no cloud API key is needed (cloud STT/TTS over `httpx` are also pluggable per agent). Gated by the agent's `voice_policy.yaml` (`enabled: false` by default).

Setup is two commands:

```bash
uv sync --group voice-local   # faster-whisper STT + Piper TTS (+ sounddevice/PortAudio)
python -m app.cli.startup     # downloads the default Piper voice into data/voices/ (gitignored)
```

Then enable it in the agent's `app/agents/<id>/voice_policy.yaml`:

```yaml
enabled: true
tts_backend: local
tts_model: data/voices/en_US-amy-medium.onnx   # path to the downloaded .onnx (config .onnx.json resolved alongside)
```

Run with `python -m app.main --user <id> --voice`, or toggle `/voice on` in the TUI (Ctrl+R to start/stop recording). Piper voices come from the Rhasspy catalog (`<lang>-<name>-<quality>`); download others with `python -m piper.download_voices <voice-id> --download-dir data/voices` and point `tts_model` at the new `.onnx`. The `researcher` agent ships voice-enabled.

### Tool Permissions (F06)

Every tool call passes a per-agent permission gate (`app/permissions/`) wired as a `pre_tool` hook. The agent's `permissions.yaml` declares `allow` / `deny` / `ask` globs over tool/namespace qualified names plus a `default_action`; `dangerous_default_deny` (ships `["shell:*"]`) forces a default-deny for dangerous namespaces. An `ask` decision raises a human-in-the-loop interrupt (the TUI `ApprovalModal`, REPL prompt, or auto-deny in `chat_once`); approvals can be remembered once, for the thread, or always (`remember_default_scope`, persisted per user/agent). Tool args are redacted (secrets masked, long values truncated) before display.

### Hooks (F01)

`app/hooks/` is a declarative lifecycle-interception layer. An agent's `hooks.yaml` lists `HookSpec` entries — each binds a `HookEvent` (`session_start`, `pre_turn`, `post_turn`, `pre_tool`, `post_tool`, `pre_model`, `post_model`), an fnmatch `matcher` (over tool qualified names for tool events), and a callable (Python dotted path or shell command). Hooks return a `HookDecision` (`allow` / `deny` / `modify` / `interrupt`); the runner short-circuits on deny/interrupt and folds `modify`. Hooks are ordered once at startup, so a no-hook registry has zero per-turn cost. This path powers the F06 permission gate and the F10 shell audit (`*run_shell` pre/post_tool).

### Human Handoff (F07)

An optional `request_handoff(reason)` core tool (`app/handoff/`) lets the agent escalate to a human. It pauses the turn, surfaces the reason to an operator, accepts a human-authored reply, and resumes with that reply injected as the next message (assistant or tool, per config). The agent's `handoff_policy.yaml` (`HandoffConfig`) sets `enabled`, the operator prompt, a timeout with an `on_timeout` fallback, and `inject_as`.

### Clarifying Questions

A core `ask_user_question(questions)` tool (`app/clarify/`) mirrors Claude Code's AskUserQuestion: pauses mid-turn to ask 1-4 structured clarifying questions (2-4 options each, optional multi-select, implicit free-text "Other"), then folds the human's answers back into the tool result. It shares the same HITL interrupt module as F06/F07 (`app/graph/interrupts.py` — `InterruptKind.ASK_USER_QUESTION`) rather than a bespoke pause mechanism; the TUI renders a `QuestionModal`, the REPL prints numbered options. Gated by `clarify_policy.yaml` (`ClarifyConfig` — `enabled`, `max_questions`, `max_options`; missing file → on by default).

### Task / Todo List (F08)

`app/tasks/` adds state-based todo tracking via two core tools: `write_todos(todos)` (full rewrite) and `update_todo(id, status|content|result)`. Todos live in `AgentState.todos` (merged by a per-id reducer), survive checkpointing, and render live in the TUI side pane. A todo can carry an `agent_id` to mark subagent delegation.

### Multimodal Input (F02)

Images attach via the `/img <path>` directive or the CLI `--attach` flag (`app/io/attachments.py`). `build_human_message` checks `supports_multimodal(model)`: vision-capable models receive base64 data-URI image blocks in a multimodal `HumanMessage`; text-only models get a graceful notice instead. Bounded by the agent's `attachment_policy.yaml` (`enabled`, `max_image_bytes`, `allowed_image_types`).

### Interactive TUI & Themes (F12 / F16 / F17)

`app/tui/` is a Textual front-end over the shared `TurnRunner`: a transcript, a todo/active-tools side pane, a status footer, and an `ApprovalModal` rendering the shared HITL interrupt. It paints in ~1.6s and is usable immediately while the engine builds on a background thread — any turn typed early is queued and runs in order (F26). Extras:

- **Themes** — `app/tui/theme.py` maps semantic color slots (not literal hex) to Textual + Rich widgets. Carbon (amber + steel on carbon black) is the default; `nord` and `gruvbox` also ship. Palettes are discovered from three roots (bundled `app/tui/themes/`, project `./themes/`, home `~/.ziro/themes/`); `/theme [name]` swaps live and persists to `data/ui_prefs.json`.
- **Context meter (F16)** — a `ContextMeter` widget reveals at ≥50% of the usable input budget and shades muted → amber → red as the request approaches the compaction trigger, with a trip-line marker at the trigger line.
- **MCP control panel (F17)** — `Ctrl+O` (or `/mcp` in the TUI) opens a modal `McpPanel`: a live table of servers (state, transport, auth, tool count) with Connect / Disconnect / Reconnect, tool peek, and an OAuth device-code prompt.
- **Reactive state + UI peek** — session state (identity/lifecycle/content/chrome) lives in one reactive `UIStore` (`app/tui/store.py`) mutated only through typed messages (`app/tui/messages.py`); widgets bind via `watch()` instead of imperative `update_*()` calls. `python -m app.tui.demo` (or `textual serve "python -m app.tui.demo"` for a browser demo) drives a scripted fake runner for visual review; SVG snapshot regression tests live in `tests/test_tui_snapshots.py` (`pytest --snapshot-update` to refresh baselines). See [`docs/tui-peek.md`](docs/tui-peek.md).

### LLM Adapter

`app/llm/` wraps LangChain's `Runnable` behind an `LLMAdapter` ABC (`invoke`/`ainvoke`/`stream`/`astream`, `bind_tools()`, `context_window()`/`supports()`), decoupling the graph from OpenRouter specifically. `app/llm/factory.py` resolves a provider (`meta.yaml`'s `provider:` field, default `openrouter`) to an adapter — only `OpenrouterAdapter` ships today, but unknown providers fail fast rather than silently falling back, and `get_llm()`/`get_agent_llm()`/`get_summary_llm()` in `app/core/config.py` are now thin wrappers over `build_adapter()` rather than constructing `ChatOpenAI` directly. This is step one toward multi-provider support.

### Context Compaction

Each agent has a `compaction_policy.yaml`. When a turn's request exceeds `trigger_pct` of the model's *usable input budget* (window minus reserved output and schema headroom), the `compaction` node folds older messages into a **running summary** (injected into the system prompt) and drops them from live history via `RemoveMessage` — shrinking both the in-flight request and the persisted checkpoint. The `keep_recent_min` most-recent messages are always kept verbatim, and the split is chosen to never orphan a `ToolMessage` from its parent call.

Budget math is model-aware: `app/llm/openrouter_catalog.py` (formerly `app/core/model_specs.py`) reads the OpenRouter `/models` catalog once per process for each model's `context_length`, `max_completion_tokens`, and `supported_parameters`, so the trigger, the reserved-output carve-out, and the `max_tokens` sent to the LLM stay consistent. Strategy is `hybrid` (summarize the dropped span via a clean tool-free LLM) or `trim` (drop only). Setting `enabled: false` restores un-compacted behavior exactly. A separate `max_tool_message_tokens` / `max_tool_message_pct` cap bounds a single oversized `ToolMessage` at ingestion. See [`docs/chat-compression.md`](docs/chat-compression.md).

### Tools available to the LLM

| Tool | Tier | Purpose |
|---|---|---|
| `search_rag(query)` | Core | Semantic search over indexed documents |
| `search_skills(query)` | Core | Semantic search over SKILL.md files |
| `load_skill_ref(skill_name, filename)` | Core | Load a reference/script file from a skill directory on demand |
| `save_memory(content)` | Core | Persist a user fact to long-term store |
| `search_tools(query)` | Meta | Find available deferred tools by keyword |
| `load_tools(names)` | Meta | Activate deferred tools for this thread |
| `list_tools(namespace)` | Meta | Browse all tools by namespace |
| `unload_tools(names)` | Meta | Deactivate tools to free context |
| `write_todos(todos)` | Core | Rewrite the turn's todo list (F08); renders in the TUI side pane |
| `update_todo(id, …)` | Core | Update one todo's status/content/result (F08) |
| `request_handoff(reason)` | Core | Pause the turn for a human operator and inject their reply (F07); gated by `handoff_policy.yaml` |
| `ask_user_question(questions)` | Core | Pause the turn to ask 1-4 structured clarifying questions (2-4 options, optional multi-select); gated by `clarify_policy.yaml` |
| `spawn_subagent(agent_id, task)` | Deferred | Delegate a self-contained subtask to a child subagent in isolated context; returns one concise result |
| `dispatch_subagents(tasks)` | Deferred | Parallel fan-out (gated by `enable_parallel`); one result block per child |
| `get_subagent_transcript(run_id)` | Deferred | Pull a past child run's full transcript on demand (capped) |
| `read_file(path, offset?, limit?)` | Core | Read a project text file as numbered lines; byte/line capped, binary-safe, path-confined (`fs`) |
| `grep(pattern, path, glob?, output_mode?, case_insensitive?)` | Core | Regex search over file contents (content/files/count); skips ignored dirs + binaries (`fs`) |
| `glob_files(pattern, path?)` | Core | Find files by glob, mtime-sorted, repo-relative, path-confined (`fs`) |
| `write_file(path, content)` / `edit_file(path, …)` | Deferred | Mutating file writes, byte-capped, path-confined; separate `fs_write` namespace so writes can `ask` while reads stay `allow`; gated by `fs_policy.yaml` `allow_write` |
| `web_fetch(url)` | Deferred | Fetch an http/https page, return cleaned text; SSRF-guarded; gated by `webfetch_policy.yaml` |
| `run_shell(command, cwd?)` | Deferred | Run a shell command on the host or a sandbox container; default-deny + HITL, denylist/timeout/output-cap/env-scrub; gated by `shell_policy.yaml` |
| *deferred tools* | Deferred | Any tool registered in the registry (local or MCP) |

### Runtime Modes

| Mode | Trigger | Store | Checkpointer | Vector |
|---|---|---|---|---|
| Dev | No `DATABASE_URL` | `SqliteStore` (`./data/memories.db`) | `AsyncSqliteSaver` (`./data/checkpoints.db`) | FAISS |
| Prod | `DATABASE_URL` set | `PostgresStore` | `AsyncPostgresSaver` | pgvector |

### Key Modules

| Module | Role |
|---|---|
| `app/graph/graph.py` | Builds LangGraph state machine; wires nodes + routing (including guardrail nodes) |
| `app/graph/nodes.py` | `make_agent_node()`, `make_dynamic_tool_node()`, `make_memory_tools()` |
| `app/graph/state.py` | `AgentState` TypedDict — `messages`, `guardrail_*`, `active_tools` (+`ReplaceActiveTools` for LRU), `running_summary`, `last_compaction_index`; `extract_text()` for reasoning-block content |
| `app/core/config.py` | Reads `.env`; LLM factories `get_llm()` / `get_agent_llm()` / `get_summary_llm()` — thin wrappers over `app.llm.factory.build_adapter()` (model-aware `max_tokens`), embeddings, Langfuse handler; `load_*` for agent/tool/mcp/compaction configs |
| `app/core/agent_profiles.py` | `AgentProfile` — resolves per-agent config files + model; `select_profile()`, `list_agent_profiles()`, `all_agent_profiles()`, `get_agent_profile()`; subagent resolution (`SUBAGENTS_DIR`, `*_subagent_profile(s)`) |
| `app/core/agent_md.py` | Parses single-file `*.agent.md` subagent definitions (frontmatter + Markdown body) |
| `app/llm/openrouter_catalog.py` | OpenRouter `/models` catalog cache (renamed from `app/core/model_specs.py`): `context_length`, `max_completion_tokens`, `supported_parameters`, `provider`; `supports_parameter()` gating, `list_model_ids(provider=)`/`list_providers()` |
| `app/llm/adapter.py` | `LLMAdapter` ABC wrapping a LangChain `Runnable` — `invoke`/`ainvoke`/`stream`/`astream`, `bind_tools()`, `build()` classmethod |
| `app/llm/factory.py` | `build_adapter(provider, ...)` — provider registry (only `openrouter` shipped; unknown provider errors) |
| `app/core/paths.py` | `PROJECT_ROOT` + resolved dirs (`AGENTS_DIR`, `DATA_DIR`, `SKILLS_DIR`) anchored to the repo root |
| `app/compaction/` | `node.py` (`make_compaction_node`, `pick_split`), `window.py` (budget/trigger math), `summarizer.py`, `tokenizer.py`, `models.py` (`CompactionConfig` / `CompactionResult`) |
| `app/memory/store.py` | `save_memory` / `load_memories` — user-scoped long-term facts |
| `app/memory/checkpointer.py` | Thread-level checkpoints (enables `--thread` resumption) |
| `app/rag/retriever.py` | `search_rag`, `search_skills`, `load_skill_ref` tools; lazy FAISS/PGVector loading with module-level cache |
| `app/rag/indexer.py` | Chunking (1000 tokens, 200 overlap), embedding (HuggingFace `all-MiniLM-L6-v2`), indexing |
| `app/tools/registry.py` | `ToolRegistry` — unified source of truth for all tools; semantic + keyword search; `ToolDescriptor`; `expand()` (namespace tokens → tools); `view()` restricted facade |
| `app/tools/meta_tools.py` | `search_tools`, `load_tools`, `list_tools`, `unload_tools` — the progressive-loading discovery surface |
| `app/tools/bootstrap.py` | `build_local_registry()` / `ingest_mcp_tools()` — startup wiring of local + MCP tools |
| `app/tools/mcp_client.py` | Connects to MCP servers from agent `mcp_servers.yaml`; fetches + allowlist-filters tools |
| `app/tools/indexer.py` | CLI to re-index tool descriptions into the `tools` vector-store collection |
| `app/guardrails/models.py` | Pydantic models: `PolicyConfig`, `PolicyRule`, `GuardrailDecision`, backend configs |
| `app/guardrails/backends.py` | `RegexInjectionRunnable`, `LocalClassifierRunnable`, `PresidioRunnable`, `LlamaGuardRunnable`, `LLMGuardrailRunnable`; `make_backend()` factory |
| `app/guardrails/evaluator.py` | `GuardrailEvaluator` — groups rules by backend, runs backends concurrently |
| `app/guardrails/nodes.py` | `make_input_guard_node()` / `make_output_guard_node()` LangGraph node factories |
| `app/guardrails/policy_loader.py` | `load_policies()` — reads an agent's `guardrails_policy.yaml` |
| `app/cli/chat_once.py` | Single-shot invocation; outputs one JSON line (for programmatic / LLM-driven use) |
| `app/cli/run_scenarios.py` | Replay scenario JSON files through the agent; writes side-by-side transcripts |
| `app/cli/manage_agents.py` | Scaffold, list, enable/disable, and configure agents |
| `app/cli/show_graph.py` | Visualize the LangGraph state machine (ASCII, Mermaid, PNG) |
| `app/subagents/` | `models.py` (`SpawnPolicy` etc.), `orchestrator.py` (`SpawnContext`, graph cache, rights intersection, skill-scope threading), `tool.py` (`make_subagent_tools`) |
| `app/permissions/` | F06: `models.py` (`PermissionPolicy`/`PermissionDecision`/`PermissionRequest`), `gate.py`/`policy.py` (evaluate + arg redaction), `hook.py` (pre_tool wiring), `store.py` (durable grants) |
| `app/hooks/` | F01: `models.py` (`HookEvent`/`HookSpec`/`HookContext`/`HookDecision`), `registry.py`, `runner.py`, `callables.py` (Python/shell callables), `guards.py` (`shell_loop_guard`/`verification_loop_guard` — repeated-command / repeated-failure loop detection) |
| `app/graph/interrupts.py` | Shared HITL interrupt module for F06/F07/clarify: `InterruptKind`, `raise_interrupt()`, `InterruptRequest`/`InterruptResponse`, `render_interactive()` |
| `app/handoff/` | F07: `models.py` (`HandoffConfig`), `tools.py` (`request_handoff`); gated by `handoff_policy.yaml` |
| `app/clarify/` | `ask_user_question` core tool: `models.py` (`ClarifyConfig`), `tools.py` (`make_clarify_tools`); gated by `clarify_policy.yaml` |
| `app/tasks/` | F08: `models.py` (`Todo`/`TodoStatus`), `tools.py` (`write_todos`/`update_todo`), `reducer.py`, `render.py` |
| `app/io/attachments.py` | F02: `AttachmentConfig`, `parse_attachments`, `to_image_block`, `build_human_message` (multimodal vs text-only) |
| `app/tui/theme.py` + `themes/` | Theme palettes (`Palette`, carbon/nord/gruvbox, `discover_palettes`, live `set_active`); `app/core/ui_prefs.py` persists the choice |
| `app/tui/mcp_panel.py` | F17: `McpPanel` modal + `OAuthPromptModal` (device-code prompt) |
| `app/tui/store.py` | `UIStore` — single non-visual reactive state holder (identity/lifecycle/content/chrome) mounted on the App |
| `app/tui/messages.py` | `SessionSwitched`/`BusyChanged`/`StartupTicked`/`StateChanged`/`UsageChanged`/`ThemeChanged` — the only vocabulary that mutates `UIStore` |
| `app/tui/demo.py` | `ScriptedRunner` (torch-free fake `TurnRunner`) + `build_demo_app()` — UI peek via `python -m app.tui.demo` / `textual serve` |
| `app/tools/mcp_manager.py` + `mcp_models.py` | F18: `McpManager` (connect/disconnect/reconnect, persistent sessions, re-ingest), `ServerStatus`/`ToolSpec`/`OAuthConfig` |
| `app/webfetch/` | `models.py` (`WebFetchConfig`), `tool.py` (`web_fetch`, SSRF guard, HTML→text); gated by `webfetch_policy.yaml` |
| `app/fs/` | `models.py` (`FsConfig` — read caps + `allow_write`/`max_write_bytes`), `tools.py` (`make_fs_tools` — `read_file`/`grep`/`glob_files`; `make_fs_write_tools` — `write_file`/`edit_file` under the separate `fs_write` namespace; `_resolve` path confinement); pure stdlib, cross-platform; gated by `fs_policy.yaml` (on by default) |
| `app/tools/shell.py` + `shell_models.py` | F10/F27 shell: `ShellGate`, host + sandbox executors, `make_shell_tool`, output caps, process-tree kill; `_resolve_bash()` (Git Bash/WSL probing) + `PersistentBashSession`; `ShellConfig`/`GateDecision` |
| `app/tools/shell_audit.py` | F10 audit hooks (`log_intent`/`log_outcome`) → `data/audit/shell-audit.jsonl` / `tool_audit` table |
| `docker/pdf-sandbox/Dockerfile` | F10 sandbox image (`python:3.12-slim` + pypdf + reportlab); built via the `pdf-sandbox` compose service |
| `skills/loader.py` | Walks `./skills/` for `SKILL.md` files; indexes them as the "skills" collection |
| `app/agents/<id>/agent_config.yaml` | Per-agent `system_prompt`, `soul_prompt`, `fallback_messages` |
| `app/agents/<id>/tool_policy.yaml` | Per-agent `core_tools`, `search_k`, `max_active`, `denylist` |
| `app/agents/<id>/mcp_servers.yaml` | Per-agent MCP server declarations (transport, command/url, allowlist, enabled flag) |
| `app/agents/<id>/guardrails_policy.yaml` | Per-agent named backends pool + input/output rules |
| `app/agents/<id>/compaction_policy.yaml` | Per-agent compaction policy (`enabled`, trigger/target band, retention, summarizer) |
| `app/agents/<id>/subagent_policy.yaml` | Per-parent spawn policy (`enabled`, `allowed_children`/`denylist`, `max_depth`, spawn/parallel caps) |
| `app/agents/<id>/webfetch_policy.yaml` | Per-agent web-fetch policy (`enabled` master flag, SSRF/scheme/host controls, content cap) |
| `app/agents/<id>/fs_policy.yaml` | Per-agent filesystem-tools policy (`enabled`, `confine`, read byte/line caps, grep/glob result caps, `ignore_dirs`) |
| `app/agents/<id>/shell_policy.yaml` | Per-agent shell policy (`enabled`, `mode` host/sandbox, denylist/allowlist, timeout, output cap, env passthrough, sandbox launcher/image) |
| `app/agents/<id>/permissions.yaml` | Per-agent F06 tool-permission policy (allow/deny/ask globs; `shell:*` default-deny; remember scope) |
| `app/agents/<id>/hooks.yaml` | Per-agent F01 lifecycle hooks (events, glob matcher, Python/shell callable; e.g. `*run_shell` audit pre/post_tool) |
| `app/agents/<id>/handoff_policy.yaml` | Per-agent F07 human-handoff policy (`enabled`, operator prompt, timeout/fallback, `inject_as`) |
| `app/agents/<id>/attachment_policy.yaml` | Per-agent F02 multimodal policy (`enabled`, `max_image_bytes`, `allowed_image_types`) |
| `app/agents/<id>/memory_policy.yaml` | Per-agent F03 self-improving-memory policy (fact extraction / reflection) |
| `app/agents/<id>/voice_policy.yaml` | Per-agent F11 voice policy (`enabled`, STT/TTS backend + model) |
| `app/agents/<id>/queue_policy.yaml` | Per-agent F13 queue policy (in-flight always on; background worker default off) |
| `app/agents/<id>/meta.yaml` | Agent `name`, `description`, `enabled`, `model` |
| `app/agents/.subagents/<id>.agent.md` | Single-file subagent definition (frontmatter persona/tools/skills + Markdown system prompt) |
| `app/agents/registry.yaml` | Records the default agent id |

### Guardrails

Policies are declared in the agent's `guardrails_policy.yaml`. Each rule references a named backend from the `backends:` pool. Multiple backends run concurrently; first block in declaration order wins. Rules support `refusal_templates` lists for randomized refusals.

| Backend type | Description |
|---|---|
| `regex_injection` | Deterministic OWASP prompt-injection patterns + evasion detection (typoglycemia, Base64/hex, char-spacing); zero dependencies |
| `local_classifier` | HuggingFace text-classification (prompt injection detection) |
| `presidio` | Microsoft Presidio PII detection (bundled in core deps; spaCy model via `python -m app.cli.startup`) |
| `llama_guard` | Llama Guard 3 1B GGUF content safety; supports `block_categories` to scope which S-codes block (bundled in core deps; GGUF via `python -m app.cli.startup`) |
| `openai` | Any OpenAI-compatible API for LLM-based policy evaluation |

## Extending

**Add documents:**
```bash
python -m app.rag.indexer ./my-docs/
```

**Add a skill:**
```bash
mkdir -p skills/my-skill/references
# create skills/my-skill/SKILL.md
# (optional) add reference files to skills/my-skill/references/
python -m skills.loader
```

**Add an agent:**
```bash
python -m app.cli.manage_agents add my-agent --name "My Agent"
# Edit app/agents/my-agent/agent_config.yaml, tool_policy.yaml, etc.
```

**Add a subagent:**
```bash
python -m app.cli.manage_agents add-subagent my-scout --tools rag:search_rag --namespaces rag --skills deep-research
# Edit app/agents/.subagents/my-scout.agent.md (frontmatter tools/namespaces/skills + persona body)
```

**Add an MCP server:** add an entry to the agent's `app/agents/<id>/mcp_servers.yaml`, restart the agent.

**Add a guardrail rule:** edit the agent's `app/agents/<id>/guardrails_policy.yaml` — add a backend entry and a rule referencing it. No code changes needed.

**Change default model:**
```env
OPENROUTER_MODEL=anthropic/claude-sonnet-4-5
```

**Change model for a specific agent:**
```bash
python -m app.cli.manage_agents set-model <AGENT_ID> anthropic/claude-sonnet-4-5
```

**Enable reasoning:**
```env
OPENROUTER_REASONING_EFFORT=medium
```

**Configure agent persona:** edit `app/agents/<id>/agent_config.yaml` (`system_prompt`, `soul_prompt`, `fallback_messages`).
