Phase 6 — Observability UI

Core change: Each tool/agent gets its own floating window that appears automatically when the tool starts. Windows can be open simultaneously alongside the conversation. No page flipping. The main conversation window becomes leaner — just the chat view and input.

Design Principles

Window Layout

Activity view (default)

search_memory
→ request
query: eggs preference
← result
You like scrambled eggs [quality: 4/5]
→ request
query: coffee preference
← result
(no results)
web_search
→ request
query: weather Tokyo today
← result
1. weather.com/Tokyo…
2. accuweather.com…
critic
● 4/5 respondent=A
Q: What is the capital…
Accurate and concise…
⟺ pairwise winner=A

Settings view (after tapping ⚙)

search_memory — settings
embed_model: nomic-embed-text n_results: 5 critic_score_weight: 0.05
Save

← returns to the activity view. Save writes the file and publishes CONFIG_RELOAD so the tool reloads on next invocation.

MemoryWindow — Browse + Semantic Search

Two modes toggled by a button in the header. Both share the same table; the mode determines what populates it.

ModeSourceSortTrigger
Browse memory_service.list_episodic(n=100) Newest first Refresh button or on window open
Search memory_service.search_episodic(query) Ranked by similarity + critic score bias — same order Gemma sees Enter in search field

Layout

┌─────────────────────────────────────────────┐
│  🧠 Memory        [Browse] [Search] [Refresh]│   ← header
├─────────────────────────────────────────────┤
│  ┌─ search field (visible in Search mode) ─┐│
│  │  eggs preference              [Search]  ││
│  └─────────────────────────────────────────┘│
├──────┬──────┬───────┬───────┬────────┬──────┤
│ Age  │ Resp │ Score │ Senti │ Winner │ Query│   ← table
├──────┼──────┼───────┼───────┼────────┼──────┤
│ 2m   │  A   │  4/5  │  👍   │  ✓     │ What…│
│ 8m   │  B   │  3/5  │  —    │  ✗     │ What…│
└──────┴──────┴───────┴───────┴────────┴──────┘
┌─────────────────────────────────────────────┐
│  Q: What do you like for breakfast?         │   ← detail pane
│  A: Based on your memory, you like scrambled│   (click a row)
│     eggs…                                  │
└─────────────────────────────────────────────┘

Reactive Window Spawning

Tools already publish tool.schema on startup. The UI's BusLogger already receives this. The change: when a new tool schema arrives for a tool we haven't seen before, create a ToolWindow and call show().

# In BusLogger / MainWindow._register_tool_schema():
def _on_tool_schema(self, schema: dict) -> None:
    name = schema.get("function", {}).get("name", "")
    if name not in self._tool_windows:
        win = ToolWindow(tool_name=name, publisher=self._publisher)
        win.show()
        self._tool_windows[name] = win
    # route future activity events to this window

The hardcoded _TOOL_DEFS list in main_window.py is removed. The QStackedWidget pages for tools are removed. Sidebar tool nav buttons are removed.

Critic window

Critic is a system agent, not a tool — it doesn't publish tool.schema. Its window spawns at app startup unconditionally (same time as the conversation window), since critic is always running when the stack is up. It subscribes to critique.result and pairwise.result via BusLogger signals.

Memory Inspector window

Memory Inspector is also spawned at startup. It is not reactive to bus events — it's pull model (Refresh button queries MemoryService directly). It lives in its own floating window like the others.

New / Modified Files

FileChange
src/local/ui/tool_window.py New. Floating window with activity log + ⚙/← settings toggle (QStackedWidget internal to window: page 0=activity, page 1=YAML editor). Replaces tool_panel.py.
src/local/ui/critic_window.py New. Floating window for critic: activity log of absolute grades + pairwise results; ⚙ opens critic.yaml settings.
src/local/ui/memory_window.py New. Floating window: engram table (Age / Resp / Score / Sentiment / Winner / Query) + Refresh button + detail pane. Requires MemoryService. Needs list_episodic() on MemoryService.
src/local/ui/tool_panel.py Retained for now; ToolWindow extracts the same logic. Can be deleted once ToolWindow is proven.
src/local/ui/main_window.py Remove _TOOL_DEFS, remove tool pages from QStackedWidget, remove tool sidebar nav buttons. Add reactive spawn in _register_tool_schema(). Add memory_service param. Spawn CriticWindow + MemoryWindow at init.
src/local/services/memory_service.py Add list_episodic(n=100).
run_local.py Pass shared_memory to MainWindow.

ToolWindow internal layout

ToolWindow (QWidget, top-level window)
└── QVBoxLayout
    ├── Header bar (QHBoxLayout)
    │     ├── ← back button  [hidden on activity view]
    │     ├── title label
    │     └── ⚙ gear button  [hidden on settings view]
    └── QStackedWidget
          ├── page 0: activity log (QScrollArea → QVBoxLayout of QLabel rows)
          └── page 1: settings (QPlainTextEdit YAML + Save button)

Gear button → switch to page 1, show ←, hide ⚙.
Back button → switch to page 0, hide ←, show ⚙.

Activity log entry format

Each tool request/result pair appended as two lines:

[21:07:04.3]  → request
   query: eggs preference

[21:07:04.8]  ← result
   You like scrambled eggs [quality: 4/5]

Critic window uses the same pattern, different content:

[21:07:18.1]  ● 4/5  respondent=A
   Q: What is the capital of Japan?
   Feedback: Accurate and concise answer…

[21:07:35.0]  ⟺ pairwise  winner=A
   A: 46868de7  B: 8a1c2f39

MemoryService — list_episodic()

def list_episodic(self, n: int = 100) -> list[dict]:
    result = self._collection.get(
        where={"type": "episodic"},
        include=["metadatas", "documents"],
    )
    ids   = result.get("ids") or []
    docs  = result.get("documents") or []
    metas = result.get("metadatas") or []
    items = sorted(
        zip(ids, docs, metas),
        key=lambda x: x[2].get("timestamp", 0),
        reverse=True,
    )
    return [{"id": id_, "content": doc, "metadata": meta}
            for id_, doc, meta in items[:n]]

State Transition Visibility

Agents have explicit state machines (Generator, Critic, MemoryAgent) that cycle through states on every message. Currently those transitions are silent. Surfacing them in the agent's window gives a live picture of what the agent is doing — especially useful for seeing Critic's grading latency and MemoryAgent's ingest pipeline.

Mechanism — agent.transition bus subject

Each agent's _StateMachine accepts an optional on_transition callback. When set, the callback fires after every valid transition with (agent_id, from_state, action, to_state). The agent passes a method that publishes an agent.transition envelope.

# subjects.py
AGENT_TRANSITION = "agent.transition"

# In _StateMachine.transition():
def transition(self, action):
    ...
    self._state = next_state
    if self._on_transition:
        self._on_transition(from_state, action, next_state)

# In CriticAgent.__init__():
self._sm = _StateMachine(on_transition=self._publish_transition)

def _publish_transition(self, from_state, action, to_state):
    self._pub.publish(MessageEnvelope.create(
        subject=AGENT_TRANSITION,
        payload={"agent": self.AGENT_ID,
                 "from": from_state.value,
                 "action": action.value,
                 "to": to_state.value},
        ...
    ))

Same pattern for GeneratorAgent and MemoryAgent. Tools (WebSearchTool etc.) are synchronous and don't have state machines — their activity log already shows the full request/result pair, which is sufficient.

Display in agent windows

Transition lines appear in the activity log interleaved with request/result entries, in a dimmer colour so they don't dominate:

[21:07:04.0]  IDLE → RECEIVING           (dim grey)
[21:07:04.1]  RECEIVING → GRADING
[21:07:04.1]  ● 4/5  respondent=A        (bright — grade result)
[21:07:18.1]  GRADING → PUBLISHING
[21:07:18.2]  PUBLISHING → IDLE

The gap between RECEIVING → GRADING and the grade result line shows Prometheus latency directly — no need for separate timing instrumentation.

BusLogger change

BusLogger subscribes to AGENT_TRANSITION. When received, it routes to the correct window by payload["agent"]:

Files added for transitions

FileChange
src/local/protocol/subjects.pyAdd AGENT_TRANSITION = "agent.transition"
src/local/agents/critic_agent.pyPass on_transition callback to _StateMachine; publish AGENT_TRANSITION
src/local/agents/memory_agent.pySame pattern
src/local/agents/generator_agent.pySame pattern (optional — generator transitions are fast; include if useful)

Activity routing after reactive spawn

Currently _TOOL_ACTIVITY_SUBJECTS and _tool_panels dict route bus activity events to the right panel. After the change, _tool_windows dict serves the same purpose. The routing logic in _on_tool_activity() stays the same — swap dict name only.

Activity subjects (tool.activity.search_memory etc.) are still published by tools. The bus monitor subscription list must include them — no change there. The only difference is that events are routed to a ToolWindow instead of a ToolPanel.

Build Order

StepFileWhat
1memory_service.pyAdd list_episodic(n=100) + unit test
2subjects.py + agentsAdd AGENT_TRANSITION; instrument _StateMachine in CriticAgent + MemoryAgent with on_transition callback → publish agent.transition
3src/local/ui/tool_window.pyNew ToolWindow: activity log + ⚙/← settings toggle + YAML save
4src/local/ui/critic_window.pyNew CriticWindow: grade/pairwise activity log + state transitions + critic.yaml settings
5src/local/ui/memory_window.pyNew MemoryWindow: Browse mode (list_episodic, newest first) + Search mode (search_episodic, semantic ranked); shared table; detail pane on row click; QThread worker for search to avoid UI block
6main_window.pyRemove _TOOL_DEFS + stack pages + sidebar tool btns; reactive spawn on tool.schema; spawn CriticWindow + MemoryWindow at init; pass memory_service; BusLogger subscribes to AGENT_TRANSITION + pairwise.result
7run_local.pyPass shared_memory to MainWindow

Story S8

  1. Start stack — tool windows for search_memory, web_search, web_fetch appear automatically; CriticWindow and MemoryWindow open at startup
  2. Send a query — activity entries appear in the relevant tool windows in real time
  3. Click ⚙ on search_memory window — YAML settings appear; edit n_results, click Save — tool reloads; click ← to return to activity
  4. After ~30s: CriticWindow shows grade entry with score and feedback snippet
  5. Click Refresh in MemoryWindow — engram appears with Age, Score, Respondent columns populated
  6. Give a 👍 on the response — Refresh MemoryWindow — Sentiment shows 👍