Phase 14 — Conversation Compaction

Long sessions accumulate messages until the silent 20-turn truncation kicks in, losing early context. This phase adds a visible context gauge in the sidebar and a manual compaction action. Pressing the gauge asks Gemma to summarize the session so far, replaces the message history with a compact summary plus a verbatim tail, and reports tokens freed.

1 — Token Tracking (no new dependencies)

GeneratorAgent ConversationService

Ollama's streaming API delivers token counts on the final chunk of each iteration via chunk.prompt_eval_count (tokens sent to the model) and chunk.eval_count (tokens generated). We capture the last iteration's prompt_eval_count — this reflects the true context size for that turn including all tool-call results.

_generate() return signature change

def _generate(...) -> tuple[str, str, list[dict], int]:
    # existing loop, plus:
    last_chunk = None
    for chunk in ollama.chat(..., stream=True):
        last_chunk = chunk
        # ... existing handling unchanged ...

    prompt_tokens = getattr(last_chunk, "prompt_eval_count", 0) or 0
    # return added as 4th element
    return answer, thinking, tool_call_log, prompt_tokens

ConversationService additions

# New field in session entry (default 0 on migration / new sessions)
"token_count": int

# New methods:
def set_token_count(self, session_id: str | None, count: int) -> None
def get_token_count(self, session_id: str | None) -> int

After generation, RespondentA stores the count:

self._conv.set_token_count(session_id, prompt_tokens)

token_count is included in the RESPONSE_GENERATION payload so the UI can update the gauge without polling.

2 — Context Gauge Widget

UI

A small arc gauge sits in the sidebar between the 💬 (conversations) and ⟳ (new session) buttons. It is a custom QWidget subclass named ContextGauge.

Sidebar (left column, top-to-bottom): [ 💬 ] ← open ConversationsWindow [ ◔ ] ← ContextGauge (arc fill, 40×40px) [ ⟳ ] ← new session [ ◉ ] ← toggle critic

ContextGauge

class ContextGauge(QWidget):
    compact_requested = Signal()   # emitted on click

    def __init__(self, num_ctx: int, parent=None)
    def set_tokens(self, count: int)      # called from MainWindow on RESPONSE_GENERATION
    def paintEvent(self, event)           # draws arc + center text
    def mousePressEvent(self, event)      # emits compact_requested

Arc drawing: QPainter.drawArc() with span = fill * 360 * 16 (Qt uses 1/16th-degree units). Arc sweeps counter-clockwise from 12 o'clock. Center shows a compact token count (e.g. 47K).

Color thresholds:

FillColor
< 60%#7ec8a4 (green)
60–85%#c8a47e (amber)
> 85%#c87e7e (red)

Tooltip: "47,200 / 128,000 tokens — click to compact"

Decision: gauge shows 0 until first generation turn. Before any turn the count is unknown; the arc stays empty (hollow ring) with no center text. This is honest — we don't guess.

3 — Compaction Flow (bus-routed)

Bus GeneratorAgent UI

Compaction is routed over the bus so GeneratorAgent owns all Ollama calls — consistent with the architecture invariant.

New subjects (subjects.py)

COMPACTION_REQUEST = "compaction.request"
COMPACTION_RESULT  = "compaction.result"

Sequence

User clicks gauge → MainWindow emits COMPACTION_REQUEST {session_id} → GeneratorAgent._handle_compaction() - gets messages from self._conv.get_history(session_id) - calls Ollama (non-streaming, no tools) with compaction prompt - builds new message list: [summary_msg] + last N verbatim turns - calls self._conv.replace_messages(session_id, new_messages) - publishes COMPACTION_RESULT {session_id, tokens_before, tokens_estimated_after, summary} → MainWindow._on_compaction_result() - appends marker to log - resets gauge to estimated post-compaction count

Compaction prompt

COMPACTION_SYSTEM = (
    "You are a conversation summarizer. "
    "Produce a concise factual summary of the conversation below. "
    "Capture: the user's goals, key facts established, decisions made, "
    "open questions, and any user preferences or constraints stated. "
    "Be terse. Omit pleasantries and filler. "
    "Output only the summary — no preamble, no 'Here is a summary:'."
)

The call is a separate ollama.chat() with stream=False and no tools — a one-shot summarization that does not touch the live conversation history until it succeeds.

replace_messages() (new ConversationService method)

def replace_messages(self, session_id: str | None, messages: list[dict]) -> None:
    """Atomically replace the message list for a session (used by compaction)."""

Tail length

Config key: compaction_tail_turns: 4 in config/generator.yaml. Turns kept verbatim after the summary = tail_turns * 2 messages (user + assistant pairs). Tool call messages in the tail are included as-is.

Result message list shape

[
  {"role": "assistant", "content": "[SUMMARY] The user is building LoCAL2 ..."},
  {"role": "user",      "content": "...turn N-3 user..."},
  {"role": "assistant", "content": "...turn N-3 assistant..."},
  ... (tail_turns pairs)
]
The synthetic summary message uses role: "assistant" because Gemma expects alternating user/assistant turns. A system-role summary would work too, but assistant is safer across model versions.

Tokens-freed estimate

tokens_before = last stored token_count from ConversationService.
tokens_estimated_after = character count of new message list ÷ 4 (heuristic, shown as ~).
Log marker: ── compacted: 82,400 → ~14,200 tokens (68,200 freed) ──

4 — Log Marker

UI

The compaction marker is appended to the main chat log using the same append_log() method used for session rejoins — a centred, dimmed separator line with the token delta.

── compacted: 82,400 → ~14,200 tokens  (68,200 freed) ──

The gauge fill drops visually to the estimated post-compaction level immediately. After the next generation turn the gauge snaps to the exact Ollama-reported count.

5 — Files Changed

FileChange
src/local/protocol/subjects.pyAdd COMPACTION_REQUEST, COMPACTION_RESULT
src/local/services/conversation_service.pytoken_count field; set/get_token_count(); replace_messages()
src/local/agents/generator_agent.pyCapture prompt_eval_count; subscribe to COMPACTION_REQUEST; _handle_compaction(); remove remaining debug print (line 134)
src/local/ui/main_window.pyContextGauge widget; sidebar wiring; _on_compaction_result()
config/generator.yamlAdd compaction_tail_turns: 4
tests/test_conversation_service.pyTests for set/get_token_count, replace_messages
tests/test_generator_agent.pyTest prompt_eval_count stored after generation

6 — Out of Scope (Phase 14)

7 — Open Questions