Use architect mode

Spend on thinking once per turn, edit on a cheap model.

When to use it

Architect mode is the right fit when an architecture call has to happen but the actual edits are mechanical. Three concrete moments:

How to use it

> /architect
**Architect mode on.**  Architect: `claude/claude-opus-4-7` —
Editor: `claude/claude-haiku-4-5-20251001`.  Each turn now runs as
*propose → edit*; both passes appear separately in `/cost`.

> Add ops-tracing to src/charm.py.

... agent's response is the editor's edits, not the architect
prose — the architect proposal is captured in the
transcript ...

> /cost
| Provider/Model                      | Requests | Prompt | Completion |
|-------------------------------------|----------|--------|------------|
| claude/claude-opus-4-7              | 1        | 2,400  | 410        |
| claude/claude-haiku-4-5-20251001    | 1        | 2,820  | 180        |

> /architect off
**Architect mode off.**  Single-model conversation resumed.

Override the editor with a second token in the slash command:

> /architect on gemini/gemini-3-flash-preview
> /architect on gemini             # provider only; uses the default model
> /architect on claude/claude-haiku-4-5-20251001

At session start, the same options live on the CLI:

$ cantrip run --architect .
$ cantrip run --architect --editor-provider claude --editor-model claude-haiku-4-5-20251001 .
$ cantrip run --print --ralph 5 --architect "add ops-tracing to this charm"

How it works

When architect mode is on, every LLM call inside the conversation loop runs as two passes:

  1. Architect pass. The main provider runs without tools. A short SYSTEM instruction asks for a plain-prose proposal — which file(s), what to change, why. The architect cannot emit tool calls because it doesn't have any.
  2. Editor pass. The editor provider runs with the full tool list. The architect's proposal is appended as a synthetic USER message wrapped in <architect_proposal> ... </architect_proposal>. The editor's job is to translate the proposal into the concrete write_file / edit_file / multi_edit calls.

Both passes write to the session's token_usage table with their own provider/model attribution, so /cost shows two rows per turn — the architect's expensive prompt + the editor's smaller one. Both passes also fire transcript events (architect_pass / editor_pass) so auditors can replay the design call when reviewing what the agent did.

Picking the editor

Three resolution rules apply, top-to-bottom:

  1. Per-session override. When you set state.editor_provider via /architect on <provider>[/<model>] or --editor-provider / --editor-model, the editor is constructed on demand from those values. A bad combination logs at WARNING and falls through to the next rule.
  2. The session's light provider. When you started Cantrip with a configured light provider (see light models), that provider acts as the editor. This is the recommended cheap-and-cheerful default.
  3. Fallback to the main provider. When no lighter variant exists, the editor runs on the architect's provider. The dual-pass shape stays — you still get the proposal-then-edit transcript — but there is no cost saving.

Fall-through after editor failures

A weak editor can stall when the architect's proposal is ambiguous. Cantrip tracks a per-turn counter (state.architect_consecutive_failures) that ticks every time an editor pass produces only failed tool calls and resets on a successful round. When the counter reaches state.architect_failure_threshold (default 2) the next editor pass uses the architect provider as the editor — the same model that wrote the proposal applies it. Counter resets on the next user turn so a single sticky problem doesn't escalate every subsequent round.

Status indicator

Every surface that shows a status bar repaints when architect mode toggles. The TUI tints with the same STATUS_BAR_CHANGED event the plan-mode and yolo-mode indicators use, so a Web or CLI surface that already listens picks up the new mode without extra wiring.

Interaction with other guards

What architect mode is not