Use architect mode
Spend on thinking once per turn, edit on a cheap model.
When to use it
Architect mode is the right fit when an architecture call has to happen but the actual edits are mechanical. Three concrete moments:
- Long BUILD sessions. The agent will spend a lot of tool calls on file edits whose intent is already clear. Splitting each turn into propose on the expensive model and edit on a cheap one cuts the bill without sacrificing the design call.
- Diff-shaped refactors. Rename a class, extract a helper, swap an event handler — the architect explains where, the editor applies it.
- Tight per-edit loops. Combined with
Ralph Loop and
--print, every iteration runs the architect once and the editor as many times as the diff needs.
How to use it
> /architect
**Architect mode on.** Architect: `claude/claude-opus-4-7` —
Editor: `claude/claude-haiku-4-5-20251001`. Each turn now runs as
*propose → edit*; both passes appear separately in `/cost`.
> Add ops-tracing to src/charm.py.
... agent's response is the editor's edits, not the architect
prose — the architect proposal is captured in the
transcript ...
> /cost
| Provider/Model | Requests | Prompt | Completion |
|-------------------------------------|----------|--------|------------|
| claude/claude-opus-4-7 | 1 | 2,400 | 410 |
| claude/claude-haiku-4-5-20251001 | 1 | 2,820 | 180 |
> /architect off
**Architect mode off.** Single-model conversation resumed.
Override the editor with a second token in the slash command:
> /architect on gemini/gemini-3-flash-preview
> /architect on gemini # provider only; uses the default model
> /architect on claude/claude-haiku-4-5-20251001
At session start, the same options live on the CLI:
$ cantrip run --architect .
$ cantrip run --architect --editor-provider claude --editor-model claude-haiku-4-5-20251001 .
$ cantrip run --print --ralph 5 --architect "add ops-tracing to this charm"
How it works
When architect mode is on, every LLM call inside the conversation loop runs as two passes:
- Architect pass. The main provider runs without tools. A short SYSTEM instruction asks for a plain-prose proposal — which file(s), what to change, why. The architect cannot emit tool calls because it doesn't have any.
- Editor pass. The editor provider runs
with the full tool list. The architect's proposal is
appended as a synthetic USER message wrapped in
<architect_proposal> ... </architect_proposal>. The editor's job is to translate the proposal into the concretewrite_file/edit_file/multi_editcalls.
Both passes write to the session's
token_usage table with their own provider/model
attribution, so /cost shows two rows per turn
— the architect's expensive prompt + the editor's smaller
one. Both passes also fire transcript events
(architect_pass / editor_pass) so
auditors can replay the design call when reviewing what the
agent did.
Picking the editor
Three resolution rules apply, top-to-bottom:
- Per-session override. When you set
state.editor_providervia/architect on <provider>[/<model>]or--editor-provider/--editor-model, the editor is constructed on demand from those values. A bad combination logs at WARNING and falls through to the next rule. - The session's light provider. When you started Cantrip with a configured light provider (see light models), that provider acts as the editor. This is the recommended cheap-and-cheerful default.
- Fallback to the main provider. When no lighter variant exists, the editor runs on the architect's provider. The dual-pass shape stays — you still get the proposal-then-edit transcript — but there is no cost saving.
Fall-through after editor failures
A weak editor can stall when the architect's proposal is
ambiguous. Cantrip tracks a per-turn counter
(state.architect_consecutive_failures) that ticks
every time an editor pass produces only failed tool calls and
resets on a successful round. When the counter reaches
state.architect_failure_threshold (default 2) the
next editor pass uses the architect provider as the
editor — the same model that wrote the proposal applies
it. Counter resets on the next user turn so a single sticky
problem doesn't escalate every subsequent round.
Status indicator
Every surface that shows a status bar repaints when architect
mode toggles. The TUI tints with the same
STATUS_BAR_CHANGED event the
plan-mode and
yolo-mode indicators use, so
a Web or CLI surface that already listens picks up the new
mode without extra wiring.
Interaction with other guards
- Plan mode. The architect pass still respects the plan-mode tool allow-list because the editor is the one that picks up tools. Plan mode + architect mode composes — the architect proposes, the editor's tool calls hit the plan-mode gate just like any other tool call would.
- Permissions. The editor runs through the
permission stack
exactly like a single-model session.
denyrules block,askrules park (or auto-approve under--yolo). - Hooks. Pre-tool hooks fire on the editor's tool calls, not the architect's prose — the architect doesn't emit any.
- /undo. Snapshots are taken at the
user-turn boundary, so
/undorolls back the whole turn (architect proposal + editor edits) atomically. - Streaming. Architect mode runs the dual-pass turn synchronously and yields the editor's response as a single chunk. You lose token-by-token streaming inside an architect-mode session in exchange for the dual-pass cost split.
What architect mode is not
- Not a multi-provider router. Picking and ranking across many models is what multi-model patterns (race, arena, oracle) cover. Architect mode is the simple two-model case.
- Not a chain-of-thought wrapper. The architect's proposal is recorded in the transcript but the editor sees only the proposal text — no hidden reasoning trace.
- Not a quality gate. A weak editor that emits unapplyable edits still emits unapplyable edits; the fall-through path catches the obvious case but bad models stay bad.
- Not persistent. The mode is session-scoped. Restarting Cantrip drops the flag back off.