How Cantrip works

Understanding the architecture helps you use Cantrip more effectively and interpret its behaviour when things go wrong.

Two concurrent loops

Most AI coding tools operate as chatbots: you send a message, the tool responds, you send another. Cantrip is different. It runs two loops concurrently:

This means the agent keeps working while you are reading its output, thinking, or not interacting at all. You steer; the agent drives.

Why two loops?

Building a charm involves many steps that do not require human input: reading documentation, scaffolding files, running tests, diagnosing failures. Waiting for the user between each step would waste time. The two-loop design lets the agent be autonomous where it can and interactive where it must (design confirmation, domain questions, steering).

The work queue

The work queue is the coordination layer between the two loops. It holds AgentTask objects, each with:

The executor picks the next ready task (one whose dependencies are all satisfied), spawns a subagent to handle it, and records the result. When a task completes, it unblocks any tasks that depend on it.

Tasks are created by the planner, which uses the LLM to decompose a high-level intent ("build a Redis charm") into concrete tasks with dependencies. The user's conversation loop can also create tasks — for example, when you say "add backup support", the agent plans new tasks and adds them to the queue.

Concurrency

By default, up to three subagents run concurrently (--concurrency flag). This means the agent can research documentation while scaffolding files while running tests, as long as there are no dependency conflicts.

Subagents

Each background task runs in a subagent — an isolated LLM conversation with its own context and a focused set of tools. Subagents are important for two reasons:

  1. Context isolation. A research subagent reading hundreds of lines of documentation does not pollute the main conversation's context window. The subagent summarises its findings into a compact result.
  2. Focused tools. A research subagent does not need Juju deployment tools. A deploy subagent does not need web search. Limiting the tool set reduces confusion and cost.

Subagents have a limited number of rounds (typically 8–12 tool calls) and category-specific timeouts. If a subagent exceeds its budget, it is terminated and the task is marked as failed, which can trigger a retry or diagnostic task.

Cost routing

Tasks are routed to different models based on their category. Research and log analysis go to the light model (cheaper, faster). Code generation and design go to the primary model (more capable). See Configure light models.

Context management

Long sessions accumulate a lot of context: conversation messages, tool results, research findings. Cantrip manages this in two ways:

Session persistence

All state is saved to a .cantrip SQLite file in the charm directory. This includes conversation history, task queue, design decisions, and token usage metrics. If Cantrip crashes or you stop it, you can resume by running cantrip in the same directory. The agent picks up where it left off, with the work queue intact.

See also: