Configure light models
Use cheaper models for internal tasks to reduce cost while keeping the primary model for code generation.
How cost routing works
Cantrip categorises tasks by how much model capability they need:
| Task type | Model used | Examples |
|---|---|---|
| Code generation | Primary | Writing charm code, designing integrations |
| Research | Light | Summarising docs, web search analysis |
| Test aggregation | Light | Parsing test output, deciding pass/fail |
| Log queries | Light | Analysing Loki logs, Tempo traces |
| Context compaction | Light | Summarising long conversations |
Light model from the same provider
The simplest setup uses a cheaper model from the same provider:
$ cantrip --provider gemini --light-model gemini-2.0-flash
Or with Claude:
$ cantrip --provider claude --light-model claude-haiku-4-5
Light model from a different provider
For maximum cost savings, use a local inference snap for light tasks while keeping a cloud model for the primary work:
$ cantrip --provider claude \
--light-provider inference-snap \
--light-snap nemotron-3-nano
This sends code generation to Claude but runs research, compaction, and log analysis locally with no API cost.
Automatic detection
If you omit --light-model, Cantrip automatically selects
a lighter model from the same provider when one is available. You
only need to configure it explicitly when you want to override the
default choice or use a different provider.