Run diagnostics and review
Two complementary inspectors: /diagnostics runs the deterministic linters, /review runs prompt-based checks for the “an experienced human would notice this” class of issues.
Two surfaces, two scopes
/diagnostics | /review | |
|---|---|---|
| What runs | ruff, ty, and (for charms) charmlint |
Every loaded prompt-based Check, then linter diagnostics underneath |
| How it decides | AST/regex rules. Deterministic. | One structured LLM call per Check (the CHECK_RESULT schema). |
| Cost | Local CPU only. | One LLM call per loaded Check (typically 3–10). |
| When to reach for it | “Is the code well-formed?” | “Would a reviewer flag this charm's design?” |
Both run against the active charm, not the whole repo. Both cap their output so a noisy charm doesn't blow the chat panel.
/diagnostics
cantrip> /diagnostics
charmlint:
ERROR metadata.yaml:5 unknown integration interface "promethus"
ruff:
E501 src/charm.py:42 line too long (132 > 120)
F401 src/charm.py:1 unused import "ops.testing"
ty:
<clean>
Issues are grouped by tool and severity. Output is capped at
~1500 tokens; over-budget charms get a
“N more issues suppressed” footer.
Results cache for 30 seconds, so a follow-up
/diagnostics in the same chat turn is free. Pass
--refresh to bust the cache after editing files
outside the agent.
Tools that aren't installed are listed as
[skipped] rather than treated as silent passes
— a missing ty doesn't look the same as
“all clear.”
Diagnostics also run pre-turn
The same aggregator runs automatically when the autonomous loop starts a BUILD or DEBUG subagent, so the agent begins each task already knowing what's broken. Result: the agent's first move on a debug task tends to be “here are the four type errors I'll fix” rather than “let me run the linter.”
The @problems
mention exposes the same
cache, so you can drop current diagnostics into a steering
message without re-running anything:
cantrip> tighten the type errors first: @problems
/review
/review runs every loaded prompt-based Check.
Each Check is one structured LLM call — the
CHECK_RESULT schema constrains the reply to
{status, severity, message, evidence?, suggested_fix?}
— so the report is uniform regardless of which model
you're using.
cantrip> /review
FAILED charm-readme-coherence warning
README claims a `prometheus` relation but metadata.yaml only
declares `loki`. Suggested fix: align README with metadata.
PASSED action-ergonomics
PASSED relation-data-hygiene
Deterministic checks
charmlint: 1 error in metadata.yaml
ruff: 2 issues across 1 file
ty: clean
Failures appear first, then errors (couldn't reach a
verdict), then skipped (no matching files), then passes.
The linter pass underneath is the same /diagnostics
output, so one combined run answers both
“is the code well-formed?” and “does it
hang together?”
What ships by default
Three Checks are bundled:
charm-readme-coherence- README claims align with the actual charm shape: relation interfaces, config options, supported actions, listed endpoints.
action-ergonomics- Action names follow the verb-noun convention; descriptions are non-empty and not just restate the name; required parameters carry sensible defaults where possible.
relation-data-hygiene-
Relation data writes use stable keys, validate on read,
and don't leak secrets that should live in
juju secrets.
Add your own Check
Checks are loaded from three layered locations (later wins on name conflict):
- Bundled defaults shipped with Cantrip.
~/.config/cantrip/checks/*.md— user scope.<charm>/.cantrip/checks/*.md— repo scope.
Each Check is YAML frontmatter plus a markdown body. See
design/CHECKS.md
for the schema. Rough boundary against the linters:
charmlint is the right home for AST/regex rules;
/review is the right home for “an
experienced human would notice this is off but you can't
write it as a regex.”
A typical workflow
- Mid-build — let the autonomous loop's pre-turn diagnostics carry the linters. You don't run anything by hand.
-
Before asking for next-step changes —
run
/diagnosticsif you've been editing files outside the agent (use--refreshto bust the 30-second cache). -
Before pushing or opening a PR — run
/review. This is the one that catches the cross-cutting issues a linter cannot: README drift, action ergonomics, relation-data hygiene, plus any repo-scoped Checks you've added.
Caveats
-
/reviewcalls the LLM once per loaded Check. Costs scale with the number of Checks; a small repo with the three defaults is a few cents per run, but a ten-Check setup adds up across a session. The/costcommand makes the spend visible. -
Prompt-based Checks are non-deterministic. Treat a
FAILEDverdict as “a competent reviewer flagged this” — worth investigating, not always worth acting on. Theevidencefield tells you which lines drove the verdict. -
Diagnostics caches for 30 seconds. In an agent loop that
edits files repeatedly within a single turn, the second
/diagnosticsmay be stale by milliseconds. Use--refreshwhen correctness matters more than latency.