Response schemas

Cantrip ships JSON schemas for the recurring structured-output shapes its callers consume. Pass one to `complete_structured()` (or directly to `provider.complete(response_schema=…)`) when you want a parseable reply rather than free text.

When to use a schema

Pass a response_schema when the caller will parse the reply rather than show it to the user. Recipes consume planner briefings. The oracle returns a structured second opinion that downstream code feeds back into decision points. Acceptance reports populate test matrices. Free-form prose stays untyped — schemas are for data, not conversation.

Schemas are plain dicts matching JSON Schema draft 2020-12 — the same surface every supported provider already accepts. No Pydantic, no attrs, no DSL. Pass a built-in or hand-roll your own.

Built-in schemas

Importable from cantrip.llm.schemas. Resolve by name through BUILTIN_SCHEMAS for caller paths driven by config (recipes, settings).

PLANNER_BRIEFING
Output of a planner LLM call: a list of work-queue tasks each with a title, a category (one of research / build / deploy / test / debug / infra / confirm — mirrors cantrip.agent.queue.TaskCategory), an optional description and dependency list.
ORACLE_ANSWER
Shape of an oracle reply when the caller wants more than free-form prose: answer (required), optional confidence in [0, 1], and lists of caveats and references.
CHECK_RESULT
Output of a prompt-based "Check" — the LLM evaluates a named rule against the active charm and returns status: pass | fail, a message, and optionally severity, evidence, and a suggested_fix.
ACCEPTANCE_REPORT
Acceptance-test report — what the agent produces after exercising a deployed charm. app and overall_status (pass | fail | partial) are required; coverage records which areas were exercised; findings is a list of issues to surface.

Provider matrix

Native enforcement is an optimisation — Cantrip-side validation runs regardless, so providers without native support still satisfy the contract via the corrective-retry path.

Provider Native enforcement Wire mechanism
Gemini Yes response_mime_type=application/json + response_schema
OpenAI-compatible (vLLM, Fireworks, OpenRouter, OpenCode Zen, inference-snap) Yes (where the backend supports it) response_format: {type: json_schema, json_schema: {...}}
Anthropic Claude No Argument accepted but ignored; relies on Cantrip-side validation.

The provider.supports_response_schema property distinguishes the two so callers that require native enforcement can short-circuit early.

Calling complete_structured

The high-level entry point lives at cantrip.llm.structured.complete_structured. It calls the provider with the schema, parses the reply, validates it, and returns a dict — or raises StructuredOutputError on unrecoverable failure.

from cantrip.llm.schemas import ORACLE_ANSWER
from cantrip.llm.structured import complete_structured

answer = await complete_structured(
    provider,
    messages=[
        Message(role=Role.SYSTEM, content="You are an architecture oracle."),
        Message(role=Role.USER, content="Should this charm use Pebble or systemd?"),
    ],
    schema=ORACLE_ANSWER,
)
print(answer["answer"])           # always a string
print(answer.get("confidence"))   # optional, in [0, 1] if present

For the lower-level path, pass response_schema=… directly to provider.complete() and validate manually with cantrip.llm.structured.validate_against_schema. Use this when you need provider-specific kwargs (custom thinking_budget, tool choices) the helper doesn't expose.

Validation and retry

Validation strips wrapping ```json code fences, parses the result with json.loads, and runs jsonschema.validate against the schema. Failures raise StructuredOutputError carrying the raw text, the schema, and the underlying parser or validator error.

complete_structured retries once by default (retries=1). On failure it appends the malformed reply as an assistant turn and a corrective USER turn that quotes the schema and the validation error, asking the model to emit valid JSON. Set retries=0 for one-shot calls (CI), or higher when burning extra tokens to coax a recalcitrant model is acceptable.

When all retries are exhausted, the last error is raised so the caller can surface the most recent malformed output to the user — earlier attempts are discarded.