<model-selector>
  <instruction>
    When this file is referenced with @model-selector.txt, you MUST:
    1. Execute the requested task in full — write the roadmap, plan, or
       whatever the user asked for
    2. Read docs/user-context.md to learn the user-specific subscription
       state, API keys, and platform preference order. These are the
       inputs the access-selection step consumes; without them the
       PLATFORM and THINKING fields in the output cannot be filled
       truthfully
    3. For every prompt or step you write as part of that task, append a
       model selection block immediately before it using the criteria in
       this file (objective, pricing, Max Mode, thinking, benchmark
       sources, task categories, model options, access methods,
       selection algorithm, access selection, and conversation
       principles)
    4. The selection block is part of the task output, not a replacement
       for it
  </instruction>

  <usage>
    Reference this file alongside any task. The AI performs the task and
    annotates each prompt it writes with the appropriate model selection
    block drawn from the criteria in this file. Each block reports
    MODEL, PLATFORM, MAX MODE, THINKING, CONVERSATION, and RATIONALE —
    model choice from `<selection-algorithm>`, platform (access method)
    choice from `<access-selection>`, with the user-specific subscription
    and API-key state read from docs/user-context.md.
  </usage>

  <objective>
    DEFAULT POSTURE — Maximize quality. Recommend the highest-quality model
    whose strengths match the prompt's task type, regardless of cost. If Opus
    4.7 in Max Mode is the most appropriate fit for a given prompt, recommend
    Opus 4.7 in Max Mode.

    SECONDARY (tie-breaker only): When two or more models are tied in
    expected quality for the prompt's task type, recommend the one with the
    lower output price per 1M tokens.

    Under this default posture quality wins; cost only resolves true ties —
    never near-ties, never "close enough." The user is paying for access to
    every tier and expects the best outcome for each prompt.

    BUDGET-PRIORITY OVERRIDE: When the appended user-context declares a
    "Budget priority and speed posture", THAT posture is the user's explicit
    quality-vs-cost instruction for the request and OVERRIDES the default
    posture above. Honor whichever it states:
    - `cheap` (Cost) — minimize the user's REAL cost, not list price. If the
      user-context's "Active subscriptions" fund a capable model at $0 and it
      is an acceptable-quality fit, KEEP that model and lower its reasoning
      effort / thinking to the lowest level that clears the task — do NOT
      switch to a cheaper-tier model the user pays per-token for (that raises
      their real cost). Only when no $0-funded model is adequate, drop to the
      cheapest model that can do the task. This is the ONE case where the
      user's funding may change WHICH model is chosen (a deliberate, scoped
      exception to the `<access-selection>` rule that funding changes only the
      platform) — because abandoning a $0 model to lower a list price the user
      does not pay is not a real saving.
    - `balanced` (Balanced) — recommend the best value: prefer a $0-funded
      model when competitive, prefer the cheaper model when two are CLOSE in
      expected quality (not only on an exact tie), and use a sensible (not
      maxed-out) effort level; reserve the most expensive tiers for tasks that
      require them.
    - `best` (Quality) — the default quality-first posture above, at the
      highest USEFUL reasoning effort (`xhigh` / `max` where supported).
    Across the three priorities, when the chosen model is HELD (e.g. a
    $0-funded model fit for the task), reasoning effort / thinking is the
    cost-vs-quality axis: Cost = lowest-adequate, Balanced = sensible, Quality
    = highest-useful. When no posture is declared (legacy / direct callers),
    the default posture governs.
  </objective>

  <pricing-context>
    All prices below are per 1M tokens, sourced from Cursor's published API
    pricing. Use these prices solely as a tie-breaker after the quality
    decision is made.

    Cost interpretation:
    - Output price is the dominant cost driver for code generation, full
      implementations, comprehensive plans, and any long-form response.
    - Input price matters most when feeding large context — long files,
      repo-wide search results, multimodal payloads, or sprawling document
      corpora.
    - Cache-read price (typically ~10% of input) only matters for sustained
      sessions with reusable system prompts or persistent context.
    - Tier placement is based solely on output price per 1M tokens:
      Low &lt; $10, Medium $10–$14.99, High $15–$24.99, Very High ≥ $25.

    Routing meta-models (Cursor's "Auto" / "Premium" modes; analogous
    routers from other providers) are NOT enumerated in `<model-options>`.
    The catalog tracks fixed-engine models only — a routing model's
    benchmarks, jurisdiction, and cost are by construction unknowable in
    advance, which conflicts with the selector's per-model tier ratings
    and the `<jurisdiction-context>` filter. Users who want routing
    behavior should pick a specific fixed engine directly.

    The per-token rates above are only one dimension of cost. Access
    methods (see `<access-methods>`) bundle the same models behind
    subscriptions and shared token pools where the marginal cost per
    call is effectively $0 until the subscription budget is exhausted.
    docs/model-tier-cost-scale.md carries a "Subscription Tiers" section
    covering Cursor Pro/Ultra, claude.ai Max, ChatGPT Plus/Pro, Gemini
    Advanced, and similar flat-monthly plans. The `<access-selection>`
    step picks the cheapest effective path for the user's specific
    subscription state — burning sunk-cost subscription budget before
    pay-per-token spend is the default posture. This optimizes only HOW
    a model is reached (the PLATFORM), never WHICH model is chosen: a
    best-fit model the user does not yet fund is still recommended, with
    its cheapest reachable path and the required spend disclosed — never
    swapped for a cheaper-to-reach one (see `<access-selection>`).
  </pricing-context>

  <max-mode-context>
    Max Mode extends a model's context window to the maximum it supports,
    giving the model deeper codebase understanding and producing better
    results on complex tasks.

    Billing:
    - Token-based pricing at the model's API rate; consumes usage faster
      than the default context window.
    - Individual plans: billed at the model's API rate (no surcharge).
    - Teams plans: requests against fixed-model surfaces include the
      Cursor Token Rate.
    - Legacy request-based plans: Max Mode adds a 20% surcharge.

    Enable Max Mode when ANY of the following hold:
    - Complexity is High on the selection-algorithm scoring.
    - Primary or secondary task category is `long-context` (large repo,
      multi-file ingestion, full-codebase reasoning).
    - Task is a `planning` prompt with many interacting concerns or
      cross-cutting architectural decisions.
    - Prompt explicitly requires extended reasoning, deep multi-step
      analysis, or chain-of-thought across many files.

    Disable Max Mode for direct, bounded prompts — single-file edits,
    isolated bug fixes, well-defined refactors, simple questions, or any
    task where default context comfortably fits the inputs.

    Max Mode is a Cursor-surface concept. Access methods outside Cursor
    (Anthropic API, Claude Code, Codex, Google API, Gemini CLI,
    direct provider APIs) do not expose a Max Mode toggle; they either
    accept the model's full native context window by default or expose
    a different long-context surface. When the chosen PLATFORM is not
    a Cursor surface, MAX MODE in the output should read `Off` (or
    `N/A` if the model offers no equivalent extended-context mode).
  </max-mode-context>

  <thinking-context>
    Thinking (also called extended thinking, reasoning effort, or
    thinking budget) lets a model spend internal reasoning tokens before
    producing its visible response. Providers expose the toggle
    differently:

    - Claude (Anthropic API, Claude Code, claude.ai): per-session
      `/effort` level with documented values `low`, `medium`, `high`,
      `xhigh`, `max` (shown in the UI as Low / Medium / High / Extra
      High / Max). Not every model supports every level — `xhigh` and
      `max` apply only to the models whose documented row lists them
      (e.g. Opus 4.7, Opus 4.8, and Fable 5 expose the full
      low/medium/high/xhigh/max range; Opus 4.6 and Sonnet 4.6 top out
      at `max` without an `xhigh` step). An effort level a model does
      not support falls back to the highest supported level at or
      below it. Extended thinking can also be toggled with Option+T /
      Alt+T, `alwaysThinkingEnabled`, or `MAX_THINKING_TOKENS=0`
      (Fable 5 cannot disable extended thinking).
    - OpenAI (Codex, OpenAI API, ChatGPT advanced controls):
      reasoning-effort knob — `minimal`, `low`, `medium`, `high`,
      `xhigh` (the top "Extra High" tier; model-dependent). Higher
      effort spends more reasoning tokens before visible output.
    - Gemini (Google API, Gemini CLI): a discrete thinking-level
      knob — `low`, `medium`, `high` — across both the 3.x and 2.5
      model generations (not every model supports every level; e.g.
      Gemini 3 Pro is low/high only). Thinking can be turned off on
      models that allow it (e.g. Gemini 2.5 Flash-Lite defaults off);
      Google retired the numeric `thinkingBudget` from the docs in
      favor of these levels.
    - DeepSeek (DeepSeek API): a thinking toggle (`enabled` /
      `disabled`, default `enabled`) plus a reasoning-effort enum —
      `high`, `max` (default `high`; `max` for some complex agentic
      requests). DeepSeek's effort scale has no `low` / `medium` tier
      (for compatibility the API accepts `low` / `medium`, mapping
      both to `high`, and `xhigh`, mapping it to `max`). A consumer
      DeepThink app toggle exposes only thinking on/off, not the
      effort enum.
    - Mistral (Mistral API / La Plateforme): a `reasoning_effort`
      parameter on the unified models (Mistral Small 4 and Medium 3.5).
      The docs surface `none` (reasoning off / fast) and `high` (full
      step-by-step reasoning); Mistral's API is OpenAI-compatible, so
      intermediate `low` / `medium` may also be accepted. The former
      standalone reasoning models (Magistral) are folded into this dial;
      Codestral and the open-weight Large 3 do not expose it.
    - Cursor: usually inherits the underlying model's thinking
      behavior but does not expose the toggle in the IDE surface
      (true in both Composer mode and Chat mode).

    Output mapping (the THINKING field of the output format):
    `Off` / `Low` / `Medium` / `High` / `XHigh` / `Max` / `N/A`. Map
    provider-native scales onto this 7-state field:

    - Claude Code `/effort`: `low` → `Low`; `medium` → `Medium`;
      `high` → `High` (the docs' default effort on Opus 4.6, Opus 4.8,
      Fable 5, and Sonnet 4.6); `xhigh` → `XHigh` (the "Extra High"
      UI label, and the default for Opus 4.7); `max` → `Max` (the top
      "Max" UI level, ABOVE "Extra High") ONLY on models whose
      documented row exposes a `max` step above `xhigh` — Opus 4.7,
      Opus 4.8, and Fable 5. On models that reach `max` with NO `xhigh`
      step (Opus 4.6, Sonnet 4.6), `max` is their Extra-High top and
      maps to `XHigh`, not `Max`. Extended thinking disabled (Option+T
      / Alt+T, `MAX_THINKING_TOKENS=0`, or `alwaysThinkingEnabled`
      false) → `Off`; Fable 5 cannot map to `Off`.
    - OpenAI `minimal` → `Off`; `low` → `Low`; `medium` → `Medium`;
      `high` → `High`; `xhigh` / `extra-high` (the high-reasoning
      Codex / GPT variant, e.g. `gpt-5.3-codex-high`) → `XHigh`.
    - Gemini thinking levels: `low` → `Low`; `medium` → `Medium`;
      `high` → `High` (Gemini tops out at `high` — no `xhigh` tier).
      Thinking turned off, on models that allow it (e.g. Gemini 2.5
      Flash-Lite), → `Off`.
    - DeepSeek: thinking `disabled` → `Off`; `enabled` + effort
      `high` → `High`; `enabled` + effort `max` → `XHigh` (DeepSeek has
      no `xhigh` step, so its `max` is the Extra-High top, not the
      above-`xhigh` cross-provider `Max`). DeepSeek effort has no
      `low` / `medium` tier, so no DeepSeek level maps to `Low` or
      `Medium`. A consumer DeepThink on/off toggle (no
      effort enum) maps `On` → `High` (default effort) and `Off` →
      `Off`.
    - Mistral: `reasoning_effort` `none` → `Off`; `high` → `High`
      (Mistral's documented scale tops out at `high`, so no `XHigh`).
      If the OpenAI-compatible `low` / `medium` values are accepted
      they map to `Low` / `Medium`. A Mistral model without the dial
      (Codestral, Large 3) maps to `N/A`.
    - `N/A` when the chosen access method does not expose a thinking
      toggle (e.g. Cursor — neither its Composer mode nor its Chat
      mode surfaces the dial), regardless of whether the underlying
      model supports one.

    Two Claude Code controls that must NOT be conflated:
    - `ultracode` is a SESSION setting (set with `/effort ultracode` or
      `"ultracode": true`): it sends `xhigh` to the model AND has Claude
      orchestrate Dynamic Workflows for substantive tasks. It is the
      ORCHESTRATION `Ultracode` value (see `<orchestration-context>`),
      not a per-turn keyword and not an effort level in its own right.
    - `ultrathink` is a PER-TURN prompt keyword: include it anywhere in a
      single prompt for deeper reasoning on that turn only; it does NOT
      change the session effort level and does NOT orchestrate workflows.
      The phrases "think", "think hard", and "think more" are NOT
      recognized keywords.

    Decision rule (applied during `<access-selection>` Step E):
    - Overall complexity from `<selection-algorithm>` Step 2 Low →
      THINKING `Off`.
    - Overall complexity Medium → THINKING `Medium`.
    - Overall complexity High → THINKING `High`.
    - High complexity AND the prompt involves novel problem-solving,
      multi-step proof / verification, or chain-of-thought across
      many files (i.e., the conditions that would push
      `<selection-algorithm>` Step 3 to require S-tier in PRIMARY) →
      THINKING `XHigh`.
    - PRIMARY task category `planning` or `knowledge` with
      cross-cutting scope → bump THINKING up at least one level
      (`Off` → `Low`, `Low` → `Medium`, `Medium` → `High`, `High` →
      `XHigh`).
    - Ceiling — `Max`: the top output level, used ONLY on models whose
      `<model-options>` row exposes a `max` step ABOVE `xhigh` (Opus
      4.7, Opus 4.8, Fable 5). On such a model, prefer `Max` over
      `XHigh` when the appended user-context declares a Quality (`best`)
      budget posture, or for the most demanding tasks (the `XHigh`
      conditions above taken to their limit). Under a Cost (`cheap`)
      budget posture, never emit `Max` and do not exceed the lowest
      level that clears the task. Absent any budget posture, `XHigh`
      stays the default ceiling and `Max` is reserved for the genuinely
      hardest tasks on max-capable models. Never emit `Max` for a model
      whose row lacks a `max`-above-`xhigh` step.
    - Chosen access method's `exposes-thinking` attribute is `no` →
      THINKING `N/A`, overriding the above.

    Thinking and Max Mode are orthogonal: a Cursor call may have
    Max Mode On and THINKING `N/A` (Cursor does not expose the
    thinking toggle); a Claude Code call may have Max Mode Off and
    THINKING `High` (Anthropic's surface exposes thinking, not Max
    Mode).
  </thinking-context>

  <orchestration-context>
    Orchestration (Claude Code's Dynamic Workflows feature, shipped
    with Opus 4.8 in May 2026) lets the model fan a single prompt
    out across parallel subagents from a script Claude writes and
    the runtime executes. Up to 1,000 agents per run, 16 concurrent.
    Intermediate results live in script variables, not the model's
    context window. Workflows can adversarially cross-check
    findings before reporting.

    Providers expose orchestration differently:

    - Claude Code (CLI, IDE extension): per-prompt opt-in by
      including the word "workflow" in the prompt; session-wide
      opt-in via /effort ultracode. Ultracode pins reasoning at
      xhigh AND auto-authors a workflow for every substantive
      task in the session.
    - All other surfaces (Cursor, Codex, Gemini CLI, claude.ai
      web, ChatGPT app, direct APIs): no equivalent built-in
      orchestration primitive at time of writing.

    Output mapping (the ORCHESTRATION field of the output format):
    `None` / `PerPrompt` / `Ultracode` / `N/A`.

    - Claude Code default → `None` (single-agent turn-by-turn).
    - Claude Code with `workflow` keyword on this prompt only →
      `PerPrompt`.
    - Claude Code with `/effort ultracode` session-wide → `Ultracode`.
    - Any non-Claude-Code platform → `N/A`.

    Decision rule (applied during <access-selection>):
    - PRIMARY task category `planning` with cross-cutting scope AND
      overall complexity High AND chosen access method is Claude
      Code → recommend `ORCHESTRATION: Ultracode`.
    - PRIMARY task category `long-context` with multi-source
      cross-checking required (e.g., codebase audit, migration
      sweep, cited research) AND chosen access method is Claude
      Code → recommend `ORCHESTRATION: Ultracode`.
    - Single well-scoped deliverable (one file, one bug fix, one
      refactor) → `None` even on Claude Code.
    - Chosen access method's `exposes-orchestration` attribute is
      `no` → `N/A` regardless of the above.

    Cost note: Ultracode lifts the per-prompt token-cost ceiling
    ("token cost is not a constraint" per Anthropic's built-in
    framing). On claude.ai Max ($200/mo), per-call $ cost is $0
    marginal, but session budget burns 10-100x faster than High.
    Recommend Ultracode as a deliberate per-step opt-in, not as
    a default — pair with a session-budget-awareness clause in
    the rationale.

    Orchestration, thinking, and Max Mode are three orthogonal
    axes. Cursor + Max Mode + THINKING N/A + ORCHESTRATION N/A
    is valid. Claude Code + THINKING XHigh + ORCHESTRATION
    Ultracode is valid. Claude Code + THINKING XHigh +
    ORCHESTRATION None is also valid (Extra high effort, no
    auto-workflow).
  </orchestration-context>

  <jurisdiction-context>
    Some users restrict which model providers are acceptable based on
    the provider's HQ jurisdiction — typically driven by data-
    sovereignty, vendor-trust, regulatory-compliance, or export-control
    concerns. The selector supports this via the `jurisdiction`
    attribute on every `<model>` element and the
    `provider-jurisdiction` attribute on every `<method>` element,
    combined with an allowed-jurisdictions list the user supplies in
    `docs/user-context.md` (or the SaaS-side `profiles` row).

    Valid jurisdiction codes (ISO-3166-1 alpha-2-style, lowercase):

    - `us` — United States. Today: Anthropic, OpenAI, Google, xAI,
      Cursor.
    - `eu` — European Union member state. Today: Mistral (La
      Plateforme). In the default allowed list, so EU-operator models
      surface for any user holding the relevant provider key.
    - `uk` — United Kingdom.
    - `ca` — Canada.
    - `au` — Australia.
    - `jp` — Japan.
    - `kr` — South Korea.
    - `cn` — China. Today: Moonshot (Kimi), DeepSeek. Future
      Chinese-HQ entrants inherit this code.
    - `ru` — Russia. (No models on Cursor's pricing page from this
      jurisdiction at time of writing.)
    - `unknown` — provider HQ has not been editorially verified yet.
      Newly-auto-added models receive this code until the maintainer
      fills it in; the auto-add rule in `update/prompt.md` emits a
      warning so these don't ship silently.

    Default allowed list (assumed when `docs/user-context.md` carries
    no `<allowed-jurisdictions>` section):
    `[us, eu, uk, ca, au, jp, kr]` — a "five eyes plus close-aligned
    democracies" baseline. Users add or remove entries to widen or
    narrow.

    The base weights of a model and the operator of a model may
    carry different jurisdictions. The `jurisdiction` attribute
    reflects the OPERATOR — the entity whose terms govern the data
    flow when a call is placed. Composer 2 / Composer 2.5 are
    `us`-jurisdiction because Cursor operates them, even though
    their base weights derive from Moonshot's Kimi K2 series; the
    data path is governed by Cursor's privacy policy and US law.
    When base-weights origin matters for a user's compliance
    posture, the `best-for` attribute discloses the lineage so the
    user can decide whether to widen the filter further.

    Routing meta-models (e.g., Cursor's "Auto" and "Premium" modes;
    OpenRouter-style routers; any "router-of-routers") are NOT
    enumerated in `<model-options>` precisely because their routing
    is opaque — the selector cannot guarantee a specific call's
    jurisdiction without knowing the routed engine, and the routing
    decision is the routing provider's, not the user's. As of
    2026-05-21 roadmodel exposes only fixed-engine models. Users
    who want routing behavior should pick a specific fixed engine
    directly and accept that the underlying provider may pool-route
    among models of the same family.
  </jurisdiction-context>

  <availability-context>
    A model can be present in `<model-options>` for catalog completeness
    (pricing, benchmarks, and capability reference) yet be temporarily
    UNAVAILABLE to recommend — its provider has restricted, withdrawn,
    waitlisted, or otherwise blocked end-user access. Recommending such a
    model is not actionable: the user cannot run it however strong its tier
    ratings are.

    Authoritative source — the runtime override. Live availability is
    maintained out of band by a daily probe + AI web-search verification and
    delivered at request time as a runtime availability-override block. When
    that block is present it is authoritative: it lists the complete current
    unavailable set, and a catalogued model absent from it is available — even
    if it appears in the fallback list below. The list below is only a
    cold-start fallback, applied when no runtime override is supplied (the
    availability service was unreachable); it is intentionally conservative so
    an outage fails closed rather than recommending a possibly-restricted model.

    Cold-start fallback — treat as unavailable ONLY when no runtime override is
    present (enforced at Step 0a of `<selection-algorithm>`):

    - `claude-fable-5` (Fable 5) — access restricted by Anthropic under an
      export-control directive as of 2026-06-12. The runtime override lifts
      this automatically once verification confirms access is restored; this
      fallback line is the fail-closed default for when the availability
      service can't be reached, and does not itself need editing.

    When an unavailable model would otherwise have been the best fit, return
    the next-best AVAILABLE model and disclose the substitution in the
    RATIONALE, exactly as the `<jurisdiction-context>` filter does.
  </availability-context>

  <benchmark-sources>
    Authoritative LLM leaderboards the AI may cite when justifying a model
    recommendation. When reasoning about a model's strength in a task
    category, ground the rationale in one of these sources by name.

    - LMArena — human-preference Elo across general chat (chatbot-arena.com)
    - Artificial Analysis Intelligence Index — composite of 10 evaluations
      including GPQA Diamond, Humanity's Last Exam, SciCode,
      Terminal-Bench Hard, and AA-Omniscience
    - Aider polyglot — coding across C++, Go, Java, JavaScript, Python, Rust
    - SWE-bench Verified — real GitHub issues, 500-instance human-filtered
      subset; gold standard for software-engineering capability
    - LiveCodeBench — contamination-free coding benchmark with rolling
      problems from LeetCode / AtCoder / Codeforces; complements
      SWE-bench by measuring algorithmic problem-solving on items the
      models could not have trained on
    - τ²-bench — agentic / tool-use benchmark with a real tool–agent–user
      loop across airline, retail, and banking domains (Sierra Research)
    - LiveBench — contamination-resistant multi-domain benchmark
    - Terminal-Bench 2.0 — terminal and agent task execution
    - GPQA Diamond — graduate-level science reasoning
    - AIME — advanced math olympiad problems
    - MMMU — multimodal university-level understanding
    - HLE (Humanity's Last Exam) — frontier-difficulty general intelligence
    - CursorBench — Cursor's proprietary benchmark built from real coding
      sessions with terse prompts and multi-file solutions
  </benchmark-sources>

  <task-categories>
    Classify every prompt into one primary category from this list. If the
    prompt spans two categories, list both and use the more demanding one
    as primary; the other becomes the secondary category for tie-breaking.

    - coding — implementation, debugging, refactoring, multi-file edits,
      writing tests, fixing build/lint errors
    - planning — architecture decisions, design docs, multi-step plans,
      ambiguity resolution, trade-off analysis, roadmap construction
    - agentic — autonomous tool use, terminal commands, long-running
      multi-step execution, end-to-end agent loops
    - multimodal — image, video, audio, or screenshot understanding
      alongside text or code
    - long-context — large repo or file ingestion, codebase-wide reasoning,
      multi-document synthesis, sustained sessions with persistent context
    - knowledge — domain expertise, factual recall, cross-domain accuracy,
      grounded research, low-hallucination requirements
    - speed — latency-sensitive completions, high-volume routine work,
      autocomplete-style tasks where wall-clock time dominates utility
  </task-categories>

  <model-options>
    Each model entry carries: pricing, S/A/B/C/D tier ratings across the
    seven task categories, headline benchmark numbers grounded in the
    sources above, and a free-text best-for description.

    Tier ratings:
    - S — top-1 or top-2 globally in this category
    - A — strong, reliable, near-frontier
    - B — competent for the category
    - C — limited; usable only for trivial work in the category
    - D — not suited; do not select for this category

    <tier cost="very-high">
      <model id="opus-4.7" name="Opus 4.7"
             input-price-per-1m="$5.00" output-price-per-1m="$25.00"
             jurisdiction="us"
             tier-coding="S" tier-planning="S" tier-agentic="A"
             tier-multimodal="A" tier-long-context="S" tier-knowledge="S"
             tier-speed="D"
             headline-benchmarks="AA Intelligence Index 53.5 (#2); LMArena Text #5 (Elo 1481.7); LMArena WebDev #5 (Elo 1556.9); AA-Omniscience 26.2 (#2)"
             pricing-notes="Hidden by default; Requires Max Mode on request-based plans; Up to 1M tokens in Max Mode at the same per-token rates (no long-context surcharge)"
             best-for="Deepest abstract and scientific reasoning, highest coherence on long unsupervised multi-step agent chains, best long-context recall at 1M tokens, 128K output ceiling for large single-shot deliverables, and novel problem-solving where high ambiguity demands creative judgment over pattern-matching" />
      <model id="opus-4.8" name="Opus 4.8"
             input-price-per-1m="$5.00" output-price-per-1m="$25.00"
             jurisdiction="us"
             tier-coding="S" tier-planning="S" tier-agentic="S"
             tier-multimodal="A" tier-long-context="S" tier-knowledge="S"
             tier-speed="D"
             headline-benchmarks="AA Intelligence Index 55.7 (#1); HLE 45.7%; Terminal-Bench Hard 58.3 (top-tier); τ²-bench retail pass_1 94.4%"
             pricing-notes="Requires Max Mode on request-based plans; Fast mode (`claude-opus-4-8-fast`) requires Max Mode; Fast mode is 3x lower per-token pricing than Opus 4.7 fast mode; Up to 1M tokens in Max Mode at the same per-token rates (no long-context surcharge)"
             best-for="Anthropic's Opus 4.7 successor at the same very-high tier pricing — placeholder tier ratings inherited from opus-4.7 pending benchmark coverage; the 3x cheaper fast-mode per-token rate (vs opus-4.7 fast mode) is the headline cost-structure change to surface in the next editorial pass" />
      <model id="claude-fable-5" name="Fable 5"
             input-price-per-1m="$10.00" output-price-per-1m="$50.00"
             jurisdiction="us"
             tier-coding="S" tier-planning="S" tier-agentic="S"
             tier-multimodal="S" tier-long-context="S" tier-knowledge="S"
             tier-speed="D"
             headline-benchmarks="AA Intelligence Index 59.9 (#1); HLE 53.3% (#1); Terminal-Bench Hard 62.9 (#1)"
             pricing-notes="Requires data retention approval for Enterprise customers, Teams and individual customers with Privacy Mode enabled; Anthropic stores agent input and output data for harm-prevention processes; this data is not used to train or improve Anthropic models or products; Requests that trip a security guardrail are automatically routed to Claude Opus; About 2x the cost of Claude Opus 4.8; Requires Max Mode on request-based plans"
             best-for="Anthropic's new top-of-line Fable family flagship (no predecessor) — S-tier across coding, planning, agentic, multimodal, long-context, and knowledge, leading HLE (53.3%) and Terminal-Bench Hard (62.9) with state-of-the-art vision and a 1M default context; about 2x the cost of Opus 4.8 and latency-slow (output ~64 tokens/s), so reserve for the hardest reasoning, agentic, and vision work where maximum capability outweighs cost and speed; security-guardrail trips auto-route to Opus. Tier profile sourced from the catalog cron's 2026-06-11 dry-run reconciliation against the live benchmark sources (τ²-bench retail not yet published for this model), pending editorial confirmation in the next refresh." />
      <model id="gpt-5.5" name="GPT-5.5"
             input-price-per-1m="$5.00" output-price-per-1m="$30.00"
             jurisdiction="us"
             tier-coding="S" tier-planning="S" tier-agentic="S"
             tier-multimodal="A" tier-long-context="A" tier-knowledge="A"
             tier-speed="D"
             headline-benchmarks="AA Intelligence Index 54.8 (#1); LMArena Text Elo 1463.3 (#18); HLE 44.3%; AA-Omniscience 20.1 (#3)"
             pricing-notes="Requires Max Mode on request-based plans; Agentic and reasoning capabilities; More token-efficient than GPT-5.4 on comparable tasks; Improved persistence on long-running tasks; Fast mode is available at higher rates; Long context (Max Mode) supports up to 1M tokens with 2x input pricing"
             best-for="OpenAI's most capable frontier model and highest-cost GPT offering, best suited for the most demanding reasoning, long-horizon planning, and tasks where maximum intelligence is required regardless of cost — strongest single model for hard coding, agentic execution, and reasoning, but verify factual claims due to elevated hallucination" />
    </tier>
    <tier cost="high">
      <model id="sonnet-4.6" name="Sonnet 4.6"
             input-price-per-1m="$3.00" output-price-per-1m="$15.00"
             jurisdiction="us"
             tier-coding="A" tier-planning="A" tier-agentic="S"
             tier-multimodal="A" tier-long-context="A" tier-knowledge="A"
             tier-speed="B"
             headline-benchmarks="AA Intelligence Index 47.2; LMArena Text Elo 1457.1 (#23); AA-Omniscience 12.4; top-ranked tool-calling on Anthropic lineage"
             pricing-notes="Requires Max Mode on request-based plans; Up to 1M tokens in Max Mode at the same per-token rates (no long-context surcharge)"
             best-for="Top-ranked tool-calling and agentic execution globally, near-Opus coding quality at 2-3x the speed, strong mathematical reasoning (89% MATH), and complex but well-structured tasks needing reliable high-throughput multi-step implementation" />
      <model id="gpt-5.4" name="GPT-5.4"
             input-price-per-1m="$2.50" output-price-per-1m="$15.00"
             jurisdiction="us"
             tier-coding="A" tier-planning="A" tier-agentic="S"
             tier-multimodal="A" tier-long-context="A" tier-knowledge="S"
             tier-speed="B"
             headline-benchmarks="AA Intelligence Index 51.4 (#4); LMArena Text Elo 1453.8 (#27); GPT-5.4 (xhigh) Output Speed 155.4 tokens/s; lowest factual error rate among GPT models"
             pricing-notes="Hidden by default; Requires Max Mode on request-based plans; Agentic and reasoning capabilities; 90% discount on cached input tokens; Fast mode is 15% faster with 2x pricing; Long context (Max Mode) supports up to 1M tokens with 2x input pricing"
             best-for="Broadest professional domain expertise (outperforms human specialists in 83% of occupations), native computer-use capability surpassing human baselines, lowest factual error rate among GPT models, and cross-domain knowledge work requiring deep real-world accuracy and grounding" />
    </tier>
    <tier cost="medium">
      <model id="gpt-5.3-codex" name="GPT-5.3 Codex"
             input-price-per-1m="$1.75" output-price-per-1m="$14.00"
             jurisdiction="us"
             tier-coding="S" tier-planning="B" tier-agentic="S"
             tier-multimodal="D" tier-long-context="B" tier-knowledge="B"
             tier-speed="B"
             headline-benchmarks="AA Intelligence Index 44.3 (xhigh); HLE 39.9%; Codex lineage retains strong Terminal-Bench and SWE-bench Verified performance for autonomous coding"
             pricing-notes="Requires Max Mode on request-based plans; Agentic and reasoning capabilities; Available reasoning effort variant is gpt-5.3-codex-high"
             best-for="Highest terminal and tool-use proficiency at the medium tier, most token-efficient autonomous coding, excels at long-running agentic sessions spanning debugging through deployment, and hard algorithmic problems requiring sustained code reasoning across languages — the cost-efficient pick for pure coding and agentic execution when an S-tier coding rating is needed" />
      <model id="gpt-5.2" name="GPT-5.2"
             input-price-per-1m="$1.75" output-price-per-1m="$14.00"
             jurisdiction="us"
             tier-coding="B" tier-planning="A" tier-agentic="B"
             tier-multimodal="C" tier-long-context="A" tier-knowledge="A"
             tier-speed="B"
             headline-benchmarks="AA Intelligence Index 42.2 (xhigh); GPQA 90.3; LiveCodeBench 88.9; HLE 35.4%; released 2025-12-10"
             pricing-notes="Hidden by default; Agentic and reasoning capabilities; Available reasoning effort variant is gpt-5.2-high"
             best-for="Earlier-flagship GPT reasoning model (December 2025) with 400K context and broad knowledge coverage (GPQA 71.2, MMLU Pro 81.4); same medium-tier pricing as GPT-5.3 Codex but lacks Codex's autonomous-coding specialization — pick gpt-5.3-codex over gpt-5.2 for coding/agentic tasks; gpt-5.2 fits when broad reasoning at A-tier knowledge and a 400K context window are the primary need at the medium price tier" />
      <model id="gemini-3.1-pro" name="Gemini 3.1 Pro"
             input-price-per-1m="$2.00" output-price-per-1m="$12.00"
             jurisdiction="us"
             tier-coding="A" tier-planning="A" tier-agentic="A"
             tier-multimodal="S" tier-long-context="S" tier-knowledge="A"
             tier-speed="B"
             headline-benchmarks="AA Intelligence Index 46.5 (#3); AA-Omniscience 32.9 (#1); HLE 44.7% (#1); LMArena Text Elo 1479.8 (#7); 1M-token context"
             pricing-notes="-"
             best-for="True native multimodal understanding (text, image, video, audio, and code in a single pass), 1M-token context optimized for heterogeneous inputs, strong agentic multi-step tool use, and synthesizing insights across large mixed-media datasets or sprawling document corpora — the obvious choice whenever multimodal or long-context is the primary category" />
      <model id="gemini-3-pro" name="Gemini 3 Pro"
             input-price-per-1m="$2.00" output-price-per-1m="$12.00"
             jurisdiction="us"
             tier-coding="A" tier-planning="A" tier-agentic="A"
             tier-multimodal="S" tier-long-context="S" tier-knowledge="A"
             tier-speed="B"
             headline-benchmarks="Gemini 3 generation Pro variant predating the 3.1 refresh; 1M-token context; native multimodal across text/image/video/audio/code"
             pricing-notes="Hidden by default"
             best-for="Gemini 3 family Pro model at the same medium-tier pricing as gemini-3.1-pro — pick gemini-3.1-pro over gemini-3-pro when both are available since 3.1 carries the updated benchmarks and is the canonical visible Gemini Pro; gemini-3-pro fits when reproducing earlier Gemini-3-generation outputs or when the 3.1 refresh's behavioral changes are undesirable for a specific workload" />
      <model id="gpt-5" name="GPT-5"
             input-price-per-1m="$1.25" output-price-per-1m="$10.00"
             jurisdiction="us"
             tier-coding="A" tier-planning="A" tier-agentic="A"
             tier-multimodal="B" tier-long-context="A" tier-knowledge="A"
             tier-speed="B"
             headline-benchmarks="Earlier flagship GPT-5 family entry with agentic and reasoning capabilities at medium-tier output pricing; specific AA / LMArena numbers pending benchmark refresh"
             pricing-notes="Hidden by default; Agentic and reasoning capabilities; Available reasoning effort variant is gpt-5-high"
             best-for="OpenAI's baseline GPT-5 family flagship — broad reasoning capability at medium-tier pricing ($10/M output), useful when a balanced GPT-5-class model is needed without the premium of GPT-5.4 / 5.5 and without the codex coding specialization; superseded by GPT-5.2 / 5.3 / 5.4 for most production use cases but available on Cursor's pool" />
      <model id="gpt-5.1-codex" name="GPT-5.1 Codex"
             input-price-per-1m="$1.25" output-price-per-1m="$10.00"
             jurisdiction="us"
             tier-coding="S" tier-planning="B" tier-agentic="A"
             tier-multimodal="D" tier-long-context="B" tier-knowledge="B"
             tier-speed="B"
             headline-benchmarks="Earlier-generation Codex specialization at medium-tier output pricing; strong terminal and tool-use proficiency carried forward from the Codex lineage"
             pricing-notes="Hidden by default; Agentic and reasoning capabilities"
             best-for="Earlier Codex generation at the same medium-tier pricing as gpt-5.3-codex but $10/M output (gpt-5.3-codex is $14/M) — the lowest-cost S-tier coding model on the medium tier; prefer gpt-5.3-codex when latest-generation Codex quality matters, prefer gpt-5.1-codex when reproducing earlier-Codex-generation outputs or when the slightly cheaper output price compounds against a high-volume coding workload" />
    </tier>
    <tier cost="low">
      <model id="composer-2" name="Composer 2 (Fast)"
             input-price-per-1m="$0.50" output-price-per-1m="$2.50"
             jurisdiction="us"
             tier-coding="A" tier-planning="B" tier-agentic="A"
             tier-multimodal="D" tier-long-context="B" tier-knowledge="B"
             tier-speed="S"
             headline-benchmarks="CursorBench 61.3 (+37% over Composer 1.5); SWE-bench Multilingual 73.7; Terminal-Bench 2.0 61.7"
             pricing-notes="Hidden by default"
             best-for="Cursor's enforced default Composer model — purpose-built for multi-file agentic editing, fine-tuned on real developer sessions, self-summarizing 200K context for sustained long tasks, and frontier-level coding quality with speed-optimized inference at the lowest output price ($2.50/M) — the default choice for standard implementation, multi-file changes, and roadmap execution where coding-A is sufficient" />
      <model id="grok-4.3" name="Grok 4.3"
             input-price-per-1m="$1.25" output-price-per-1m="$2.50"
             jurisdiction="us"
             tier-coding="B" tier-planning="A" tier-agentic="S"
             tier-multimodal="B" tier-long-context="S" tier-knowledge="A"
             tier-speed="B"
             headline-benchmarks="AA Intelligence Index 37.6 (#7); AA-Omniscience 18.3 (#4); HLE 35.0%; LMArena Search Elo 1165.3"
             pricing-notes="Hidden by default; Requires Max Mode on request-based plans"
             best-for="Latest Grok release with built-in multi-agent self-verification, configurable reasoning depth, and signature 2M-token context with hallucination-resistant grounding — leads the low tier on agentic execution and long-context, ideal when massive context, factual accuracy, and aggressive cost efficiency must coexist" />
      <model id="claude-4.5-haiku" name="Claude 4.5 Haiku"
             input-price-per-1m="$1.00" output-price-per-1m="$5.00"
             jurisdiction="us"
             tier-coding="B" tier-planning="B" tier-agentic="B"
             tier-multimodal="B" tier-long-context="B" tier-knowledge="B"
             tier-speed="S"
             headline-benchmarks="AA Intelligence Index 29.6; Output Speed 120.7 tokens/s; AA-Omniscience -4.2; latency leader among Claude family"
             pricing-notes="Hidden by default; Bedrock/Vertex: regional endpoints +10% surcharge; Cache: writes 1.25x, reads 0.1x"
             best-for="Speed-optimized lowest-cost Claude model, ideal for simple completions, high-volume repetitive tasks, and latency-sensitive workflows where a lightweight capable response matters more than deep reasoning" />
      <model id="gpt-5.4-mini" name="GPT-5.4 Mini"
             input-price-per-1m="$0.75" output-price-per-1m="$4.50"
             jurisdiction="us"
             tier-coding="B" tier-planning="C" tier-agentic="C"
             tier-multimodal="B" tier-long-context="B" tier-knowledge="B"
             tier-speed="A"
             headline-benchmarks="AA Intelligence Index 40 (xhigh); Output Speed 175.9 tokens/s; HLE 26.6% (GPT-5.4-mini xhigh)"
             pricing-notes="Hidden by default; Smaller, faster variant of GPT-5.4; 90% discount on cached input tokens"
             best-for="Lightweight GPT-5.4 variant balancing quality and cost, well-suited for straightforward coding, short-form generation, and high-throughput workloads needing solid GPT reasoning at a fraction of the flagship price" />
      <model id="gpt-5.4-nano" name="GPT-5.4 Nano"
             input-price-per-1m="$0.20" output-price-per-1m="$1.25"
             jurisdiction="us"
             tier-coding="C" tier-planning="D" tier-agentic="D"
             tier-multimodal="C" tier-long-context="C" tier-knowledge="C"
             tier-speed="S"
             headline-benchmarks="Cheapest GPT-5.4 family variant; throughput-optimized inference"
             pricing-notes="Hidden by default; Smallest GPT-5.4 variant, optimized for cost; 90% discount on cached input tokens"
             best-for="Ultra-low-cost GPT variant for trivial text tasks, simple lookups, rapid classification, and extreme-throughput pipelines where cost efficiency is the sole constraint and task complexity is minimal" />
      <model id="composer-2.5" name="Composer 2.5"
             input-price-per-1m="$0.50" output-price-per-1m="$2.50"
             jurisdiction="us"
             tier-coding="A" tier-planning="B" tier-agentic="A"
             tier-multimodal="D" tier-long-context="B" tier-knowledge="B"
             tier-speed="S"
             headline-benchmarks="Composer 2 family successor at the same output price ($2.50/M); Cursor's release notes claim substantial intelligence + behavior improvements over Composer 2 trained on ~25x more synthetic tasks; specific benchmark numbers pending republish (CursorBench 61.3 + SWE-bench Multilingual 73.7 + Terminal-Bench 2.0 61.7 from Composer 2 carry forward as floors)"
             pricing-notes="-"
             best-for="Composer 2's successor at the same output price — Cursor's purpose-built multi-file agentic editor with frontier-level coding quality and speed-optimized inference; prefer over Composer 2 when both are available since 2.5 supersedes 2 within the same series per the equal-output-price replacement rule (Composer 2 is now Hidden by default on Cursor's pricing page)" />
      <model id="gemini-2.5-flash" name="Gemini 2.5 Flash"
             input-price-per-1m="$0.30" output-price-per-1m="$2.50"
             jurisdiction="us"
             tier-coding="B" tier-planning="B" tier-agentic="B"
             tier-multimodal="A" tier-long-context="A" tier-knowledge="B"
             tier-speed="S"
             headline-benchmarks="High-throughput Gemini Flash variant with native multimodal grounding; 1M-token context; designed for low-cost high-volume inference"
             pricing-notes="Hidden by default"
             best-for="Google's cheap, fast, multimodal Flash model at $0.30/M output — the cost-efficient pick for high-volume structured-output tasks (model recommendation, classification, light planning with strong system-prompt grounding) where multimodal capability matters and frontier-class reasoning does not; powers free-tier SaaS surfaces where per-call cost discipline is essential and the bundled templates do the structural heavy lifting" />
      <model id="gemini-3-flash" name="Gemini 3 Flash"
             input-price-per-1m="$0.50" output-price-per-1m="$3.00"
             jurisdiction="us"
             tier-coding="B" tier-planning="A" tier-agentic="A"
             tier-multimodal="S" tier-long-context="S" tier-knowledge="A"
             tier-speed="S"
             headline-benchmarks="Gemini 3 generation Flash variant; native multimodal across text/image/video/audio; 1M-token context; throughput-optimized inference"
             pricing-notes="Hidden by default"
             best-for="Gemini 3 generation's cheap-tier model — meaningfully stronger planning, agentic, knowledge ratings than 2.5 Flash at slightly higher cost ($3.00/M output vs $2.50/M), with native multimodal-S; pick over 2.5 Flash when the task benefits from Gemini 3 family improvements and per-call cost discipline still matters" />
      <model id="gemini-3.5-flash" name="Gemini 3.5 Flash"
             input-price-per-1m="$1.50" output-price-per-1m="$9.00"
             jurisdiction="us"
             tier-coding="B" tier-planning="A" tier-agentic="A"
             tier-multimodal="B" tier-long-context="B" tier-knowledge="A"
             tier-speed="S"
             headline-benchmarks="AA Intelligence Index 50.2 (high reasoning); τ²-bench retail pass_1 45.6 (Gemini 3.5 Flash); Output Speed 215.6 tokens/s"
             pricing-notes="-"
             best-for="Auto-added cheap-tier Google model; pending editorial best-for refinement." />
      <model id="gpt-5-mini" name="GPT-5 Mini"
             input-price-per-1m="$0.25" output-price-per-1m="$2.00"
             jurisdiction="us"
             tier-coding="B" tier-planning="C" tier-agentic="C"
             tier-multimodal="B" tier-long-context="B" tier-knowledge="B"
             tier-speed="S"
             headline-benchmarks="Cheapest GPT-5 family variant at $2.00/M output; throughput-optimized inference"
             pricing-notes="Hidden by default"
             best-for="The cheapest GPT-5 family variant at $2.00/M output — well-suited for trivial text tasks, simple lookups, rapid classification, and high-throughput pipelines where the cost-per-call is the binding constraint; not appropriate for multi-step planning or autonomous agentic execution; competitive with Gemini 2.5 Flash on cost but lacks Gemini's native multimodal-A rating" />
      <model id="kimi-k2.5" name="Kimi K2.5"
             input-price-per-1m="$0.60" output-price-per-1m="$3.00"
             jurisdiction="cn"
             tier-coding="B" tier-planning="B" tier-agentic="B"
             tier-multimodal="C" tier-long-context="B" tier-knowledge="B"
             tier-speed="B"
             headline-benchmarks="Moonshot AI's Kimi K2 series successor at $3.00/M output; competitive cost positioning across general text tasks; specific benchmark numbers pending refresh"
             pricing-notes="Hidden by default"
             best-for="Moonshot's affordable mid-volume model — a non-Google / non-OpenAI / non-Anthropic option at low-tier pricing for cost-conscious code and text generation when provider diversity is desired (vendor-risk hedging, regional preferences); routed via Cursor's pool only — no direct Moonshot access method is currently enumerated in the access-methods block" />
      <model id="deepseek-v4-pro" name="DeepSeek-V4-Pro"
             input-price-per-1m="$0.435" output-price-per-1m="$0.87"
             jurisdiction="cn"
             tier-coding="A" tier-planning="A" tier-agentic="A"
             tier-multimodal="D" tier-long-context="A" tier-knowledge="A"
             tier-speed="C"
             headline-benchmarks="AA Intelligence Index 52 (reasoning, max effort) — independently measured by Artificial Analysis; SWE-bench Verified 80.6%, LiveCodeBench 93.5, Terminal-Bench 2.0 67.9, Codeforces CodeElo 3206, Putnam-2025 120/120 (DeepSeek-reported); 1M-token context; text-only (no image input); ~46 tokens/s (notably slow)"
             pricing-notes="Provider-direct DeepSeek API per-token pricing (not via the Cursor pool); cache-hit input $0.003625/M"
             best-for="DeepSeek's V4-Pro flagship — a very low-cost ($0.87/M output), cn-jurisdiction reasoning model with a 1M-token context window and thinking mode on by default. Strong general intelligence (Artificial Analysis Intelligence Index 52, just below Grok 4.3) and a frontier-approaching coding profile (SWE-bench Verified 80.6%, LiveCodeBench 93.5, Codeforces CodeElo 3206) — these coding numbers are DeepSeek-reported, so it is rated coding-A rather than S pending an independent SWE-bench leaderboard entry. Text-only (no multimodal) and notably slow (~46 tokens/s), so not for latency-sensitive or image work. Reached via the `deepseek-api` method (provider-direct per-token, not the Cursor pool) when the cn jurisdiction is acceptable and a deepseek-api-key is configured — the cheapest A-tier coding / reasoning option in the catalog." />
      <model id="deepseek-v4-flash" name="DeepSeek-V4-Flash"
             input-price-per-1m="$0.14" output-price-per-1m="$0.28"
             jurisdiction="cn"
             tier-coding="A" tier-planning="B" tier-agentic="B"
             tier-multimodal="D" tier-long-context="B" tier-knowledge="B"
             tier-speed="A"
             headline-benchmarks="AA Intelligence Index 47 (reasoning, max effort) — independently measured by Artificial Analysis; SWE-bench Verified 79.0% (DeepSeek-reported); 1M-token context; text-only (no image input); ~90 tokens/s"
             pricing-notes="Provider-direct DeepSeek API per-token pricing (not via the Cursor pool); cache-hit input $0.0028/M"
             best-for="DeepSeek's V4-Flash — the fast (~90 tokens/s), cheapest DeepSeek variant ($0.28/M output) with a 1M-token context window, for high-throughput / latency-sensitive text and code work under the cn jurisdiction with a deepseek-api-key. Mid-pack general intelligence (Artificial Analysis Intelligence Index 47) paired with a strong, DeepSeek-reported coding result (SWE-bench Verified 79.0%) — rated coding-A on that basis with the rest of its profile B-tier; text-only (no multimodal). Reached via the `deepseek-api` method (provider-direct per-token). Pick V4-Pro over V4-Flash when reasoning depth or the strongest coding matters; pick V4-Flash when speed and the lowest cost dominate." />
      <model id="mistral-medium-3.5" name="Mistral Medium 3.5"
             input-price-per-1m="$1.50" output-price-per-1m="$7.50"
             jurisdiction="eu"
             tier-coding="B" tier-planning="B" tier-agentic="C"
             tier-multimodal="B" tier-long-context="B" tier-knowledge="B"
             tier-speed="B"
             headline-benchmarks="AA Intelligence Index 39 (independently measured by Artificial Analysis); unified chat / reasoning / code model with an adjustable reasoning dial (reasoning_effort); multimodal (text + image input)"
             pricing-notes="Provider-direct Mistral API per-token pricing (not via the Cursor pool); eu-jurisdiction; prices manually maintained (Mistral publishes no machine-readable price source)"
             best-for="Mistral's flagship unified model — the EU-jurisdiction choice for data-sovereignty / EU-regulatory workloads at low cost ($7.50/M output), with adjustable reasoning and multimodal (vision) input. Artificial Analysis Intelligence Index 39 places it mid-pack (below the US/cn frontier such as Gemini 3.1 Pro or DeepSeek V4-Pro) — pick it when the operator's EU jurisdiction is the deciding constraint, not when raw capability is. Reached via the `mistral-api` method (provider-direct per-token) with a mistral-api-key." />
      <model id="mistral-small-4" name="Mistral Small 4"
             input-price-per-1m="$0.10" output-price-per-1m="$0.30"
             jurisdiction="eu"
             tier-coding="C" tier-planning="C" tier-agentic="C"
             tier-multimodal="B" tier-long-context="C" tier-knowledge="C"
             tier-speed="A"
             headline-benchmarks="AA Intelligence Index 28 (independently measured by Artificial Analysis); compact Mixture-of-Experts unifying the former Small / Magistral / Pixtral / Devstral lines; adjustable reasoning_effort; multimodal (text + image)"
             pricing-notes="Provider-direct Mistral API per-token pricing (not via the Cursor pool); eu-jurisdiction; prices manually maintained (Mistral publishes no machine-readable price source)"
             best-for="Mistral's cheapest fast model ($0.30/M output) — a small EU-jurisdiction MoE with multimodal input and an optional reasoning dial, for high-throughput / latency-sensitive text and light multimodal work where the EU operator matters and a mistral-api-key is configured. Artificial Analysis Intelligence Index 28 is low, so it is a cost / sovereignty pick rather than a capability pick. Reached via the `mistral-api` method (provider-direct per-token)." />
      <model id="mistral-large-3" name="Mistral Large 3"
             input-price-per-1m="$0.50" output-price-per-1m="$1.50"
             jurisdiction="eu"
             tier-coding="C" tier-planning="C" tier-agentic="C"
             tier-multimodal="D" tier-long-context="C" tier-knowledge="C"
             tier-speed="B"
             headline-benchmarks="AA Intelligence Index 23 (independently measured by Artificial Analysis); open-weight Mixture-of-Experts (self-hostable); text-only"
             pricing-notes="Provider-direct Mistral API per-token pricing (not via the Cursor pool); eu-jurisdiction; prices manually maintained (Mistral publishes no machine-readable price source)"
             best-for="Mistral's open-weight Large 3 (MoE) — an EU-jurisdiction, self-hostable option at very low cost ($1.50/M output) for data-sovereignty workloads or teams that want to run the weights themselves. Artificial Analysis Intelligence Index 23 sits below the frontier and even below Mistral's own Medium 3.5 (Mistral repositioned Large as an open community model) — pick it for the open-weights / EU-operator profile, not raw capability. Reached via the `mistral-api` method (provider-direct per-token) or self-hosting." />
      <model id="codestral" name="Codestral"
             input-price-per-1m="$0.30" output-price-per-1m="$0.90"
             jurisdiction="eu"
             tier-coding="B" tier-planning="C" tier-agentic="C"
             tier-multimodal="D" tier-long-context="B" tier-knowledge="C"
             tier-speed="A"
             headline-benchmarks="Mistral's code-specialist model; fast low-latency completion / fill-in-the-middle across many languages with a large code context window; specific public benchmark numbers pending"
             pricing-notes="Provider-direct Mistral API per-token pricing (not via the Cursor pool); eu-jurisdiction; prices manually maintained (Mistral publishes no machine-readable price source)"
             best-for="Mistral's dedicated code model — fast, cheap ($0.90/M output) code completion and fill-in-the-middle under the EU jurisdiction, for autocomplete-style / high-throughput coding loops where an EU operator and low latency matter more than top-tier agentic reasoning. Reached via the `mistral-api` method (provider-direct per-token) with a mistral-api-key; prefer mistral-medium-3.5 for reasoning-heavy coding, codestral for fast bounded completions." />
      <model id="glm-5.2" name="GLM-5.2"
             input-price-per-1m="$1.40" output-price-per-1m="$4.40"
             jurisdiction="cn"
             tier-coding="A" tier-planning="A" tier-agentic="A"
             tier-multimodal="D" tier-long-context="B" tier-knowledge="A"
             tier-speed="B"
             headline-benchmarks="z.ai (Zhipu AI) GLM-5.2 flagship (~Jun 2026) — strong coding / agentic model in the GLM-5 line; text-only; cn-jurisdiction; specific public benchmark numbers pending independent refresh"
             pricing-notes="Provider-direct z.ai (Zhipu) API per-token pricing (not via the Cursor pool); cn-jurisdiction; cache-input $0.26/M"
             best-for="z.ai's flagship GLM-5.2 — a low-cost ($4.40/M output), cn-jurisdiction reasoning / coding model and the strongest GLM for multi-step coding and agentic work. Pick it when the cn jurisdiction is acceptable and a zai-api-key is configured and you want a frontier-adjacent coder at a fraction of US-frontier output price; text-only, so not for multimodal work (use the GLM-V vision line, not yet in the catalog). Reached via the `zai-api` method (provider-direct per-token, not the Cursor pool). Rated coding/planning/agentic-A on the GLM-5 line's positioning pending an independent benchmark refresh (the catalog cron's Opus lane owns the numbers)." />
      <model id="glm-4.6" name="GLM-4.6"
             input-price-per-1m="$0.60" output-price-per-1m="$2.20"
             jurisdiction="cn"
             tier-coding="A" tier-planning="B" tier-agentic="B"
             tier-multimodal="D" tier-long-context="B" tier-knowledge="B"
             tier-speed="B"
             headline-benchmarks="z.ai GLM-4.6 — the widely-adopted, cost-efficient GLM coding workhorse; regarded as a strong value coding model in the GLM-4.x line; text-only; cn-jurisdiction; specific numbers pending independent refresh"
             pricing-notes="Provider-direct z.ai (Zhipu) API per-token pricing (not via the Cursor pool); cn-jurisdiction; cache-input $0.11/M"
             best-for="z.ai's GLM-4.6 — the proven value coding model: very low cost ($2.20/M output), cn-jurisdiction, and a strong code-generation / fill profile that made it a popular coding-plan default. Pick it over glm-5.2 when cost dominates and the task is bounded coding rather than the hardest agentic reasoning; pick glm-5.2 when reasoning depth or agentic autonomy matters. Text-only. Reached via the `zai-api` method (provider-direct per-token) with a zai-api-key." />
      <model id="glm-4.5-air" name="GLM-4.5-Air"
             input-price-per-1m="$0.20" output-price-per-1m="$1.10"
             jurisdiction="cn"
             tier-coding="B" tier-planning="C" tier-agentic="C"
             tier-multimodal="D" tier-long-context="C" tier-knowledge="C"
             tier-speed="A"
             headline-benchmarks="z.ai GLM-4.5-Air — a lightweight, fast, low-cost GLM variant for high-throughput text / code; text-only; cn-jurisdiction; specific numbers pending independent refresh"
             pricing-notes="Provider-direct z.ai (Zhipu) API per-token pricing (not via the Cursor pool); cn-jurisdiction; cache-input $0.03/M"
             best-for="z.ai's GLM-4.5-Air — the cheapest non-free GLM ($1.10/M output): a small, fast, cn-jurisdiction model for high-throughput / latency-sensitive text and light coding where cost is the binding constraint, not top-tier capability. Reached via the `zai-api` method (provider-direct per-token) with a zai-api-key; step up to glm-4.6 for serious coding or glm-5.2 for reasoning / agentic work." />
      <model id="gpt-oss-120b" name="gpt-oss-120b"
             input-price-per-1m="$0.15" output-price-per-1m="$0.60"
             jurisdiction="us"
             tier-coding="B" tier-planning="B" tier-agentic="B"
             tier-multimodal="D" tier-long-context="C" tier-knowledge="B"
             tier-speed="A"
             headline-benchmarks="OpenAI gpt-oss-120b — open-weight (Apache-2.0) Mixture-of-Experts reasoning model (~117B total / ~5B active) with configurable reasoning effort; OpenAI positions it near o4-mini on reasoning; 128K context; hosted by Groq (~500 tokens/s); us-jurisdiction"
             pricing-notes="Provider-direct Groq-hosted pricing for OpenAI's open-weight gpt-oss (Apache-2.0); us-jurisdiction; prices manually maintained from groq.com/pricing"
             best-for="OpenAI's open-weight gpt-oss-120b (Apache-2.0) hosted by Groq — a very low-cost ($0.60/M output), fast (~500 tokens/s), us-jurisdiction reasoning model with an adjustable reasoning dial, for cost / throughput-sensitive reasoning and coding where an open-weight, self-hostable model (data-sovereignty, on-prem portability) is preferred. OpenAI positions it near o4-mini; 128K context; text-only. Reached via the `groq-api` method (provider-direct per-token) with a groq-api-key — note this host is not among the operator's current subscriptions, so it is catalog-present but only recommendable once a groq-api-key (or another gpt-oss host) is configured." />
      <model id="gpt-oss-20b" name="gpt-oss-20b"
             input-price-per-1m="$0.075" output-price-per-1m="$0.30"
             jurisdiction="us"
             tier-coding="C" tier-planning="C" tier-agentic="C"
             tier-multimodal="D" tier-long-context="C" tier-knowledge="C"
             tier-speed="S"
             headline-benchmarks="OpenAI gpt-oss-20b — smaller open-weight (Apache-2.0) Mixture-of-Experts reasoning model (~21B total / ~3.6B active); OpenAI positions it near o3-mini; 128K context; very fast on Groq (~1000 tokens/s); us-jurisdiction"
             pricing-notes="Provider-direct Groq-hosted pricing for OpenAI's open-weight gpt-oss (Apache-2.0); us-jurisdiction; prices manually maintained from groq.com/pricing"
             best-for="OpenAI's smaller open-weight gpt-oss-20b (Apache-2.0) hosted by Groq — the cheapest gpt-oss ($0.30/M output) and extremely fast (~1000 tokens/s), for high-throughput / latency-sensitive light reasoning, classification, and simple code under the us jurisdiction, or as an on-device / self-hostable open-weight option. OpenAI positions it near o3-mini; 128K context; text-only. Reached via the `groq-api` method (provider-direct per-token) with a groq-api-key — catalog-present but only recommendable once a groq-api-key (or another gpt-oss host) is configured. Prefer gpt-oss-120b when reasoning quality matters more than raw speed / cost." />
    </tier>
  </model-options>

  <access-methods>
    Each access method is a way to run one or more models from
    `<model-options>`. Methods differ in (a) which models they expose,
    (b) how billing works (per-token, subscription-included,
    subscription-pool, subscription-or-key), (c) which capability
    toggles (Max Mode, thinking) they expose, and (d) what credentials
    the user must hold.

    The `<access-selection>` algorithm consumes this list together with
    the user-specific state in docs/user-context.md to pick a PLATFORM
    for the chosen model.

    Billing types:
    - subscription-included — a flat-monthly plan with a usage budget
      pool that the call draws from at $0 marginal cost until the
      budget is exhausted (e.g. claude.ai Max, ChatGPT Plus).
    - subscription-pool — a flat-monthly plan with a shared token pool
      consumed across many models (e.g. Cursor Ultra). $0 marginal
      cost until the pool is exhausted.
    - subscription-or-key — surface accepts either a subscription OR a
      direct API key; if a subscription is active, prefer it.
    - per-token — pay-per-token at the provider's published API rate.

    <method id="anthropic-api" name="Anthropic API"
            provider="anthropic" billing="per-token"
            provider-jurisdiction="us"
            requires="anthropic-api-key"
            supports-models="opus-4.8,claude-fable-5,opus-4.7,sonnet-4.6,claude-4.5-haiku"
            exposes-max-mode="no" exposes-thinking="yes"
            exposes-orchestration="no"
            best-for="Programmatic / scripted Claude use outside Claude Code — raw API headers, batch endpoints, or features not surfaced by Claude Code. Falls back here when claude.ai Max budget is exhausted." />
    <method id="claude-code" name="Claude Code"
            provider="anthropic" billing="subscription-or-key"
            provider-jurisdiction="us"
            requires="claude-max-subscription OR anthropic-api-key"
            supports-models="opus-4.8,claude-fable-5,opus-4.7,sonnet-4.6,claude-4.5-haiku"
            exposes-max-mode="no" exposes-thinking="yes"
            exposes-orchestration="yes"
            best-for="Default for Claude coding or terminal tasks when a claude.ai Max subscription is active — $0 marginal cost until the Max budget is exhausted, full tool-use surface, runs as a CLI and as an IDE extension inside Cursor. Heavy Opus usage that would cost over $1,000/mo on per-token API is fully covered by a $100/mo Max plan. Exposes the full `/effort` dial (low/medium/high/xhigh/max — Opus 4.6 and Sonnet 4.6 top out at max with no xhigh step; Opus 4.7, Opus 4.8, and Fable 5 expose the full range) plus Ultracode (session-wide xhigh + Dynamic Workflows) and the per-turn `ultrathink` keyword." />
    <method id="claude-web" name="claude.ai web / desktop"
            provider="anthropic" billing="subscription-included"
            provider-jurisdiction="us"
            requires="claude-max-subscription"
            supports-models="opus-4.8,opus-4.7,sonnet-4.6,claude-4.5-haiku"
            exposes-max-mode="no" exposes-thinking="yes"
            exposes-orchestration="no"
            best-for="Chat-driven Claude use (no terminal, no codebase tool use) under the same Max budget that funds Claude Code — pick when the task is conversational rather than code-editing." />
    <method id="openai-api" name="OpenAI API"
            provider="openai" billing="per-token"
            provider-jurisdiction="us"
            requires="openai-api-key"
            supports-models="gpt-5.5,gpt-5.4,gpt-5.3-codex,gpt-5.2,gpt-5.1-codex,gpt-5,gpt-5.4-mini,gpt-5.4-nano,gpt-5-mini"
            exposes-max-mode="no" exposes-thinking="yes"
            exposes-orchestration="no"
            best-for="Programmatic / scripted GPT use when an OpenAI API key is configured. Pay-per-token at OpenAI's published rates." />
    <method id="codex-cli" name="Codex"
            provider="openai" billing="subscription-or-key"
            provider-jurisdiction="us"
            requires="chatgpt-subscription OR openai-api-key"
            supports-models="gpt-5.5,gpt-5.4,gpt-5.3-codex,gpt-5.2,gpt-5.1-codex,gpt-5,gpt-5.4-mini,gpt-5-mini"
            exposes-max-mode="no" exposes-thinking="yes"
            exposes-orchestration="no"
            best-for="Default for GPT-driven autonomous coding sessions when a ChatGPT Plus/Pro subscription is active — pays from the ChatGPT budget instead of the per-token API rate. Best surface for gpt-5.3-codex / gpt-5.1-codex on long-running terminal / agentic work." />
    <method id="chatgpt-app" name="ChatGPT (web / desktop)"
            provider="openai" billing="subscription-included"
            provider-jurisdiction="us"
            requires="chatgpt-subscription"
            supports-models="gpt-5.5,gpt-5.4,gpt-5,gpt-5.4-mini,gpt-5-mini"
            exposes-max-mode="no" exposes-thinking="yes"
            exposes-orchestration="no"
            best-for="Chat-driven GPT use without terminal or IDE integration; subscription-funded so marginal cost is $0 until ChatGPT's usage limits kick in." />
    <method id="google-api" name="Google AI Studio API"
            provider="google" billing="per-token"
            provider-jurisdiction="us"
            requires="google-api-key"
            supports-models="gemini-3.1-pro,gemini-3-pro,gemini-3.5-flash,gemini-3-flash,gemini-2.5-flash"
            exposes-max-mode="no" exposes-thinking="yes"
            exposes-orchestration="no"
            best-for="Programmatic / scripted Gemini use with a Google API key. Pay-per-token at Google's published rates. Powers the roadmodel SaaS free-tier surfaces (/recommend on Gemini 2.5 Flash; /roadmap on Gemini 2.5 Flash with 3.1 Pro escalation)." />
    <method id="gemini-cli" name="Gemini CLI"
            provider="google" billing="subscription-or-key"
            provider-jurisdiction="us"
            requires="gemini-advanced-subscription OR google-api-key"
            supports-models="gemini-3.1-pro,gemini-3-pro,gemini-3.5-flash,gemini-3-flash,gemini-2.5-flash"
            exposes-max-mode="no" exposes-thinking="yes"
            exposes-orchestration="no"
            best-for="Terminal-driven Gemini use; the CLI surface for multimodal and long-context Gemini work outside Cursor's pool." />
    <method id="gemini-app" name="Gemini (web / app)"
            provider="google" billing="subscription-included"
            provider-jurisdiction="us"
            requires="gemini-advanced-subscription"
            supports-models="gemini-3.1-pro,gemini-3-pro,gemini-3.5-flash,gemini-3-flash,gemini-2.5-flash"
            exposes-max-mode="no" exposes-thinking="yes"
            exposes-orchestration="no"
            best-for="Chat-driven Gemini use under the Gemini Advanced subscription budget." />
    <method id="xai-api" name="xAI API"
            provider="xai" billing="per-token"
            provider-jurisdiction="us"
            requires="xai-api-key"
            supports-models="grok-4.3"
            exposes-max-mode="no" exposes-thinking="no"
            exposes-orchestration="no"
            best-for="Direct Grok API access for 2M-context or hallucination-resistant tasks; pay-per-token at xAI's published rates." />
    <method id="deepseek-api" name="DeepSeek API"
            provider="deepseek" billing="per-token"
            provider-jurisdiction="cn"
            requires="deepseek-api-key"
            supports-models="deepseek-v4-flash,deepseek-v4-pro"
            exposes-max-mode="no" exposes-thinking="yes"
            exposes-orchestration="no"
            best-for="Direct DeepSeek API access (provider-direct per-token; OpenAI-format at api.deepseek.com and Anthropic-format at api.deepseek.com/anthropic) for the deepseek-v4 models — cost-conscious coding / reasoning / long-context (1M) work when the cn jurisdiction is acceptable and a deepseek-api-key is configured. Exposes the full thinking dial (toggle + reasoning_effort `high`/`max`). Not routed via the Cursor pool. cn-jurisdiction: excluded by the default allowed-jurisdictions list unless the user opts into cn." />
    <method id="mistral-api" name="Mistral API"
            provider="mistral" billing="per-token"
            provider-jurisdiction="eu"
            requires="mistral-api-key"
            supports-models="mistral-medium-3.5,mistral-small-4,mistral-large-3,codestral"
            exposes-max-mode="no" exposes-thinking="yes"
            exposes-orchestration="no"
            best-for="Direct Mistral API access (provider-direct per-token; La Plateforme at api.mistral.ai) for the Mistral models — the EU-jurisdiction option for data-sovereignty / EU-regulatory workloads at low cost. Exposes a reasoning dial on the unified models (Mistral Small 4 / Medium 3.5) via the `reasoning_effort` parameter. Not routed via the Cursor pool. eu-jurisdiction is in the default allowed-jurisdictions list, so Mistral surfaces for any user with a mistral-api-key configured (no jurisdiction opt-in required, unlike cn providers)." />
    <method id="zai-api" name="z.ai API"
            provider="zai" billing="per-token"
            provider-jurisdiction="cn"
            requires="zai-api-key"
            supports-models="glm-5.2,glm-4.6,glm-4.5-air"
            exposes-max-mode="no" exposes-thinking="yes"
            exposes-orchestration="no"
            best-for="Direct z.ai (Zhipu AI) API access (provider-direct per-token; OpenAI-format and Anthropic-format endpoints at api.z.ai) for the GLM models — cost-conscious coding / reasoning / agentic work when the cn jurisdiction is acceptable and a zai-api-key is configured. Exposes the GLM thinking dial. Not routed via the Cursor pool. cn-jurisdiction: excluded by the default allowed-jurisdictions list unless the user opts into cn." />
    <method id="groq-api" name="Groq API"
            provider="groq" billing="per-token"
            provider-jurisdiction="us"
            requires="groq-api-key"
            supports-models="gpt-oss-120b,gpt-oss-20b"
            exposes-max-mode="no" exposes-thinking="yes"
            exposes-orchestration="no"
            best-for="Direct Groq API access (provider-direct per-token; OpenAI-format at api.groq.com) hosting OpenAI's open-weight gpt-oss (Apache-2.0) models — very low-cost, very high-throughput reasoning / coding under the us jurisdiction. Exposes the gpt-oss reasoning-effort dial. Groq is the pinned host that defines gpt-oss price + access (price = f(model, platform)); the open weights can also be self-hosted or run on another host. NOTE Groq is not among the operator's current subscriptions, so gpt-oss is catalog-present but only recommendable once a groq-api-key is configured." />
    <method id="cursor" name="Cursor"
            provider="cursor" billing="subscription-pool"
            provider-jurisdiction="us"
            requires="cursor-pro-or-ultra-subscription"
            supports-models="opus-4.8,claude-fable-5,opus-4.7,gpt-5.5,sonnet-4.6,gpt-5.4,gpt-5.3-codex,gpt-5.2,gemini-3.1-pro,gemini-3-pro,gpt-5,gpt-5.1-codex,grok-4.3,claude-4.5-haiku,gpt-5.4-mini,gpt-5.4-nano,composer-2,composer-2.5,gemini-2.5-flash,gemini-3-flash,gemini-3.5-flash,gpt-5-mini,kimi-k2.5"
            exposes-max-mode="yes" exposes-thinking="no"
            exposes-orchestration="no"
            best-for="Cursor IDE — single Platform covering both UI modes (Composer for multi-file autonomous editing; Chat for interactive model-picker). The operator picks the mode at task time based on the chosen Model: composer-2 / composer-2.5 imply Composer mode; frontier models (opus-4.7, gpt-5.5, sonnet-4.6, etc.) imply Chat mode. Cursor's own Auto and Premium routing modes are deliberately NOT enumerated as roadmodel-recommendable models because their routing is opaque (see `jurisdiction-context` for the rationale) — operators who want routing behavior pick a specific fixed model and let Cursor's pool handle the call. All routes through the $0-marginal Cursor pool. Defer to claude-code when the chosen model is Claude and claude.ai Max is active (Max budget is cheaper marginal cost than burning Cursor pool tokens on Claude calls that have a dedicated Anthropic subscription path)." />
  </access-methods>

  <selection-algorithm>
    Run this procedure for every prompt that needs a model recommendation.
    Quality wins at every step; cost only enters at step 5. Two hard
    pre-filters run first (Step 0a then Step 0b), because a model that is
    unavailable, or of a forbidden jurisdiction, can never be recommended
    regardless of quality.

    Step 0a — Filter out unavailable models.
      Drop every `<model>` whose id is listed as currently unavailable in
      `<availability-context>`. Such a model is never recommended,
      regardless of its tier ratings — its provider has restricted or
      withdrawn end-user access, so the recommendation would not be
      actionable. When an unavailable model would otherwise have been the
      best fit, the RATIONALE MUST disclose it — e.g., "Fable 5 was the
      strongest fit but is currently unavailable (access restricted by
      Anthropic); next-best available model returned instead."

    Step 0b — Filter candidate models by allowed jurisdictions.
      Read the user's allowed-jurisdictions list from
      `docs/user-context.md` (the SaaS surface reads it from the
      user's `profiles.allowed_jurisdictions` column). Default when
      absent is `[us, eu, uk, ca, au, jp, kr]`. Drop every `<model>`
      whose `jurisdiction` attribute is not in the allowed list. The
      result is the input candidate set for Step 1.

      When the filter eliminates the otherwise-best model, the
      RATIONALE in the output MUST disclose the substitution — e.g.,
      "Kimi K2.5 was the strongest cost fit at this tier but was
      excluded by the jurisdiction filter (jurisdiction=cn, allowed
      list=[us, eu, uk, ca, au, jp, kr]); next-best fit returned
      instead." A silent filter is a worse experience than a
      transparent one.

      If the filter would eliminate every candidate (no allowed
      provider serves the task's PRIMARY at the required tier),
      emit a hard error rather than picking a forbidden model:
      "No allowed-jurisdiction model meets the required tier for
      this task. Either widen the allowed-jurisdictions list or
      lower the quality requirement."

      Models whose `jurisdiction` is `unknown` are treated as
      forbidden under the default-allow-list — the maintainer must
      editorially set the jurisdiction before such a model becomes
      recommendable.

    Step 1 — Classify the prompt's task category.
      Pick exactly one PRIMARY category from `<task-categories>`. If the
      prompt clearly spans two, also pick a SECONDARY category and use the
      more demanding one as PRIMARY. Examples:
        - "Implement a multi-file refactor" → PRIMARY coding
        - "Design the auth architecture for our app" → PRIMARY planning
        - "Investigate the screenshot and fix the layout bug" → PRIMARY
          multimodal, SECONDARY coding
        - "Audit the entire repo for race conditions" → PRIMARY
          long-context, SECONDARY coding

    Step 2 — Score complexity dimensions.
      Rate each dimension Low / Medium / High:
        - Complexity (how many interacting concerns)
        - Ambiguity (judgment calls or trade-offs needed)
        - Scope (localized vs cross-cutting)
        - Novelty (known pattern vs creative problem-solving)
      Take the maximum of the four ratings as the overall complexity level.

    Step 3 — Set the minimum acceptable tier rating in PRIMARY.
        - Overall complexity High → require S in the PRIMARY category
        - Overall complexity Medium → require A or better
        - Overall complexity Low → B or better is acceptable

    Step 4 — Filter and rank candidates by quality.
      Filter the model list to those meeting the minimum rating in PRIMARY.
      Among survivors, prefer the model with the highest tier rating in
      PRIMARY. If multiple models tie at the top of PRIMARY, break the tie
      by rating in SECONDARY. If still tied, break by overall coverage —
      number of S/A ratings across the seven categories.

    Step 5 — Apply the cost tie-breaker.
      Only if step 4 produced two or more models with identical PRIMARY
      and SECONDARY ratings, recommend the one with the lower
      `output-price-per-1m`. Never use cost to demote a higher-quality
      model.

    Step 6 — Decide Max Mode.
      Enable Max Mode iff any of these hold:
        - Overall complexity is High
        - PRIMARY or SECONDARY is `long-context`
        - PRIMARY is `planning` with cross-cutting scope
        - Prompt explicitly requires extended reasoning across many files
      Otherwise leave Max Mode Off.

    Step 7 — Name a backup model (the BACKUP output field).
      After the primary MODEL is fixed, also name a BACKUP: a second model
      the user can fall back to if the primary is unavailable to them (no
      funded platform, provider outage, or a later access restriction).
      Choose it from the SAME candidate set that survived Step 0a (available)
      and Step 0b (allowed jurisdiction), applying the Step 4-5 quality-then-
      cost ranking — but:
        - Prefer a model from a DIFFERENT provider/family than the primary,
          so one provider's outage or access block does not take out both.
          Only fall back to a same-family model when no distinct provider
          meets the minimum PRIMARY tier.
        - The BACKUP must meet the same minimum PRIMARY tier (Step 3) and
          must NEVER be an unavailable model (Step 0a) or the same model as
          the primary.
      Emit `None` only when no qualifying distinct alternative exists. The
      BACKUP does not change the primary MODEL, MAX MODE, THINKING, or
      PLATFORM — it is advisory.

    Guardrails:
      - Never sacrifice quality to save cost — the cost step is a true-tie
        resolver, not a downgrade trigger.
      - For PRIMARY = `multimodal`, only consider models with tier-multimodal
        of S or A (currently: gemini-3-flash, gemini-3-pro, gemini-3.1-pro at S; sonnet-4.6, gpt-5.4, opus-4.7, opus-4.8, gpt-5.5 at A).
      - For PRIMARY = `long-context`, prefer models with native large
        context (opus-4.7 1M, opus-4.8 1M, gemini-3.1-pro 1M, grok-4.3 2M) over forcing
        a smaller-context model into Max Mode truncation.
      - For PRIMARY = `coding` at S-tier requirement, the candidate set is
        gpt-5.1-codex, gpt-5.3-codex, opus-4.7, opus-4.8, gpt-5.5; cost tie-breaker favors
        gpt-5.1-codex when the ratings are equivalent for the prompt.
      - Default to composer-2 for routine multi-file implementation when a
        coding-A rating suffices; escalate only on a concrete capability
        gap.
  </selection-algorithm>

  <access-selection>
    After `<selection-algorithm>` picks a MODEL and a Max Mode setting,
    run this procedure to pick a PLATFORM (access method) and a
    THINKING level. Read docs/user-context.md first to learn the user's
    subscription state, API keys, and platform preference order — the
    PLATFORM choice is meaningless without that input.

    Step A0 — Filter access methods by allowed jurisdictions.
      Reduce `<access-methods>` to those whose
      `provider-jurisdiction` attribute is in the user's allowed-
      jurisdictions list from `docs/user-context.md` (default
      `[us, eu, uk, ca, au, jp, kr]`). This is a defense-in-depth
      pass against the Step 0 filter in `<selection-algorithm>`:
      it prevents a chosen model from being routed through a
      provider in a forbidden jurisdiction even if the model
      itself passed Step 0 (e.g., a US-operator model that's
      only reachable via a reseller in a restricted jurisdiction).
      In practice the two filters usually agree, but the two-step
      structure handles the edge case cleanly. Methods with
      `provider-jurisdiction="unknown"` are treated as forbidden
      under the default allowed list.

    Step A — Filter access methods by model support.
      Reduce the candidate set from Step A0 to those whose
      `supports-models` attribute lists the chosen model id.
      The result is the candidate set of platforms.

    Step B — Tag credential availability (funded vs unfunded; do NOT drop).
      Tag each candidate access method FUNDED when the user's active
      subscriptions or API keys per docs/user-context.md satisfy its
      `requires` clause, and UNFUNDED otherwise. Keep BOTH: an unfunded
      method is still a valid platform that simply costs real money to
      use. Do NOT drop unfunded methods, and never let funding change the
      MODEL — the model is fixed by `<selection-algorithm>` on quality.
      This tag only feeds the cost ranking in Step C and the spend
      disclosure in the RATIONALE (consider every model, surface the
      cheaper path, never suppress the better pick).

    Step C — Rank by effective marginal cost.
      Order survivors lowest-cost first:
        1. `subscription-included` and `subscription-or-key` methods
           backed by an active subscription (marginal cost $0 until
           the subscription's usage budget is exhausted).
        2. `subscription-pool` methods backed by an active pool
           (marginal cost $0 until the pool is exhausted).
        3. `subscription-or-key` methods backed only by an API key,
           and `per-token` methods (real dollars per call).
      Within a tier, prefer the access method whose surface matches
      the task — Claude Code over claude.ai web for coding, Codex
      over ChatGPT app for autonomous coding sessions. Cursor's
      Composer vs Chat UI modes are both reached via the single
      `cursor` access method — the operator picks the mode at task
      time based on the chosen Model (composer-2 / composer-2.5
      imply Composer mode; frontier models imply Chat mode).
      If every survivor is UNFUNDED (the user holds no credential for
      any method that reaches the chosen model), still pick the cheapest
      of them by published per-token rate and disclose in the RATIONALE
      that reaching this model needs pay-per-token spend (and a
      credential the user has not declared, if so). Do NOT swap the
      model for a cheaper-to-reach one.

    Step D — Apply user-context.md preference overrides.
      docs/user-context.md may set a preferred platform order. When
      the user's preference puts a method ahead of a cheaper-on-paper
      one, honor the preference — it reflects subscription-utilization
      economics the catalog cannot see (e.g. burning Max budget that
      would otherwise expire vs preserving Cursor pool tokens for
      OpenAI / Google calls that have no other paid path on this
      account).

    Step E — Determine the THINKING level.
      Apply the decision rule in `<thinking-context>` against the
      overall complexity from `<selection-algorithm>` Step 2. If the
      chosen access method's `exposes-thinking` attribute is `no`,
      set THINKING `N/A` regardless of complexity.

    Step E2 — ORCHESTRATION: apply the decision rule from
      <orchestration-context>. If the chosen method has
      `exposes-orchestration="no"`, emit ORCHESTRATION: N/A.
      Otherwise consider PRIMARY task category, scope, and
      complexity per the rule list. Default ORCHESTRATION: None
      for well-scoped single deliverables on Claude Code.

    Step F — Resolve MAX MODE against the chosen PLATFORM.
      Max Mode is a Cursor-surface concept. If the chosen access
      method's `exposes-max-mode` attribute is `no`, set MAX MODE
      `Off` in the output even when `<selection-algorithm>` Step 6
      enabled it — Max Mode does not apply outside Cursor. The
      `<selection-algorithm>` rationale for enabling Max Mode (long
      context, cross-cutting reasoning) still holds; it just manifests
      as native context-window use on non-Cursor surfaces.

    Step G — Emit PLATFORM, THINKING, ORCHESTRATION, and MAX MODE
      in the output. The PLATFORM field is the `name` attribute of
      the chosen access method. The RATIONALE must name (a) the
      subscription or API key that pays for the call (or note the
      lack thereof), (b) why the THINKING level was chosen (or why
      it is `N/A`), and (c) why ORCHESTRATION was chosen — and when
      ORCHESTRATION is `Ultracode`, the rationale must call out the
      session-budget caveat (claude.ai Max budget burns 10-100x
      faster than at High effort).

    Guardrails:
    - Prefer a FUNDED access method, but NEVER hard-exclude an unfunded
      one and NEVER downgrade the MODEL to avoid spend. The model from
      `<selection-algorithm>` stands on quality. If no funded method
      reaches it, still recommend that model via its cheapest access
      method (Step C) and add a RATIONALE clause disclosing the
      pay-per-token spend — and, when the user has not declared that
      credential, noting that reaching this pick needs an API key or
      subscription they have not listed. Consider every model, surface
      the cheaper path, never suppress the better pick. (Jurisdiction is
      the one hard filter — Step A0 / `<selection-algorithm>` Step 0 —
      because it is a compliance constraint, not a cost one; funding is
      always a soft, tie-break preference.)
    - On a quality tie, do not burn pay-per-token spend when an already-
      paid subscription can serve the call: subscriptions are sunk cost,
      a per-token call is real cash out. This is a tie-break only — it
      never overrides the quality decision or drops a model.
    - When the chosen model is a Cursor-only model (composer-2,
      composer-2.5), the only valid access method is `cursor`
      (the operator uses Composer mode at task time). Cursor's
      Auto and Premium routing modes are intentionally NOT
      recommendable engines per `<jurisdiction-context>` — if
      Cursor routing behavior is desired, the operator picks
      a specific fixed model and lets the Cursor pool handle
      the call, which keeps the catalog's tier ratings and
      jurisdiction filter meaningful.
    - When the chosen model is Claude (opus-4.7, sonnet-4.6, claude-
      4.5-haiku) AND the user has both claude-max-subscription and
      cursor-pro-or-ultra-subscription active, prefer claude-code (or
      claude-web for non-coding tasks) over `cursor` — the Max
      subscription is dedicated Claude budget that the Cursor pool
      cannot substitute for, while the Cursor pool can absorb OpenAI /
      Google / xAI calls that have no other paid path.
  </access-selection>

  <conversation-principles>
    <principle>Start a New conversation when the prompt is self-contained and carries no dependency on prior turns, when switching to a significantly different domain or task type, or when accumulated context from earlier steps would add noise and increase cost without improving output quality.</principle>
    <principle>Continue the current conversation when the prompt explicitly builds on decisions, outputs, or context established in immediately prior steps — for example, iterating on a file just created, referencing a plan just written, or following up on an error just encountered.</principle>
    <principle>In roadmap annotation, treat each phase or major feature boundary as a natural New conversation break unless sequential steps within that phase share tight context dependencies.</principle>
    <principle>When in doubt, prefer New. A clean context produces more focused, higher-quality results than a bloated one — and input cost scales linearly with context size.</principle>
  </conversation-principles>

  <output-format>
    CRITICAL: Respond in EXACTLY this format and ABSOLUTELY NOTHING ELSE.
    Do not add any preamble, explanation, or perform any actions.

    BACKUP names the fallback model from Step 7 of the
    `<selection-algorithm>` — the next-best AVAILABLE model to use if the
    primary MODEL is unavailable to the user. Emit `None` when no distinct
    alternative qualifies; never emit an unavailable model (Step 0a) here.

    Single-prompt mode — output one block:

    MODEL: [Model Name]
    BACKUP: [Model Name or None]
    PLATFORM: [Access Method Name]
    MAX MODE: [On/Off]
    THINKING: [Off/Low/Medium/High/XHigh/Max/N/A]
    ORCHESTRATION: [None/PerPrompt/Ultracode/N/A]
    CONVERSATION: [New/Continue]
    RATIONALE: [2-3 sentences that MUST name (a) the prompt's PRIMARY task
                category, (b) the recommended model's tier rating in that
                category, (c) at least one headline benchmark or named
                leaderboard from <benchmark-sources> supporting the choice,
                (d) the cost tie-breaker outcome if step 5 of the
                selection-algorithm applied, (e) the subscription or API
                key that pays for the chosen PLATFORM (or note its
                absence), (f) why the THINKING level was set as stated
                (or why it is N/A), and (g) why ORCHESTRATION was set to
                its value, including the session-budget caveat when
                Ultracode is recommended. Also note the conversation
                handling decision.]

    Roadmap annotation mode — output one block per prompt, preceded by the
    prompt identifier or a brief label, in order:

    MODEL: [Model Name]
    BACKUP: [Model Name or None]
    PLATFORM: [Access Method Name]
    MAX MODE: [On/Off]
    THINKING: [Off/Low/Medium/High/XHigh/Max/N/A]
    ORCHESTRATION: [None/PerPrompt/Ultracode/N/A]
    CONVERSATION: [New/Continue]
    RATIONALE: [2-3 sentences that MUST name (a) the prompt's PRIMARY task
                category, (b) the recommended model's tier rating in that
                category, (c) at least one headline benchmark or named
                leaderboard from <benchmark-sources> supporting the choice,
                (d) the cost tie-breaker outcome if step 5 of the
                selection-algorithm applied, (e) the subscription or API
                key that pays for the chosen PLATFORM (or note its
                absence), (f) why the THINKING level was set as stated
                (or why it is N/A), and (g) why ORCHESTRATION was set to
                its value, including the session-budget caveat when
                Ultracode is recommended. Also note the conversation
                handling decision.]
    PROMPT: [Prompt # or short label]
  </output-format>

</model-selector>
