{# P2-T02 / P2-T03 / P2-T04 / P2-T06 / P2-T16 — per-step LLM Explorer. Renders one ``llm`` step from a scan's trace. The chrome (eyebrow, breadcrumb, card frames, palette) reuses the T55 admin tokens; this page only adds its own local rhythm (data grid, prompt/response side-by-side pre blocks, comparisons table styling). No Tailwind. No emojis. Vanilla JS only — used twice: once to lazy-fetch the S3 dump (P2-T03), once to wire the copy-curl button (P2-T04). Both helpers run < 80 LOC. Inputs (assembled by ``routes.debug._llm_step_view``): * ``view.scan_id`` / ``view.scan_id_short`` * ``view.step_index`` / ``view.step_index_padded`` / ``view.total_step_count`` * ``view.step_name`` / ``view.status_label`` / ``view.status_pill_class`` * ``view.duration`` / ``view.cost`` / ``view.model`` * ``view.llm_call_id`` / ``view.dump_url`` (lazy fetch target) * ``view.metadata_rows`` — list of ``{key, value}`` rendered as a data grid * ``view.comparisons`` — list of pre-formatted row dicts; empty list when the operator has never re-run this step * ``view.copy_curl_payload`` — JSON-safe dict spliced client-side into the clipboard-ready curl string * ``view.back_to_trace_url`` Re-run / mutation flows (P2-T07-T09) intentionally land later; this page renders a placeholder card for that section. #} {% extends "admin_layout.html" %} {% from 'partials/help_tooltip.html' import tip, tip_styles %} {% from 'partials/notifications.html' import notify_styles %} {% block page_title %}admin :: step :: {{ view.scan_id_short }} / {{ view.step_index_padded }}{% endblock %} {% block breadcrumb %} admin/ scans/ {{ view.scan_id_short }}/ step {{ view.step_index_padded }} {% endblock %} {% block head_extra %} {{ tip_styles() }} {{ notify_styles() }} {% if view.is_in_progress %}{% endif %} {% endblock %} {% block content %} {# ============== Sticky breadcrumb strip ============== #}
← back to trace · step {{ view.step_index_padded }} of {{ view.total_step_count }} {{ tip("This page renders one step from the scan's trace. The trace lives at the link to your left; the per-step Explorer never mutates production data.", position="bottom") }}
{# ============== Eyebrow + headline ============== #}
02.1 — scan debug · step {{ view.step_index_padded }} · llm

Step {{ view.step_index_padded }} — {{ view.step_name }} {% if view.is_cancelled_scan %}[cancelled scan]{% endif %}

{{ view.model or '—' }}· {{ view.duration }}· {{ view.cost }}· {{ view.status_label }} {% if is_admin and view.owner_email %} ·{{ view.owner_email }} {% endif %}
{% if view.is_in_progress %}
// in progress Step still running — refresh to update. Auto-refresh every 30s.
{% endif %} {% if view.error_class %}
[errored] {{ view.error_class }} {% if view.error_message %}
{{ view.error_message }}
{% endif %}
{% endif %} {# ============== BUCKET CLASSIFICATION card (intent_filter only) ============== The headline for the intent_filter step: the A/B/C endpoint->bucket assignment this step produces. Rendered ONLY when ``view.buckets`` is set (every other LLM step leaves it ``None``), and pinned above the raw prompt/response panels which stay available below for re-run / debugging. #} {% if view.buckets %}
bucket classification {{ tip("The intent_filter step's endpoint->bucket assignment. A = first-party primary DATA endpoints (what you actually scrape); B = session/auth PREREQUISITES needed before A works; C = noise / trackers / ignored.") }}
{% if view.buckets.skipped %}
Generic scan — no intent filtering was applied; all captured endpoints were treated uniformly.
{% else %}
A · data endpoints {{ tip("Bucket A: the first-party primary data endpoints you actually scrape. These return the records the user wants.") }} {{ view.buckets.a_count }}
{% if view.buckets.a %}
    {% for url in view.buckets.a %}
  • {{ url }}
  • {% endfor %}
{% else %}
none — no data endpoints classified into A
{% endif %}
B · prerequisites {{ tip("Bucket B: session / auth prerequisites that must run before the A endpoints return data (token mints, init / bootstrap calls, CSRF fetches).") }} {{ view.buckets.b_count }}
{% if view.buckets.b %}
    {% for url in view.buckets.b %}
  • {{ url }}
  • {% endfor %}
{% else %}
none — A works without prerequisites
{% endif %}
C · ignored {{ tip("Bucket C: noise — trackers, analytics, third-party widgets, and anything irrelevant to the scrape. Filtered out of validation + synthesis.") }} {{ view.buckets.c_count }}
{% if view.buckets.c %}
    {% for url in view.buckets.c %}
  • {{ url }}
  • {% endfor %}
{% else %}
none — nothing classified as noise
{% endif %}
{% if view.buckets.promoted_to_b or view.buckets.demoted_from_a or view.buckets.defaulted_to_c or view.buckets.rationale %}
guardrail corrections + rationale
{% if view.buckets.promoted_to_b %}
promoted to B ({{ view.buckets.promoted_to_b_count }}) {{ tip("Guardrail moved these out of A into B: the model first called them data endpoints, but they're really prerequisites that gate the real data.") }} {{ view.buckets.promoted_to_b | join(', ') }}
{% endif %} {% if view.buckets.demoted_from_a %}
demoted from A ({{ view.buckets.demoted_from_a_count }}) {{ tip("Guardrail removed these from A: classified as data by the model but judged not to be first-party primary data endpoints.") }} {{ view.buckets.demoted_from_a | join(', ') }}
{% endif %} {% if view.buckets.defaulted_to_c %}
defaulted to C ({{ view.buckets.defaulted_to_c_count }}) {{ tip("Endpoints the model left unclassified; the guardrail defaulted them to C (ignored) rather than guess them into A or B.") }} {{ view.buckets.defaulted_to_c | join(', ') }}
{% endif %} {% if view.buckets.rationale %}
rationale {{ tip("The model's free-text explanation for the bucket assignment, persisted with the scan.") }} {{ view.buckets.rationale }}
{% endif %}
{% endif %} {% endif %}
{% endif %} {# ============== TIMING + METADATA card ============== #}
timing + metadata {{ tip("Server-measured timings, prompt + token + cost stats, and the row id you'd reach for in the DB or with the copy-curl button.") }}
{% for row in view.metadata_rows %}
{{ row.key }}{%- if row.key == 'prompt_name' %} {{ tip("The registered prompt template invoked by this step (e.g. flow_confirm, scan_synthesis).") }} {%- elif row.key == 'prompt_version' %} {{ tip("Version of the prompt template at the time this call ran; bumps when the template body is edited.") }} {%- elif row.key == 'model' %} {{ tip("Model identifier passed to the provider SDK at call time.") }} {%- elif row.key == 'provider' %} {{ tip("Provider that handled the call (anthropic / openai / xai).") }} {%- elif row.key == 'started_at' %} {{ tip("Wall-clock UTC start, recorded by the wrapper that called the SDK.") }} {%- elif row.key == 'ended_at' %} {{ tip("Wall-clock UTC end, computed as started_at + latency_ms.") }} {%- elif row.key == 'elapsed' %} {{ tip("Total time spent inside the SDK call, server-measured (not client RTT).") }} {%- elif row.key == 'retry_count' %} {{ tip("Number of retries the wrapper attempted before this row's terminal status was recorded.") }} {%- elif row.key == 'input_tokens' %} {{ tip("Prompt tokens billed by the provider for this call.") }} {%- elif row.key == 'output_tokens' %} {{ tip("Completion tokens billed by the provider for this call.") }} {%- elif row.key == 'cache_read_tokens' %} {{ tip("Tokens served from the provider's prompt-cache layer (cheaper than full prompt tokens).") }} {%- elif row.key == 'cache_write_tokens' %} {{ tip("Tokens written to the provider's prompt-cache for reuse on subsequent calls.") }} {%- elif row.key == 'cost_usd' %} {{ tip("Server-side cost computed from token counts + the model's price card at call time.") }} {%- elif row.key == 's3_prompt_path' %} {{ tip("Object-store path to the gzipped + KMS-encrypted prompt dump (system + user).") }} {%- elif row.key == 's3_response_path' %} {{ tip("Object-store path to the gzipped response dump (parsed + raw).") }} {%- elif row.key == 'db_row' %} {{ tip("Primary key of the llm_calls row backing this step.") }} {%- endif %}
{%- if row.key == 'db_row' -%} {{ row.value }} {%- elif row.key in ('s3_prompt_path', 's3_response_path') -%} {{ row.value }} {%- else -%} {{ row.value }} {%- endif -%}
{% endfor %}
{# ============== PROMPT + RESPONSE card ============== #}
prompt + response {{ tip("System + user prompts and the parsed structured-output response. Fetched lazily from S3 to keep the page render fast.") }}
{% if view.dump_url %} // click to load — keeps page render fast on cached scans {% else %} // dump unavailable — pre-T15 scan or missing llm_call row {% endif %}
system prompt





      
user prompt





      
parsed response (structured output)



    
{# ============== RE-RUN panel (P2-T07/T08/T09) ============== #}
re-run {{ tip("Fires one LlmExperiment row against the source call. Always an experiment — never mutates the production scan. Cost goes to your admin daily cap.") }}
{% if not view.rerun_enabled %}
// {{ view.rerun_locked_reason or 'unavailable' }}
{% else %} {# Primary form: model dropdown + system/user textareas + run button. Plain HTML form POST → 303 redirect back to this page with ?just_ran= so the comparisons table picks up the new row on full re-render. CSRF travels in the form body (the require_csrf dependency accepts either header OR ``csrf_token`` field). #}
{# ── Left column: MODEL + QUICK COMPARE ── #}
model {{ tip("Model id passed to the provider SDK. Re-runs are temperature=0 for determinism. The list moves to the Models Registry in P4-T08.") }}
quick compare {{ tip("One-click re-run with a different model and the original prompts unchanged. Each fires a separate experiment row — useful for 'how would this look on a cheaper / smarter / different-provider model'.") }}
{% for qm in view.quick_compare_models %} {# Each quick-compare button is its own tiny form so the click submits independently of the main form's textareas. No JS — the browser handles the inline POST natively. #} {# Empty system + user => rerun handler falls back to the source dump verbatim (no prompt edits). #} {% endfor %}
{# ── Right column: PROMPT EDIT + RUN button ── #}
prompt edit {{ tip("Leave both fields empty to re-run with the source call's prompts verbatim. Edits are diffed against the original in the experiment's diff view.") }}
{% set _cap = '%.2f' % (view.cap_usd or 0.0) %} {% if view.spent_today_usd is none %} // est. cost ~$0.05 · daily budget $? / ${{ _cap }} {% else %} {% set _spent = '%.2f' % view.spent_today_usd %} // est. cost ~$0.05 · daily budget = view.cap_usd %} class="danger"{% elif view.cap_usd and view.spent_today_usd >= (view.cap_usd * 0.8) %} class="warn"{% endif %}>${{ _spent }} / ${{ _cap }} {% endif %}
{% endif %}
{# ============== FOCUSED DIFF panel (FIX 2 / ?focus_experiment) ============== #} {% if view.focus_experiment %} {% set fx = view.focus_experiment %}
focused re-run · {{ fx.id_short }} {{ fx.outcome_label }} {{ tip("The experiment pointed at by ?focus_experiment. Shows the full result -- including the error message when it failed -- so a failed re-run is diagnosable.") }}
model{{ fx.model }} status{{ fx.outcome_label }} cost{{ fx.cost }} latency{{ fx.latency }} input tokens{{ fx.tokens_in }} output tokens{{ fx.tokens_out }} cache read{{ fx.cache_read_tokens }} cache write{{ fx.cache_write_tokens }} fired{{ fx.when_iso }}
{% if fx.original %}
// original call: {{ fx.original.model }} · in {{ fx.original.input_tokens }} · out {{ fx.original.output_tokens }} · {{ fx.original.cost }}
{% endif %} {% if fx.error_message %}
error message{{ fx.error_message }}
{% endif %}
system prompt
{% if fx.system %}{{ fx.system }}{% else %}// (empty){% endif %}
user prompt
{% if fx.user %}{{ fx.user }}{% else %}// (empty){% endif %}
response
{% if fx.response %}
{{ fx.response }}
{% elif fx.errored %}
// errored before a response was produced
{% else %}
// (no response body)
{% endif %}
{% endif %} {# ============== COMPARISONS card (P2-T06) ============== #}
comparisons · {{ view.comparisons|length }} previous re-run{{ '' if view.comparisons|length == 1 else 's' }} {{ tip("Every prior LlmExperiment row fired against this step. Each row links to the side-by-side diff view.") }}
{% if view.just_ran %} {# P2-T11: surface the just-fired experiment id + a deep-link to the diff view. Rendered once per ?just_ran= load; not audited (the audit row fired on the mutation, not on the read). #}
{{ view.just_ran.id_short }} {{ view.just_ran.model }} {% if view.just_ran.ok == 'ok' %} // ok {% else %} // errored {% endif %} open diff →
{% endif %} {% if not view.comparisons %}
no re-runs yet · use the Re-run panel above to start
{% else %}
{% for row in view.comparisons %} {# data-experiment-id is consumed by the just-ran notification wiring at the bottom of the page: when ?just_ran={exp_id} is present we look up this row's model / cost / latency to build the notification body. #} {# P5-T03: step-focused diff link (?focus_experiment={exp_id}). #} {% endfor %}
when {{ tip("Relative time since the experiment fired.") }} model {{ tip("Model the experiment ran against.") }} edits {{ tip("Whether the prompts were edited vs the source call (today this is a coarse 'none / edited' signal; P5 ships a semantic diff).") }} cost {{ tip("Cost the experiment billed to the admin's daily cap.") }} latency {{ tip("Wall-clock SDK call duration on the experiment.") }} outcome {{ tip("Aligned: parseable, no error. Errored: ran but the response was missing or unparseable.") }} fired by {{ tip("Admin who fired the experiment; the truncated handle is the local-part of their email.") }} {{ tip("Diff view: opens this step page with the experiment focused (?focus_experiment={exp_id}). The cross-scan index lives at /admin/experiments.") }}
{{ row.when_rel }} {{ row.model }} {{ row.edits_flag }} {{ row.cost }} {{ row.latency }} {{ row.outcome_label }} {{ row.fired_by }}diff →
{% endif %}
{% endblock %} {% block scripts %} {% endblock %}