{# T55 admin evals index — terminal-brutalist re-skin. #} {# Lists scans with at least one llm_evals row. #} {% extends "admin_layout.html" %} {% block page_title %}admin :: evals{% endblock %} {% block breadcrumb %} admin/ evals {% endblock %} {% block content %}
07 — evals

Eval campaigns. {{ rows|length }} scans evaluated.

Fixture-based regression evals: run a prompt against a fixture set across all current models, then watch cost / quality / drift over time. Each row below is one scan that already has llm_evals results.

{# P5-T05: in-page explainer — what evals are FOR after the redesign, #} {# and where single-scan model comparison moved. #}
what evals are for read me

Eval campaigns run a prompt against a fixture set across every current model, so you can compare cost, quality, and drift over time. This is the place for prompt-regression questions that span many scans.

For a one-off model comparison on a single scan, you no longer need the eval matrix. Open the scan's trace, click into any LLM step, and use the quick-compare buttons in the per-step Explorer → browse scans.

budget & controls today: ${{ '%.4f'|format(today_usd or 0.0) }} / ${{ '%.2f'|format(cap_usd or 0.0) }}
browse scans →
scans with evals
{% if not rows %}
// No evals recorded yet. Open a scan from /admin/scans, then use the per-step quick-compare buttons in the Explorer, or click "start new eval" above to launch a fixture campaign.
{% else %} {% for row in rows %} {% endfor %}
scan target owner evals total cost last eval action
{{ row.scan_id_short }} {{ row.target_url }} {{ row.owner_email or '(anonymous)' }} {{ row.eval_count }} ${{ '%.4f'|format(row.total_cost) }} {{ row.last_eval_at }} open scan
{% endif %}
{# T48.1: shared modal for "Start new eval". select-scan mode -- the #} {# index page doesn't pin a scan id, so the picker is visible. #} {% with mode='select-scan' %} {% include 'partials/eval_new_modal.html' %} {% endwith %} {% endblock %}