{% extends "base.html" %}
{% block title %}{{ validation_name }}{% endblock %}
{% block content %}
{{ paper_title }} Comparing {{ alignment.n_actual }} human comment{{ 's' if alignment.n_actual != 1 }} against {{ alignment.n_ai }} AI comment{{ 's' if alignment.n_ai != 1 }}.
LLM: These files are written to the run directory on every validation. They're intermediate steps — feel free to ignore unless something looks wrong.{{ validation_name }}
{% if paper_title %}{{ llm_provider }} / {{ llm_model }}
· Base URL: {{ llm_base_url }}
{% if launched_at %}
· Launched: {{ launched_at }}
{% endif %}
{% if ended_at %}
· Ended: {{ ended_at }}
{% endif %}
Intermediate artifacts — for debugging why pairs got the verdicts they did
| File | Size | Description |
|---|---|---|
{{ f.abs_path }} |
{{ f.size }} | {{ f.description }} |
Everything produced by this validation lives in a single directory. Paths below are absolute so you can copy-paste straight into a shell.
Run directory: {{ run_files.run_dir }}
Emitted — total AI comments this reviewer produced. Caught — distinct human comments this reviewer helped match (as primary or supporting); can exceed Emitted when one AI comment matches several human comments. False alarms — AI comments that matched no human comment (sim < 0.35). Noise ratio — False alarms ÷ Emitted.
| Reviewer ID | Persona | Emitted | Caught | False alarms | Noise ratio |
|---|---|---|---|---|---|
{{ ps.reviewer_id or '—' }} |
{{ ps.persona }} | {{ ps.comments_emitted }} | {{ ps.actual_comments_helped_catch }} | {{ ps.false_alarms }} | {{ ps.noise_ratio if ps.noise_ratio is not none else '—' }} |
| Reviewer | Sub-rating | Value | Expected persona | Verdict |
|---|---|---|---|---|
| {{ a.reviewer_label }} | {{ a.sub_rating }} | {{ a.value }}/{{ a.scale }} | {{ a.expected_persona or '—' }} | {{ a.failure_mode.replace('_', ' ') }} |
{{ ai.reviewer_id or '—' }}
{% if ai.persona %} / {{ ai.persona }}{% endif %}
{% if ai.comment_id %}
({{ ai.comment_id }})
{% endif %}
{% if ai.severity %}
{{ ai.severity|upper }}
{% endif %}
sim {{ "%.2f"|format(sim) }}
{{ ai.summary }}
{% endif %} {% if ai.description %}{{ ai.description }}
{% endif %}{{ h.primary_ai.reviewer_id }} / {{ h.primary_ai.persona }}
(sim {{ "%.2f"|format(h.primary_sim) }})
{% if h.supporting_ai %} · plus {{ h.supporting_ai|length }} more{% endif %}
— click to see AI comment text
Primary match:
{{ ai_match_card(h.primary_ai, h.primary_sim) }} {% if h.supporting_ai %}Also matched by {{ h.supporting_ai|length }} additional AI comment{{ 's' if h.supporting_ai|length != 1 }}:
{% for sup in h.supporting_ai %} {{ ai_match_card(sup.ai, sup.sim) }} {% endfor %} {% endif %}Category: {{ m.category or 'uncategorized' }} · best AI similarity {{ "%.2f"|format(m.best_sim) }}
{{ fa.reviewer_id }} / {{ fa.persona }}
{{ fa.summary }}
{{ s.rationale }}
{% if s.prompt_patch_hint %}Hint: {{ s.prompt_patch_hint }}
{% endif %} {% if s.fix_hint %}Fix: {{ s.fix_hint }}
{% endif %} {% if s.example_misses %}