{% extends "base.html" %} {% block title %}{{ data.review_name }}{% endblock %} {% block content %}

← Back to review

{{ data.review_name }}

{{ data.paper.title }}

Reminder: this output is a draft-polishing aid — it is not a peer-review generator. Most venues have strict policies against using LLMs in assigned reviews. Please use it at your own discretion, and indicate when you have used it. Every comment is a suggestion to evaluate, not a finding to accept. AI reviewers hallucinate, miss context, and over-confidently flag non-issues. Expect to reject roughly half of what you see.

Download report (.md) Download per-reviewer data (.md) {% if data.clarity_review and data.clarity_review.comments %} Download writing clarity review (.md) {% endif %}
{% if data.llm_provider and data.llm_model %}

LLM: {{ data.llm_provider }} / {{ data.llm_model }}  ·  Base URL: {{ data.llm_base_url or '(default)' }} {% if data.launched_at %}  ·  Launched: {{ data.launched_at }} {% endif %} {% if data.ended_at %}  ·  Ended: {{ data.ended_at }} {% endif %}

{% endif %}
{% if run_files %} {# One macro for the three run-files tables so column changes land in one place. #} {% macro run_files_table(files) -%} {% for f in files %} {% endfor %}
FileSizeDescription
{{ f.abs_path }} {{ f.size }} {{ f.description }}
{%- endmacro %}

Source files on disk

Everything produced by this review lives in a single directory. Paths below are absolute so you can copy-paste straight into a shell.

Run directory: {{ run_files.run_dir }}

{% if run_files.inputs %}

Inputs

{{ run_files_table(run_files.inputs) }} {% endif %} {% if run_files.outputs %}

Outputs

{{ run_files_table(run_files.outputs) }} {% endif %} {% if run_files.internal %}
Internal artifacts ({{ run_files.internal|length }})
{{ run_files_table(run_files.internal) }}
{% endif %}
{% endif %}

Selected reviewers

{{ data.selected|length }} reviewers selected by topic similarity, then diversified across personas.

{% for r in data.selected | sort(attribute='id') %} {% endfor %}
IDDomainPersonaSelection relevance
{{ r.id }} {{ r.domain }} {{ r.persona }} {{ "%.3f"|format(r.score) }}
{% if data.n_reviewers_total is defined %}

Format-fix retries: {{ data.n_format_repairs or 0 }} of {{ data.n_reviewers_total }} reviewer(s) (incl. the clarity reviewer) needed a markdown-repair pass to produce usable output. {% if (data.n_format_repairs or 0) > 0 %} A high count suggests the model is deviating from the expected comment format — consider a different provider/model or tightening the persona prompt. {% endif %}

{% endif %}
{% if data.clarity_review and data.clarity_review.comments %}

Writing clarity review

Always-on reviewer focused on writing quality (flow, terminology, grammar, figures, structure). Not part of the ranked issues and not compared against human reviewers during Validation — {{ data.clarity_review._reviewer_id }} / {{ data.clarity_review._persona }}.

{% for c in data.clarity_review.comments %} {% set sev = c.severity or 'minor' %}
{{ sev|upper }} category: {{ c.category or 'clarity' }}, section: {{ c.section_reference or 'general' }}
{% if c.summary %}

{{ c.summary }}

{% endif %} {% if c.description %}

{{ c.description }}

{% endif %}
{% endfor %}
{% endif %}

Ranked issues

Ordered by commonality × importance: issues multiple reviewers raise, weighted by how severe they are, rise to the top.

How the score is computed

Each issue cluster groups comments that look like the same concern (cosine similarity on summary + keywords). Clusters are then scored:

score = num_distinct_reviewers × (0.5·avg_severity + 0.5·max_severity)

where severity weights are:

  • MAJOR = 3 (would block acceptance)
  • MODERATE = 2 (significant revision)
  • MINOR = 1 (improvement but not blocking)

Higher-severity comments AND broader reviewer agreement both push a cluster up. Intuition: one reviewer's minor issue scores 1·(0.5·1 + 0.5·1) = 1; three reviewers agreeing on a moderate issue scores 3·(0.5·2 + 0.5·2) = 6; five reviewers on a mix of major and moderate scores 5·(0.5·2.4 + 0.5·3) = 13.5.

The severity badge next to each cluster shows the worst individual comment in the cluster, not an average. A cluster can legitimately have score 6 with a MINOR label — that means six reviewers all flagged the same thing as minor, and none escalated it to moderate or major. The count of agreeing reviewers still pushes its score up. The small coloured chips beside the label ("6 minor" etc.) show the severity mix so you can tell a broadly-agreed nit from a genuinely severe issue at a glance.

What the category and section fields show

Each cluster row shows two tags pulled from the representative comment (the highest-severity member):

  • category — a classification of what kind of concern this is, snapped to a fixed vocabulary at parse time. Examples: novelty, evaluation, methodology, reproducibility, presentation, deployment, related_work, correctness. This is what the validator uses to route each miss to an expected persona (Methodology Critic catches methodology, Novelty Hunter catches novelty, etc.). Shown as general if the LLM didn't pick a specific one.
  • section — a normalized anchor into where in the paper the comment applies, extracted from the reviewer's text by pattern matching: Section 3.2, Table 2, Figure 4, Algorithm 1, Eq. 7, or named parts like Abstract, Introduction, Related Work, Conclusion, References. Shown as general when the comment doesn't point at any specific part of the paper.

Clusters group comments by semantic similarity, so the representative's category and section are shared by most of the cluster — but individual members may phrase section references slightly differently. Expand the "other reviewers raised the same issue" details below each cluster to see the full text each reviewer wrote.

{% for c in data.ranked_clusters %} {% set rep = c.representative %} {% set sev = rep.severity or 'minor' %}
#{{ c.rank }} {{ sev|upper }} {# Severity mix — makes "score 6, MINOR" readable as "6 reviewers all flagged this as minor". Only shown for multi-member clusters. #} {% if c.members|length > 1 %} {% set n_major = c.members|selectattr('severity', 'equalto', 'major')|list|length %} {% set n_moderate = c.members|selectattr('severity', 'equalto', 'moderate')|list|length %} {% set n_minor = c.members|selectattr('severity', 'equalto', 'minor')|list|length %} {% if n_major %}{{ n_major }} major{% endif %} {% if n_moderate %}{{ n_moderate }} moderate{% endif %} {% if n_minor %}{{ n_minor }} minor{% endif %} {% endif %}

Score {{ c.score }} {{ c.size }} comment{{ 's' if c.size != 1 }} {{ c.num_distinct_reviewers }} reviewer{{ 's' if c.num_distinct_reviewers != 1 }} category: {{ rep.category or 'general' }} section: {{ rep.section_reference or 'general' }}

Description. {{ rep.description }}

{% if c.members|length > 1 %}
{{ c.members|length - 1 }} other reviewer{{ 's' if c.members|length - 1 != 1 }} raised the same issue
    {% for m in c.members %} {% if m is not sameas rep %}
  • {{ m._reviewer_id }} ({{ m._persona }}) {{ (m.severity or 'minor')|upper }} category: {{ m.category or 'general' }}, section: {{ m.section_reference or 'general' }}
    {% if m.summary %}

    {{ m.summary }}

    {% endif %} {% if m.description %}

    {{ m.description }}

    {% endif %}
  • {% endif %} {% endfor %}
{% endif %}
{% else %}

No issues were raised — surprising; the reviewers produced no comments for this paper.

{% endfor %}
{% endblock %}