{% import "_macros.html" as ui %} {# ---------- 1. Header ---------- #}

Instrument Review

{{ review.filename }}
Reviewed {{ review.timestamp.strftime("%Y-%m-%d %H:%M") }} · Model: {{ review.parameters.model }} · Categories: {{ review.parameters.category_rubric_version }} · Review ID: {{ review.review_id }}
{# ---------- 2. Score panel ---------- #}
{{ '%.0f'|format(review.overall_score) }}
{{ score_band(review.overall_score) }}
Overall defense score / 100
{{ '%.0f'|format(review.completion_likelihood) }}
{{ score_band(100 - review.completion_likelihood) }} resistance
Bot completion likelihood / 100
lower is better
{{ review.coverage }} / {{ review.categories|length }}
{{ coverage_band }} breadth
Defense breadth
categories with a present defense
{% set failed_cats = review.categories | selectattr("error") | list %} {% if failed_cats %}
Note: {{ failed_cats|length }} categor{{ 'ies' if failed_cats|length != 1 else 'y' }} did not complete and {{ 'were' if failed_cats|length != 1 else 'was' }} weighted at zero in the overall score.
{% endif %} {# ---------- 3. Bot-resistance assessment ---------- #} {% if review.overall_feedback and review.overall_feedback.headline %}

Bot-resistance assessment

{{ review.overall_feedback.headline }}

{% for para in review.overall_feedback.paragraphs %}

{{ para }}

{% endfor %}
{% endif %} {# ---------- 4. Methods statement ---------- #}

Methods statement for your paper

Paste this into the Methods section of your manuscript to signal the instrument was reviewed for non-human-response resistance.
{# ---------- 4b. Reproducibility (GUIDE-LLM B.2 / B.4 / B.5 / C.1 / C.2) ---------- #}

Reproducibility

All reviewer calls are single-shot API calls via the LangChain {{ "ChatGoogleGenerativeAI" if review.parameters.provider == "google" else "ChatOpenAI" }} client with a Pydantic structured-output schema (no multi-turn chat, no fine-tuned model, stateless between calls). Sampling parameters: temperature={{ review.parameters.temperature }}, seed={{ review.parameters.seed }}.

No system instructions are used; all reviewer guidance is in the user prompt. Each per-category reviewer runs in parallel against the same survey-blob prefix; a separate consolidation call merges cross-category findings.

Exact prompt templates and the rubric definitions are versioned in the Survey Shield package (surveyshield/review/dimensions.py at rubric version {{ review.parameters.category_rubric_version }}; canonical text on GitHub).

{# ---------- 5. Survey at a glance ---------- #}

Survey at a glance

Title{{ survey.name }}
Questions{{ review.parsed_summary.n_questions }}
Blocks{{ review.parsed_summary.n_blocks }}
Force-response{{ review.parsed_summary.force_response_pct }}% of questions
Question types {% for t, n in review.parsed_summary.type_histogram.items() -%} {{ t }}: {{ n }}{% if not loop.last %} · {% endif %} {%- endfor %}
{# ---------- 6. Per-category cards ---------- #}

Per-category review

{% for c in review.categories %}

{{ category_display(c.category) }}

{{ '%.0f'|format(c.score) }} / 100 · {{ score_band(c.score) }}
{% if c.error %}

Reviewer error: {{ c.error }}

{% else %}

{{ c.summary }}

{% if c.findings %}
{{ c.findings|length }} finding{% if c.findings|length != 1 %}s{% endif %} {% for f in c.findings %}
{{ ui.severity_class(f.severity) }} {% if f.locator %} {{ f.locator }} {% endif %}
{{ f.rationale }}
{% if f.excerpt %}
"{{ f.excerpt }}"
{% endif %} {% if f.suggested_fix %}
→ {{ f.suggested_fix }}
{% endif %} {% if f.contributing_categories %}
Flagged by: {{ f.contributing_categories | map(attribute='value') | map('replace', '-', ' ') | join(', ') }}
{% endif %}
{% endfor %}
{% endif %} {% endif %}
{% endfor %} {# ---------- 7. Top recommendations ---------- #}

Top recommendations

{% if review.recommendations %}
    {% for r in review.recommendations %}
  1. {{ r.title }}
    {{ r.detail }}
    Surfaced by: {{ r.related_categories | map(attribute='value') | map('replace', '-', ' ') | join(', ') }} {% if r.projected_lift is not none and r.projected_lift > 0 %} · Projected lift on overall score: +{{ '%.1f'|format(r.projected_lift) }} {% endif %}
  2. {% endfor %}
{% else %}

No actionable recommendations — every reviewer either passed cleanly or returned only low-impact findings.

{% endif %} {# ---------- 7b. Score-lift simulator ---------- #} {% if score_lift_tiers %}

Score-lift simulator

Cumulative projection if researchers add one well-placed defense item per category, starting with the highest-impact recommendation. Assumes each new item lifts its category to {{ score_band(75) }} (75/100) — a conservative single-item floor; multiple well-distributed items can push higher.
{% for t in score_lift_tiers %} {% endfor %}
ActionCategory targetedProjected overall scoreBand
No additions {{ '%.0f'|format(review.overall_score) }} / 100 {{ score_band(review.overall_score) }}
Add {{ t.n_added }} high-impact item{{ 's' if t.n_added != 1 else '' }} {{ t.category }} {{ '%.0f'|format(t.projected_score) }} / 100 (+{{ '%.1f'|format(t.delta) }}) {{ t.band }}
{% endif %} {# ---------- 8. Narrative summary ---------- #}

Narrative summary

{{ review.narrative_summary }}
{# ---------- 9. Evidence appendix ---------- #}

Evidence appendix

{% for q in survey.questions %}
{{ q.qid }} · {{ q.type }}{% if q.subtype %}/{{ q.subtype }}{% endif %} · {{ q.text[:80] }}{% if q.text|length > 80 %}…{% endif %}
{{ q.text }}
{% if q.choices %}
Choices: {{ q.choices|join(' | ') }}
{% endif %} {% if q.force_response %}
force_response: ON
{% endif %} {% if q.hidden %}
hidden
{% endif %} {% if findings_by_qid.get(q.qid) %}
Flagged by: {% for cat_name, f in findings_by_qid[q.qid] -%} {{ cat_name }}{% if not loop.last %}, {% endif %} {%- endfor %}
{% endif %}
{% endfor %}
{# ---------- 10. Citation block ---------- #}

Cite Survey Shield

APA
{{ citation.apa }}
BibTeX
{{ citation.bibtex }}
{# ---------- 11. Footer ---------- #} {{ ui.copy_button_script() }}