AI Red Team Evaluation Report

Overall Safety

{{ overall_safety_score }}/100

Safety Score Break‑up

{#‑‑ Dummy data – replace later with real dict from backend ‑‑#} {% set breakup = { 'Breakability' : {'severity':'Low', 'score': 1.5}, 'Hallucination': {'severity':'High', 'score': 8.0}, 'Data Leaks' : {'severity':'Medium', 'score': 5.3}, 'Cybersecurity': {'severity':'High', 'score': 7.9}, 'Adversarial' : {'severity':'Medium', 'score': 6.2} } %}
    {% for label,info in breakup.items() %}
  • {{ label }} {{ info.severity }}    ({{ '{:.1f}'.format(info.score) }})
  • {% endfor %}

Regulatory Readiness

Presents your compliance levels against leading AI governance frameworks. Each percentage reflects the share of required controls you meet.

{% for framework,pct in regulatory_readiness.items() %}
{{ framework }}
{{ pct }}%
{% endfor %}

Findings Summary

Critical / High / Medium issues & total jailbreaks.

{% for label,val in { 'Critical':findings_summary.critical, 'High':findings_summary.high, 'Medium':findings_summary.medium, 'Total Jailbreaks':findings_summary.jailbreaks, 'Total Findings':findings_summary.total_findings }.items() %}
{{ label }}
{{ val }}
{% endfor %}

Plugin Summary

Failure‑rate for each plugin category.

{% for p in findings_list %} {% set fail_pct = (p.failed / p.total * 100) | round(0) %}
{{ p.title }}
{{ fail_pct }}%
{{ p.failed }} of {{ p.total }} failed
{% endfor %}

Summary of Evaluation Results

{% macro score_str(o) -%} {%- if o is none -%} N/A {%- elif o.overall_score is defined -%} {{ "{:.2f}".format(o.overall_score.score) }} ({{ o.overall_score.severity.value }}) {%- elif o.score is defined -%} {{ "{:.2f}".format(o.score) }}{% if o.severity is defined %} ({{ o.severity.value }}){% endif %} {%- else -%} {{ o }} {%- endif -%} {%- endmacro %} {% for res in report.eval_results %} {% endfor %}
Plugin IDPrompt GoalSuccess Risk Score (Overall)JailbreakDescription
{{ res.prompt.goal }} {{ res.plugin_id }} {{ res.responses[0].success }} {{ '{:.2f}'.format(res.risk_score.overall.overall_score.score) }} ( {{ res.risk_score.overall.overall_score.severity.value }} ) {{ res.responses[0].jailbreak_achieved or '-' }} {{ res.responses[0].description or '–' }}
Finding for Run ID: {{ res.run_id }}
{% macro row(label,val) -%} {%- endmacro %} {{ row('Plugin ID', res.plugin_id) }} {{ row('Prompt Goal', res.prompt.goal) }} {{ row('Base Prompt', res.prompt.base_prompt) }} {{ row('User Message', res.responses[0].user_message) }} {{ row('Assistant Response', res.responses[0].assistant_response) }} {{ row('Success', res.responses[0].success) }} {{ row('Risk Score (AIVSS)', score_str(res.risk_score.aivss.aivss_score)) }} {{ row('Base Score', score_str(res.risk_score.base.base)) }} {{ row('Breakability Score', score_str(res.risk_score.breakability.breakability)) }} {{ row('Overall Score', score_str(res.risk_score.overall.overall_score)) }} {{ row('Jailbreak Achieved', res.responses[0].jailbreak_achieved or 'N/A') }} {{ row('Description', res.responses[0].description) }}
FieldValue
{{ label }} {{ val if val is not none else '–' }}