AI Security Assessment Report

Comprehensive Red Team Evaluation

Client
{{ client_name }}
Assessor
{{ assessor_name }}
Engagement ID
{{ engagement_id }}
Assessment Period
{{ date_range }}
Report Date
{{ report_date }}
Scope
{{ scope }}
CONFIDENTIAL

Table of Contents

1.Executive Summary

Overall Risk Rating
{{ overall_risk | upper }}
{% if overall_risk == 'critical' %} Critical vulnerabilities were identified that allow immediate exploitation of the AI system. Urgent remediation is required before production use. {% elif overall_risk == 'high' %} High-severity vulnerabilities were identified that pose significant risk to the AI system's security posture. Prompt remediation is recommended. {% elif overall_risk == 'medium' %} Moderate vulnerabilities were identified. While not immediately exploitable in all cases, they should be addressed in the next development cycle. {% elif overall_risk == 'low' %} Minor issues were identified. The AI system demonstrates a generally strong security posture with room for hardening. {% else %} The assessment completed without identifying actionable security findings. Continue monitoring and periodic testing. {% endif %}
{{ total_tests }}
Total Tests
{{ severity_counts.critical }}
Critical
{{ severity_counts.high }}
High
{{ severity_counts.medium }}
Medium
{{ severity_counts.low }}
Low
{{ "%.1f" | format(pass_rate) }}%
Pass Rate

Key Findings

{% if recommendation_summary %}

Recommendations

{{ recommendation_summary }}

{% endif %}

2.Findings Overview

{% if findings %} {% for f in findings %} {% endfor %}
Severity Title Category OWASP CVSS Status
{{ f.severity }} {{ f.title }} {{ f.category }} {{ f.owasp_llm or '—' }} {{ "%.1f" | format(f.cvss) if f.cvss else '—' }} {{ 'Pass' if f.passed else 'Fail' }}
{% else %}

No findings to report.

{% endif %}

3.Detailed Findings

{% for f in findings %}
{{ f.id }}
{{ f.title }}
{% if f.cvss %}CVSS {{ "%.1f" | format(f.cvss) }}{% endif %} {{ f.severity }}

{{ f.description }}

Category
{{ f.category }}
Technique
{{ f.technique or '—' }}
Confidence
{{ f.confidence }}
Status
{{ 'Defended' if f.passed else 'Vulnerable' }}
Evidence: Request / Response

Prompt Sent

{{ f.prompt }}

Response Received

{{ f.response }}
{% if f.detector_evidence %}

Detector Analysis

{% for line in f.detector_evidence %}{{ line }}
{% endfor %}
{% endif %}
{% if f.owasp_llm or f.owasp_agentic or f.atlas or f.cwe %}
{% if f.owasp_llm %}
OWASP LLM {{ f.owasp_llm }}
{% endif %} {% if f.owasp_agentic %}
OWASP Agentic {{ f.owasp_agentic }}
{% endif %} {% if f.atlas %}
MITRE ATLAS {{ f.atlas }}
{% endif %} {% if f.cwe %}
CWE {{ f.cwe }}
{% endif %}
{% endif %}

Remediation

{{ f.remediation }}

{% if f.references %}

References: {% for ref in f.references %} {{ ref }}{% if not loop.last %}, {% endif %} {% endfor %}

{% endif %}
{% endfor %}

4.Compliance Evidence

4.1 NIST AI Risk Management Framework

The following table maps assessment findings to NIST AI RMF functions and categories. Controls marked as "Finding Identified" have findings that provide evidence for compliance activities.

{% for row in nist_mapping %} {% endfor %}
NIST AI RMF Function Category Status Related Findings
{{ row.function }} {{ row.category }} {% if row.findings %} Evidence Available {% else %} No Evidence {% endif %} {{ row.findings | join(', ') if row.findings else '—' }}

4.2 OWASP LLM Top 10 Coverage

{% for row in owasp_llm_coverage %} {% endfor %}
ID Risk Tested Findings Result
{{ row.id }} {{ row.name }} {% if row.tested %} Yes ({{ row.test_count }}) {% else %} Not Tested {% endif %} {{ row.finding_count }} {% if not row.tested %} N/A {% elif row.finding_count > 0 %} Issues Found {% else %} Pass {% endif %}

4.3 OWASP Agentic Security Top 10 Coverage

{% for row in owasp_agentic_coverage %} {% endfor %}
ID Risk Tested Findings Result
{{ row.id }} {{ row.name }} {% if row.tested %} Yes ({{ row.test_count }}) {% else %} Not Tested {% endif %} {{ row.finding_count }} {% if not row.tested %} N/A {% elif row.finding_count > 0 %} Issues Found {% else %} Pass {% endif %}

4.4 SOC 2 Type II Relevance

AI security findings map to SOC 2 Trust Services Criteria. The following identifies which Common Criteria are supported by evidence from this assessment.

{% for row in soc2_mapping %} {% endfor %}
SOC 2 Criteria Description AI Security Relevance Related Findings
{{ row.criteria }} {{ row.description }} {{ row.relevance }} {{ row.findings | join(', ') if row.findings else '—' }}

5.Methodology

Tools & Configuration

  • Platform: AI Purple Ops v{{ aipop_version }}
  • Adapter: {{ adapter_name }}
  • Model: {{ model_name }}
  • {% if suites_run %}
  • Suites: {{ suites_run | join(', ') }}
  • {% endif %}

Execution Details

  • Total Tests: {{ total_tests }}
  • Total Cost: ${{ "%.4f" | format(total_cost) }}
  • P50 Latency: {{ "%.1f" | format(p50_latency) }} ms
  • Generated: {{ report_timestamp }}

Scope Boundaries

{{ scope }}

This assessment covers AI-specific security risks. Traditional infrastructure, network, and application security testing is out of scope unless explicitly stated.

Frameworks Referenced

  • OWASP LLM Top 10 (2025)
  • OWASP Agentic Security Initiative
  • MITRE ATLAS
  • NIST AI Risk Management Framework
  • SOC 2 Trust Services Criteria

6.Appendix: Raw Evidence

Full request and response data for each test case. Expand individual sections to view raw evidence.

{% for f in all_results %}
{{ 'PASS' if f.passed else 'FAIL' }} {{ f.id }} — {{ f.title }}

Category: {{ f.category }} | Risk: {{ f.risk }} | Suite: {{ f.suite_id }}

Prompt

{{ f.prompt }}

Response

{{ f.response }}
{% if f.detector_evidence %}

Detector Analysis

{% for line in f.detector_evidence %}{{ line }}
{% endfor %}
{% endif %}
{% endfor %}
How findings are scored

Severities and framework mappings (OWASP, MITRE ATLAS, CVSS) come from a static taxonomy file maintained by the AIPOP project — not from LLM-generated analysis. Each test case ID maps to a hand-curated entry with a title, description, remediation, and framework references.

CVSS scores are estimates based on the vulnerability class, not the specific target’s environment. Adjust them based on your assessment context — a finding in a dev sandbox has different impact than the same finding in production.

Review all findings before sharing this report. Remove false positives, adjust severities for your context, and add engagement-specific notes.