{% extends "base.html" %} {% block title %}Evals{% endblock %} {% block content %}

Evaluations

Test and benchmark agent performance

{{ summary.total_runs }}
Total Runs
{{ summary.passed }}
Passed
{{ summary.failed }}
Failed
{{ "%.0f"|format(summary.pass_rate) }}%
Pass Rate
{{ "%.0f"|format(summary.avg_accuracy) }}%
Avg Accuracy

📋 Scenarios

{% include "partials/scenario_list.html" %}

📈 Recent Results

{% include "partials/results_list.html" %}
{% endblock %}