{# ============================================================ DISPLAY SETTINGS field_table_decimal_places: number of decimal places shown in the per-field metric tables (score, precision, recall, F1). Default is 4. Change to e.g. 2 for "0.85" style or 6 for more precision. ============================================================ #} {% set field_table_decimal_places = 2 %} {% set _field_fmt = "%." ~ field_table_decimal_places ~ "f" %} Valtron Evaluation Report

Valtron Evaluation Report

Model Comparison and Performance Analysis
Generated: {{ timestamp }}

Summary

Models Evaluated: {{ num_models }}

Documents Tested: {{ num_documents }}

Original Prompt Template: {{ prompt_template|e }}

View Detailed Input/Output Analysis
{% if has_field_metrics and all_field_names|length == 1 %}

Performance Metrics

{% else %}

Performance Metrics

{% endif %} {% if has_field_metrics and all_field_names|length > 1 %}{% else %}{% endif %} {% if has_optimizations %} {% endif %} {% for result in results %} {% if result.metrics %} {% if has_optimizations %} {% endif %} {% endif %} {% endfor %}
Documents Model PromptPerformance Rank iAccuracyTotal Cost Avg Cost/Doc Total Time Avg Time/DocPrompt Manipulation
{{ result.metrics.total_documents }} {{ result.model }} {% if model_override_prompts.get(result.model) %} {% else %} {% endif %} {% if has_field_metrics and all_field_names|length > 1 %} {% set rank_info = performance_ranks[result.model] %} #{{ rank_info.rank }}{% if rank_info.delta_pct < 0 %} {{ "%.2f"|format(rank_info.delta_pct) }}%{% endif %} {% elif has_field_metrics and all_field_names|length == 1 %} {% set _sf = all_field_names | first %} {% set _fm = result.metrics.aggregated_field_metrics.get(_sf) if result.metrics and result.metrics.aggregated_field_metrics else none %} {{ "%.2f"|format(_fm.precision * 100) if _fm else 'N/A' }}% {% else %} {{ "%.2f"|format(result.metrics.accuracy * 100) }}% {% endif %} {% set _has_custom_cost_rate = result.llm_config and (result.llm_config.cost_rate is defined or result.llm_config.cost_rate_imputed is defined) %} {% set _is_best_cost = result.metrics.total_cost <= performance_best.best_total_cost %} ${{ "%.2f"|format(result.metrics.total_cost) }}{% if _has_custom_cost_rate %} (est){% endif %} {% if result.llm_config %} {% if result.llm_config.cost_rate is defined %} i {% elif result.llm_config.cost_rate_imputed is defined %} i {% endif %} {% endif %} ${{ "%.6f"|format(result.metrics.average_cost_per_document) }} {{ "%.2f"|format(result.metrics.total_time) }}s {{ "%.2f"|format(result.metrics.average_time_per_document) }}s {% set manipulations = prompt_optimizations.get(result.model, []) %} {% if manipulations %} {{ ', '.join(manipulations) }} {% else %} None {% endif %}
{% if has_optimizations %}

Note: Prompt manipulations can include: "explanation" (chain-of-thought reasoning), "few_shot" (example injection), and others. Click "View Prompt" to see the enhanced prompt for each model.

{% endif %}
{% if has_field_metrics and all_field_names|length > 1 %}

Per-Field Metrics

Detailed accuracy metrics for individual fields in structured outputs.
How are fields scored?

{% endif %}
{% if recommendation %}

AI-Powered Recommendation

{{ recommendation | replace('\n', '
') | safe }}
{% endif %}

Visual Analysis

Aggregate Metrics Comparison

{% if not has_field_metrics or all_field_names|length == 1 %}

Quality: Accuracy by Model

{% endif %}

Speed: Average Time per Document

Cost: Average Cost per Document

Distribution Histograms

{% if has_field_metrics and all_field_names|length == 1 %}Accuracy Distribution{% else %}Score Distribution{% endif %}

Processing Time Distribution

Cost Distribution

Model Comparison Scatter Plot

Copied to clipboard
{% if has_field_metrics %} {% endif %}