{# ============================================================ DISPLAY SETTINGS field_table_decimal_places: number of decimal places shown in the per-field metric tables (score, precision, recall, F1). Default is 4. Change to e.g. 2 for "0.85" style or 6 for more precision. ============================================================ #} {% set field_table_decimal_places = 2 %} {% set _field_fmt = "%." ~ field_table_decimal_places ~ "f" %}
Models Evaluated: {{ num_models }}
Documents Tested: {{ num_documents }}
Original Prompt Template:
{{ prompt_template|e }}
| Documents | Model | Prompt | {% if has_field_metrics and all_field_names|length > 1 %}Performance Rank | {% else %}Accuracy | {% endif %}Total Cost | Avg Cost/Doc | Total Time | Avg Time/Doc | {% if has_optimizations %}Prompt Manipulation | {% endif %}
|---|---|---|---|---|---|---|---|---|---|
| {{ result.metrics.total_documents }} | {{ result.model }} | {% if model_override_prompts.get(result.model) %} {% else %} {% endif %} | {% if has_field_metrics and all_field_names|length > 1 %} {% set rank_info = performance_ranks[result.model] %} #{{ rank_info.rank }}{% if rank_info.delta_pct < 0 %} {{ "%.2f"|format(rank_info.delta_pct) }}%{% endif %} {% elif has_field_metrics and all_field_names|length == 1 %} {% set _sf = all_field_names | first %} {% set _fm = result.metrics.aggregated_field_metrics.get(_sf) if result.metrics and result.metrics.aggregated_field_metrics else none %} {{ "%.2f"|format(_fm.precision * 100) if _fm else 'N/A' }}% {% else %} {{ "%.2f"|format(result.metrics.accuracy * 100) }}% {% endif %} | {% set _has_custom_cost_rate = result.llm_config and (result.llm_config.cost_rate is defined or result.llm_config.cost_rate_imputed is defined) %} {% set _is_best_cost = result.metrics.total_cost <= performance_best.best_total_cost %} ${{ "%.2f"|format(result.metrics.total_cost) }}{% if _has_custom_cost_rate %} (est){% endif %} {% if result.llm_config %} {% if result.llm_config.cost_rate is defined %} {% elif result.llm_config.cost_rate_imputed is defined %} {% endif %} {% endif %} | ${{ "%.6f"|format(result.metrics.average_cost_per_document) }} | {{ "%.2f"|format(result.metrics.total_time) }}s | {{ "%.2f"|format(result.metrics.average_time_per_document) }}s | {% if has_optimizations %}{% set manipulations = prompt_optimizations.get(result.model, []) %} {% if manipulations %} {{ ', '.join(manipulations) }} {% else %} None {% endif %} | {% endif %}
Note: Prompt manipulations can include: "explanation" (chain-of-thought reasoning), "few_shot" (example injection), and others. Click "View Prompt" to see the enhanced prompt for each model.
{% endif %}
Detailed accuracy metrics for individual fields in structured outputs.
How are fields scored?