{% extends "base.html" %} {% from "components/custom_dropdown.html" import render_dropdown %} {% from "components/help_macros.html" import tooltip, help_panel, help_step, help_tip %} {% set active_page = 'benchmark' %} {% block title %}Benchmark Configuration - Deep Research System{% endblock %} {% block extra_head %} {% endblock %} {% block content %}
Purpose: Benchmarks help you evaluate if your configuration works well, not for research papers or production use.
Responsible Usage: Use reasonable example counts to avoid overwhelming search engines. Default of 75 examples is good for testing.
Requirements: Benchmarks require an evaluation model for grading. Configure in Evaluation Settings below. Default: OpenRouter with Claude 3.7 Sonnet.