Configure your evaluation backend

Connect to LangFuse

We need your API keys to access your datasets and evaluators.

Select a Dataset

Choose the dataset to evaluate your target function against.

Dataset Preview

Sample rows from your dataset.

Source Files & Target

Select the files Weco will optimize, then choose the target function.

Source Files
Select the Python files that Weco will modify during optimization.
Target Function
e.g. agent:run_chain imports run_chain from agent.py

Evaluators

Evaluators score your target function's outputs on each dataset example.

Evaluator Files
Select Python files containing your evaluator functions.

Space-separated. Use module:function for custom evaluators. Leave empty to default to your metric name ().
Managed Evaluators (LLM-as-a-Judge)
Managed evaluators run server-side in LangFuse using LLM-as-a-Judge. Enter the name of each evaluator as configured in your LangFuse dashboard under Evaluation → Evaluators.
A custom function that receives all aggregated evaluator scores as a dict and returns a single float. Use this to combine multiple evaluators into one composite metric (e.g. a weighted average).

Run Settings

Configure optimization parameters. All fields have sensible defaults.

Optimization
The metric name your evaluator outputs
Optimization iterations (default: 100)
LLM model. Leave blank for auto.
LangFuse
Parallel eval threads
Seconds to poll for server-side LLM-as-a-Judge scores (default: 900)
Execution
Seconds per evaluation
Save execution logs and code snapshots
Auto-apply best solution
Require manual review of each change

Review & Start

Confirm your configuration and begin optimization.

Next time, skip the wizard with:


      
    

Configuration Sent

Return to your terminal — optimization is starting.