Test Judge
Select traces in the viewer, pick models to compare, and the backend runs inference on-the-fly then feeds outputs into the judge.
Standard judges: 1+ models. Ranking judges: 2+ models.
Max: 50
Edit the prompt and test changes, or save to update the judge permanently