Configuration Wizard

Create your optimized model comparison configuration

Current Model
Test Models
Prompt
Training Data
Response Format
Field Metrics
Review

What model are you currently using?

Tell us which LLM model you're currently using for this task.

Select models to test against

We've suggested 3 models that would provide a good comparison. You can edit this list after generation.

This can significantly improve the performance of smaller, cheaper models like gpt-4o-mini.

Enter your prompt template

Your prompt must include at least one placeholder. Use {content} for plain-text documents, or use named keys like {text}, {topic} when your data's content field is a JSON object with those keys.

Provide labeled training data

Upload or link to your labeled dataset. This will be used for evaluation and few-shot learning.

Browse supports files up to 1 GB. For larger files, enter the path directly.

Relative paths resolve from: ...

📋 View required data format

Response Format

This is the Pydantic model inferred from your data. It is a best-effort inference — for plain-text labels with up to 50 unique values a Literal enum is generated; for JSON labels the structure is recursively inferred from your first example. If you need a more precise schema, pass your own Pydantic model as response_format when constructing ModelEval. See Structured extraction mode for details.

This schema will be saved to your config file and used for structured output.


            

    Field-Level Metrics

    Review & Download Configuration

    Your configuration is ready! You can edit it below or download it directly.

    ✅ Configuration generated successfully! Save this JSON file and pass it to your recipe, for example: ModelEval(config="./config.json", data=data)