What model are you currently using?
Tell us which LLM model you're currently using for this task.
Select models to test against
We've suggested 3 models that would provide a good comparison. You can edit this list after generation.
This can significantly improve the performance of smaller, cheaper models like gpt-4o-mini.
Enter your prompt template
Your prompt must include at least one placeholder. Use {content} for plain-text documents, or use named keys like {text}, {topic} when your data's content field is a JSON object with those keys.
Provide labeled training data
Upload or link to your labeled dataset. This will be used for evaluation and few-shot learning.
Response Format
This is the Pydantic model inferred from your data. It is a best-effort inference — for plain-text labels with up to 50 unique values a Literal enum is generated; for JSON labels the structure is recursively inferred from your first example. If you need a more precise schema, pass your own Pydantic model as response_format when constructing ModelEval. See Structured extraction mode for details.
This schema will be saved to your config file and used for structured output.
Field-Level Metrics
Review & Download Configuration
Your configuration is ready! You can edit it below or download it directly.