modernization_classic (v1.0.0)
2 run summaries ยท 2 models
Mean win rate
Model
Win rate
Qwen-32B
1.000
Llama-70B
0.000
Per-axis rollup
Model
Scenario
functional_accuracy
completeness
Qwen-32B
cobol_billing
88.5
82.0
Llama-70B
cobol_billing
72.0
68.0