tokenjam 0.5.1 n=20 tasks · k=1 sample(s) · deepseek:deepseek-reasoner → deepseek:deepseek-chat
| task | orig | cand | candidate detail | |
|---|---|---|---|---|
| HumanEval/0 | 1/1 | 1/1 | ok | |
| HumanEval/1 | 1/1 | 1/1 | ok | |
| HumanEval/2 | 1/1 | 1/1 | ok | |
| HumanEval/3 | 1/1 | 1/1 | ok | |
| HumanEval/4 | 1/1 | 1/1 | ok | |
| HumanEval/5 | 1/1 | 1/1 | ok | |
| HumanEval/6 | 1/1 | 1/1 | ok | |
| HumanEval/7 | 1/1 | 1/1 | ok | |
| HumanEval/8 | 1/1 | 1/1 | ok | |
| HumanEval/9 | 1/1 | 1/1 | ok | |
| HumanEval/10 | 0/1 | 1/1 | ok | |
| HumanEval/11 | 1/1 | 1/1 | ok | |
| HumanEval/12 | 1/1 | 1/1 | ok | |
| HumanEval/13 | 1/1 | 1/1 | ok | |
| HumanEval/14 | 1/1 | 1/1 | ok | |
| HumanEval/15 | 1/1 | 1/1 | ok | |
| HumanEval/16 | 1/1 | 1/1 | ok | |
| HumanEval/17 | 1/1 | 1/1 | ok | |
| HumanEval/18 | 1/1 | 1/1 | ok | |
| HumanEval/19 | 1/1 | 1/1 | ok |