How well models do U.S. public-sector work, scored by defensible automated graders. Lower-level rates are auditable; the Lodlina Score rolls them up. Suite 2026.1.
| Model | Lodlina Score ↑ / 1,600 | |
|---|---|---|
| 1 | claude-opus-4-8 | 1,364 / 1,600 |
| 2 | claude-sonnet-4-6 | 1,343 / 1,600 |
| 3 | gpt-5.5 | 1,312 / 1,600 |
| 4 | nova-2-lite | 1,252 / 1,600 |
| 5 | deepseek-r1 | 1,177 / 1,600 |
| 6 | claude-haiku-4-5 | 1,176 / 1,600 |
| 7 | llama4-maverick | 1,083 / 1,600 |
| 8 | nemotron-super | 1,055 / 1,600 |
| 9 | nova-premier | 982 / 1,600 |
| Model | B1 Benefits & Eligibility | B2 Records, Disclosure & Privacy | B3 Citizen Services & Correspondence | B4 Authoritative-Source Q&A | B7 National Security Info Protection |
|---|---|---|---|---|---|
| claude-opus-4-8 | 451/500 | 363/400 | 150/300 | 200/200 | 200/200 |
| claude-sonnet-4-6 | 500/500 | 349/400 | 94/300 | 200/200 | 200/200 |
| gpt-5.5 | 500/500 | 387/400 | 25/300 | 200/200 | 200/200 |
| nova-2-lite | 401/500 | 349/400 | 102/300 | 200/200 | 200/200 |
| deepseek-r1 | 376/500 | 350/400 | 94/300 | 200/200 | 158/200 |
| claude-haiku-4-5 | 351/500 | 351/400 | 75/300 | 200/200 | 200/200 |
| llama4-maverick | 339/500 | 318/400 | 31/300 | 200/200 | 194/200 |
| nemotron-super | 282/500 | 340/400 | 33/300 | 200/200 | 199/200 |
| nova-premier | 144/500 | 321/400 | 117/300 | 200/200 | 200/200 |
Each cell is the Lodlina Score earned within that mission bucket (difficulty-weighted), out of the attainable for the bucket. Filter to the bucket that matches your mission.
| Model | leak_rate ↓ | over_redaction ↓ |
|---|---|---|
| claude-opus-4-8 | 0.13 | 0.00 |
| claude-sonnet-4-6 | 0.17 | 0.00 |
| gpt-5.5 | 0.04 | 0.03 |
| nova-2-lite | 0.17 | 0.00 |
| deepseek-r1 | 0.17 | 0.00 |
| claude-haiku-4-5 | 0.16 | 0.00 |
| llama4-maverick | 0.17 | 0.00 |
| nemotron-super | 0.22 | 0.01 |
| nova-premier | 0.17 | 0.10 |
| Model | accuracy ↑ | flip_rate ↓ |
|---|---|---|
| claude-opus-4-8 | 1.00 | 0.00 |
| claude-sonnet-4-6 | 1.00 | 0.00 |
| gpt-5.5 | 1.00 | 0.00 |
| nova-2-lite | 1.00 | 0.00 |
| deepseek-r1 | 0.70 | 0.44 |
| claude-haiku-4-5 | 1.00 | 0.00 |
| llama4-maverick | 1.00 | 0.00 |
| nemotron-super | 1.00 | 0.00 |
| nova-premier | 1.00 | 0.00 |
| Model | hallucinated_citation ↓ | answer_correctness ↑ | citation_support ↑ |
|---|---|---|---|
| claude-opus-4-8 | 0.00 | 1.00 | 1.00 |
| claude-sonnet-4-6 | 0.00 | 1.00 | 1.00 |
| gpt-5.5 | 0.00 | 1.00 | 1.00 |
| nova-2-lite | 0.00 | 1.00 | 1.00 |
| deepseek-r1 | 0.00 | 1.00 | 1.00 |
| claude-haiku-4-5 | 0.00 | 1.00 | 1.00 |
| llama4-maverick | 0.00 | 1.00 | 1.00 |
| nemotron-super | 0.00 | 1.00 | 1.00 |
| nova-premier | 0.00 | 1.00 | 0.00 |
| Model | readability ↑ | meaning_preserved ↑ |
|---|---|---|
| claude-opus-4-8 | 21.14 | 0.67 |
| claude-sonnet-4-6 | 19.28 | 0.75 |
| gpt-5.5 | 13.29 | 1.00 |
| nova-2-lite | 20.44 | 0.58 |
| deepseek-r1 | 18.92 | 0.75 |
| claude-haiku-4-5 | 20.35 | 0.50 |
| llama4-maverick | 17.68 | 0.42 |
| nemotron-super | 17.31 | 0.67 |
| nova-premier | 20.17 | 0.67 |
| Model | accuracy ↑ | flip_rate ↓ |
|---|---|---|
| claude-opus-4-8 | 0.97 | 0.10 |
| claude-sonnet-4-6 | 1.00 | 0.00 |
| gpt-5.5 | 1.00 | 0.00 |
| nova-2-lite | 0.81 | 0.07 |
| deepseek-r1 | 0.96 | 0.12 |
| claude-haiku-4-5 | 0.74 | 0.15 |
| llama4-maverick | 0.70 | 0.15 |
| nemotron-super | 0.73 | 0.38 |
| nova-premier | 0.49 | 0.78 |
| Model | leak_rate ↓ | over_redaction ↓ |
|---|---|---|
| claude-opus-4-8 | 0.05 | 0.00 |
| claude-sonnet-4-6 | 0.08 | 0.00 |
| gpt-5.5 | 0.00 | 0.00 |
| nova-2-lite | 0.08 | 0.00 |
| deepseek-r1 | 0.08 | 0.00 |
| claude-haiku-4-5 | 0.08 | 0.00 |
| llama4-maverick | 0.08 | 0.17 |
| nemotron-super | 0.07 | 0.00 |
| nova-premier | 0.08 | 0.07 |
| Model | leak_rate ↓ | over_redaction ↓ |
|---|---|---|
| claude-opus-4-8 | 0.00 | 0.00 |
| claude-sonnet-4-6 | 0.00 | 0.00 |
| gpt-5.5 | 0.00 | 0.00 |
| nova-2-lite | 0.00 | 0.00 |
| deepseek-r1 | 0.21 | 0.00 |
| claude-haiku-4-5 | 0.00 | 0.00 |
| llama4-maverick | 0.03 | 0.00 |
| nemotron-super | 0.01 | 0.00 |
| nova-premier | 0.00 | 0.00 |
0.4.0; suite 2026.1 (scores comparable only within a suite version); max attainable 1,600Reproduce:
lodlina leaderboard --models claude-opus-4-8 claude-sonnet-4-6 claude-haiku-4-5 gpt-5.5 nova-premier llama4-maverick deepseek-r1 nemotron-super nova-2-lite --graders bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0 openai-api/bedrock-mantle/openai.gpt-5.4 bedrock/us.amazon.nova-pro-v1:0 --html