🔥 ULTIMATE BENCHMARK

Agent Sandbox Runtime - groq / llama-3.3-70b-versatile

100%
🔥 GOD TIER

12/12 tests passed in 11.1s

📈 Success by Category

âš¡ Response Times

📋 All Results

Task Status Time Lines Confidence
Calculate factorial of 10 PASS 787ms 14 99%
Check if 'racecar' is a palindrome PASS 421ms 7 99%
Print fibonacci sequence up to 10 terms PASS 922ms 10 95%
Find all prime numbers between 1 and 50 PASS 1298ms 18 95%
Sort a list using quicksort algorithm PASS 725ms 17 95%
Implement binary search on sorted list [1,3,5,7,9, PASS 1435ms 32 95%
Solve the Tower of Hanoi for 4 disks PASS 1368ms 17 95%
Find longest common subsequence of 'ABCDGH' and 'A PASS 950ms 33 95%
Implement a min-heap and insert [5,3,8,1,2] PASS 848ms 27 95%
Generate all permutations of [1,2,3,4] without ite PASS 498ms 21 95%
Solve 8-queens problem and print one solution PASS 993ms 43 95%
Implement Dijkstra's shortest path for a 5-node gr PASS 800ms 40 95%

Generated 2025-12-18 23:01:27 | Agent Sandbox Runtime v0.1.0