identity layers + randn queries
mean abs randn paper: 0.21875
production_forward2 fwd+bwd:  243.421 ms
production_forward2 fwd-only: 24.792 ms
production_forward2 bwd-only: 219.192 ms
production_forward2 peak allocated: fwd=2.692 GiB, fwd+bwd=6.071 GiB
production_forward2 peak reserved:  fwd=2.973 GiB, fwd+bwd=8.723 GiB

Autotune Choices Stats:
{"num_choices": 15, "num_triton_choices": 14, "best_kernel": "mm", "best_time": 0.319487988948822, "best_triton_pos": 1, "best_triton_time": 6.01907205581665, "best_triton_kernel": "triton_mm_214", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2"}
AUTOTUNE mm(512x131072, 131072x8)
strides: [1, 512], [s5, 1]
dtypes: torch.float32, torch.float32
  mm 0.3195 ms 100.0% 
  triton_mm_214 6.0191 ms 5.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2
  triton_mm_213 7.0451 ms 4.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2
  triton_mm_221 7.7476 ms 4.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_218 8.1531 ms 3.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_225 8.4183 ms 3.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_215 8.6446 ms 3.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_219 8.6446 ms 3.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_220 8.6456 ms 3.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_222 9.0173 ms 3.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
SingleProcess AUTOTUNE benchmarking takes 1.3155 seconds and 1.2963 seconds precompiling for 15 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_231", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.19046400487422943, "best_triton_pos": 0}
AUTOTUNE mm(131072x8, 8x512)
strides: [s5, 1], [512, 1]
dtypes: torch.float32, torch.float32
  triton_mm_231 0.1905 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  mm 0.1946 ms 97.9% 
  triton_mm_226 0.1997 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2
  triton_mm_240 0.2017 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_234 0.2079 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_236 0.2161 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_232 0.2304 ms 82.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
  triton_mm_233 0.2304 ms 82.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_227 0.2355 ms 80.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_237 0.2406 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
SingleProcess AUTOTUNE benchmarking takes 0.6854 seconds and 0.0043 seconds precompiling for 18 choices
Autotune Choices Stats:
{"num_choices": 15, "num_triton_choices": 14, "best_kernel": "mm", "best_time": 0.6123520135879517, "best_triton_pos": 1, "best_triton_time": 10.909695625305176, "best_triton_kernel": "triton_mm_259", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2"}
AUTOTUNE mm(512x262144, 262144x8)
strides: [1, 512], [s5, 1]
dtypes: torch.float32, torch.float32
  mm 0.6124 ms 100.0% 
  triton_mm_259 10.9097 ms 5.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2
  triton_mm_258 11.3715 ms 5.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2
  triton_mm_266 15.7891 ms 3.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_270 16.1710 ms 3.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_263 16.9697 ms 3.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_265 17.1264 ms 3.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_264 17.1325 ms 3.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_260 17.1346 ms 3.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_267 17.3056 ms 3.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
SingleProcess AUTOTUNE benchmarking takes 2.1563 seconds and 1.1843 seconds precompiling for 15 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_276", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.374783992767334, "best_triton_pos": 0}
AUTOTUNE mm(262144x8, 8x512)
strides: [s5, 1], [512, 1]
dtypes: torch.float32, torch.float32
  triton_mm_276 0.3748 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  mm 0.3799 ms 98.7% 
  triton_mm_271 0.3922 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2
  triton_mm_285 0.4106 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_279 0.4188 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_278 0.4618 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_277 0.4700 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
  triton_mm_272 0.4874 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_281 0.4884 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_282 0.5038 ms 74.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
SingleProcess AUTOTUNE benchmarking takes 1.1781 seconds and 0.0004 seconds precompiling for 18 choices
Autotune Choices Stats:
{"num_choices": 15, "num_triton_choices": 14, "best_kernel": "mm", "best_time": 1.201151967048645, "best_triton_pos": 1, "best_triton_time": 21.816320419311523, "best_triton_kernel": "triton_mm_304", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2"}
AUTOTUNE mm(512x524288, 524288x8)
strides: [1, 512], [s5, 1]
dtypes: torch.float32, torch.float32
  mm 1.2012 ms 100.0% 
  triton_mm_304 21.8163 ms 5.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2
  triton_mm_303 22.7041 ms 5.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2
  triton_mm_311 31.5740 ms 3.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_315 32.3328 ms 3.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_308 33.9343 ms 3.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_310 34.2508 ms 3.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_305 34.2620 ms 3.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_309 34.2630 ms 3.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_312 34.5948 ms 3.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
SingleProcess AUTOTUNE benchmarking takes 3.9438 seconds and 0.0005 seconds precompiling for 15 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_321", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.7413759827613831, "best_triton_pos": 0}
AUTOTUNE mm(524288x8, 8x512)
strides: [s5, 1], [512, 1]
dtypes: torch.float32, torch.float32
  triton_mm_321 0.7414 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  mm 0.7465 ms 99.3% 
  triton_mm_316 0.7772 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2
  triton_mm_330 0.8090 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_324 0.8294 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_323 0.9165 ms 80.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_322 0.9329 ms 79.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
  triton_mm_317 0.9677 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_326 0.9687 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_327 0.9994 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
SingleProcess AUTOTUNE benchmarking takes 0.6548 seconds and 0.0003 seconds precompiling for 18 choices

torch_compile_phases_forward fwd+bwd:  260.734 ms
torch_compile_phases_forward fwd-only: 43.712 ms
torch_compile_phases_forward bwd-only: 213.838 ms
torch_compile_phases_forward peak allocated: fwd=5.342 GiB, fwd+bwd=6.469 GiB
torch_compile_phases_forward peak reserved:  fwd=5.848 GiB, fwd+bwd=9.848 GiB
production_forward fwd+bwd:  124.509 ms
production_forward fwd-only: 22.782 ms
production_forward bwd-only: 102.019 ms
production_forward peak allocated: fwd=2.192 GiB, fwd+bwd=6.071 GiB
production_forward peak reserved:  fwd=2.223 GiB, fwd+bwd=6.098 GiB
paper_forward fwd+bwd:  535.273 ms
paper_forward fwd-only: 97.326 ms
paper_forward bwd-only: 439.261 ms
paper_forward peak allocated: fwd=6.194 GiB, fwd+bwd=10.068 GiB
paper_forward peak reserved:  fwd=6.223 GiB, fwd+bwd=10.223 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.001622954849153757, max_abs=0.03936767578125
production_forward grad[0] vs paper_forward: mean_abs=0.008612722158432007, max_abs=0.3125, mean_rel=0.07476150989532471, max_rel=116.77863311767578, norm_rel=0.020482482388615608, ref_abs_avg=0.4543583393096924, test_abs_avg=0.45436903834342957
production_forward grad[1] vs paper_forward: mean_abs=7.327532768249512, max_abs=64.0, mean_rel=0.1615167111158371, max_rel=263.7208557128906, norm_rel=0.020811807364225388, ref_abs_avg=312.56329345703125, test_abs_avg=312.6188049316406
production_forward grad[2] vs paper_forward: mean_abs=1.2993431091308594, max_abs=4.796875, mean_rel=0.1010420173406601, max_rel=7.954300880432129, norm_rel=0.024547293782234192, ref_abs_avg=52.77109146118164, test_abs_avg=52.74391174316406
production_forward grad[3] vs paper_forward: mean_abs=1.6429557800292969, max_abs=12.0, mean_rel=0.18185992538928986, max_rel=2999.7626953125, norm_rel=0.02520584501326084, ref_abs_avg=65.59686279296875, test_abs_avg=65.59568786621094
production_forward grad[4] vs paper_forward: mean_abs=1.5103346109390259, max_abs=9.0, mean_rel=0.4252282977104187, max_rel=3874.999755859375, norm_rel=0.02343526855111122, ref_abs_avg=64.83712768554688, test_abs_avg=64.84431457519531
production_forward grad[5] vs paper_forward: mean_abs=1.1779274940490723, max_abs=4.0, mean_rel=0.09858773648738861, max_rel=8.041523933410645, norm_rel=0.025126254186034203, ref_abs_avg=45.942832946777344, test_abs_avg=46.001346588134766
production_forward grad[6] vs paper_forward: mean_abs=1.4390149116516113, max_abs=9.5, mean_rel=0.1729777455329895, max_rel=2032.04931640625, norm_rel=0.024914264678955078, ref_abs_avg=58.08663558959961, test_abs_avg=58.08409118652344
production_forward grad[7] vs paper_forward: mean_abs=1.3347513675689697, max_abs=8.5, mean_rel=0.3503764867782593, max_rel=3624.999755859375, norm_rel=0.02330653741955757, ref_abs_avg=57.473087310791016, test_abs_avg=57.47651672363281
production_forward grad[8] vs paper_forward: mean_abs=0.9916296005249023, max_abs=4.0, mean_rel=0.11793404072523117, max_rel=14.573022842407227, norm_rel=0.022442927584052086, ref_abs_avg=44.297142028808594, test_abs_avg=44.216468811035156
production_forward grad[9] vs paper_forward: mean_abs=1.2877596616744995, max_abs=9.0, mean_rel=0.17300061881542206, max_rel=1026.6156005859375, norm_rel=0.02464010939002037, ref_abs_avg=52.56721878051758, test_abs_avg=52.56978988647461
production_forward grad[10] vs paper_forward: mean_abs=1.1903250217437744, max_abs=7.0, mean_rel=0.4368489384651184, max_rel=5874.99951171875, norm_rel=0.023080570623278618, ref_abs_avg=51.85448455810547, test_abs_avg=51.85536193847656
production_forward grad[11] vs paper_forward: mean_abs=0.8824477195739746, max_abs=3.5, mean_rel=0.09863635897636414, max_rel=18.107067108154297, norm_rel=0.022324640303850174, ref_abs_avg=39.76739501953125, test_abs_avg=39.80281066894531
production_forward grad[12] vs paper_forward: mean_abs=1.1915969848632812, max_abs=8.0, mean_rel=0.17529258131980896, max_rel=3151.536865234375, norm_rel=0.024475032463669777, ref_abs_avg=48.99323272705078, test_abs_avg=48.99232482910156
production_forward grad[13] vs paper_forward: mean_abs=1.1012144088745117, max_abs=6.5, mean_rel=0.33448484539985657, max_rel=5374.99951171875, norm_rel=0.022906823083758354, ref_abs_avg=48.356754302978516, test_abs_avg=48.360755920410156
production_forward grad[14] vs paper_forward: mean_abs=0.8857226371765137, max_abs=3.25, mean_rel=0.11290035396814346, max_rel=21.989652633666992, norm_rel=0.02193720079958439, ref_abs_avg=39.879486083984375, test_abs_avg=39.85713195800781
production_forward grad[15] vs paper_forward: mean_abs=1.11199152469635, max_abs=7.0, mean_rel=0.1616661250591278, max_rel=1428.79296875, norm_rel=0.02421475760638714, ref_abs_avg=46.20357131958008, test_abs_avg=46.205780029296875
production_forward grad[16] vs paper_forward: mean_abs=1.031006932258606, max_abs=6.25, mean_rel=0.36769887804985046, max_rel=3499.999755859375, norm_rel=0.022814298048615456, ref_abs_avg=45.401756286621094, test_abs_avg=45.39902114868164
production_forward grad[17] vs paper_forward: mean_abs=0.7435297966003418, max_abs=3.265625, mean_rel=0.07346451282501221, max_rel=3.049910545349121, norm_rel=0.021056415513157845, ref_abs_avg=35.46287155151367, test_abs_avg=35.50519561767578
production_forward grad[18] vs paper_forward: mean_abs=1.0449295043945312, max_abs=8.0, mean_rel=0.17345628142356873, max_rel=1968.7772216796875, norm_rel=0.024173729121685028, ref_abs_avg=43.54027557373047, test_abs_avg=43.54477310180664
production_forward grad[19] vs paper_forward: mean_abs=0.9626057147979736, max_abs=6.0, mean_rel=0.3150082528591156, max_rel=2749.999755859375, norm_rel=0.022422607988119125, ref_abs_avg=43.077884674072266, test_abs_avg=43.078880310058594
production_forward grad[20] vs paper_forward: mean_abs=0.7853202819824219, max_abs=3.25, mean_rel=0.09472481161355972, max_rel=12.562556266784668, norm_rel=0.023104963824152946, ref_abs_avg=34.119895935058594, test_abs_avg=34.07027816772461
production_forward grad[21] vs paper_forward: mean_abs=0.9914911985397339, max_abs=8.0, mean_rel=0.15340624749660492, max_rel=1282.1387939453125, norm_rel=0.02398747391998768, ref_abs_avg=41.540916442871094, test_abs_avg=41.54512023925781
production_forward grad[22] vs paper_forward: mean_abs=0.9105414152145386, max_abs=5.0, mean_rel=0.3217881917953491, max_rel=2937.499755859375, norm_rel=0.022180553525686264, ref_abs_avg=41.20490646362305, test_abs_avg=41.200775146484375
production_forward grad[23] vs paper_forward: mean_abs=0.7358689308166504, max_abs=3.0, mean_rel=0.0918242558836937, max_rel=5.6319355964660645, norm_rel=0.02260497771203518, ref_abs_avg=32.770477294921875, test_abs_avg=32.77979278564453
production_forward grad[24] vs paper_forward: mean_abs=0.9403342604637146, max_abs=6.5, mean_rel=0.15493495762348175, max_rel=1334.2552490234375, norm_rel=0.023848624899983406, ref_abs_avg=39.662315368652344, test_abs_avg=39.66231918334961
production_forward grad[25] vs paper_forward: mean_abs=0.8632340431213379, max_abs=5.0, mean_rel=0.30031618475914, max_rel=2312.5, norm_rel=0.022041641175746918, ref_abs_avg=39.334991455078125, test_abs_avg=39.32968521118164
production_forward grad[26] vs paper_forward: mean_abs=0.8736929893493652, max_abs=3.25, mean_rel=0.08513287454843521, max_rel=2.9613163471221924, norm_rel=0.02703416720032692, ref_abs_avg=32.66877365112305, test_abs_avg=32.764244079589844
production_forward grad[27] vs paper_forward: mean_abs=1.078906536102295, max_abs=8.375, mean_rel=0.1679408848285675, max_rel=1093.8560791015625, norm_rel=0.025838658213615417, ref_abs_avg=41.94619369506836, test_abs_avg=41.945343017578125
production_forward grad[28] vs paper_forward: mean_abs=1.0103070735931396, max_abs=6.5, mean_rel=0.31830698251724243, max_rel=3874.999755859375, norm_rel=0.024553440511226654, ref_abs_avg=41.31233215332031, test_abs_avg=41.31605911254883
production_forward grad[29] vs paper_forward: mean_abs=0.7980701923370361, max_abs=3.0, mean_rel=0.09207286685705185, max_rel=7.863340377807617, norm_rel=0.025762420147657394, ref_abs_avg=31.037708282470703, test_abs_avg=31.111053466796875
production_forward grad[30] vs paper_forward: mean_abs=1.0047919750213623, max_abs=7.0, mean_rel=0.18415117263793945, max_rel=1692.4656982421875, norm_rel=0.025976572185754776, ref_abs_avg=38.84492111206055, test_abs_avg=38.8476676940918
production_forward grad[31] vs paper_forward: mean_abs=0.9382805824279785, max_abs=7.0, mean_rel=0.32302331924438477, max_rel=2781.249755859375, norm_rel=0.02458244562149048, ref_abs_avg=38.34797668457031, test_abs_avg=38.35060119628906
production_forward grad[32] vs paper_forward: mean_abs=0.7173625230789185, max_abs=3.5, mean_rel=0.13856279850006104, max_rel=15.015739440917969, norm_rel=0.02390938624739647, ref_abs_avg=30.002788543701172, test_abs_avg=30.03772735595703
production_forward grad[33] vs paper_forward: mean_abs=0.9352625608444214, max_abs=7.0, mean_rel=0.16314706206321716, max_rel=1527.20458984375, norm_rel=0.0259272251278162, ref_abs_avg=36.2186279296875, test_abs_avg=36.22191619873047
production_forward grad[34] vs paper_forward: mean_abs=0.8756327033042908, max_abs=6.5625, mean_rel=0.28306132555007935, max_rel=2015.6248779296875, norm_rel=0.02445199526846409, ref_abs_avg=35.930599212646484, test_abs_avg=35.930728912353516
production_forward grad[35] vs paper_forward: mean_abs=0.6939075589179993, max_abs=2.5, mean_rel=0.3283408284187317, max_rel=38.47515869140625, norm_rel=0.02474275976419449, ref_abs_avg=27.646896362304688, test_abs_avg=27.62041473388672
production_forward grad[36] vs paper_forward: mean_abs=0.879143238067627, max_abs=6.75, mean_rel=0.15611135959625244, max_rel=1057.7501220703125, norm_rel=0.025701619684696198, ref_abs_avg=34.35588836669922, test_abs_avg=34.35706329345703
production_forward grad[37] vs paper_forward: mean_abs=0.8201910257339478, max_abs=5.75, mean_rel=0.27817773818969727, max_rel=3249.999755859375, norm_rel=0.024149129167199135, ref_abs_avg=34.01878356933594, test_abs_avg=34.02170944213867
production_forward grad[38] vs paper_forward: mean_abs=0.6589587330818176, max_abs=2.40625, mean_rel=0.16888312995433807, max_rel=45.589881896972656, norm_rel=0.02320408634841442, ref_abs_avg=28.422138214111328, test_abs_avg=28.44690704345703
production_forward grad[39] vs paper_forward: mean_abs=0.8296570181846619, max_abs=5.75, mean_rel=0.17724591493606567, max_rel=1419.3619384765625, norm_rel=0.025500303134322166, ref_abs_avg=32.64189147949219, test_abs_avg=32.64117431640625
production_forward grad[40] vs paper_forward: mean_abs=0.7748596668243408, max_abs=5.75, mean_rel=0.24521860480308533, max_rel=3749.999755859375, norm_rel=0.024173540994524956, ref_abs_avg=32.124820709228516, test_abs_avg=32.13078308105469
production_forward grad[41] vs paper_forward: mean_abs=0.6173443794250488, max_abs=2.375, mean_rel=0.21737922728061676, max_rel=42.23107147216797, norm_rel=0.023794829845428467, ref_abs_avg=25.578018188476562, test_abs_avg=25.56656265258789
production_forward grad[42] vs paper_forward: mean_abs=0.7882556915283203, max_abs=5.5, mean_rel=0.16566623747348785, max_rel=1847.21728515625, norm_rel=0.025322934612631798, ref_abs_avg=31.2303466796875, test_abs_avg=31.229412078857422
production_forward grad[43] vs paper_forward: mean_abs=0.7316010594367981, max_abs=4.5625, mean_rel=0.24624919891357422, max_rel=1749.9998779296875, norm_rel=0.023832296952605247, ref_abs_avg=30.770891189575195, test_abs_avg=30.7714786529541
production_forward grad[44] vs paper_forward: mean_abs=0.5874471664428711, max_abs=2.125, mean_rel=0.10287551581859589, max_rel=6.180919170379639, norm_rel=0.022050272673368454, ref_abs_avg=25.50808334350586, test_abs_avg=25.489011764526367
production_forward grad[45] vs paper_forward: mean_abs=0.7454500794410706, max_abs=4.75, mean_rel=0.16717201471328735, max_rel=1122.7301025390625, norm_rel=0.025210924446582794, ref_abs_avg=29.70465850830078, test_abs_avg=29.705219268798828
production_forward grad[46] vs paper_forward: mean_abs=0.6971765756607056, max_abs=4.5, mean_rel=0.26286378502845764, max_rel=1937.4998779296875, norm_rel=0.023534946143627167, ref_abs_avg=29.688003540039062, test_abs_avg=29.69601821899414
production_forward grad[47] vs paper_forward: mean_abs=0.5402498245239258, max_abs=2.125, mean_rel=0.11381737142801285, max_rel=7.142640590667725, norm_rel=0.022050490602850914, ref_abs_avg=25.094926834106445, test_abs_avg=25.09738540649414
production_forward grad[48] vs paper_forward: mean_abs=0.7165796160697937, max_abs=4.75, mean_rel=0.15337447822093964, max_rel=996.9852905273438, norm_rel=0.024681614711880684, ref_abs_avg=29.102567672729492, test_abs_avg=29.10317611694336
production_forward grad[49] vs paper_forward: mean_abs=0.6659319400787354, max_abs=4.140625, mean_rel=0.2691343426704407, max_rel=1749.9998779296875, norm_rel=0.023428084328770638, ref_abs_avg=28.50307846069336, test_abs_avg=28.501449584960938
production_forward grad[50] vs paper_forward: mean_abs=0.6131646037101746, max_abs=2.8125, mean_rel=0.13392826914787292, max_rel=21.493894577026367, norm_rel=0.02472953498363495, ref_abs_avg=25.26034927368164, test_abs_avg=25.311290740966797
production_forward grad[51] vs paper_forward: mean_abs=0.8293404579162598, max_abs=6.0, mean_rel=0.17914006114006042, max_rel=2122.722412109375, norm_rel=0.026912108063697815, ref_abs_avg=30.92561912536621, test_abs_avg=30.923860549926758
production_forward grad[52] vs paper_forward: mean_abs=0.7714024782180786, max_abs=5.0, mean_rel=0.30063068866729736, max_rel=2312.5, norm_rel=0.02531629242002964, ref_abs_avg=30.551387786865234, test_abs_avg=30.549705505371094
production_forward grad[53] vs paper_forward: mean_abs=0.6086562871932983, max_abs=2.5625, mean_rel=0.3127855360507965, max_rel=113.10273742675781, norm_rel=0.025990687310695648, ref_abs_avg=23.69620704650879, test_abs_avg=23.696243286132812
production_forward grad[54] vs paper_forward: mean_abs=0.7455875873565674, max_abs=5.25, mean_rel=0.1754496991634369, max_rel=1257.5255126953125, norm_rel=0.026417167857289314, ref_abs_avg=28.30862808227539, test_abs_avg=28.308183670043945
production_forward grad[55] vs paper_forward: mean_abs=0.6986494660377502, max_abs=4.75, mean_rel=0.29010313749313354, max_rel=2187.5, norm_rel=0.024736573919653893, ref_abs_avg=28.314376831054688, test_abs_avg=28.31806755065918
production_forward grad[56] vs paper_forward: mean_abs=0.5957450866699219, max_abs=2.0625, mean_rel=0.07835832983255386, max_rel=2.2422173023223877, norm_rel=0.02725970186293125, ref_abs_avg=21.69518280029297, test_abs_avg=21.72208023071289
production_forward grad[57] vs paper_forward: mean_abs=0.6903611421585083, max_abs=5.25, mean_rel=0.17018458247184753, max_rel=1374.150146484375, norm_rel=0.025742031633853912, ref_abs_avg=26.913257598876953, test_abs_avg=26.912307739257812
production_forward grad[58] vs paper_forward: mean_abs=0.6425697803497314, max_abs=4.0625, mean_rel=0.26393723487854004, max_rel=2406.25, norm_rel=0.024228308349847794, ref_abs_avg=26.50649642944336, test_abs_avg=26.51204490661621
production_forward grad[59] vs paper_forward: mean_abs=0.5314702987670898, max_abs=2.1875, mean_rel=0.08567290008068085, max_rel=2.1669485569000244, norm_rel=0.02524501644074917, ref_abs_avg=20.57448387145996, test_abs_avg=20.58504867553711
production_forward grad[60] vs paper_forward: mean_abs=0.6390479207038879, max_abs=5.0, mean_rel=0.1666254997253418, max_rel=1498.0164794921875, norm_rel=0.025233222171664238, ref_abs_avg=25.374771118164062, test_abs_avg=25.372737884521484
production_forward grad[61] vs paper_forward: mean_abs=0.5964449644088745, max_abs=4.0, mean_rel=0.2544530928134918, max_rel=1687.4998779296875, norm_rel=0.023540154099464417, ref_abs_avg=25.331562042236328, test_abs_avg=25.33446502685547
production_forward grad[62] vs paper_forward: mean_abs=0.45984363555908203, max_abs=1.75, mean_rel=0.08669789135456085, max_rel=8.390641212463379, norm_rel=0.022602228447794914, ref_abs_avg=20.36969757080078, test_abs_avg=20.359512329101562
production_forward grad[63] vs paper_forward: mean_abs=0.6034504771232605, max_abs=5.0, mean_rel=0.1619981974363327, max_rel=1016.1087646484375, norm_rel=0.024725282564759254, ref_abs_avg=24.43368911743164, test_abs_avg=24.43579864501953
production_forward grad[64] vs paper_forward: mean_abs=0.561205267906189, max_abs=4.0, mean_rel=0.23305410146713257, max_rel=2375.0, norm_rel=0.022963346913456917, ref_abs_avg=24.402387619018555, test_abs_avg=24.398845672607422
production_forward grad[65] vs paper_forward: mean_abs=0.4418187141418457, max_abs=1.625, mean_rel=0.1199997067451477, max_rel=15.352655410766602, norm_rel=0.02406703121960163, ref_abs_avg=18.055755615234375, test_abs_avg=18.030765533447266
production_forward grad[66] vs paper_forward: mean_abs=0.5709408521652222, max_abs=5.0, mean_rel=0.15291815996170044, max_rel=820.8823852539062, norm_rel=0.024258553981781006, ref_abs_avg=23.56433868408203, test_abs_avg=23.56158447265625
production_forward grad[67] vs paper_forward: mean_abs=0.5286343097686768, max_abs=3.53125, mean_rel=0.24346858263015747, max_rel=1906.2498779296875, norm_rel=0.022632552310824394, ref_abs_avg=23.367782592773438, test_abs_avg=23.36508560180664
production_forward grad[68] vs paper_forward: mean_abs=0.42055606842041016, max_abs=1.625, mean_rel=0.0933968722820282, max_rel=10.139562606811523, norm_rel=0.02353857457637787, ref_abs_avg=17.885040283203125, test_abs_avg=17.929319381713867
production_forward grad[69] vs paper_forward: mean_abs=0.5424137115478516, max_abs=4.0, mean_rel=0.15290923416614532, max_rel=1422.6962890625, norm_rel=0.023966720327734947, ref_abs_avg=22.66676139831543, test_abs_avg=22.667064666748047
production_forward grad[70] vs paper_forward: mean_abs=0.49868083000183105, max_abs=4.5, mean_rel=0.23461100459098816, max_rel=1234.375, norm_rel=0.022581078112125397, ref_abs_avg=22.126178741455078, test_abs_avg=22.120559692382812
production_forward grad[71] vs paper_forward: mean_abs=0.38988351821899414, max_abs=1.5, mean_rel=0.12578904628753662, max_rel=12.444671630859375, norm_rel=0.021279457956552505, ref_abs_avg=17.994375228881836, test_abs_avg=18.00123405456543
production_forward grad[72] vs paper_forward: mean_abs=0.5161941051483154, max_abs=4.0, mean_rel=0.14932523667812347, max_rel=709.3079223632812, norm_rel=0.02323201298713684, ref_abs_avg=22.236278533935547, test_abs_avg=22.237369537353516
production_forward grad[73] vs paper_forward: mean_abs=0.4746284484863281, max_abs=3.5, mean_rel=0.19820399582386017, max_rel=1312.4998779296875, norm_rel=0.02176845818758011, ref_abs_avg=21.809003829956055, test_abs_avg=21.807559967041016
production_forward grad[74] vs paper_forward: mean_abs=0.4377784729003906, max_abs=1.6875, mean_rel=0.06726641207933426, max_rel=4.360990524291992, norm_rel=0.02252696454524994, ref_abs_avg=19.610614776611328, test_abs_avg=19.599525451660156
production_forward grad[75] vs paper_forward: mean_abs=0.5664868950843811, max_abs=4.75, mean_rel=0.1538984477519989, max_rel=998.2214965820312, norm_rel=0.02426525019109249, ref_abs_avg=23.357080459594727, test_abs_avg=23.358367919921875
production_forward grad[76] vs paper_forward: mean_abs=0.5212175846099854, max_abs=3.5, mean_rel=0.21928976476192474, max_rel=1687.4998779296875, norm_rel=0.022634880617260933, ref_abs_avg=23.05202865600586, test_abs_avg=23.05221176147461
production_forward grad[77] vs paper_forward: mean_abs=0.4057788848876953, max_abs=1.5, mean_rel=0.058887094259262085, max_rel=1.767603874206543, norm_rel=0.022553803399205208, ref_abs_avg=18.313594818115234, test_abs_avg=18.322845458984375
production_forward grad[78] vs paper_forward: mean_abs=0.5293735265731812, max_abs=4.0, mean_rel=0.14693310856819153, max_rel=650.0675659179688, norm_rel=0.023764416575431824, ref_abs_avg=22.345035552978516, test_abs_avg=22.346189498901367
production_forward grad[79] vs paper_forward: mean_abs=0.4897052049636841, max_abs=4.0, mean_rel=0.22195680439472198, max_rel=1281.25, norm_rel=0.022114222869277, ref_abs_avg=22.141401290893555, test_abs_avg=22.138851165771484
production_forward grad[80] vs paper_forward: mean_abs=0.41481971740722656, max_abs=1.625, mean_rel=0.08783680200576782, max_rel=6.714012622833252, norm_rel=0.023344310000538826, ref_abs_avg=17.50365447998047, test_abs_avg=17.4765682220459
production_forward grad[81] vs paper_forward: mean_abs=0.49978625774383545, max_abs=4.5, mean_rel=0.1517615169286728, max_rel=1397.4420166015625, norm_rel=0.023177778348326683, ref_abs_avg=21.63033676147461, test_abs_avg=21.630168914794922
production_forward grad[82] vs paper_forward: mean_abs=0.45602431893348694, max_abs=4.625, mean_rel=0.21406076848506927, max_rel=1468.7498779296875, norm_rel=0.02195136621594429, ref_abs_avg=20.81479263305664, test_abs_avg=20.815967559814453
production_forward grad[83] vs paper_forward: mean_abs=0.3689908981323242, max_abs=1.46875, mean_rel=0.07733423262834549, max_rel=3.941765308380127, norm_rel=0.021359218284487724, ref_abs_avg=17.291542053222656, test_abs_avg=17.26528549194336
production_forward grad[84] vs paper_forward: mean_abs=0.46638253331184387, max_abs=4.25, mean_rel=0.1361355483531952, max_rel=657.03125, norm_rel=0.022859251126646996, ref_abs_avg=20.474063873291016, test_abs_avg=20.474653244018555
production_forward grad[85] vs paper_forward: mean_abs=0.425375759601593, max_abs=3.25, mean_rel=0.23023861646652222, max_rel=1882.8123779296875, norm_rel=0.02057340182363987, ref_abs_avg=20.639244079589844, test_abs_avg=20.647653579711914
production_forward grad[86] vs paper_forward: mean_abs=0.3431727886199951, max_abs=1.25, mean_rel=0.1124533861875534, max_rel=16.878929138183594, norm_rel=0.020821820944547653, ref_abs_avg=16.732683181762695, test_abs_avg=16.777610778808594
production_forward grad[87] vs paper_forward: mean_abs=0.44307777285575867, max_abs=7.0, mean_rel=0.14455926418304443, max_rel=1646.81982421875, norm_rel=0.022302227094769478, ref_abs_avg=19.98227310180664, test_abs_avg=19.98349380493164
production_forward grad[88] vs paper_forward: mean_abs=0.4036865234375, max_abs=3.75, mean_rel=0.17689219117164612, max_rel=874.9999389648438, norm_rel=0.020793672651052475, ref_abs_avg=19.44961166381836, test_abs_avg=19.446266174316406
production_forward grad[89] vs paper_forward: mean_abs=0.3243138790130615, max_abs=1.25, mean_rel=0.0761113166809082, max_rel=3.92130446434021, norm_rel=0.0200493186712265, ref_abs_avg=16.20738983154297, test_abs_avg=16.219173431396484
production_forward grad[90] vs paper_forward: mean_abs=0.42058616876602173, max_abs=4.5, mean_rel=0.1355120986700058, max_rel=686.6209106445312, norm_rel=0.022083228453993797, ref_abs_avg=19.22994613647461, test_abs_avg=19.231599807739258
production_forward grad[91] vs paper_forward: mean_abs=0.36742809414863586, max_abs=3.5, mean_rel=0.18739719688892365, max_rel=1156.25, norm_rel=0.0196620374917984, ref_abs_avg=18.82971954345703, test_abs_avg=18.837200164794922
production_forward grad[92] vs paper_forward: mean_abs=0.3098461627960205, max_abs=1.3125, mean_rel=0.11930476874113083, max_rel=14.417338371276855, norm_rel=0.019176745787262917, ref_abs_avg=16.653289794921875, test_abs_avg=16.671180725097656
production_forward grad[93] vs paper_forward: mean_abs=0.38782280683517456, max_abs=4.25, mean_rel=0.12591367959976196, max_rel=603.6089477539062, norm_rel=0.021270455792546272, ref_abs_avg=18.498981475830078, test_abs_avg=18.4989013671875
production_forward grad[94] vs paper_forward: mean_abs=0.35714423656463623, max_abs=3.5, mean_rel=0.1764681488275528, max_rel=1062.5, norm_rel=0.01998935639858246, ref_abs_avg=18.107589721679688, test_abs_avg=18.118717193603516
production_forward grad[95] vs paper_forward: mean_abs=0.29448509216308594, max_abs=1.5, mean_rel=0.056718431413173676, max_rel=4.356293678283691, norm_rel=0.019460733979940414, ref_abs_avg=15.742973327636719, test_abs_avg=15.710733413696289
production_forward grad[96] vs paper_forward: mean_abs=0.3668394386768341, max_abs=4.5, mean_rel=0.12694381177425385, max_rel=622.2876586914062, norm_rel=0.020889008417725563, ref_abs_avg=17.8472900390625, test_abs_avg=17.84670639038086
production_forward grad[97] vs paper_forward: mean_abs=0.3250695466995239, max_abs=3.3125, mean_rel=0.15252266824245453, max_rel=1109.375, norm_rel=0.0183539018034935, ref_abs_avg=17.846160888671875, test_abs_avg=17.85053253173828
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016254665097221732, max_abs=0.03936767578125
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008627177216112614, max_abs=0.33984375, mean_rel=0.07481513917446136, max_rel=143.45785522460938, norm_rel=0.020509354770183563, ref_abs_avg=0.4543583393096924, test_abs_avg=0.45435839891433716
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.341368675231934, max_abs=56.0, mean_rel=0.1594865471124649, max_rel=222.23423767089844, norm_rel=0.020769892260432243, ref_abs_avg=312.56329345703125, test_abs_avg=312.6437072753906
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.3162879943847656, max_abs=5.0, mean_rel=0.1063491702079773, max_rel=11.46961784362793, norm_rel=0.025102227926254272, ref_abs_avg=52.77109146118164, test_abs_avg=52.718109130859375
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.6426926851272583, max_abs=11.0, mean_rel=0.17547772824764252, max_rel=1896.62109375, norm_rel=0.025199545547366142, ref_abs_avg=65.59686279296875, test_abs_avg=65.59330749511719
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5059058666229248, max_abs=10.0, mean_rel=0.41001060605049133, max_rel=5062.5, norm_rel=0.02336426079273224, ref_abs_avg=64.83712768554688, test_abs_avg=64.84687805175781
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.1080050468444824, max_abs=4.625, mean_rel=0.11581698060035706, max_rel=16.938541412353516, norm_rel=0.02427377924323082, ref_abs_avg=45.942832946777344, test_abs_avg=45.93769836425781
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.4420092105865479, max_abs=10.0, mean_rel=0.16854047775268555, max_rel=2199.522705078125, norm_rel=0.024970801547169685, ref_abs_avg=58.08663558959961, test_abs_avg=58.08062744140625
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3341827392578125, max_abs=8.5, mean_rel=0.38749778270721436, max_rel=4562.5, norm_rel=0.02330201305449009, ref_abs_avg=57.473087310791016, test_abs_avg=57.474544525146484
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.9599865674972534, max_abs=3.75, mean_rel=0.105204276740551, max_rel=15.493727684020996, norm_rel=0.022092584520578384, ref_abs_avg=44.297142028808594, test_abs_avg=44.22880554199219
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.2930742502212524, max_abs=8.0, mean_rel=0.17235544323921204, max_rel=987.63623046875, norm_rel=0.02471659518778324, ref_abs_avg=52.56721878051758, test_abs_avg=52.56804656982422
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.19833505153656, max_abs=7.0, mean_rel=0.4134219288825989, max_rel=4812.5, norm_rel=0.023233601823449135, ref_abs_avg=51.85448455810547, test_abs_avg=51.861083984375
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9473209381103516, max_abs=3.6875, mean_rel=0.09404350817203522, max_rel=7.6102166175842285, norm_rel=0.02412649244070053, ref_abs_avg=39.76739501953125, test_abs_avg=39.792808532714844
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.1972332000732422, max_abs=8.0, mean_rel=0.1671801507472992, max_rel=1675.77685546875, norm_rel=0.024573244154453278, ref_abs_avg=48.99323272705078, test_abs_avg=48.994300842285156
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1086028814315796, max_abs=6.34375, mean_rel=0.3451943099498749, max_rel=4250.0, norm_rel=0.023033231496810913, ref_abs_avg=48.356754302978516, test_abs_avg=48.35527801513672
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.9409008026123047, max_abs=3.5, mean_rel=0.10164124518632889, max_rel=10.308874130249023, norm_rel=0.02313069999217987, ref_abs_avg=39.879486083984375, test_abs_avg=39.87897491455078
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1181312799453735, max_abs=8.0, mean_rel=0.16257232427597046, max_rel=1754.4588623046875, norm_rel=0.024360869079828262, ref_abs_avg=46.20357131958008, test_abs_avg=46.20514678955078
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0338764190673828, max_abs=6.25, mean_rel=0.3441392779350281, max_rel=3062.499755859375, norm_rel=0.022869819775223732, ref_abs_avg=45.401756286621094, test_abs_avg=45.40065002441406
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.7589294910430908, max_abs=3.0, mean_rel=0.07595732063055038, max_rel=3.822039842605591, norm_rel=0.0216273982077837, ref_abs_avg=35.46287155151367, test_abs_avg=35.55052185058594
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.050072431564331, max_abs=8.0, mean_rel=0.1752830147743225, max_rel=1589.18115234375, norm_rel=0.024275926873087883, ref_abs_avg=43.54027557373047, test_abs_avg=43.54326629638672
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9685444831848145, max_abs=7.0, mean_rel=0.3013050854206085, max_rel=2609.374755859375, norm_rel=0.02257373370230198, ref_abs_avg=43.077884674072266, test_abs_avg=43.0787467956543
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.8178826570510864, max_abs=3.25, mean_rel=0.10346893966197968, max_rel=11.044794082641602, norm_rel=0.0233786441385746, ref_abs_avg=34.119895935058594, test_abs_avg=34.07442855834961
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9956966638565063, max_abs=7.0, mean_rel=0.15091973543167114, max_rel=849.9949951171875, norm_rel=0.024103863164782524, ref_abs_avg=41.540916442871094, test_abs_avg=41.544166564941406
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9125430583953857, max_abs=6.0, mean_rel=0.28816014528274536, max_rel=2250.0, norm_rel=0.02221718803048134, ref_abs_avg=41.20490646362305, test_abs_avg=41.199493408203125
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7030372619628906, max_abs=2.75, mean_rel=0.08145569264888763, max_rel=4.081298351287842, norm_rel=0.022301632910966873, ref_abs_avg=32.770477294921875, test_abs_avg=32.790557861328125
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9445517659187317, max_abs=8.0, mean_rel=0.15347662568092346, max_rel=1754.4473876953125, norm_rel=0.023945175111293793, ref_abs_avg=39.662315368652344, test_abs_avg=39.66310119628906
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8682165741920471, max_abs=5.0, mean_rel=0.30648595094680786, max_rel=2624.999755859375, norm_rel=0.02217467688024044, ref_abs_avg=39.334991455078125, test_abs_avg=39.3297233581543
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.9023323059082031, max_abs=3.75, mean_rel=0.0894627720117569, max_rel=4.509340763092041, norm_rel=0.027926595881581306, ref_abs_avg=32.66877365112305, test_abs_avg=32.776123046875
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.079871416091919, max_abs=9.0, mean_rel=0.16865167021751404, max_rel=1761.843505859375, norm_rel=0.02583705633878708, ref_abs_avg=41.94619369506836, test_abs_avg=41.947357177734375
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.009177803993225, max_abs=7.0, mean_rel=0.3234425485134125, max_rel=4437.5, norm_rel=0.024522166699171066, ref_abs_avg=41.31233215332031, test_abs_avg=41.321502685546875
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.8664360046386719, max_abs=2.75, mean_rel=0.10797019302845001, max_rel=6.25903844833374, norm_rel=0.02715664729475975, ref_abs_avg=31.037708282470703, test_abs_avg=31.142074584960938
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.0076849460601807, max_abs=7.0, mean_rel=0.18198999762535095, max_rel=1860.8565673828125, norm_rel=0.0260575320571661, ref_abs_avg=38.84492111206055, test_abs_avg=38.84809112548828
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9391304850578308, max_abs=7.0, mean_rel=0.30462658405303955, max_rel=2812.499755859375, norm_rel=0.024610789492726326, ref_abs_avg=38.34797668457031, test_abs_avg=38.34963607788086
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7252277135848999, max_abs=3.25, mean_rel=0.26385697722435, max_rel=73.66064453125, norm_rel=0.02426731027662754, ref_abs_avg=30.002788543701172, test_abs_avg=30.05382537841797
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9375123381614685, max_abs=7.0, mean_rel=0.16225463151931763, max_rel=1153.2783203125, norm_rel=0.02598477154970169, ref_abs_avg=36.2186279296875, test_abs_avg=36.22057342529297
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.880990743637085, max_abs=6.0, mean_rel=0.28742414712905884, max_rel=2671.874755859375, norm_rel=0.024623898789286613, ref_abs_avg=35.930599212646484, test_abs_avg=35.931617736816406
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.664084792137146, max_abs=3.0, mean_rel=0.27788829803466797, max_rel=33.99971389770508, norm_rel=0.024006040766835213, ref_abs_avg=27.646896362304688, test_abs_avg=27.615211486816406
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8823970556259155, max_abs=7.0, mean_rel=0.15792682766914368, max_rel=1509.6234130859375, norm_rel=0.025788575410842896, ref_abs_avg=34.35588836669922, test_abs_avg=34.356475830078125
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8242357969284058, max_abs=5.0, mean_rel=0.26809728145599365, max_rel=3374.999755859375, norm_rel=0.024263104423880577, ref_abs_avg=34.01878356933594, test_abs_avg=34.023319244384766
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.679795503616333, max_abs=2.9375, mean_rel=0.2088773101568222, max_rel=63.8048210144043, norm_rel=0.024118922650814056, ref_abs_avg=28.422138214111328, test_abs_avg=28.446561813354492
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8326656818389893, max_abs=6.5, mean_rel=0.17265263199806213, max_rel=1992.3924560546875, norm_rel=0.02559180185198784, ref_abs_avg=32.64189147949219, test_abs_avg=32.64073944091797
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.777188241481781, max_abs=5.0, mean_rel=0.2326270192861557, max_rel=3281.249755859375, norm_rel=0.024240434169769287, ref_abs_avg=32.124820709228516, test_abs_avg=32.13214874267578
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6195989847183228, max_abs=3.0, mean_rel=0.11910790205001831, max_rel=11.17064380645752, norm_rel=0.024311084300279617, ref_abs_avg=25.578018188476562, test_abs_avg=25.544540405273438
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7895028591156006, max_abs=5.5, mean_rel=0.16669924557209015, max_rel=1669.2000732421875, norm_rel=0.025362834334373474, ref_abs_avg=31.2303466796875, test_abs_avg=31.229665756225586
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7313006520271301, max_abs=5.0, mean_rel=0.2529115676879883, max_rel=1874.9998779296875, norm_rel=0.02382032759487629, ref_abs_avg=30.770891189575195, test_abs_avg=30.770999908447266
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5882850885391235, max_abs=2.25, mean_rel=0.11122514307498932, max_rel=6.714174747467041, norm_rel=0.02230633608996868, ref_abs_avg=25.50808334350586, test_abs_avg=25.488792419433594
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7475310564041138, max_abs=5.5, mean_rel=0.17424531280994415, max_rel=1191.5491943359375, norm_rel=0.02527128905057907, ref_abs_avg=29.70465850830078, test_abs_avg=29.705488204956055
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.6978241801261902, max_abs=4.5, mean_rel=0.26164621114730835, max_rel=1843.7498779296875, norm_rel=0.023565245792269707, ref_abs_avg=29.688003540039062, test_abs_avg=29.696014404296875
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5603019595146179, max_abs=2.0, mean_rel=0.10549093782901764, max_rel=6.627584934234619, norm_rel=0.022784225642681122, ref_abs_avg=25.094926834106445, test_abs_avg=25.106246948242188
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.7183058261871338, max_abs=5.625, mean_rel=0.15756471455097198, max_rel=900.0115356445312, norm_rel=0.024757783859968185, ref_abs_avg=29.102567672729492, test_abs_avg=29.103883743286133
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.667857825756073, max_abs=4.25, mean_rel=0.27287885546684265, max_rel=1843.7498779296875, norm_rel=0.02350897714495659, ref_abs_avg=28.50307846069336, test_abs_avg=28.49871253967285
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.618057906627655, max_abs=2.5, mean_rel=0.3535439074039459, max_rel=135.09844970703125, norm_rel=0.024682803079485893, ref_abs_avg=25.26034927368164, test_abs_avg=25.297954559326172
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.8266832232475281, max_abs=6.0, mean_rel=0.18677332997322083, max_rel=2795.177734375, norm_rel=0.02683732658624649, ref_abs_avg=30.92561912536621, test_abs_avg=30.922531127929688
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7656170725822449, max_abs=4.875, mean_rel=0.28786399960517883, max_rel=2312.5, norm_rel=0.025123994797468185, ref_abs_avg=30.551387786865234, test_abs_avg=30.549701690673828
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.6072041988372803, max_abs=2.3125, mean_rel=0.1863633096218109, max_rel=43.84651565551758, norm_rel=0.02606327086687088, ref_abs_avg=23.69620704650879, test_abs_avg=23.708093643188477
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7455657124519348, max_abs=6.0, mean_rel=0.16936209797859192, max_rel=971.2791748046875, norm_rel=0.026407983154058456, ref_abs_avg=28.30862808227539, test_abs_avg=28.306425094604492
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.7024540305137634, max_abs=5.0, mean_rel=0.28292664885520935, max_rel=2187.5, norm_rel=0.024857578799128532, ref_abs_avg=28.314376831054688, test_abs_avg=28.315156936645508
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5708847045898438, max_abs=2.25, mean_rel=0.08182719349861145, max_rel=4.8732075691223145, norm_rel=0.02655201032757759, ref_abs_avg=21.69518280029297, test_abs_avg=21.71704864501953
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6915296912193298, max_abs=5.0, mean_rel=0.17243990302085876, max_rel=2184.488525390625, norm_rel=0.0257805697619915, ref_abs_avg=26.913257598876953, test_abs_avg=26.913150787353516
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6422226428985596, max_abs=4.0, mean_rel=0.2737218737602234, max_rel=1906.2498779296875, norm_rel=0.024223603308200836, ref_abs_avg=26.50649642944336, test_abs_avg=26.513591766357422
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5457191467285156, max_abs=2.125, mean_rel=0.09120003134012222, max_rel=2.7507050037384033, norm_rel=0.025753116235136986, ref_abs_avg=20.57448387145996, test_abs_avg=20.586950302124023
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6387416124343872, max_abs=6.0, mean_rel=0.1678033173084259, max_rel=1142.4459228515625, norm_rel=0.02524806559085846, ref_abs_avg=25.374771118164062, test_abs_avg=25.372251510620117
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.5981165766716003, max_abs=4.25, mean_rel=0.2477068454027176, max_rel=1703.1248779296875, norm_rel=0.02360096015036106, ref_abs_avg=25.331562042236328, test_abs_avg=25.331798553466797
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4695749282836914, max_abs=1.65625, mean_rel=0.10013407468795776, max_rel=13.062490463256836, norm_rel=0.02274969220161438, ref_abs_avg=20.36969757080078, test_abs_avg=20.355939865112305
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.6039891242980957, max_abs=6.5, mean_rel=0.1674565076828003, max_rel=1364.4859619140625, norm_rel=0.02475721575319767, ref_abs_avg=24.43368911743164, test_abs_avg=24.43444061279297
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.560133159160614, max_abs=4.0, mean_rel=0.2332913875579834, max_rel=2375.0, norm_rel=0.022923607379198074, ref_abs_avg=24.402387619018555, test_abs_avg=24.396137237548828
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.44340837001800537, max_abs=1.62890625, mean_rel=0.12954437732696533, max_rel=14.247154235839844, norm_rel=0.024281533434987068, ref_abs_avg=18.055755615234375, test_abs_avg=18.021324157714844
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5722213983535767, max_abs=4.375, mean_rel=0.15536372363567352, max_rel=627.1193237304688, norm_rel=0.024293819442391396, ref_abs_avg=23.56433868408203, test_abs_avg=23.562702178955078
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5289388298988342, max_abs=4.0, mean_rel=0.22946351766586304, max_rel=1718.7498779296875, norm_rel=0.022636719048023224, ref_abs_avg=23.367782592773438, test_abs_avg=23.36530113220215
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.40995049476623535, max_abs=1.5, mean_rel=0.0766352117061615, max_rel=4.488271713256836, norm_rel=0.022772060707211494, ref_abs_avg=17.885040283203125, test_abs_avg=17.921100616455078
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5428876876831055, max_abs=4.25, mean_rel=0.1528533399105072, max_rel=937.0219116210938, norm_rel=0.023970846086740494, ref_abs_avg=22.66676139831543, test_abs_avg=22.66686248779297
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5008252263069153, max_abs=3.5, mean_rel=0.240648552775383, max_rel=1437.4998779296875, norm_rel=0.022674892097711563, ref_abs_avg=22.126178741455078, test_abs_avg=22.11807632446289
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.4080315828323364, max_abs=1.75, mean_rel=0.09562461823225021, max_rel=5.5900068283081055, norm_rel=0.02230004221200943, ref_abs_avg=17.994375228881836, test_abs_avg=17.987178802490234
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.517330527305603, max_abs=4.0, mean_rel=0.15051105618476868, max_rel=865.5225830078125, norm_rel=0.02326561138033867, ref_abs_avg=22.236278533935547, test_abs_avg=22.237533569335938
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.47779005765914917, max_abs=3.5, mean_rel=0.18924576044082642, max_rel=1062.5, norm_rel=0.021900832653045654, ref_abs_avg=21.809003829956055, test_abs_avg=21.809232711791992
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4425315856933594, max_abs=1.5625, mean_rel=0.06586867570877075, max_rel=3.8412868976593018, norm_rel=0.02273814007639885, ref_abs_avg=19.610614776611328, test_abs_avg=19.615089416503906
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5648384094238281, max_abs=6.0, mean_rel=0.15513953566551208, max_rel=893.142578125, norm_rel=0.024186914786696434, ref_abs_avg=23.357080459594727, test_abs_avg=23.357982635498047
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5203309059143066, max_abs=3.8125, mean_rel=0.21497328579425812, max_rel=1499.9998779296875, norm_rel=0.022609233856201172, ref_abs_avg=23.05202865600586, test_abs_avg=23.049877166748047
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.41733741760253906, max_abs=1.5, mean_rel=0.06540289521217346, max_rel=2.6591358184814453, norm_rel=0.023095928132534027, ref_abs_avg=18.313594818115234, test_abs_avg=18.314661026000977
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5298852324485779, max_abs=4.5, mean_rel=0.14661866426467896, max_rel=619.93212890625, norm_rel=0.023790279403328896, ref_abs_avg=22.345035552978516, test_abs_avg=22.34557342529297
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.4901171922683716, max_abs=4.0, mean_rel=0.2199845314025879, max_rel=1953.1248779296875, norm_rel=0.022150451317429543, ref_abs_avg=22.141401290893555, test_abs_avg=22.136655807495117
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.3890652656555176, max_abs=1.5, mean_rel=0.07027832418680191, max_rel=3.6570208072662354, norm_rel=0.02208368107676506, ref_abs_avg=17.50365447998047, test_abs_avg=17.46218490600586
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.4998764097690582, max_abs=5.0, mean_rel=0.152219757437706, max_rel=1269.2662353515625, norm_rel=0.02318592183291912, ref_abs_avg=21.63033676147461, test_abs_avg=21.629417419433594
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.45369499921798706, max_abs=3.75, mean_rel=0.197933167219162, max_rel=1296.8748779296875, norm_rel=0.02181175723671913, ref_abs_avg=20.81479263305664, test_abs_avg=20.812786102294922
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.37446367740631104, max_abs=1.4375, mean_rel=0.0867757648229599, max_rel=4.934399604797363, norm_rel=0.021781083196401596, ref_abs_avg=17.291542053222656, test_abs_avg=17.27645492553711
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.4670872688293457, max_abs=4.25, mean_rel=0.137263685464859, max_rel=723.8335571289062, norm_rel=0.022905593737959862, ref_abs_avg=20.474063873291016, test_abs_avg=20.474037170410156
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.4259909987449646, max_abs=3.625, mean_rel=0.22166728973388672, max_rel=1250.0, norm_rel=0.020630165934562683, ref_abs_avg=20.639244079589844, test_abs_avg=20.651283264160156
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.34117579460144043, max_abs=1.375, mean_rel=0.10321897268295288, max_rel=12.118547439575195, norm_rel=0.020673835650086403, ref_abs_avg=16.732683181762695, test_abs_avg=16.764068603515625
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4443570077419281, max_abs=5.0, mean_rel=0.1446053385734558, max_rel=1541.6875, norm_rel=0.02236163429915905, ref_abs_avg=19.98227310180664, test_abs_avg=19.983030319213867
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.4036344885826111, max_abs=3.75, mean_rel=0.17946982383728027, max_rel=1156.25, norm_rel=0.02079077810049057, ref_abs_avg=19.44961166381836, test_abs_avg=19.44367027282715
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.3366560935974121, max_abs=1.1875, mean_rel=0.06839767098426819, max_rel=2.019014596939087, norm_rel=0.02039405144751072, ref_abs_avg=16.20738983154297, test_abs_avg=16.21099281311035
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.4207456707954407, max_abs=4.5, mean_rel=0.13384146988391876, max_rel=607.6271362304688, norm_rel=0.02208344079554081, ref_abs_avg=19.22994613647461, test_abs_avg=19.23104476928711
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.36750757694244385, max_abs=3.875, mean_rel=0.18169207870960236, max_rel=1093.75, norm_rel=0.019674237817525864, ref_abs_avg=18.82971954345703, test_abs_avg=18.837717056274414
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.303499698638916, max_abs=1.25, mean_rel=0.08820011466741562, max_rel=15.602392196655273, norm_rel=0.018500249832868576, ref_abs_avg=16.653289794921875, test_abs_avg=16.667316436767578
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.3891524076461792, max_abs=4.5, mean_rel=0.1264590620994568, max_rel=706.2421264648438, norm_rel=0.02133117988705635, ref_abs_avg=18.498981475830078, test_abs_avg=18.497905731201172
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3557860255241394, max_abs=3.875, mean_rel=0.17449596524238586, max_rel=1062.5, norm_rel=0.019880568608641624, ref_abs_avg=18.107589721679688, test_abs_avg=18.122249603271484
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.31254422664642334, max_abs=1.5, mean_rel=0.06339585036039352, max_rel=3.8694138526916504, norm_rel=0.020438114181160927, ref_abs_avg=15.742973327636719, test_abs_avg=15.70394515991211
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3675740361213684, max_abs=4.75, mean_rel=0.12427549064159393, max_rel=611.6472778320312, norm_rel=0.02092980593442917, ref_abs_avg=17.8472900390625, test_abs_avg=17.845985412597656
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.3247195780277252, max_abs=3.5, mean_rel=0.14860278367996216, max_rel=1250.0, norm_rel=0.01834586262702942, ref_abs_avg=17.846160888671875, test_abs_avg=17.854942321777344
production_forward2 vs paper_forward output: mean_abs=0.001622954849153757, max_abs=0.03936767578125
production_forward2 grad[0] vs paper_forward: mean_abs=0.008618341758847237, max_abs=0.34375, mean_rel=0.07471009343862534, max_rel=143.45785522460938, norm_rel=0.020489094778895378, ref_abs_avg=0.4543583393096924, test_abs_avg=0.45435863733291626
production_forward2 grad[1] vs paper_forward: mean_abs=7.309306621551514, max_abs=64.0, mean_rel=0.1662234216928482, max_rel=332.86517333984375, norm_rel=0.02072340063750744, ref_abs_avg=312.56329345703125, test_abs_avg=312.6120910644531
production_forward2 grad[2] vs paper_forward: mean_abs=1.294403076171875, max_abs=4.8125, mean_rel=0.09862012416124344, max_rel=7.6673359870910645, norm_rel=0.024216553196310997, ref_abs_avg=52.77109146118164, test_abs_avg=52.67282485961914
production_forward2 grad[3] vs paper_forward: mean_abs=1.6406900882720947, max_abs=12.0, mean_rel=0.17607679963111877, max_rel=2938.5068359375, norm_rel=0.025160860270261765, ref_abs_avg=65.59686279296875, test_abs_avg=65.59607696533203
production_forward2 grad[4] vs paper_forward: mean_abs=1.5131826400756836, max_abs=10.59375, mean_rel=0.44026312232017517, max_rel=5125.0, norm_rel=0.023479007184505463, ref_abs_avg=64.83712768554688, test_abs_avg=64.84288024902344
production_forward2 grad[5] vs paper_forward: mean_abs=1.1565899848937988, max_abs=4.0625, mean_rel=0.1980377435684204, max_rel=48.886558532714844, norm_rel=0.024727480486035347, ref_abs_avg=45.942832946777344, test_abs_avg=45.955413818359375
production_forward2 grad[6] vs paper_forward: mean_abs=1.438621997833252, max_abs=10.0, mean_rel=0.17322684824466705, max_rel=2577.331298828125, norm_rel=0.02490806020796299, ref_abs_avg=58.08663558959961, test_abs_avg=58.08269119262695
production_forward2 grad[7] vs paper_forward: mean_abs=1.332421064376831, max_abs=8.5, mean_rel=0.3457374572753906, max_rel=4250.0, norm_rel=0.02327112853527069, ref_abs_avg=57.473087310791016, test_abs_avg=57.47073745727539
production_forward2 grad[8] vs paper_forward: mean_abs=0.9797840118408203, max_abs=3.875, mean_rel=0.10491427779197693, max_rel=7.811600685119629, norm_rel=0.022051816806197166, ref_abs_avg=44.297142028808594, test_abs_avg=44.202720642089844
production_forward2 grad[9] vs paper_forward: mean_abs=1.291883111000061, max_abs=9.0, mean_rel=0.17436370253562927, max_rel=1267.73486328125, norm_rel=0.024714650586247444, ref_abs_avg=52.56721878051758, test_abs_avg=52.56949234008789
production_forward2 grad[10] vs paper_forward: mean_abs=1.1960910558700562, max_abs=7.125, mean_rel=0.41440120339393616, max_rel=5562.49951171875, norm_rel=0.023194288834929466, ref_abs_avg=51.85448455810547, test_abs_avg=51.859031677246094
production_forward2 grad[11] vs paper_forward: mean_abs=0.9202365875244141, max_abs=3.125, mean_rel=0.09860105067491531, max_rel=15.70778751373291, norm_rel=0.02348512038588524, ref_abs_avg=39.76739501953125, test_abs_avg=39.82506561279297
production_forward2 grad[12] vs paper_forward: mean_abs=1.196050763130188, max_abs=8.5, mean_rel=0.16822248697280884, max_rel=1897.6956787109375, norm_rel=0.024567484855651855, ref_abs_avg=48.99323272705078, test_abs_avg=48.99338150024414
production_forward2 grad[13] vs paper_forward: mean_abs=1.1047286987304688, max_abs=6.75, mean_rel=0.3259357810020447, max_rel=5749.99951171875, norm_rel=0.02296512760221958, ref_abs_avg=48.356754302978516, test_abs_avg=48.35602569580078
production_forward2 grad[14] vs paper_forward: mean_abs=0.9040813446044922, max_abs=3.5625, mean_rel=0.10500243306159973, max_rel=10.896833419799805, norm_rel=0.022330285981297493, ref_abs_avg=39.879486083984375, test_abs_avg=39.86685562133789
production_forward2 grad[15] vs paper_forward: mean_abs=1.116745114326477, max_abs=8.0, mean_rel=0.1636662632226944, max_rel=1382.269287109375, norm_rel=0.02432164177298546, ref_abs_avg=46.20357131958008, test_abs_avg=46.205482482910156
production_forward2 grad[16] vs paper_forward: mean_abs=1.0323597192764282, max_abs=7.0, mean_rel=0.35462018847465515, max_rel=3187.499755859375, norm_rel=0.022857144474983215, ref_abs_avg=45.401756286621094, test_abs_avg=45.39863967895508
production_forward2 grad[17] vs paper_forward: mean_abs=0.7669970989227295, max_abs=3.0, mean_rel=0.07182439416646957, max_rel=3.513188123703003, norm_rel=0.021421173587441444, ref_abs_avg=35.46287155151367, test_abs_avg=35.53325653076172
production_forward2 grad[18] vs paper_forward: mean_abs=1.0492637157440186, max_abs=7.0, mean_rel=0.1737874448299408, max_rel=1539.24462890625, norm_rel=0.024258678779006004, ref_abs_avg=43.54027557373047, test_abs_avg=43.54315185546875
production_forward2 grad[19] vs paper_forward: mean_abs=0.9687963724136353, max_abs=6.0, mean_rel=0.29817232489585876, max_rel=2874.999755859375, norm_rel=0.022585967555642128, ref_abs_avg=43.077884674072266, test_abs_avg=43.07685852050781
production_forward2 grad[20] vs paper_forward: mean_abs=0.8064441680908203, max_abs=2.875, mean_rel=0.09876050055027008, max_rel=13.547050476074219, norm_rel=0.023319125175476074, ref_abs_avg=34.119895935058594, test_abs_avg=34.034950256347656
production_forward2 grad[21] vs paper_forward: mean_abs=0.9941779375076294, max_abs=6.5, mean_rel=0.1511843502521515, max_rel=982.3236083984375, norm_rel=0.024058910086750984, ref_abs_avg=41.540916442871094, test_abs_avg=41.54603576660156
production_forward2 grad[22] vs paper_forward: mean_abs=0.9131505489349365, max_abs=5.75, mean_rel=0.3198315501213074, max_rel=2749.999755859375, norm_rel=0.022238895297050476, ref_abs_avg=41.20490646362305, test_abs_avg=41.20151138305664
production_forward2 grad[23] vs paper_forward: mean_abs=0.7184562683105469, max_abs=3.0, mean_rel=0.08318646252155304, max_rel=4.852043628692627, norm_rel=0.022331172600388527, ref_abs_avg=32.770477294921875, test_abs_avg=32.78633117675781
production_forward2 grad[24] vs paper_forward: mean_abs=0.9424668550491333, max_abs=7.0, mean_rel=0.15581181645393372, max_rel=1721.4910888671875, norm_rel=0.02390160970389843, ref_abs_avg=39.662315368652344, test_abs_avg=39.661800384521484
production_forward2 grad[25] vs paper_forward: mean_abs=0.8665506839752197, max_abs=5.625, mean_rel=0.29583922028541565, max_rel=2375.0, norm_rel=0.02212061919271946, ref_abs_avg=39.334991455078125, test_abs_avg=39.329689025878906
production_forward2 grad[26] vs paper_forward: mean_abs=0.8831171989440918, max_abs=3.25, mean_rel=0.08529412746429443, max_rel=3.1806957721710205, norm_rel=0.02749807760119438, ref_abs_avg=32.66877365112305, test_abs_avg=32.7814826965332
production_forward2 grad[27] vs paper_forward: mean_abs=1.0763919353485107, max_abs=8.0, mean_rel=0.16937971115112305, max_rel=1391.295654296875, norm_rel=0.02577131800353527, ref_abs_avg=41.94619369506836, test_abs_avg=41.94499969482422
production_forward2 grad[28] vs paper_forward: mean_abs=1.004709005355835, max_abs=6.0, mean_rel=0.33110660314559937, max_rel=4125.0, norm_rel=0.024423521012067795, ref_abs_avg=41.31233215332031, test_abs_avg=41.31666946411133
production_forward2 grad[29] vs paper_forward: mean_abs=0.8472623825073242, max_abs=3.0, mean_rel=0.09750614315271378, max_rel=6.3268256187438965, norm_rel=0.026746239513158798, ref_abs_avg=31.037708282470703, test_abs_avg=31.140796661376953
production_forward2 grad[30] vs paper_forward: mean_abs=1.0055434703826904, max_abs=7.0, mean_rel=0.18370646238327026, max_rel=1409.177734375, norm_rel=0.02598799765110016, ref_abs_avg=38.84492111206055, test_abs_avg=38.846946716308594
production_forward2 grad[31] vs paper_forward: mean_abs=0.9404537677764893, max_abs=7.0, mean_rel=0.3054429590702057, max_rel=2937.499755859375, norm_rel=0.02465372532606125, ref_abs_avg=38.34797668457031, test_abs_avg=38.35154724121094
production_forward2 grad[32] vs paper_forward: mean_abs=0.7187846899032593, max_abs=3.125, mean_rel=0.13916605710983276, max_rel=14.859827995300293, norm_rel=0.02422412857413292, ref_abs_avg=30.002788543701172, test_abs_avg=30.061649322509766
production_forward2 grad[33] vs paper_forward: mean_abs=0.9376535415649414, max_abs=6.5, mean_rel=0.16420960426330566, max_rel=1939.1571044921875, norm_rel=0.02599402889609337, ref_abs_avg=36.2186279296875, test_abs_avg=36.220733642578125
production_forward2 grad[34] vs paper_forward: mean_abs=0.8782665729522705, max_abs=5.6875, mean_rel=0.28575268387794495, max_rel=2281.25, norm_rel=0.024532770738005638, ref_abs_avg=35.930599212646484, test_abs_avg=35.92949295043945
production_forward2 grad[35] vs paper_forward: mean_abs=0.6904768347740173, max_abs=2.5, mean_rel=0.3546421527862549, max_rel=51.29098892211914, norm_rel=0.02430298924446106, ref_abs_avg=27.646896362304688, test_abs_avg=27.610496520996094
production_forward2 grad[36] vs paper_forward: mean_abs=0.8823140859603882, max_abs=6.0, mean_rel=0.1602632999420166, max_rel=1217.5589599609375, norm_rel=0.025784242898225784, ref_abs_avg=34.35588836669922, test_abs_avg=34.357059478759766
production_forward2 grad[37] vs paper_forward: mean_abs=0.8218919634819031, max_abs=5.25, mean_rel=0.2598000168800354, max_rel=2250.0, norm_rel=0.024203110486268997, ref_abs_avg=34.01878356933594, test_abs_avg=34.0224609375
production_forward2 grad[38] vs paper_forward: mean_abs=0.6581549644470215, max_abs=2.75, mean_rel=0.12179835885763168, max_rel=21.63023567199707, norm_rel=0.0234635341912508, ref_abs_avg=28.422138214111328, test_abs_avg=28.4506778717041
production_forward2 grad[39] vs paper_forward: mean_abs=0.8321986794471741, max_abs=6.0, mean_rel=0.17299702763557434, max_rel=1615.652099609375, norm_rel=0.025569263845682144, ref_abs_avg=32.64189147949219, test_abs_avg=32.64137268066406
production_forward2 grad[40] vs paper_forward: mean_abs=0.7769180536270142, max_abs=4.625, mean_rel=0.2540922164916992, max_rel=3343.749755859375, norm_rel=0.024225490167737007, ref_abs_avg=32.124820709228516, test_abs_avg=32.13130569458008
production_forward2 grad[41] vs paper_forward: mean_abs=0.6030049324035645, max_abs=2.5, mean_rel=0.20632445812225342, max_rel=37.574737548828125, norm_rel=0.023547155782580376, ref_abs_avg=25.578018188476562, test_abs_avg=25.55367660522461
production_forward2 grad[42] vs paper_forward: mean_abs=0.7900124788284302, max_abs=5.75, mean_rel=0.1636076271533966, max_rel=1824.9652099609375, norm_rel=0.02536633051931858, ref_abs_avg=31.2303466796875, test_abs_avg=31.22832679748535
production_forward2 grad[43] vs paper_forward: mean_abs=0.7327291369438171, max_abs=4.8125, mean_rel=0.2449284791946411, max_rel=1812.4998779296875, norm_rel=0.023886747658252716, ref_abs_avg=30.770891189575195, test_abs_avg=30.771556854248047
production_forward2 grad[44] vs paper_forward: mean_abs=0.5908880233764648, max_abs=2.375, mean_rel=0.09474250674247742, max_rel=5.017451763153076, norm_rel=0.022761771455407143, ref_abs_avg=25.50808334350586, test_abs_avg=25.489683151245117
production_forward2 grad[45] vs paper_forward: mean_abs=0.7456744909286499, max_abs=5.5, mean_rel=0.1686241775751114, max_rel=1076.50634765625, norm_rel=0.02521517686545849, ref_abs_avg=29.70465850830078, test_abs_avg=29.70537567138672
production_forward2 grad[46] vs paper_forward: mean_abs=0.6976312398910522, max_abs=4.5, mean_rel=0.25273922085762024, max_rel=1624.9998779296875, norm_rel=0.02356722019612789, ref_abs_avg=29.688003540039062, test_abs_avg=29.695045471191406
production_forward2 grad[47] vs paper_forward: mean_abs=0.5572105646133423, max_abs=2.0, mean_rel=0.11149032413959503, max_rel=5.761233806610107, norm_rel=0.022349951788783073, ref_abs_avg=25.094926834106445, test_abs_avg=25.110654830932617
production_forward2 grad[48] vs paper_forward: mean_abs=0.7176893949508667, max_abs=5.5, mean_rel=0.15538069605827332, max_rel=1040.084716796875, norm_rel=0.02472979947924614, ref_abs_avg=29.102567672729492, test_abs_avg=29.103618621826172
production_forward2 grad[49] vs paper_forward: mean_abs=0.6674594879150391, max_abs=4.5, mean_rel=0.2526475787162781, max_rel=1843.7498779296875, norm_rel=0.023477207869291306, ref_abs_avg=28.50307846069336, test_abs_avg=28.500734329223633
production_forward2 grad[50] vs paper_forward: mean_abs=0.6274153590202332, max_abs=2.609375, mean_rel=0.3553827404975891, max_rel=128.921142578125, norm_rel=0.024855267256498337, ref_abs_avg=25.26034927368164, test_abs_avg=25.295562744140625
production_forward2 grad[51] vs paper_forward: mean_abs=0.8265042901039124, max_abs=6.0, mean_rel=0.1825784146785736, max_rel=1784.7723388671875, norm_rel=0.026825280860066414, ref_abs_avg=30.92561912536621, test_abs_avg=30.9234561920166
production_forward2 grad[52] vs paper_forward: mean_abs=0.7669751644134521, max_abs=4.875, mean_rel=0.30537450313568115, max_rel=2749.999755859375, norm_rel=0.02518410049378872, ref_abs_avg=30.551387786865234, test_abs_avg=30.549163818359375
production_forward2 grad[53] vs paper_forward: mean_abs=0.6045857667922974, max_abs=2.625, mean_rel=0.8061472773551941, max_rel=365.23004150390625, norm_rel=0.025839703157544136, ref_abs_avg=23.69620704650879, test_abs_avg=23.713781356811523
production_forward2 grad[54] vs paper_forward: mean_abs=0.7456534504890442, max_abs=5.5, mean_rel=0.17349721491336823, max_rel=1237.940673828125, norm_rel=0.026409495621919632, ref_abs_avg=28.30862808227539, test_abs_avg=28.307266235351562
production_forward2 grad[55] vs paper_forward: mean_abs=0.6996740698814392, max_abs=5.0, mean_rel=0.2879320979118347, max_rel=2500.0, norm_rel=0.024755483493208885, ref_abs_avg=28.314376831054688, test_abs_avg=28.317955017089844
production_forward2 grad[56] vs paper_forward: mean_abs=0.5978126525878906, max_abs=2.0, mean_rel=0.07788616418838501, max_rel=2.468862533569336, norm_rel=0.027310123667120934, ref_abs_avg=21.69518280029297, test_abs_avg=21.723407745361328
production_forward2 grad[57] vs paper_forward: mean_abs=0.6915078163146973, max_abs=5.0, mean_rel=0.17158202826976776, max_rel=1663.0533447265625, norm_rel=0.025767413899302483, ref_abs_avg=26.913257598876953, test_abs_avg=26.912179946899414
production_forward2 grad[58] vs paper_forward: mean_abs=0.6410669088363647, max_abs=4.75, mean_rel=0.26938194036483765, max_rel=2375.0, norm_rel=0.02416132763028145, ref_abs_avg=26.50649642944336, test_abs_avg=26.510759353637695
production_forward2 grad[59] vs paper_forward: mean_abs=0.5226545333862305, max_abs=1.9375, mean_rel=0.08539166301488876, max_rel=3.051417112350464, norm_rel=0.024800850078463554, ref_abs_avg=20.57448387145996, test_abs_avg=20.588165283203125
production_forward2 grad[60] vs paper_forward: mean_abs=0.6386798620223999, max_abs=5.0, mean_rel=0.16783030331134796, max_rel=1487.2415771484375, norm_rel=0.025238221511244774, ref_abs_avg=25.374771118164062, test_abs_avg=25.372278213500977
production_forward2 grad[61] vs paper_forward: mean_abs=0.5971934795379639, max_abs=4.0, mean_rel=0.2505461871623993, max_rel=1437.4998779296875, norm_rel=0.023577744141221046, ref_abs_avg=25.331562042236328, test_abs_avg=25.333946228027344
production_forward2 grad[62] vs paper_forward: mean_abs=0.461148738861084, max_abs=1.8125, mean_rel=0.09445488452911377, max_rel=14.333233833312988, norm_rel=0.022700048983097076, ref_abs_avg=20.36969757080078, test_abs_avg=20.35654067993164
production_forward2 grad[63] vs paper_forward: mean_abs=0.60362708568573, max_abs=4.75, mean_rel=0.16180437803268433, max_rel=1139.0987548828125, norm_rel=0.02474011480808258, ref_abs_avg=24.43368911743164, test_abs_avg=24.434974670410156
production_forward2 grad[64] vs paper_forward: mean_abs=0.5609679222106934, max_abs=4.0, mean_rel=0.23495878279209137, max_rel=2500.0, norm_rel=0.022954165935516357, ref_abs_avg=24.402387619018555, test_abs_avg=24.397737503051758
production_forward2 grad[65] vs paper_forward: mean_abs=0.4393789768218994, max_abs=1.56640625, mean_rel=0.11871995031833649, max_rel=11.888750076293945, norm_rel=0.024095380678772926, ref_abs_avg=18.055755615234375, test_abs_avg=18.040306091308594
production_forward2 grad[66] vs paper_forward: mean_abs=0.5722808241844177, max_abs=5.25, mean_rel=0.15400415658950806, max_rel=853.1878662109375, norm_rel=0.024298863485455513, ref_abs_avg=23.56433868408203, test_abs_avg=23.561859130859375
production_forward2 grad[67] vs paper_forward: mean_abs=0.5294010639190674, max_abs=3.5, mean_rel=0.24136283993721008, max_rel=1671.8748779296875, norm_rel=0.022664399817585945, ref_abs_avg=23.367782592773438, test_abs_avg=23.364791870117188
production_forward2 grad[68] vs paper_forward: mean_abs=0.41421008110046387, max_abs=1.625, mean_rel=0.0974174290895462, max_rel=13.546717643737793, norm_rel=0.023307496681809425, ref_abs_avg=17.885040283203125, test_abs_avg=17.927757263183594
production_forward2 grad[69] vs paper_forward: mean_abs=0.543369472026825, max_abs=4.5, mean_rel=0.15220963954925537, max_rel=1251.6842041015625, norm_rel=0.024002378806471825, ref_abs_avg=22.66676139831543, test_abs_avg=22.666526794433594
production_forward2 grad[70] vs paper_forward: mean_abs=0.4994013011455536, max_abs=4.0, mean_rel=0.23488061130046844, max_rel=1187.5, norm_rel=0.022609727457165718, ref_abs_avg=22.126178741455078, test_abs_avg=22.120729446411133
production_forward2 grad[71] vs paper_forward: mean_abs=0.38361406326293945, max_abs=1.5, mean_rel=0.09883399307727814, max_rel=3.35026478767395, norm_rel=0.021384235471487045, ref_abs_avg=17.994375228881836, test_abs_avg=17.99594497680664
production_forward2 grad[72] vs paper_forward: mean_abs=0.5165113210678101, max_abs=4.0, mean_rel=0.1512090265750885, max_rel=740.1299438476562, norm_rel=0.023243658244609833, ref_abs_avg=22.236278533935547, test_abs_avg=22.236730575561523
production_forward2 grad[73] vs paper_forward: mean_abs=0.47581279277801514, max_abs=3.75, mean_rel=0.19397664070129395, max_rel=1156.25, norm_rel=0.021812979131937027, ref_abs_avg=21.809003829956055, test_abs_avg=21.80735206604004
production_forward2 grad[74] vs paper_forward: mean_abs=0.4338340759277344, max_abs=1.5, mean_rel=0.06861161440610886, max_rel=5.332610130310059, norm_rel=0.02219904400408268, ref_abs_avg=19.610614776611328, test_abs_avg=19.609725952148438
production_forward2 grad[75] vs paper_forward: mean_abs=0.5639302730560303, max_abs=4.5, mean_rel=0.15444420278072357, max_rel=1264.0849609375, norm_rel=0.024147989228367805, ref_abs_avg=23.357080459594727, test_abs_avg=23.357975006103516
production_forward2 grad[76] vs paper_forward: mean_abs=0.5179383754730225, max_abs=3.375, mean_rel=0.2112576812505722, max_rel=1656.2498779296875, norm_rel=0.02249329537153244, ref_abs_avg=23.05202865600586, test_abs_avg=23.051284790039062
production_forward2 grad[77] vs paper_forward: mean_abs=0.402651309967041, max_abs=1.5, mean_rel=0.05644501745700836, max_rel=1.1712126731872559, norm_rel=0.022254247218370438, ref_abs_avg=18.313594818115234, test_abs_avg=18.303936004638672
production_forward2 grad[78] vs paper_forward: mean_abs=0.5289593935012817, max_abs=4.25, mean_rel=0.1458844244480133, max_rel=695.7244873046875, norm_rel=0.023739900439977646, ref_abs_avg=22.345035552978516, test_abs_avg=22.34589958190918
production_forward2 grad[79] vs paper_forward: mean_abs=0.4885796010494232, max_abs=4.0, mean_rel=0.2213180661201477, max_rel=1218.75, norm_rel=0.022079236805438995, ref_abs_avg=22.141401290893555, test_abs_avg=22.13895034790039
production_forward2 grad[80] vs paper_forward: mean_abs=0.40660035610198975, max_abs=1.75, mean_rel=0.08763298392295837, max_rel=7.6092143058776855, norm_rel=0.023179473355412483, ref_abs_avg=17.50365447998047, test_abs_avg=17.469593048095703
production_forward2 grad[81] vs paper_forward: mean_abs=0.49990054965019226, max_abs=4.5, mean_rel=0.1497924029827118, max_rel=1435.2054443359375, norm_rel=0.023169726133346558, ref_abs_avg=21.63033676147461, test_abs_avg=21.630268096923828
production_forward2 grad[82] vs paper_forward: mean_abs=0.45646369457244873, max_abs=4.25, mean_rel=0.21459564566612244, max_rel=1499.9998779296875, norm_rel=0.021968314424157143, ref_abs_avg=20.81479263305664, test_abs_avg=20.81575584411621
production_forward2 grad[83] vs paper_forward: mean_abs=0.3812439441680908, max_abs=1.5625, mean_rel=0.08083656430244446, max_rel=5.0782599449157715, norm_rel=0.021925685927271843, ref_abs_avg=17.291542053222656, test_abs_avg=17.264816284179688
production_forward2 grad[84] vs paper_forward: mean_abs=0.46660810708999634, max_abs=4.5, mean_rel=0.1363305151462555, max_rel=582.88134765625, norm_rel=0.022875091060996056, ref_abs_avg=20.474063873291016, test_abs_avg=20.474258422851562
production_forward2 grad[85] vs paper_forward: mean_abs=0.4245447516441345, max_abs=4.0, mean_rel=0.2304004281759262, max_rel=1874.9998779296875, norm_rel=0.02055060863494873, ref_abs_avg=20.639244079589844, test_abs_avg=20.647600173950195
production_forward2 grad[86] vs paper_forward: mean_abs=0.33950066566467285, max_abs=1.25, mean_rel=0.1097659021615982, max_rel=17.731534957885742, norm_rel=0.020571904256939888, ref_abs_avg=16.732683181762695, test_abs_avg=16.77522087097168
production_forward2 grad[87] vs paper_forward: mean_abs=0.44347333908081055, max_abs=6.0, mean_rel=0.1448824405670166, max_rel=1705.2265625, norm_rel=0.02232181653380394, ref_abs_avg=19.98227310180664, test_abs_avg=19.983417510986328
production_forward2 grad[88] vs paper_forward: mean_abs=0.40372800827026367, max_abs=3.5, mean_rel=0.18095353245735168, max_rel=820.3124389648438, norm_rel=0.020797278732061386, ref_abs_avg=19.44961166381836, test_abs_avg=19.446067810058594
production_forward2 grad[89] vs paper_forward: mean_abs=0.32431507110595703, max_abs=1.125, mean_rel=0.07585425674915314, max_rel=3.9611778259277344, norm_rel=0.01998264715075493, ref_abs_avg=16.20738983154297, test_abs_avg=16.216745376586914
production_forward2 grad[90] vs paper_forward: mean_abs=0.4208695888519287, max_abs=5.0, mean_rel=0.1363944113254547, max_rel=707.6943359375, norm_rel=0.022092651575803757, ref_abs_avg=19.22994613647461, test_abs_avg=19.231353759765625
production_forward2 grad[91] vs paper_forward: mean_abs=0.36713898181915283, max_abs=3.5, mean_rel=0.18652892112731934, max_rel=1187.5, norm_rel=0.019656889140605927, ref_abs_avg=18.82971954345703, test_abs_avg=18.837129592895508
production_forward2 grad[92] vs paper_forward: mean_abs=0.31003737449645996, max_abs=1.25, mean_rel=0.11397328972816467, max_rel=12.681290626525879, norm_rel=0.01911891996860504, ref_abs_avg=16.653289794921875, test_abs_avg=16.674619674682617
production_forward2 grad[93] vs paper_forward: mean_abs=0.3882303833961487, max_abs=4.25, mean_rel=0.12526389956474304, max_rel=568.7848510742188, norm_rel=0.021294448524713516, ref_abs_avg=18.498981475830078, test_abs_avg=18.499069213867188
production_forward2 grad[94] vs paper_forward: mean_abs=0.35724717378616333, max_abs=3.5, mean_rel=0.17615589499473572, max_rel=1039.0625, norm_rel=0.019986633211374283, ref_abs_avg=18.107589721679688, test_abs_avg=18.11867332458496
production_forward2 grad[95] vs paper_forward: mean_abs=0.29448509216308594, max_abs=1.5, mean_rel=0.056718431413173676, max_rel=4.356293678283691, norm_rel=0.019460733979940414, ref_abs_avg=15.742973327636719, test_abs_avg=15.710733413696289
production_forward2 grad[96] vs paper_forward: mean_abs=0.3668394386768341, max_abs=4.5, mean_rel=0.12694381177425385, max_rel=622.2876586914062, norm_rel=0.020889008417725563, ref_abs_avg=17.8472900390625, test_abs_avg=17.84670639038086
production_forward2 grad[97] vs paper_forward: mean_abs=0.3250695466995239, max_abs=3.3125, mean_rel=0.15252266824245453, max_rel=1109.375, norm_rel=0.0183539018034935, ref_abs_avg=17.846160888671875, test_abs_avg=17.85053253173828
identity layers + randn queries
mean abs randn paper: 0.21484375
production_forward2 fwd+bwd:  243.399 ms
production_forward2 fwd-only: 24.782 ms
production_forward2 bwd-only: 219.124 ms
production_forward2 peak allocated: fwd=2.692 GiB, fwd+bwd=6.071 GiB
production_forward2 peak reserved:  fwd=2.975 GiB, fwd+bwd=8.725 GiB
torch_compile_phases_forward fwd+bwd:  260.424 ms
torch_compile_phases_forward fwd-only: 43.669 ms
torch_compile_phases_forward bwd-only: 213.842 ms
torch_compile_phases_forward peak allocated: fwd=5.342 GiB, fwd+bwd=6.469 GiB
torch_compile_phases_forward peak reserved:  fwd=5.850 GiB, fwd+bwd=9.850 GiB
paper_forward fwd+bwd:  536.057 ms
paper_forward fwd-only: 97.297 ms
paper_forward bwd-only: 439.688 ms
paper_forward peak allocated: fwd=6.194 GiB, fwd+bwd=10.068 GiB
paper_forward peak reserved:  fwd=6.225 GiB, fwd+bwd=10.225 GiB
production_forward fwd+bwd:  124.474 ms
production_forward fwd-only: 22.769 ms
production_forward bwd-only: 102.006 ms
production_forward peak allocated: fwd=2.192 GiB, fwd+bwd=6.071 GiB
production_forward peak reserved:  fwd=2.225 GiB, fwd+bwd=6.100 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0015884855529293418, max_abs=0.0390625
production_forward grad[0] vs paper_forward: mean_abs=0.008295292966067791, max_abs=0.34375, mean_rel=0.07369919121265411, max_rel=84.75749969482422, norm_rel=0.020120985805988312, ref_abs_avg=0.4435698091983795, test_abs_avg=0.44358110427856445
production_forward grad[1] vs paper_forward: mean_abs=7.079155445098877, max_abs=53.0, mean_rel=0.14909011125564575, max_rel=249.170166015625, norm_rel=0.020964985713362694, ref_abs_avg=302.96246337890625, test_abs_avg=302.8840637207031
production_forward grad[2] vs paper_forward: mean_abs=1.1657276153564453, max_abs=6.0, mean_rel=0.13671496510505676, max_rel=24.981182098388672, norm_rel=0.023680446669459343, ref_abs_avg=50.880393981933594, test_abs_avg=50.92145538330078
production_forward grad[3] vs paper_forward: mean_abs=1.5507068634033203, max_abs=10.0, mean_rel=0.15974195301532745, max_rel=999.5101318359375, norm_rel=0.0248278621584177, ref_abs_avg=62.89813995361328, test_abs_avg=62.90218734741211
production_forward grad[4] vs paper_forward: mean_abs=1.4318064451217651, max_abs=8.0, mean_rel=0.41132253408432007, max_rel=5249.99951171875, norm_rel=0.0231376551091671, ref_abs_avg=62.213104248046875, test_abs_avg=62.22469711303711
production_forward grad[5] vs paper_forward: mean_abs=1.0299057960510254, max_abs=4.375, mean_rel=0.06597226858139038, max_rel=2.798433780670166, norm_rel=0.021962970495224, ref_abs_avg=47.7823600769043, test_abs_avg=47.84791564941406
production_forward grad[6] vs paper_forward: mean_abs=1.3693277835845947, max_abs=9.0, mean_rel=0.163907989859581, max_rel=1185.3492431640625, norm_rel=0.024478232488036156, ref_abs_avg=56.361183166503906, test_abs_avg=56.36566925048828
production_forward grad[7] vs paper_forward: mean_abs=1.2629772424697876, max_abs=9.5, mean_rel=0.34790870547294617, max_rel=3874.999755859375, norm_rel=0.022803496569395065, ref_abs_avg=55.67506408691406, test_abs_avg=55.684844970703125
production_forward grad[8] vs paper_forward: mean_abs=0.9980781674385071, max_abs=3.625, mean_rel=2.422100305557251, max_rel=1190.95263671875, norm_rel=0.023159554228186607, ref_abs_avg=41.70429229736328, test_abs_avg=41.75518798828125
production_forward grad[9] vs paper_forward: mean_abs=1.2446842193603516, max_abs=9.5, mean_rel=0.15800753235816956, max_rel=1367.0625, norm_rel=0.024255570024251938, ref_abs_avg=51.61774826049805, test_abs_avg=51.61874771118164
production_forward grad[10] vs paper_forward: mean_abs=1.1350743770599365, max_abs=7.0, mean_rel=0.3483137786388397, max_rel=3312.499755859375, norm_rel=0.02244064398109913, ref_abs_avg=50.882110595703125, test_abs_avg=50.87993621826172
production_forward grad[11] vs paper_forward: mean_abs=0.9096806049346924, max_abs=3.5, mean_rel=0.18772350251674652, max_rel=44.05624008178711, norm_rel=0.02427668310701847, ref_abs_avg=38.153289794921875, test_abs_avg=38.16948318481445
production_forward grad[12] vs paper_forward: mean_abs=1.1468863487243652, max_abs=8.0, mean_rel=0.15792162716388702, max_rel=1111.2698974609375, norm_rel=0.024010544642806053, ref_abs_avg=48.06172180175781, test_abs_avg=48.06591796875
production_forward grad[13] vs paper_forward: mean_abs=1.0621087551116943, max_abs=6.5, mean_rel=0.37370896339416504, max_rel=3249.999755859375, norm_rel=0.022338788956403732, ref_abs_avg=47.745670318603516, test_abs_avg=47.75434875488281
production_forward grad[14] vs paper_forward: mean_abs=0.8281083106994629, max_abs=3.5, mean_rel=0.23233942687511444, max_rel=36.302371978759766, norm_rel=0.023147709667682648, ref_abs_avg=36.13105010986328, test_abs_avg=36.18876647949219
production_forward grad[15] vs paper_forward: mean_abs=1.0692914724349976, max_abs=9.25, mean_rel=0.16448667645454407, max_rel=1477.490966796875, norm_rel=0.02386217936873436, ref_abs_avg=45.069244384765625, test_abs_avg=45.06923294067383
production_forward grad[16] vs paper_forward: mean_abs=0.9910165667533875, max_abs=6.5, mean_rel=0.3079448342323303, max_rel=3437.499755859375, norm_rel=0.02246423065662384, ref_abs_avg=44.32792663574219, test_abs_avg=44.328041076660156
production_forward grad[17] vs paper_forward: mean_abs=0.7967877388000488, max_abs=3.5, mean_rel=0.1062498390674591, max_rel=8.197535514831543, norm_rel=0.023813951760530472, ref_abs_avg=33.41709899902344, test_abs_avg=33.455684661865234
production_forward grad[18] vs paper_forward: mean_abs=1.0100841522216797, max_abs=7.0, mean_rel=0.16092072427272797, max_rel=1100.8184814453125, norm_rel=0.023760110139846802, ref_abs_avg=42.73699188232422, test_abs_avg=42.738922119140625
production_forward grad[19] vs paper_forward: mean_abs=0.9301986694335938, max_abs=6.25, mean_rel=0.29324281215667725, max_rel=2874.999755859375, norm_rel=0.02200784534215927, ref_abs_avg=42.47849655151367, test_abs_avg=42.482913970947266
production_forward grad[20] vs paper_forward: mean_abs=0.7292603850364685, max_abs=3.375, mean_rel=0.49717962741851807, max_rel=214.18148803710938, norm_rel=0.020978054031729698, ref_abs_avg=34.94683837890625, test_abs_avg=34.920806884765625
production_forward grad[21] vs paper_forward: mean_abs=0.957912027835846, max_abs=7.0, mean_rel=0.15368854999542236, max_rel=1206.22265625, norm_rel=0.023549674078822136, ref_abs_avg=40.90215301513672, test_abs_avg=40.90165710449219
production_forward grad[22] vs paper_forward: mean_abs=0.8712919354438782, max_abs=6.0, mean_rel=0.28012222051620483, max_rel=5437.49951171875, norm_rel=0.02166539430618286, ref_abs_avg=40.4848747253418, test_abs_avg=40.47909927368164
production_forward grad[23] vs paper_forward: mean_abs=0.6993536949157715, max_abs=2.46875, mean_rel=0.13642188906669617, max_rel=27.693655014038086, norm_rel=0.0228477381169796, ref_abs_avg=29.988601684570312, test_abs_avg=30.0006160736084
production_forward grad[24] vs paper_forward: mean_abs=0.9029415249824524, max_abs=6.5, mean_rel=0.15298792719841003, max_rel=1060.7781982421875, norm_rel=0.02347424067556858, ref_abs_avg=38.689720153808594, test_abs_avg=38.69062805175781
production_forward grad[25] vs paper_forward: mean_abs=0.8302915096282959, max_abs=5.375, mean_rel=0.25293171405792236, max_rel=2125.0, norm_rel=0.021772492676973343, ref_abs_avg=38.26338195800781, test_abs_avg=38.264461517333984
production_forward grad[26] vs paper_forward: mean_abs=0.7959712147712708, max_abs=3.0, mean_rel=0.5884718894958496, max_rel=255.1057586669922, norm_rel=0.022618256509304047, ref_abs_avg=35.49198913574219, test_abs_avg=35.49894332885742
production_forward grad[27] vs paper_forward: mean_abs=1.039656400680542, max_abs=7.0, mean_rel=0.16813036799430847, max_rel=1259.4666748046875, norm_rel=0.025237075984477997, ref_abs_avg=41.434444427490234, test_abs_avg=41.43403244018555
production_forward grad[28] vs paper_forward: mean_abs=0.9669123291969299, max_abs=6.0, mean_rel=0.3678680658340454, max_rel=2656.249755859375, norm_rel=0.02368830144405365, ref_abs_avg=40.99825668334961, test_abs_avg=40.998023986816406
production_forward grad[29] vs paper_forward: mean_abs=0.7172480821609497, max_abs=3.0, mean_rel=0.32131242752075195, max_rel=100.08045959472656, norm_rel=0.02258642576634884, ref_abs_avg=31.349300384521484, test_abs_avg=31.339570999145508
production_forward grad[30] vs paper_forward: mean_abs=0.9669992923736572, max_abs=7.0, mean_rel=0.1776610016822815, max_rel=2319.224853515625, norm_rel=0.0254246536642313, ref_abs_avg=38.23497772216797, test_abs_avg=38.23162078857422
production_forward grad[31] vs paper_forward: mean_abs=0.9069280624389648, max_abs=6.0, mean_rel=0.2884242832660675, max_rel=2578.124755859375, norm_rel=0.02415328472852707, ref_abs_avg=37.73699188232422, test_abs_avg=37.73052978515625
production_forward grad[32] vs paper_forward: mean_abs=0.6904118657112122, max_abs=2.75, mean_rel=0.18875063955783844, max_rel=49.26773452758789, norm_rel=0.02383565716445446, ref_abs_avg=29.52902603149414, test_abs_avg=29.541881561279297
production_forward grad[33] vs paper_forward: mean_abs=0.9051080942153931, max_abs=6.0, mean_rel=0.15968677401542664, max_rel=845.8026733398438, norm_rel=0.025330038741230965, ref_abs_avg=35.92019271850586, test_abs_avg=35.91963577270508
production_forward grad[34] vs paper_forward: mean_abs=0.8459784984588623, max_abs=5.5, mean_rel=0.23595526814460754, max_rel=2375.0, norm_rel=0.024135647341609, ref_abs_avg=35.144676208496094, test_abs_avg=35.14396667480469
production_forward grad[35] vs paper_forward: mean_abs=0.6615982055664062, max_abs=2.375, mean_rel=0.37980982661247253, max_rel=51.534141540527344, norm_rel=0.023553811013698578, ref_abs_avg=27.986764907836914, test_abs_avg=28.004833221435547
production_forward grad[36] vs paper_forward: mean_abs=0.8498971462249756, max_abs=6.0, mean_rel=0.1594824194908142, max_rel=1123.8465576171875, norm_rel=0.025184601545333862, ref_abs_avg=33.90088653564453, test_abs_avg=33.90178680419922
production_forward grad[37] vs paper_forward: mean_abs=0.7888083457946777, max_abs=4.75, mean_rel=0.24609221518039703, max_rel=2187.5, norm_rel=0.023477673530578613, ref_abs_avg=33.695587158203125, test_abs_avg=33.69791793823242
production_forward grad[38] vs paper_forward: mean_abs=0.5863163471221924, max_abs=2.3125, mean_rel=0.08304569125175476, max_rel=8.6951322555542, norm_rel=0.022477606311440468, ref_abs_avg=26.343456268310547, test_abs_avg=26.34347152709961
production_forward grad[39] vs paper_forward: mean_abs=0.806178629398346, max_abs=5.5, mean_rel=0.15374450385570526, max_rel=1201.5228271484375, norm_rel=0.024948030710220337, ref_abs_avg=32.45726013183594, test_abs_avg=32.45917510986328
production_forward grad[40] vs paper_forward: mean_abs=0.7529951333999634, max_abs=5.375, mean_rel=0.28928816318511963, max_rel=2125.0, norm_rel=0.023522593080997467, ref_abs_avg=32.128570556640625, test_abs_avg=32.126319885253906
production_forward grad[41] vs paper_forward: mean_abs=0.6016460657119751, max_abs=2.34375, mean_rel=0.22771137952804565, max_rel=54.736732482910156, norm_rel=0.022794701159000397, ref_abs_avg=26.492977142333984, test_abs_avg=26.48883628845215
production_forward grad[42] vs paper_forward: mean_abs=0.7638840079307556, max_abs=5.5, mean_rel=0.1544790416955948, max_rel=1802.3084716796875, norm_rel=0.02452789805829525, ref_abs_avg=31.25214385986328, test_abs_avg=31.25341796875
production_forward grad[43] vs paper_forward: mean_abs=0.7143856287002563, max_abs=5.0, mean_rel=0.2954310178756714, max_rel=2624.999755859375, norm_rel=0.023180818185210228, ref_abs_avg=30.87991714477539, test_abs_avg=30.880643844604492
production_forward grad[44] vs paper_forward: mean_abs=0.5802441239356995, max_abs=2.125, mean_rel=0.29719704389572144, max_rel=92.40967559814453, norm_rel=0.023423712700605392, ref_abs_avg=24.37251091003418, test_abs_avg=24.406246185302734
production_forward grad[45] vs paper_forward: mean_abs=0.7283544540405273, max_abs=5.5, mean_rel=0.16061054170131683, max_rel=1188.904541015625, norm_rel=0.024353351444005966, ref_abs_avg=29.99771499633789, test_abs_avg=29.996475219726562
production_forward grad[46] vs paper_forward: mean_abs=0.674971878528595, max_abs=4.125, mean_rel=0.24833473563194275, max_rel=2031.2498779296875, norm_rel=0.02273617498576641, ref_abs_avg=29.694520950317383, test_abs_avg=29.68876075744629
production_forward grad[47] vs paper_forward: mean_abs=0.5425186157226562, max_abs=2.125, mean_rel=0.17145156860351562, max_rel=27.698904037475586, norm_rel=0.0231917854398489, ref_abs_avg=23.264705657958984, test_abs_avg=23.256160736083984
production_forward grad[48] vs paper_forward: mean_abs=0.695467472076416, max_abs=5.5, mean_rel=0.15702888369560242, max_rel=859.5733642578125, norm_rel=0.024206943809986115, ref_abs_avg=28.799495697021484, test_abs_avg=28.79946517944336
production_forward grad[49] vs paper_forward: mean_abs=0.6466957330703735, max_abs=4.25, mean_rel=0.2500818073749542, max_rel=1749.9998779296875, norm_rel=0.022659139707684517, ref_abs_avg=28.646867752075195, test_abs_avg=28.649883270263672
production_forward grad[50] vs paper_forward: mean_abs=0.5894870758056641, max_abs=2.25, mean_rel=0.08141770213842392, max_rel=5.379535675048828, norm_rel=0.023875175043940544, ref_abs_avg=24.734539031982422, test_abs_avg=24.728586196899414
production_forward grad[51] vs paper_forward: mean_abs=0.7771075367927551, max_abs=5.0, mean_rel=0.1759299337863922, max_rel=1583.47119140625, norm_rel=0.026054387912154198, ref_abs_avg=29.921646118164062, test_abs_avg=29.922983169555664
production_forward grad[52] vs paper_forward: mean_abs=0.7206545472145081, max_abs=5.25, mean_rel=0.27400243282318115, max_rel=1999.9998779296875, norm_rel=0.02485431358218193, ref_abs_avg=29.1123046875, test_abs_avg=29.118885040283203
production_forward grad[53] vs paper_forward: mean_abs=0.5667381286621094, max_abs=2.46875, mean_rel=0.10802970081567764, max_rel=8.43765640258789, norm_rel=0.024828214198350906, ref_abs_avg=22.65166473388672, test_abs_avg=22.680072784423828
production_forward grad[54] vs paper_forward: mean_abs=0.7006261348724365, max_abs=4.75, mean_rel=0.1605292558670044, max_rel=1097.6595458984375, norm_rel=0.025688033550977707, ref_abs_avg=27.361656188964844, test_abs_avg=27.36100959777832
production_forward grad[55] vs paper_forward: mean_abs=0.6573499441146851, max_abs=5.0, mean_rel=0.2732386291027069, max_rel=1812.4998779296875, norm_rel=0.024362772703170776, ref_abs_avg=27.02971649169922, test_abs_avg=27.032737731933594
production_forward grad[56] vs paper_forward: mean_abs=0.509010910987854, max_abs=2.25, mean_rel=0.14147910475730896, max_rel=9.506891250610352, norm_rel=0.025424709543585777, ref_abs_avg=20.007184982299805, test_abs_avg=20.02527618408203
production_forward grad[57] vs paper_forward: mean_abs=0.6556500792503357, max_abs=5.0, mean_rel=0.16367128491401672, max_rel=1030.518798828125, norm_rel=0.025078408420085907, ref_abs_avg=26.196563720703125, test_abs_avg=26.195354461669922
production_forward grad[58] vs paper_forward: mean_abs=0.609128475189209, max_abs=5.0, mean_rel=0.28109103441238403, max_rel=1999.9998779296875, norm_rel=0.02362126111984253, ref_abs_avg=25.812030792236328, test_abs_avg=25.815603256225586
production_forward grad[59] vs paper_forward: mean_abs=0.5191726684570312, max_abs=2.375, mean_rel=0.07636978477239609, max_rel=5.833463191986084, norm_rel=0.024492155760526657, ref_abs_avg=21.51241683959961, test_abs_avg=21.49408721923828
production_forward grad[60] vs paper_forward: mean_abs=0.6183673739433289, max_abs=5.0, mean_rel=0.1526329219341278, max_rel=918.520263671875, norm_rel=0.024667473509907722, ref_abs_avg=25.125457763671875, test_abs_avg=25.125778198242188
production_forward grad[61] vs paper_forward: mean_abs=0.5681890249252319, max_abs=4.5, mean_rel=0.2276793122291565, max_rel=2375.0, norm_rel=0.02333579957485199, ref_abs_avg=24.41583251953125, test_abs_avg=24.414165496826172
production_forward grad[62] vs paper_forward: mean_abs=0.4352583885192871, max_abs=1.65625, mean_rel=0.12357918918132782, max_rel=18.652795791625977, norm_rel=0.02197817526757717, ref_abs_avg=20.36512565612793, test_abs_avg=20.37527084350586
production_forward grad[63] vs paper_forward: mean_abs=0.5808057188987732, max_abs=4.5, mean_rel=0.15750721096992493, max_rel=1191.98388671875, norm_rel=0.02420988865196705, ref_abs_avg=23.995847702026367, test_abs_avg=23.99648094177246
production_forward grad[64] vs paper_forward: mean_abs=0.5322829484939575, max_abs=4.375, mean_rel=0.24867551028728485, max_rel=1624.9998779296875, norm_rel=0.02286449633538723, ref_abs_avg=23.278202056884766, test_abs_avg=23.28339385986328
production_forward grad[65] vs paper_forward: mean_abs=0.4331246614456177, max_abs=1.5, mean_rel=0.06596534699201584, max_rel=1.420517921447754, norm_rel=0.02296900935471058, ref_abs_avg=18.863739013671875, test_abs_avg=18.886363983154297
production_forward grad[66] vs paper_forward: mean_abs=0.5462549924850464, max_abs=4.5, mean_rel=0.15120291709899902, max_rel=1364.773681640625, norm_rel=0.023798463866114616, ref_abs_avg=22.999900817871094, test_abs_avg=22.997943878173828
production_forward grad[67] vs paper_forward: mean_abs=0.5055403113365173, max_abs=4.0, mean_rel=0.2355487048625946, max_rel=1718.7498779296875, norm_rel=0.02225707843899727, ref_abs_avg=22.71340560913086, test_abs_avg=22.71158218383789
production_forward grad[68] vs paper_forward: mean_abs=0.4253370761871338, max_abs=1.375, mean_rel=0.12344608455896378, max_rel=11.457369804382324, norm_rel=0.022062314674258232, ref_abs_avg=18.962921142578125, test_abs_avg=18.946426391601562
production_forward grad[69] vs paper_forward: mean_abs=0.5233749151229858, max_abs=4.5, mean_rel=0.14450618624687195, max_rel=634.4127197265625, norm_rel=0.023581603541970253, ref_abs_avg=22.214208602905273, test_abs_avg=22.211528778076172
production_forward grad[70] vs paper_forward: mean_abs=0.48444032669067383, max_abs=4.0, mean_rel=0.21575108170509338, max_rel=1484.3748779296875, norm_rel=0.021854136139154434, ref_abs_avg=22.122600555419922, test_abs_avg=22.1165714263916
production_forward grad[71] vs paper_forward: mean_abs=0.40186119079589844, max_abs=1.5, mean_rel=0.1343262791633606, max_rel=29.076738357543945, norm_rel=0.022222749888896942, ref_abs_avg=18.3199462890625, test_abs_avg=18.2808837890625
production_forward grad[72] vs paper_forward: mean_abs=0.49846023321151733, max_abs=4.125, mean_rel=0.149729922413826, max_rel=727.5308227539062, norm_rel=0.02308318391442299, ref_abs_avg=21.618694305419922, test_abs_avg=21.617902755737305
production_forward grad[73] vs paper_forward: mean_abs=0.46139711141586304, max_abs=4.25, mean_rel=0.19750115275382996, max_rel=1062.5, norm_rel=0.021554121747612953, ref_abs_avg=21.377172470092773, test_abs_avg=21.382965087890625
production_forward grad[74] vs paper_forward: mean_abs=0.4478015899658203, max_abs=1.75, mean_rel=0.200907364487648, max_rel=44.2147331237793, norm_rel=0.02458152547478676, ref_abs_avg=18.011995315551758, test_abs_avg=18.022138595581055
production_forward grad[75] vs paper_forward: mean_abs=0.5519197583198547, max_abs=5.5, mean_rel=0.15899668633937836, max_rel=1211.07470703125, norm_rel=0.024857172742486, ref_abs_avg=22.28485107421875, test_abs_avg=22.28519058227539
production_forward grad[76] vs paper_forward: mean_abs=0.5067760348320007, max_abs=4.0, mean_rel=0.2511552572250366, max_rel=1531.2498779296875, norm_rel=0.023175852373242378, ref_abs_avg=21.94292640686035, test_abs_avg=21.950307846069336
production_forward grad[77] vs paper_forward: mean_abs=0.4030303955078125, max_abs=1.875, mean_rel=0.05974293500185013, max_rel=2.143944025039673, norm_rel=0.023172598332166672, ref_abs_avg=17.437042236328125, test_abs_avg=17.488483428955078
production_forward grad[78] vs paper_forward: mean_abs=0.5144606232643127, max_abs=4.5, mean_rel=0.15760943293571472, max_rel=730.6946411132812, norm_rel=0.024181509390473366, ref_abs_avg=21.324974060058594, test_abs_avg=21.3262996673584
production_forward grad[79] vs paper_forward: mean_abs=0.47265759110450745, max_abs=4.0, mean_rel=0.2033841907978058, max_rel=2093.75, norm_rel=0.02250535972416401, ref_abs_avg=21.033740997314453, test_abs_avg=21.034313201904297
production_forward grad[80] vs paper_forward: mean_abs=0.36406826972961426, max_abs=1.4375, mean_rel=0.11637569963932037, max_rel=10.13036060333252, norm_rel=0.023562293499708176, ref_abs_avg=15.63050651550293, test_abs_avg=15.617064476013184
production_forward grad[81] vs paper_forward: mean_abs=0.47740089893341064, max_abs=4.0, mean_rel=0.14998291432857513, max_rel=762.083740234375, norm_rel=0.023531729355454445, ref_abs_avg=20.319713592529297, test_abs_avg=20.31964683532715
production_forward grad[82] vs paper_forward: mean_abs=0.4390020966529846, max_abs=4.0, mean_rel=0.1998189389705658, max_rel=1499.9998779296875, norm_rel=0.02198079228401184, ref_abs_avg=19.98800277709961, test_abs_avg=19.991249084472656
production_forward grad[83] vs paper_forward: mean_abs=0.3503713607788086, max_abs=1.3125, mean_rel=0.12654893100261688, max_rel=16.740896224975586, norm_rel=0.022590946406126022, ref_abs_avg=15.301029205322266, test_abs_avg=15.283998489379883
production_forward grad[84] vs paper_forward: mean_abs=0.43715333938598633, max_abs=4.25, mean_rel=0.1406836360692978, max_rel=896.6620483398438, norm_rel=0.022857943549752235, ref_abs_avg=19.220355987548828, test_abs_avg=19.220149993896484
production_forward grad[85] vs paper_forward: mean_abs=0.4029553234577179, max_abs=3.5, mean_rel=0.19344811141490936, max_rel=1374.9998779296875, norm_rel=0.02118932269513607, ref_abs_avg=19.050106048583984, test_abs_avg=19.050966262817383
production_forward grad[86] vs paper_forward: mean_abs=0.3306032419204712, max_abs=1.25, mean_rel=0.14855694770812988, max_rel=14.779600143432617, norm_rel=0.021118858829140663, ref_abs_avg=15.540349006652832, test_abs_avg=15.54596996307373
production_forward grad[87] vs paper_forward: mean_abs=0.41547486186027527, max_abs=4.0, mean_rel=0.12933313846588135, max_rel=562.8778686523438, norm_rel=0.022281739860773087, ref_abs_avg=18.754602432250977, test_abs_avg=18.754650115966797
production_forward grad[88] vs paper_forward: mean_abs=0.3782564401626587, max_abs=4.25, mean_rel=0.19057884812355042, max_rel=1281.25, norm_rel=0.020145758986473083, ref_abs_avg=18.897640228271484, test_abs_avg=18.892440795898438
production_forward grad[89] vs paper_forward: mean_abs=0.3210945129394531, max_abs=1.25, mean_rel=0.1159241572022438, max_rel=8.838383674621582, norm_rel=0.022024191915988922, ref_abs_avg=14.594757080078125, test_abs_avg=14.596855163574219
production_forward grad[90] vs paper_forward: mean_abs=0.40379244089126587, max_abs=4.0, mean_rel=0.13144786655902863, max_rel=925.1610717773438, norm_rel=0.021786317229270935, ref_abs_avg=18.700483322143555, test_abs_avg=18.699844360351562
production_forward grad[91] vs paper_forward: mean_abs=0.3565223515033722, max_abs=4.5, mean_rel=0.16867822408676147, max_rel=937.4999389648438, norm_rel=0.020181797444820404, ref_abs_avg=17.779239654541016, test_abs_avg=17.776897430419922
production_forward grad[92] vs paper_forward: mean_abs=0.28823816776275635, max_abs=1.3125, mean_rel=0.07237069308757782, max_rel=1.5933923721313477, norm_rel=0.0191034022718668, ref_abs_avg=14.862128257751465, test_abs_avg=14.875140190124512
production_forward grad[93] vs paper_forward: mean_abs=0.36745911836624146, max_abs=5.5, mean_rel=0.12647368013858795, max_rel=742.4666748046875, norm_rel=0.021151652559638023, ref_abs_avg=17.61752700805664, test_abs_avg=17.61819839477539
production_forward grad[94] vs paper_forward: mean_abs=0.33280545473098755, max_abs=4.0, mean_rel=0.1497732400894165, max_rel=781.2499389648438, norm_rel=0.01880253665149212, ref_abs_avg=17.80240249633789, test_abs_avg=17.803367614746094
production_forward grad[95] vs paper_forward: mean_abs=0.26389074325561523, max_abs=1.25, mean_rel=0.0812288150191307, max_rel=6.621312618255615, norm_rel=0.018481090664863586, ref_abs_avg=14.256451606750488, test_abs_avg=14.245870590209961
production_forward grad[96] vs paper_forward: mean_abs=0.34643107652664185, max_abs=4.0, mean_rel=0.12324550747871399, max_rel=585.5897216796875, norm_rel=0.020732270553708076, ref_abs_avg=16.994476318359375, test_abs_avg=16.994918823242188
production_forward grad[97] vs paper_forward: mean_abs=0.31490394473075867, max_abs=3.5, mean_rel=0.1547372043132782, max_rel=1125.0, norm_rel=0.01876204088330269, ref_abs_avg=16.966373443603516, test_abs_avg=16.969791412353516
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0015932887326925993, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.00831769872456789, max_abs=0.34375, mean_rel=0.07380838692188263, max_rel=105.25938415527344, norm_rel=0.02016628161072731, ref_abs_avg=0.4435698091983795, test_abs_avg=0.44356590509414673
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.025076866149902, max_abs=48.0, mean_rel=0.16438081860542297, max_rel=425.8399963378906, norm_rel=0.020698802545666695, ref_abs_avg=302.96246337890625, test_abs_avg=302.9163818359375
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.1380176544189453, max_abs=5.5, mean_rel=0.10940945148468018, max_rel=22.72299575805664, norm_rel=0.023184508085250854, ref_abs_avg=50.880393981933594, test_abs_avg=50.91626739501953
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.551417350769043, max_abs=10.0, mean_rel=0.16056591272354126, max_rel=1263.1260986328125, norm_rel=0.02483144775032997, ref_abs_avg=62.89813995361328, test_abs_avg=62.901023864746094
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.4355791807174683, max_abs=9.0, mean_rel=0.4189189374446869, max_rel=4875.0, norm_rel=0.023197349160909653, ref_abs_avg=62.213104248046875, test_abs_avg=62.222652435302734
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.0002098083496094, max_abs=4.125, mean_rel=0.0712130218744278, max_rel=3.7523322105407715, norm_rel=0.021506059914827347, ref_abs_avg=47.7823600769043, test_abs_avg=47.855743408203125
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.3733665943145752, max_abs=9.0, mean_rel=0.1607314646244049, max_rel=1033.7054443359375, norm_rel=0.02453172206878662, ref_abs_avg=56.361183166503906, test_abs_avg=56.36466979980469
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.266129970550537, max_abs=8.0, mean_rel=0.350603312253952, max_rel=3562.499755859375, norm_rel=0.02286975458264351, ref_abs_avg=55.67506408691406, test_abs_avg=55.67848587036133
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.9814448356628418, max_abs=4.25, mean_rel=2.8662712574005127, max_rel=1416.4317626953125, norm_rel=0.023610416799783707, ref_abs_avg=41.70429229736328, test_abs_avg=41.703155517578125
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.250828742980957, max_abs=9.0, mean_rel=0.15926101803779602, max_rel=1434.2972412109375, norm_rel=0.02439476177096367, ref_abs_avg=51.61774826049805, test_abs_avg=51.616424560546875
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.1432301998138428, max_abs=7.0, mean_rel=0.3512888550758362, max_rel=3124.999755859375, norm_rel=0.022628789767622948, ref_abs_avg=50.882110595703125, test_abs_avg=50.87586212158203
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9073069095611572, max_abs=3.125, mean_rel=0.16438347101211548, max_rel=39.46826171875, norm_rel=0.023919086903333664, ref_abs_avg=38.153289794921875, test_abs_avg=38.158485412597656
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.1527830362319946, max_abs=8.0, mean_rel=0.16155414283275604, max_rel=1564.341796875, norm_rel=0.02411901019513607, ref_abs_avg=48.06172180175781, test_abs_avg=48.061607360839844
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.065746784210205, max_abs=6.0, mean_rel=0.35237354040145874, max_rel=4375.0, norm_rel=0.022403886541724205, ref_abs_avg=47.745670318603516, test_abs_avg=47.749351501464844
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.808300256729126, max_abs=3.5, mean_rel=0.2998211681842804, max_rel=68.94792938232422, norm_rel=0.022394172847270966, ref_abs_avg=36.13105010986328, test_abs_avg=36.17314147949219
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.073876976966858, max_abs=7.0, mean_rel=0.17250488698482513, max_rel=1625.0252685546875, norm_rel=0.023982111364603043, ref_abs_avg=45.069244384765625, test_abs_avg=45.066261291503906
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=0.9960794448852539, max_abs=6.5, mean_rel=0.3312554955482483, max_rel=2937.499755859375, norm_rel=0.022583279758691788, ref_abs_avg=44.32792663574219, test_abs_avg=44.3289794921875
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.7869353294372559, max_abs=3.0, mean_rel=0.1319560408592224, max_rel=10.066258430480957, norm_rel=0.02384481206536293, ref_abs_avg=33.41709899902344, test_abs_avg=33.46241760253906
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.013784646987915, max_abs=6.125, mean_rel=0.162038654088974, max_rel=1115.799560546875, norm_rel=0.023834899067878723, ref_abs_avg=42.73699188232422, test_abs_avg=42.73826599121094
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9335375428199768, max_abs=6.75, mean_rel=0.2825208902359009, max_rel=2375.0, norm_rel=0.02209404669702053, ref_abs_avg=42.47849655151367, test_abs_avg=42.48003005981445
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7059211134910583, max_abs=2.8046875, mean_rel=0.7120339870452881, max_rel=327.8162536621094, norm_rel=0.020615898072719574, ref_abs_avg=34.94683837890625, test_abs_avg=34.926361083984375
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9609125256538391, max_abs=6.25, mean_rel=0.15323151648044586, max_rel=1955.23388671875, norm_rel=0.023609332740306854, ref_abs_avg=40.90215301513672, test_abs_avg=40.90052795410156
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.8784412145614624, max_abs=6.0, mean_rel=0.2767171263694763, max_rel=3499.999755859375, norm_rel=0.02181297540664673, ref_abs_avg=40.4848747253418, test_abs_avg=40.478843688964844
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7082648277282715, max_abs=2.9375, mean_rel=0.20371299982070923, max_rel=63.874717712402344, norm_rel=0.023536324501037598, ref_abs_avg=29.988601684570312, test_abs_avg=29.974111557006836
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9055487513542175, max_abs=6.0, mean_rel=0.1563502848148346, max_rel=1011.94873046875, norm_rel=0.023552458733320236, ref_abs_avg=38.689720153808594, test_abs_avg=38.690635681152344
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8310635089874268, max_abs=5.1875, mean_rel=0.2621793746948242, max_rel=2874.999755859375, norm_rel=0.02179219201207161, ref_abs_avg=38.26338195800781, test_abs_avg=38.26186752319336
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8031986355781555, max_abs=3.25, mean_rel=1.4589695930480957, max_rel=698.76953125, norm_rel=0.02272392436861992, ref_abs_avg=35.49198913574219, test_abs_avg=35.477577209472656
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.0396078824996948, max_abs=7.0, mean_rel=0.16316016018390656, max_rel=927.2544555664062, norm_rel=0.025219202041625977, ref_abs_avg=41.434444427490234, test_abs_avg=41.43352508544922
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.968964159488678, max_abs=6.0, mean_rel=0.34980517625808716, max_rel=3124.999755859375, norm_rel=0.02372286096215248, ref_abs_avg=40.99825668334961, test_abs_avg=40.99718475341797
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.7210394144058228, max_abs=3.046875, mean_rel=0.5370665192604065, max_rel=208.12916564941406, norm_rel=0.02286970242857933, ref_abs_avg=31.349300384521484, test_abs_avg=31.31534767150879
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.9684524536132812, max_abs=7.0, mean_rel=0.16997568309307098, max_rel=1704.8565673828125, norm_rel=0.025470778346061707, ref_abs_avg=38.23497772216797, test_abs_avg=38.23110580444336
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9100433588027954, max_abs=6.0, mean_rel=0.280594140291214, max_rel=2562.5, norm_rel=0.024234062060713768, ref_abs_avg=37.73699188232422, test_abs_avg=37.734458923339844
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7005718350410461, max_abs=2.5, mean_rel=0.1981629729270935, max_rel=52.263038635253906, norm_rel=0.02371852472424507, ref_abs_avg=29.52902603149414, test_abs_avg=29.535837173461914
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9080345034599304, max_abs=8.0, mean_rel=0.15964782238006592, max_rel=702.6607666015625, norm_rel=0.02539646066725254, ref_abs_avg=35.92019271850586, test_abs_avg=35.91862106323242
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.845973014831543, max_abs=5.59375, mean_rel=0.22767063975334167, max_rel=2593.749755859375, norm_rel=0.02412918396294117, ref_abs_avg=35.144676208496094, test_abs_avg=35.1459846496582
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.6527328491210938, max_abs=2.296875, mean_rel=0.3438340127468109, max_rel=57.54859924316406, norm_rel=0.023052772507071495, ref_abs_avg=27.986764907836914, test_abs_avg=28.000499725341797
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.851082980632782, max_abs=6.5, mean_rel=0.15783020853996277, max_rel=1046.8232421875, norm_rel=0.025229476392269135, ref_abs_avg=33.90088653564453, test_abs_avg=33.89937973022461
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.793590784072876, max_abs=5.0, mean_rel=0.25680649280548096, max_rel=1999.9998779296875, norm_rel=0.023595694452524185, ref_abs_avg=33.695587158203125, test_abs_avg=33.696128845214844
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.565180778503418, max_abs=2.5, mean_rel=0.12219428271055222, max_rel=25.692880630493164, norm_rel=0.021803176030516624, ref_abs_avg=26.343456268310547, test_abs_avg=26.332059860229492
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8076756596565247, max_abs=6.0, mean_rel=0.15457138419151306, max_rel=958.5037231445312, norm_rel=0.024976322427392006, ref_abs_avg=32.45726013183594, test_abs_avg=32.45948028564453
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7539597749710083, max_abs=5.75, mean_rel=0.2601245641708374, max_rel=2125.0, norm_rel=0.023561133071780205, ref_abs_avg=32.128570556640625, test_abs_avg=32.12549591064453
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6274391412734985, max_abs=2.6875, mean_rel=0.17979153990745544, max_rel=29.343461990356445, norm_rel=0.023691507056355476, ref_abs_avg=26.492977142333984, test_abs_avg=26.49407958984375
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7664014101028442, max_abs=5.5, mean_rel=0.15823602676391602, max_rel=1578.3924560546875, norm_rel=0.02459201216697693, ref_abs_avg=31.25214385986328, test_abs_avg=31.253555297851562
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7161441445350647, max_abs=5.0, mean_rel=0.30101263523101807, max_rel=2531.25, norm_rel=0.023225149139761925, ref_abs_avg=30.87991714477539, test_abs_avg=30.881296157836914
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5536217093467712, max_abs=2.375, mean_rel=0.23531880974769592, max_rel=56.845218658447266, norm_rel=0.022954916581511497, ref_abs_avg=24.37251091003418, test_abs_avg=24.41655731201172
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.730520486831665, max_abs=5.25, mean_rel=0.1607801616191864, max_rel=1328.775146484375, norm_rel=0.02442949078977108, ref_abs_avg=29.99771499633789, test_abs_avg=29.996185302734375
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.6759856939315796, max_abs=4.5, mean_rel=0.2520596385002136, max_rel=2437.5, norm_rel=0.022776726633310318, ref_abs_avg=29.694520950317383, test_abs_avg=29.687782287597656
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5258145332336426, max_abs=2.140625, mean_rel=0.1451626420021057, max_rel=25.272924423217773, norm_rel=0.02270854450762272, ref_abs_avg=23.264705657958984, test_abs_avg=23.253469467163086
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.696740984916687, max_abs=5.0, mean_rel=0.158881276845932, max_rel=957.7858276367188, norm_rel=0.02426263689994812, ref_abs_avg=28.799495697021484, test_abs_avg=28.798763275146484
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6494436860084534, max_abs=4.03125, mean_rel=0.25653910636901855, max_rel=1874.9998779296875, norm_rel=0.02274964191019535, ref_abs_avg=28.646867752075195, test_abs_avg=28.64717674255371
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.5826401114463806, max_abs=2.0, mean_rel=0.07821646332740784, max_rel=3.6259965896606445, norm_rel=0.023508861660957336, ref_abs_avg=24.734539031982422, test_abs_avg=24.729637145996094
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.7758520841598511, max_abs=5.5, mean_rel=0.17630591988563538, max_rel=1645.3173828125, norm_rel=0.02600637450814247, ref_abs_avg=29.921646118164062, test_abs_avg=29.921676635742188
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7199006676673889, max_abs=5.25, mean_rel=0.272361695766449, max_rel=2046.8748779296875, norm_rel=0.024839304387569427, ref_abs_avg=29.1123046875, test_abs_avg=29.116619110107422
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5738674402236938, max_abs=1.7734375, mean_rel=0.10102207958698273, max_rel=5.94005823135376, norm_rel=0.024592667818069458, ref_abs_avg=22.65166473388672, test_abs_avg=22.67363739013672
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7014187574386597, max_abs=4.5, mean_rel=0.16265082359313965, max_rel=1409.9720458984375, norm_rel=0.025719184428453445, ref_abs_avg=27.361656188964844, test_abs_avg=27.361248016357422
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.658440113067627, max_abs=4.5, mean_rel=0.26360830664634705, max_rel=1874.9998779296875, norm_rel=0.02439943328499794, ref_abs_avg=27.02971649169922, test_abs_avg=27.029186248779297
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5014019012451172, max_abs=1.875, mean_rel=0.13392360508441925, max_rel=11.720702171325684, norm_rel=0.024627255275845528, ref_abs_avg=20.007184982299805, test_abs_avg=20.005823135375977
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.657233715057373, max_abs=5.0, mean_rel=0.1638135462999344, max_rel=1064.149658203125, norm_rel=0.02514726109802723, ref_abs_avg=26.196563720703125, test_abs_avg=26.195140838623047
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6109797954559326, max_abs=5.0, mean_rel=0.26557084918022156, max_rel=2468.75, norm_rel=0.023701829835772514, ref_abs_avg=25.812030792236328, test_abs_avg=25.816789627075195
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5271625518798828, max_abs=1.8125, mean_rel=0.0815344899892807, max_rel=8.330732345581055, norm_rel=0.024885937571525574, ref_abs_avg=21.51241683959961, test_abs_avg=21.49110984802246
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6203693151473999, max_abs=5.0, mean_rel=0.15577644109725952, max_rel=992.3642578125, norm_rel=0.024747036397457123, ref_abs_avg=25.125457763671875, test_abs_avg=25.124221801757812
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.5693100690841675, max_abs=5.0, mean_rel=0.22890311479568481, max_rel=1749.9998779296875, norm_rel=0.023378990590572357, ref_abs_avg=24.41583251953125, test_abs_avg=24.415279388427734
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4427461624145508, max_abs=2.0, mean_rel=0.1434139609336853, max_rel=23.59184455871582, norm_rel=0.02240637131035328, ref_abs_avg=20.36512565612793, test_abs_avg=20.3714542388916
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.5822298526763916, max_abs=5.0, mean_rel=0.15640999376773834, max_rel=763.755615234375, norm_rel=0.024277927353978157, ref_abs_avg=23.995847702026367, test_abs_avg=23.996788024902344
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5328131914138794, max_abs=4.125, mean_rel=0.23125958442687988, max_rel=1562.4998779296875, norm_rel=0.02287660725414753, ref_abs_avg=23.278202056884766, test_abs_avg=23.278667449951172
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4356861114501953, max_abs=1.5625, mean_rel=0.06829769164323807, max_rel=2.4823455810546875, norm_rel=0.02338753454387188, ref_abs_avg=18.863739013671875, test_abs_avg=18.88848876953125
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5484225749969482, max_abs=4.15625, mean_rel=0.15335991978645325, max_rel=1253.3724365234375, norm_rel=0.02388676442205906, ref_abs_avg=22.999900817871094, test_abs_avg=22.99846649169922
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5074203610420227, max_abs=4.25, mean_rel=0.2363419532775879, max_rel=1812.4998779296875, norm_rel=0.022356998175382614, ref_abs_avg=22.71340560913086, test_abs_avg=22.711387634277344
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.4116063117980957, max_abs=1.5, mean_rel=0.12719663977622986, max_rel=14.613669395446777, norm_rel=0.021725039929151535, ref_abs_avg=18.962921142578125, test_abs_avg=18.94460678100586
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5255569815635681, max_abs=4.5, mean_rel=0.14672017097473145, max_rel=661.9743041992188, norm_rel=0.023667754605412483, ref_abs_avg=22.214208602905273, test_abs_avg=22.211254119873047
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.48808595538139343, max_abs=3.8125, mean_rel=0.22287724912166595, max_rel=1640.6248779296875, norm_rel=0.022016074508428574, ref_abs_avg=22.122600555419922, test_abs_avg=22.115371704101562
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.3938894271850586, max_abs=1.375, mean_rel=0.13646090030670166, max_rel=32.67386245727539, norm_rel=0.02188323251903057, ref_abs_avg=18.3199462890625, test_abs_avg=18.296112060546875
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5004763603210449, max_abs=4.0, mean_rel=0.15230783820152283, max_rel=759.1484375, norm_rel=0.023170197382569313, ref_abs_avg=21.618694305419922, test_abs_avg=21.616806030273438
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.46585455536842346, max_abs=4.125, mean_rel=0.19875100255012512, max_rel=1218.75, norm_rel=0.021768003702163696, ref_abs_avg=21.377172470092773, test_abs_avg=21.38384246826172
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.43032360076904297, max_abs=1.703125, mean_rel=0.17254987359046936, max_rel=31.184850692749023, norm_rel=0.023779449984431267, ref_abs_avg=18.011995315551758, test_abs_avg=18.03130340576172
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.550473690032959, max_abs=5.25, mean_rel=0.15801295638084412, max_rel=1066.9287109375, norm_rel=0.02477746643126011, ref_abs_avg=22.28485107421875, test_abs_avg=22.284099578857422
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5062907934188843, max_abs=4.5, mean_rel=0.25253826379776, max_rel=1468.7498779296875, norm_rel=0.023169413208961487, ref_abs_avg=21.94292640686035, test_abs_avg=21.94974136352539
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.3821132183074951, max_abs=1.875, mean_rel=0.06218692660331726, max_rel=2.2764997482299805, norm_rel=0.022086339071393013, ref_abs_avg=17.437042236328125, test_abs_avg=17.472095489501953
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5145856142044067, max_abs=4.0, mean_rel=0.15591168403625488, max_rel=755.6962280273438, norm_rel=0.024201788008213043, ref_abs_avg=21.324974060058594, test_abs_avg=21.32583999633789
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.4719638526439667, max_abs=4.0, mean_rel=0.20459222793579102, max_rel=1843.7498779296875, norm_rel=0.022478116676211357, ref_abs_avg=21.033740997314453, test_abs_avg=21.034278869628906
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.36648133397102356, max_abs=1.5, mean_rel=0.12153427302837372, max_rel=10.465094566345215, norm_rel=0.023799514397978783, ref_abs_avg=15.63050651550293, test_abs_avg=15.646275520324707
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.4782480299472809, max_abs=4.435676574707031, mean_rel=0.15057089924812317, max_rel=743.1292724609375, norm_rel=0.023550083860754967, ref_abs_avg=20.319713592529297, test_abs_avg=20.320159912109375
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4391661286354065, max_abs=3.75, mean_rel=0.19695162773132324, max_rel=1843.7498779296875, norm_rel=0.022013137117028236, ref_abs_avg=19.98800277709961, test_abs_avg=19.991836547851562
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3605306148529053, max_abs=1.4375, mean_rel=0.17912207543849945, max_rel=50.10548400878906, norm_rel=0.02333734557032585, ref_abs_avg=15.301029205322266, test_abs_avg=15.289692878723145
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.438689649105072, max_abs=4.5, mean_rel=0.1416909396648407, max_rel=901.9384155273438, norm_rel=0.022925643250346184, ref_abs_avg=19.220355987548828, test_abs_avg=19.220178604125977
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.40326887369155884, max_abs=3.9375, mean_rel=0.18365181982517242, max_rel=1437.4998779296875, norm_rel=0.02122960053384304, ref_abs_avg=19.050106048583984, test_abs_avg=19.049461364746094
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.33699584007263184, max_abs=1.2568359375, mean_rel=0.19296593964099884, max_rel=34.85035705566406, norm_rel=0.021488357335329056, ref_abs_avg=15.540349006652832, test_abs_avg=15.552149772644043
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.41665104031562805, max_abs=4.25, mean_rel=0.12782901525497437, max_rel=510.3043212890625, norm_rel=0.022328604012727737, ref_abs_avg=18.754602432250977, test_abs_avg=18.75503921508789
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.3798263967037201, max_abs=4.5, mean_rel=0.18606987595558167, max_rel=1062.5, norm_rel=0.020225608721375465, ref_abs_avg=18.897640228271484, test_abs_avg=18.88855743408203
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.33002495765686035, max_abs=1.25, mean_rel=0.1367063820362091, max_rel=20.268474578857422, norm_rel=0.022393686696887016, ref_abs_avg=14.594757080078125, test_abs_avg=14.60243034362793
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.4044516384601593, max_abs=4.0, mean_rel=0.13313479721546173, max_rel=709.7243041992188, norm_rel=0.021818222478032112, ref_abs_avg=18.700483322143555, test_abs_avg=18.699731826782227
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.3589654564857483, max_abs=3.75, mean_rel=0.170536607503891, max_rel=1312.4998779296875, norm_rel=0.02033485658466816, ref_abs_avg=17.779239654541016, test_abs_avg=17.775760650634766
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.27712559700012207, max_abs=1.1875, mean_rel=0.07602567970752716, max_rel=2.362720012664795, norm_rel=0.018237100914120674, ref_abs_avg=14.862128257751465, test_abs_avg=14.876054763793945
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.3685048222541809, max_abs=5.5, mean_rel=0.12719596922397614, max_rel=699.9559326171875, norm_rel=0.021209876984357834, ref_abs_avg=17.61752700805664, test_abs_avg=17.618324279785156
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3329051733016968, max_abs=4.0, mean_rel=0.1532495766878128, max_rel=890.6249389648438, norm_rel=0.018817828968167305, ref_abs_avg=17.80240249633789, test_abs_avg=17.80097770690918
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.2686910629272461, max_abs=1.234375, mean_rel=0.07678426802158356, max_rel=4.985458850860596, norm_rel=0.0187954381108284, ref_abs_avg=14.256451606750488, test_abs_avg=14.242576599121094
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.34747961163520813, max_abs=4.5, mean_rel=0.12006505578756332, max_rel=412.48382568359375, norm_rel=0.02078990451991558, ref_abs_avg=16.994476318359375, test_abs_avg=16.994895935058594
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.313196063041687, max_abs=3.625, mean_rel=0.15214741230010986, max_rel=832.0311889648438, norm_rel=0.018604684621095657, ref_abs_avg=16.966373443603516, test_abs_avg=16.971149444580078
production_forward2 vs paper_forward output: mean_abs=0.0015884855529293418, max_abs=0.0390625
production_forward2 grad[0] vs paper_forward: mean_abs=0.008303320035338402, max_abs=0.34375, mean_rel=0.07367774844169617, max_rel=105.25938415527344, norm_rel=0.020132210105657578, ref_abs_avg=0.4435698091983795, test_abs_avg=0.4435667395591736
production_forward2 grad[1] vs paper_forward: mean_abs=7.027438163757324, max_abs=53.0, mean_rel=0.16097348928451538, max_rel=495.207275390625, norm_rel=0.020806878805160522, ref_abs_avg=302.96246337890625, test_abs_avg=302.8829650878906
production_forward2 grad[2] vs paper_forward: mean_abs=1.1223726272583008, max_abs=5.5, mean_rel=0.1162869781255722, max_rel=24.792999267578125, norm_rel=0.022932475432753563, ref_abs_avg=50.880393981933594, test_abs_avg=50.87944412231445
production_forward2 grad[3] vs paper_forward: mean_abs=1.5497702360153198, max_abs=12.0, mean_rel=0.15791870653629303, max_rel=1257.876708984375, norm_rel=0.02482086978852749, ref_abs_avg=62.89813995361328, test_abs_avg=62.897186279296875
production_forward2 grad[4] vs paper_forward: mean_abs=1.431244134902954, max_abs=9.3125, mean_rel=0.36919254064559937, max_rel=3999.999755859375, norm_rel=0.023126743733882904, ref_abs_avg=62.213104248046875, test_abs_avg=62.222198486328125
production_forward2 grad[5] vs paper_forward: mean_abs=1.03311288356781, max_abs=4.5, mean_rel=0.06910820305347443, max_rel=2.976425886154175, norm_rel=0.02208760939538479, ref_abs_avg=47.7823600769043, test_abs_avg=47.854339599609375
production_forward2 grad[6] vs paper_forward: mean_abs=1.371793270111084, max_abs=9.0, mean_rel=0.15997371077537537, max_rel=921.376708984375, norm_rel=0.024499325081706047, ref_abs_avg=56.361183166503906, test_abs_avg=56.364532470703125
production_forward2 grad[7] vs paper_forward: mean_abs=1.2642065286636353, max_abs=8.5, mean_rel=0.35213011503219604, max_rel=3874.999755859375, norm_rel=0.02284161001443863, ref_abs_avg=55.67506408691406, test_abs_avg=55.676055908203125
production_forward2 grad[8] vs paper_forward: mean_abs=0.9635763168334961, max_abs=3.5, mean_rel=2.1595053672790527, max_rel=1042.1363525390625, norm_rel=0.02292337827384472, ref_abs_avg=41.70429229736328, test_abs_avg=41.71305847167969
production_forward2 grad[9] vs paper_forward: mean_abs=1.247664451599121, max_abs=8.0, mean_rel=0.1623937338590622, max_rel=2345.701416015625, norm_rel=0.024332676082849503, ref_abs_avg=51.61774826049805, test_abs_avg=51.616065979003906
production_forward2 grad[10] vs paper_forward: mean_abs=1.1402804851531982, max_abs=7.0, mean_rel=0.37023574113845825, max_rel=3812.499755859375, norm_rel=0.02253415435552597, ref_abs_avg=50.882110595703125, test_abs_avg=50.876617431640625
production_forward2 grad[11] vs paper_forward: mean_abs=0.9276297092437744, max_abs=3.625, mean_rel=0.18254214525222778, max_rel=37.461021423339844, norm_rel=0.02448125369846821, ref_abs_avg=38.153289794921875, test_abs_avg=38.143096923828125
production_forward2 grad[12] vs paper_forward: mean_abs=1.151895523071289, max_abs=9.0, mean_rel=0.15920040011405945, max_rel=1130.92919921875, norm_rel=0.02410726062953472, ref_abs_avg=48.06172180175781, test_abs_avg=48.06108856201172
production_forward2 grad[13] vs paper_forward: mean_abs=1.0653164386749268, max_abs=6.0625, mean_rel=0.36726248264312744, max_rel=3624.999755859375, norm_rel=0.022402092814445496, ref_abs_avg=47.745670318603516, test_abs_avg=47.750755310058594
production_forward2 grad[14] vs paper_forward: mean_abs=0.8042986392974854, max_abs=3.25, mean_rel=0.30003029108047485, max_rel=65.2695541381836, norm_rel=0.02262873202562332, ref_abs_avg=36.13105010986328, test_abs_avg=36.20017623901367
production_forward2 grad[15] vs paper_forward: mean_abs=1.0725352764129639, max_abs=7.5, mean_rel=0.17008677124977112, max_rel=1724.880859375, norm_rel=0.023939289152622223, ref_abs_avg=45.069244384765625, test_abs_avg=45.06598663330078
production_forward2 grad[16] vs paper_forward: mean_abs=0.9930413961410522, max_abs=6.25, mean_rel=0.3097469210624695, max_rel=2859.374755859375, norm_rel=0.02250639535486698, ref_abs_avg=44.32792663574219, test_abs_avg=44.32646179199219
production_forward2 grad[17] vs paper_forward: mean_abs=0.7654190063476562, max_abs=3.0, mean_rel=0.12103679776191711, max_rel=9.042130470275879, norm_rel=0.02277655526995659, ref_abs_avg=33.41709899902344, test_abs_avg=33.46021270751953
production_forward2 grad[18] vs paper_forward: mean_abs=1.0124306678771973, max_abs=7.0, mean_rel=0.16095378994941711, max_rel=1237.6781005859375, norm_rel=0.023813309147953987, ref_abs_avg=42.73699188232422, test_abs_avg=42.73841094970703
production_forward2 grad[19] vs paper_forward: mean_abs=0.9314874410629272, max_abs=6.75, mean_rel=0.27789703011512756, max_rel=2687.499755859375, norm_rel=0.022033054381608963, ref_abs_avg=42.47849655151367, test_abs_avg=42.48313903808594
production_forward2 grad[20] vs paper_forward: mean_abs=0.7139970660209656, max_abs=2.875, mean_rel=0.5406382083892822, max_rel=237.8085174560547, norm_rel=0.02065322734415531, ref_abs_avg=34.94683837890625, test_abs_avg=34.92076110839844
production_forward2 grad[21] vs paper_forward: mean_abs=0.9606443643569946, max_abs=7.0, mean_rel=0.1538936197757721, max_rel=658.17529296875, norm_rel=0.023597799241542816, ref_abs_avg=40.90215301513672, test_abs_avg=40.900611877441406
production_forward2 grad[22] vs paper_forward: mean_abs=0.8758739233016968, max_abs=5.5, mean_rel=0.2898714542388916, max_rel=4718.75, norm_rel=0.021747760474681854, ref_abs_avg=40.4848747253418, test_abs_avg=40.47773742675781
production_forward2 grad[23] vs paper_forward: mean_abs=0.7172198295593262, max_abs=2.96875, mean_rel=0.1527971476316452, max_rel=36.459991455078125, norm_rel=0.023392437025904655, ref_abs_avg=29.988601684570312, test_abs_avg=29.979290008544922
production_forward2 grad[24] vs paper_forward: mean_abs=0.9050284624099731, max_abs=7.0, mean_rel=0.15653973817825317, max_rel=929.78857421875, norm_rel=0.023542538285255432, ref_abs_avg=38.689720153808594, test_abs_avg=38.68968963623047
production_forward2 grad[25] vs paper_forward: mean_abs=0.8315428495407104, max_abs=5.5, mean_rel=0.2441350221633911, max_rel=1687.4998779296875, norm_rel=0.02180037833750248, ref_abs_avg=38.26338195800781, test_abs_avg=38.26284408569336
production_forward2 grad[26] vs paper_forward: mean_abs=0.7895815968513489, max_abs=3.0, mean_rel=0.5882266163825989, max_rel=249.12644958496094, norm_rel=0.0222734734416008, ref_abs_avg=35.49198913574219, test_abs_avg=35.49956130981445
production_forward2 grad[27] vs paper_forward: mean_abs=1.0378377437591553, max_abs=8.0, mean_rel=0.16936340928077698, max_rel=1006.3295288085938, norm_rel=0.025188306346535683, ref_abs_avg=41.434444427490234, test_abs_avg=41.432884216308594
production_forward2 grad[28] vs paper_forward: mean_abs=0.9670404195785522, max_abs=6.0, mean_rel=0.35493576526641846, max_rel=3124.999755859375, norm_rel=0.023680731654167175, ref_abs_avg=40.99825668334961, test_abs_avg=40.99277877807617
production_forward2 grad[29] vs paper_forward: mean_abs=0.7265609502792358, max_abs=3.0, mean_rel=0.24961966276168823, max_rel=56.270111083984375, norm_rel=0.02307642623782158, ref_abs_avg=31.349300384521484, test_abs_avg=31.34013557434082
production_forward2 grad[30] vs paper_forward: mean_abs=0.9673053026199341, max_abs=7.0, mean_rel=0.17612047493457794, max_rel=2023.214111328125, norm_rel=0.0254384595900774, ref_abs_avg=38.23497772216797, test_abs_avg=38.23040771484375
production_forward2 grad[31] vs paper_forward: mean_abs=0.9085770845413208, max_abs=6.0, mean_rel=0.2805432379245758, max_rel=2593.749755859375, norm_rel=0.0241964440792799, ref_abs_avg=37.73699188232422, test_abs_avg=37.72787857055664
production_forward2 grad[32] vs paper_forward: mean_abs=0.6746860146522522, max_abs=2.41015625, mean_rel=0.09589934349060059, max_rel=5.029376983642578, norm_rel=0.022910594940185547, ref_abs_avg=29.52902603149414, test_abs_avg=29.526334762573242
production_forward2 grad[33] vs paper_forward: mean_abs=0.9069975018501282, max_abs=7.0, mean_rel=0.15946274995803833, max_rel=972.5081176757812, norm_rel=0.025373049080371857, ref_abs_avg=35.92019271850586, test_abs_avg=35.91836929321289
production_forward2 grad[34] vs paper_forward: mean_abs=0.844631552696228, max_abs=5.5, mean_rel=0.23290589451789856, max_rel=3031.249755859375, norm_rel=0.024090507999062538, ref_abs_avg=35.144676208496094, test_abs_avg=35.14326858520508
production_forward2 grad[35] vs paper_forward: mean_abs=0.6694431304931641, max_abs=2.5, mean_rel=0.39746731519699097, max_rel=78.67456817626953, norm_rel=0.023370057344436646, ref_abs_avg=27.986764907836914, test_abs_avg=27.98966407775879
production_forward2 grad[36] vs paper_forward: mean_abs=0.8513449430465698, max_abs=6.5, mean_rel=0.1591421216726303, max_rel=1117.427978515625, norm_rel=0.025214755907654762, ref_abs_avg=33.90088653564453, test_abs_avg=33.90078353881836
production_forward2 grad[37] vs paper_forward: mean_abs=0.7895064949989319, max_abs=4.6875, mean_rel=0.24161292612552643, max_rel=2624.999755859375, norm_rel=0.02350260689854622, ref_abs_avg=33.695587158203125, test_abs_avg=33.69758987426758
production_forward2 grad[38] vs paper_forward: mean_abs=0.5911033153533936, max_abs=2.75, mean_rel=0.1384139358997345, max_rel=35.113319396972656, norm_rel=0.022581102326512337, ref_abs_avg=26.343456268310547, test_abs_avg=26.34825325012207
production_forward2 grad[39] vs paper_forward: mean_abs=0.8080371618270874, max_abs=6.0, mean_rel=0.15387660264968872, max_rel=1196.30029296875, norm_rel=0.02499968744814396, ref_abs_avg=32.45726013183594, test_abs_avg=32.4584846496582
production_forward2 grad[40] vs paper_forward: mean_abs=0.7525404095649719, max_abs=5.375, mean_rel=0.27114275097846985, max_rel=2515.625, norm_rel=0.02353072538971901, ref_abs_avg=32.128570556640625, test_abs_avg=32.126564025878906
production_forward2 grad[41] vs paper_forward: mean_abs=0.6194626092910767, max_abs=2.5, mean_rel=0.28826069831848145, max_rel=89.09233093261719, norm_rel=0.02314312756061554, ref_abs_avg=26.492977142333984, test_abs_avg=26.508045196533203
production_forward2 grad[42] vs paper_forward: mean_abs=0.765314519405365, max_abs=5.5, mean_rel=0.15464632213115692, max_rel=1610.38037109375, norm_rel=0.02457168884575367, ref_abs_avg=31.25214385986328, test_abs_avg=31.25325584411621
production_forward2 grad[43] vs paper_forward: mean_abs=0.7160757780075073, max_abs=5.0, mean_rel=0.2872154414653778, max_rel=2406.25, norm_rel=0.02323206141591072, ref_abs_avg=30.87991714477539, test_abs_avg=30.881378173828125
production_forward2 grad[44] vs paper_forward: mean_abs=0.5666918158531189, max_abs=2.5, mean_rel=0.34753865003585815, max_rel=107.93782043457031, norm_rel=0.023274976760149002, ref_abs_avg=24.37251091003418, test_abs_avg=24.409635543823242
production_forward2 grad[45] vs paper_forward: mean_abs=0.7281359434127808, max_abs=5.5, mean_rel=0.15912938117980957, max_rel=1134.2391357421875, norm_rel=0.024366483092308044, ref_abs_avg=29.99771499633789, test_abs_avg=29.995803833007812
production_forward2 grad[46] vs paper_forward: mean_abs=0.6763403415679932, max_abs=4.125, mean_rel=0.24700400233268738, max_rel=2250.0, norm_rel=0.02279253676533699, ref_abs_avg=29.694520950317383, test_abs_avg=29.689491271972656
production_forward2 grad[47] vs paper_forward: mean_abs=0.5415370464324951, max_abs=2.25, mean_rel=0.16318845748901367, max_rel=24.942108154296875, norm_rel=0.023241886869072914, ref_abs_avg=23.264705657958984, test_abs_avg=23.262643814086914
production_forward2 grad[48] vs paper_forward: mean_abs=0.6961624622344971, max_abs=5.0, mean_rel=0.1553620994091034, max_rel=764.4299926757812, norm_rel=0.024243254214525223, ref_abs_avg=28.799495697021484, test_abs_avg=28.798784255981445
production_forward2 grad[49] vs paper_forward: mean_abs=0.6486812233924866, max_abs=4.21875, mean_rel=0.25436586141586304, max_rel=1906.2498779296875, norm_rel=0.022731106728315353, ref_abs_avg=28.646867752075195, test_abs_avg=28.64893913269043
production_forward2 grad[50] vs paper_forward: mean_abs=0.5778083801269531, max_abs=2.0, mean_rel=0.07676694542169571, max_rel=3.7090346813201904, norm_rel=0.023473674431443214, ref_abs_avg=24.734539031982422, test_abs_avg=24.706031799316406
production_forward2 grad[51] vs paper_forward: mean_abs=0.7745365500450134, max_abs=5.0, mean_rel=0.1715383231639862, max_rel=1226.5648193359375, norm_rel=0.02596300281584263, ref_abs_avg=29.921646118164062, test_abs_avg=29.922691345214844
production_forward2 grad[52] vs paper_forward: mean_abs=0.7185848951339722, max_abs=5.125, mean_rel=0.2733921408653259, max_rel=2046.8748779296875, norm_rel=0.024795694276690483, ref_abs_avg=29.1123046875, test_abs_avg=29.117259979248047
production_forward2 grad[53] vs paper_forward: mean_abs=0.5859878063201904, max_abs=2.25, mean_rel=0.1099407970905304, max_rel=7.8194427490234375, norm_rel=0.02502887323498726, ref_abs_avg=22.65166473388672, test_abs_avg=22.66388702392578
production_forward2 grad[54] vs paper_forward: mean_abs=0.7011326551437378, max_abs=4.5, mean_rel=0.16258859634399414, max_rel=1133.838623046875, norm_rel=0.02570854127407074, ref_abs_avg=27.361656188964844, test_abs_avg=27.361724853515625
production_forward2 grad[55] vs paper_forward: mean_abs=0.6567636132240295, max_abs=4.5, mean_rel=0.2668258547782898, max_rel=1937.4998779296875, norm_rel=0.02433507703244686, ref_abs_avg=27.02971649169922, test_abs_avg=27.03084945678711
production_forward2 grad[56] vs paper_forward: mean_abs=0.5086460113525391, max_abs=1.875, mean_rel=0.13363297283649445, max_rel=11.506463050842285, norm_rel=0.02507873997092247, ref_abs_avg=20.007184982299805, test_abs_avg=20.013708114624023
production_forward2 grad[57] vs paper_forward: mean_abs=0.6556673049926758, max_abs=5.0, mean_rel=0.16316208243370056, max_rel=1053.23974609375, norm_rel=0.025091638788580894, ref_abs_avg=26.196563720703125, test_abs_avg=26.195117950439453
production_forward2 grad[58] vs paper_forward: mean_abs=0.6103833913803101, max_abs=4.125, mean_rel=0.27299025654792786, max_rel=2250.0, norm_rel=0.023667292669415474, ref_abs_avg=25.812030792236328, test_abs_avg=25.815025329589844
production_forward2 grad[59] vs paper_forward: mean_abs=0.5183870792388916, max_abs=2.0, mean_rel=0.07908590883016586, max_rel=7.160137176513672, norm_rel=0.024566324427723885, ref_abs_avg=21.51241683959961, test_abs_avg=21.487064361572266
production_forward2 grad[60] vs paper_forward: mean_abs=0.6194438934326172, max_abs=4.5, mean_rel=0.15459994971752167, max_rel=1111.9990234375, norm_rel=0.024688368663191795, ref_abs_avg=25.125457763671875, test_abs_avg=25.12483787536621
production_forward2 grad[61] vs paper_forward: mean_abs=0.5688862800598145, max_abs=5.0, mean_rel=0.22596323490142822, max_rel=2437.5, norm_rel=0.02334989234805107, ref_abs_avg=24.41583251953125, test_abs_avg=24.413087844848633
production_forward2 grad[62] vs paper_forward: mean_abs=0.42881035804748535, max_abs=1.875, mean_rel=0.09334686398506165, max_rel=6.66737699508667, norm_rel=0.02171807922422886, ref_abs_avg=20.36512565612793, test_abs_avg=20.388185501098633
production_forward2 grad[63] vs paper_forward: mean_abs=0.5809763073921204, max_abs=4.5, mean_rel=0.15776285529136658, max_rel=1069.85205078125, norm_rel=0.02421659417450428, ref_abs_avg=23.995847702026367, test_abs_avg=23.996917724609375
production_forward2 grad[64] vs paper_forward: mean_abs=0.53386390209198, max_abs=4.125, mean_rel=0.2425938993692398, max_rel=1562.4998779296875, norm_rel=0.02291676588356495, ref_abs_avg=23.278202056884766, test_abs_avg=23.282657623291016
production_forward2 grad[65] vs paper_forward: mean_abs=0.4454660415649414, max_abs=1.625, mean_rel=0.07139740884304047, max_rel=2.1312081813812256, norm_rel=0.02348480373620987, ref_abs_avg=18.863739013671875, test_abs_avg=18.885814666748047
production_forward2 grad[66] vs paper_forward: mean_abs=0.5469496250152588, max_abs=4.5, mean_rel=0.15066303312778473, max_rel=1267.297607421875, norm_rel=0.02384260669350624, ref_abs_avg=22.999900817871094, test_abs_avg=22.99843978881836
production_forward2 grad[67] vs paper_forward: mean_abs=0.5060768127441406, max_abs=3.75, mean_rel=0.24086812138557434, max_rel=1624.9998779296875, norm_rel=0.022286178544163704, ref_abs_avg=22.71340560913086, test_abs_avg=22.711162567138672
production_forward2 grad[68] vs paper_forward: mean_abs=0.41753435134887695, max_abs=1.375, mean_rel=0.11160451173782349, max_rel=8.823698043823242, norm_rel=0.02160690911114216, ref_abs_avg=18.962921142578125, test_abs_avg=18.94127655029297
production_forward2 grad[69] vs paper_forward: mean_abs=0.5247746706008911, max_abs=4.5, mean_rel=0.1457137018442154, max_rel=819.0906372070312, norm_rel=0.02362917549908161, ref_abs_avg=22.214208602905273, test_abs_avg=22.211881637573242
production_forward2 grad[70] vs paper_forward: mean_abs=0.4850007891654968, max_abs=4.0, mean_rel=0.21299421787261963, max_rel=1656.2498779296875, norm_rel=0.02187766134738922, ref_abs_avg=22.122600555419922, test_abs_avg=22.116390228271484
production_forward2 grad[71] vs paper_forward: mean_abs=0.3955373764038086, max_abs=1.375, mean_rel=0.145904079079628, max_rel=34.40580749511719, norm_rel=0.021799668669700623, ref_abs_avg=18.3199462890625, test_abs_avg=18.279138565063477
production_forward2 grad[72] vs paper_forward: mean_abs=0.4988921284675598, max_abs=4.25, mean_rel=0.1504337042570114, max_rel=859.2709350585938, norm_rel=0.023094847798347473, ref_abs_avg=21.618694305419922, test_abs_avg=21.61741065979004
production_forward2 grad[73] vs paper_forward: mean_abs=0.46245646476745605, max_abs=4.0, mean_rel=0.1970694214105606, max_rel=968.7499389648438, norm_rel=0.02160746231675148, ref_abs_avg=21.377172470092773, test_abs_avg=21.383068084716797
production_forward2 grad[74] vs paper_forward: mean_abs=0.4393274784088135, max_abs=1.75, mean_rel=0.18274828791618347, max_rel=33.269630432128906, norm_rel=0.024207672104239464, ref_abs_avg=18.011995315551758, test_abs_avg=18.028127670288086
production_forward2 grad[75] vs paper_forward: mean_abs=0.549232006072998, max_abs=5.25, mean_rel=0.1578870415687561, max_rel=1166.946044921875, norm_rel=0.024723727256059647, ref_abs_avg=22.28485107421875, test_abs_avg=22.285306930541992
production_forward2 grad[76] vs paper_forward: mean_abs=0.5048183798789978, max_abs=4.0, mean_rel=0.24907785654067993, max_rel=1499.9998779296875, norm_rel=0.023081017658114433, ref_abs_avg=21.94292640686035, test_abs_avg=21.94934844970703
production_forward2 grad[77] vs paper_forward: mean_abs=0.3916587829589844, max_abs=1.75, mean_rel=0.05821503326296806, max_rel=1.9018858671188354, norm_rel=0.02261548861861229, ref_abs_avg=17.437042236328125, test_abs_avg=17.476696014404297
production_forward2 grad[78] vs paper_forward: mean_abs=0.5135632157325745, max_abs=4.0, mean_rel=0.15630723536014557, max_rel=772.9962768554688, norm_rel=0.02414582297205925, ref_abs_avg=21.324974060058594, test_abs_avg=21.326610565185547
production_forward2 grad[79] vs paper_forward: mean_abs=0.47126442193984985, max_abs=4.25, mean_rel=0.2071617990732193, max_rel=2218.75, norm_rel=0.02243385650217533, ref_abs_avg=21.033740997314453, test_abs_avg=21.034311294555664
production_forward2 grad[80] vs paper_forward: mean_abs=0.3591771125793457, max_abs=1.375, mean_rel=0.10917805135250092, max_rel=6.597054481506348, norm_rel=0.02329058013856411, ref_abs_avg=15.63050651550293, test_abs_avg=15.62059211730957
production_forward2 grad[81] vs paper_forward: mean_abs=0.47721540927886963, max_abs=4.3125, mean_rel=0.14866787195205688, max_rel=799.4480590820312, norm_rel=0.023509392514824867, ref_abs_avg=20.319713592529297, test_abs_avg=20.320289611816406
production_forward2 grad[82] vs paper_forward: mean_abs=0.4380452036857605, max_abs=4.0, mean_rel=0.19449052214622498, max_rel=1624.9998779296875, norm_rel=0.021939143538475037, ref_abs_avg=19.98800277709961, test_abs_avg=19.990558624267578
production_forward2 grad[83] vs paper_forward: mean_abs=0.3578987121582031, max_abs=1.28125, mean_rel=0.1659785807132721, max_rel=41.666664123535156, norm_rel=0.023008689284324646, ref_abs_avg=15.301029205322266, test_abs_avg=15.288540840148926
production_forward2 grad[84] vs paper_forward: mean_abs=0.437211811542511, max_abs=4.75, mean_rel=0.14139601588249207, max_rel=770.0286254882812, norm_rel=0.022850533947348595, ref_abs_avg=19.220355987548828, test_abs_avg=19.220300674438477
production_forward2 grad[85] vs paper_forward: mean_abs=0.4029075801372528, max_abs=4.0, mean_rel=0.19431015849113464, max_rel=1374.9998779296875, norm_rel=0.02118401974439621, ref_abs_avg=19.050106048583984, test_abs_avg=19.051284790039062
production_forward2 grad[86] vs paper_forward: mean_abs=0.3284728527069092, max_abs=1.25, mean_rel=0.15154440701007843, max_rel=13.027325630187988, norm_rel=0.020897693932056427, ref_abs_avg=15.540349006652832, test_abs_avg=15.543914794921875
production_forward2 grad[87] vs paper_forward: mean_abs=0.41592076420783997, max_abs=4.0, mean_rel=0.12901869416236877, max_rel=475.080810546875, norm_rel=0.022301888093352318, ref_abs_avg=18.754602432250977, test_abs_avg=18.754745483398438
production_forward2 grad[88] vs paper_forward: mean_abs=0.3778098523616791, max_abs=4.0, mean_rel=0.1872662901878357, max_rel=1281.25, norm_rel=0.020130690187215805, ref_abs_avg=18.897640228271484, test_abs_avg=18.892475128173828
production_forward2 grad[89] vs paper_forward: mean_abs=0.3253900408744812, max_abs=1.25, mean_rel=0.16593289375305176, max_rel=35.287078857421875, norm_rel=0.022015178576111794, ref_abs_avg=14.594757080078125, test_abs_avg=14.600723266601562
production_forward2 grad[90] vs paper_forward: mean_abs=0.4039750397205353, max_abs=4.0, mean_rel=0.13254500925540924, max_rel=929.015869140625, norm_rel=0.021796485409140587, ref_abs_avg=18.700483322143555, test_abs_avg=18.699783325195312
production_forward2 grad[91] vs paper_forward: mean_abs=0.35663968324661255, max_abs=4.5, mean_rel=0.1680004894733429, max_rel=968.7499389648438, norm_rel=0.020181924104690552, ref_abs_avg=17.779239654541016, test_abs_avg=17.776968002319336
production_forward2 grad[92] vs paper_forward: mean_abs=0.28432178497314453, max_abs=1.3125, mean_rel=0.07304640114307404, max_rel=1.6458971500396729, norm_rel=0.01897438056766987, ref_abs_avg=14.862128257751465, test_abs_avg=14.874914169311523
production_forward2 grad[93] vs paper_forward: mean_abs=0.36763817071914673, max_abs=5.5, mean_rel=0.12582334876060486, max_rel=732.656494140625, norm_rel=0.021160433068871498, ref_abs_avg=17.61752700805664, test_abs_avg=17.617996215820312
production_forward2 grad[94] vs paper_forward: mean_abs=0.3329095244407654, max_abs=4.0, mean_rel=0.15121465921401978, max_rel=749.9999389648438, norm_rel=0.018815604969859123, ref_abs_avg=17.80240249633789, test_abs_avg=17.803586959838867
production_forward2 grad[95] vs paper_forward: mean_abs=0.26389074325561523, max_abs=1.25, mean_rel=0.0812288150191307, max_rel=6.621312618255615, norm_rel=0.018481090664863586, ref_abs_avg=14.256451606750488, test_abs_avg=14.245870590209961
production_forward2 grad[96] vs paper_forward: mean_abs=0.34643107652664185, max_abs=4.0, mean_rel=0.12324550747871399, max_rel=585.5897216796875, norm_rel=0.020732270553708076, ref_abs_avg=16.994476318359375, test_abs_avg=16.994918823242188
production_forward2 grad[97] vs paper_forward: mean_abs=0.31490394473075867, max_abs=3.5, mean_rel=0.1547372043132782, max_rel=1125.0, norm_rel=0.01876204088330269, ref_abs_avg=16.966373443603516, test_abs_avg=16.969791412353516
identity layers + randn queries
mean abs randn paper: 0.2177734375
production_forward fwd+bwd:  124.536 ms
production_forward fwd-only: 22.781 ms
production_forward bwd-only: 102.163 ms
production_forward peak allocated: fwd=2.192 GiB, fwd+bwd=6.071 GiB
production_forward peak reserved:  fwd=2.225 GiB, fwd+bwd=6.100 GiB
torch_compile_phases_forward fwd+bwd:  260.501 ms
torch_compile_phases_forward fwd-only: 43.622 ms
torch_compile_phases_forward bwd-only: 213.891 ms
torch_compile_phases_forward peak allocated: fwd=5.342 GiB, fwd+bwd=6.469 GiB
torch_compile_phases_forward peak reserved:  fwd=5.850 GiB, fwd+bwd=9.850 GiB
paper_forward fwd+bwd:  535.999 ms
paper_forward fwd-only: 97.262 ms
paper_forward bwd-only: 439.613 ms
paper_forward peak allocated: fwd=6.194 GiB, fwd+bwd=10.068 GiB
paper_forward peak reserved:  fwd=6.225 GiB, fwd+bwd=10.225 GiB
production_forward2 fwd+bwd:  243.328 ms
production_forward2 fwd-only: 24.772 ms
production_forward2 bwd-only: 219.060 ms
production_forward2 peak allocated: fwd=2.692 GiB, fwd+bwd=6.071 GiB
production_forward2 peak reserved:  fwd=2.975 GiB, fwd+bwd=8.725 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016604738775640726, max_abs=0.0625
production_forward grad[0] vs paper_forward: mean_abs=0.00886599626392126, max_abs=0.46875, mean_rel=0.07585589587688446, max_rel=99.1597671508789, norm_rel=0.02078019268810749, ref_abs_avg=0.4614817500114441, test_abs_avg=0.46148228645324707
production_forward grad[1] vs paper_forward: mean_abs=7.599981784820557, max_abs=64.0, mean_rel=0.20753054320812225, max_rel=1000.3134155273438, norm_rel=0.021152064204216003, ref_abs_avg=322.84356689453125, test_abs_avg=322.91387939453125
production_forward grad[2] vs paper_forward: mean_abs=1.363255500793457, max_abs=5.6875, mean_rel=0.12542293965816498, max_rel=16.5623779296875, norm_rel=0.025506282225251198, ref_abs_avg=55.132720947265625, test_abs_avg=55.0892219543457
production_forward grad[3] vs paper_forward: mean_abs=1.6968581676483154, max_abs=12.0, mean_rel=0.20094981789588928, max_rel=2702.014404296875, norm_rel=0.025585593655705452, ref_abs_avg=66.75446319580078, test_abs_avg=66.75099182128906
production_forward grad[4] vs paper_forward: mean_abs=1.5788280963897705, max_abs=9.671875, mean_rel=0.4294275641441345, max_rel=6124.99951171875, norm_rel=0.024049125611782074, ref_abs_avg=66.01354217529297, test_abs_avg=65.99153137207031
production_forward grad[5] vs paper_forward: mean_abs=1.2060576677322388, max_abs=4.75, mean_rel=0.18970361351966858, max_rel=29.910886764526367, norm_rel=0.026242036372423172, ref_abs_avg=45.96514892578125, test_abs_avg=46.03305435180664
production_forward grad[6] vs paper_forward: mean_abs=1.4775108098983765, max_abs=10.0, mean_rel=0.18196168541908264, max_rel=2419.167236328125, norm_rel=0.02514774352312088, ref_abs_avg=59.17359924316406, test_abs_avg=59.17401123046875
production_forward grad[7] vs paper_forward: mean_abs=1.3658198118209839, max_abs=9.0, mean_rel=0.482318639755249, max_rel=4875.0, norm_rel=0.023409537971019745, ref_abs_avg=58.61775207519531, test_abs_avg=58.6171989440918
production_forward grad[8] vs paper_forward: mean_abs=1.0707511901855469, max_abs=4.5, mean_rel=0.15396526455879211, max_rel=21.44224739074707, norm_rel=0.02507113292813301, ref_abs_avg=43.111061096191406, test_abs_avg=43.11427688598633
production_forward grad[9] vs paper_forward: mean_abs=1.3429486751556396, max_abs=9.0, mean_rel=0.176259845495224, max_rel=1493.8905029296875, norm_rel=0.024866754189133644, ref_abs_avg=54.31037521362305, test_abs_avg=54.310447692871094
production_forward grad[10] vs paper_forward: mean_abs=1.242974042892456, max_abs=7.125, mean_rel=0.3586859405040741, max_rel=4312.5, norm_rel=0.023108238354325294, ref_abs_avg=53.96864318847656, test_abs_avg=53.9710807800293
production_forward grad[11] vs paper_forward: mean_abs=0.9544229507446289, max_abs=3.625, mean_rel=0.12686455249786377, max_rel=10.947750091552734, norm_rel=0.02436353638768196, ref_abs_avg=39.254661560058594, test_abs_avg=39.21284484863281
production_forward grad[12] vs paper_forward: mean_abs=1.2368342876434326, max_abs=9.75, mean_rel=0.16884513199329376, max_rel=1028.4122314453125, norm_rel=0.024822624400258064, ref_abs_avg=50.15018844604492, test_abs_avg=50.15131378173828
production_forward grad[13] vs paper_forward: mean_abs=1.145312786102295, max_abs=7.0, mean_rel=0.29901933670043945, max_rel=4937.5, norm_rel=0.022987889125943184, ref_abs_avg=50.054649353027344, test_abs_avg=50.054237365722656
production_forward grad[14] vs paper_forward: mean_abs=0.9163150787353516, max_abs=3.3125, mean_rel=0.09730532020330429, max_rel=5.7816643714904785, norm_rel=0.02528461255133152, ref_abs_avg=35.790283203125, test_abs_avg=35.81653594970703
production_forward grad[15] vs paper_forward: mean_abs=1.1516516208648682, max_abs=8.0, mean_rel=0.17557746171951294, max_rel=2529.65576171875, norm_rel=0.024535462260246277, ref_abs_avg=47.24266815185547, test_abs_avg=47.240509033203125
production_forward grad[16] vs paper_forward: mean_abs=1.0629334449768066, max_abs=6.5, mean_rel=0.3535383343696594, max_rel=3874.999755859375, norm_rel=0.022935405373573303, ref_abs_avg=46.56147766113281, test_abs_avg=46.55811309814453
production_forward grad[17] vs paper_forward: mean_abs=0.8071496486663818, max_abs=3.75, mean_rel=0.08226826041936874, max_rel=12.501344680786133, norm_rel=0.02176922746002674, ref_abs_avg=38.23847198486328, test_abs_avg=38.21745300292969
production_forward grad[18] vs paper_forward: mean_abs=1.084444522857666, max_abs=7.0625, mean_rel=0.15752959251403809, max_rel=1051.3482666015625, norm_rel=0.02447653003036976, ref_abs_avg=44.58213806152344, test_abs_avg=44.58169937133789
production_forward grad[19] vs paper_forward: mean_abs=1.0039114952087402, max_abs=7.0, mean_rel=0.3166542053222656, max_rel=2624.999755859375, norm_rel=0.02292085625231266, ref_abs_avg=44.01583480834961, test_abs_avg=44.01882553100586
production_forward grad[20] vs paper_forward: mean_abs=0.8325629234313965, max_abs=3.375, mean_rel=0.19953487813472748, max_rel=47.1361083984375, norm_rel=0.02316869981586933, ref_abs_avg=35.71661376953125, test_abs_avg=35.76447296142578
production_forward grad[21] vs paper_forward: mean_abs=1.029761552810669, max_abs=7.0, mean_rel=0.15543723106384277, max_rel=1371.05517578125, norm_rel=0.024318544194102287, ref_abs_avg=42.601844787597656, test_abs_avg=42.59819030761719
production_forward grad[22] vs paper_forward: mean_abs=0.948664665222168, max_abs=6.0, mean_rel=0.32913240790367126, max_rel=2500.0, norm_rel=0.022651035338640213, ref_abs_avg=42.150115966796875, test_abs_avg=42.14785385131836
production_forward grad[23] vs paper_forward: mean_abs=0.7787948846817017, max_abs=3.0, mean_rel=0.08239251375198364, max_rel=3.392530918121338, norm_rel=0.024491317570209503, ref_abs_avg=33.00856399536133, test_abs_avg=33.003807067871094
production_forward grad[24] vs paper_forward: mean_abs=0.9821560978889465, max_abs=7.0, mean_rel=0.15584754943847656, max_rel=796.5933227539062, norm_rel=0.024159785360097885, ref_abs_avg=40.873695373535156, test_abs_avg=40.872215270996094
production_forward grad[25] vs paper_forward: mean_abs=0.907339334487915, max_abs=6.0, mean_rel=0.28927403688430786, max_rel=2656.249755859375, norm_rel=0.022701626643538475, ref_abs_avg=40.18817138671875, test_abs_avg=40.18694305419922
production_forward grad[26] vs paper_forward: mean_abs=0.8903789520263672, max_abs=4.0, mean_rel=0.07780183851718903, max_rel=2.9194836616516113, norm_rel=0.02506278082728386, ref_abs_avg=35.86631774902344, test_abs_avg=35.888633728027344
production_forward grad[27] vs paper_forward: mean_abs=1.1543216705322266, max_abs=9.0, mean_rel=0.16903099417686462, max_rel=1272.435791015625, norm_rel=0.026287682354450226, ref_abs_avg=44.16026306152344, test_abs_avg=44.164039611816406
production_forward grad[28] vs paper_forward: mean_abs=1.0561273097991943, max_abs=6.5, mean_rel=0.3697260618209839, max_rel=3937.499755859375, norm_rel=0.024649860337376595, ref_abs_avg=43.079345703125, test_abs_avg=43.080665588378906
production_forward grad[29] vs paper_forward: mean_abs=0.8257122039794922, max_abs=3.5, mean_rel=0.1838379055261612, max_rel=28.56215476989746, norm_rel=0.026377364993095398, ref_abs_avg=31.315065383911133, test_abs_avg=31.259809494018555
production_forward grad[30] vs paper_forward: mean_abs=1.0403653383255005, max_abs=8.0, mean_rel=0.17594972252845764, max_rel=1289.946044921875, norm_rel=0.026460494846105576, ref_abs_avg=39.48816680908203, test_abs_avg=39.48564147949219
production_forward grad[31] vs paper_forward: mean_abs=0.968350887298584, max_abs=6.25, mean_rel=0.3635968565940857, max_rel=3593.749755859375, norm_rel=0.02478184923529625, ref_abs_avg=39.22298812866211, test_abs_avg=39.22713088989258
production_forward grad[32] vs paper_forward: mean_abs=0.7188296318054199, max_abs=3.4453125, mean_rel=0.11025188863277435, max_rel=20.03004264831543, norm_rel=0.023760080337524414, ref_abs_avg=30.834375381469727, test_abs_avg=30.75200843811035
production_forward grad[33] vs paper_forward: mean_abs=0.9550033807754517, max_abs=6.5, mean_rel=0.17710626125335693, max_rel=1260.0833740234375, norm_rel=0.02616133540868759, ref_abs_avg=36.68043518066406, test_abs_avg=36.68061447143555
production_forward grad[34] vs paper_forward: mean_abs=0.891151487827301, max_abs=6.375, mean_rel=0.3497955799102783, max_rel=2843.749755859375, norm_rel=0.024705884978175163, ref_abs_avg=36.15293884277344, test_abs_avg=36.15797424316406
production_forward grad[35] vs paper_forward: mean_abs=0.6938269138336182, max_abs=3.125, mean_rel=0.12554599344730377, max_rel=14.17525863647461, norm_rel=0.025858793407678604, ref_abs_avg=27.694332122802734, test_abs_avg=27.71279525756836
production_forward grad[36] vs paper_forward: mean_abs=0.8968280553817749, max_abs=7.0, mean_rel=0.16840946674346924, max_rel=1774.3385009765625, norm_rel=0.025791512802243233, ref_abs_avg=34.857269287109375, test_abs_avg=34.85677719116211
production_forward grad[37] vs paper_forward: mean_abs=0.8373700380325317, max_abs=4.9609375, mean_rel=0.25672534108161926, max_rel=2562.5, norm_rel=0.02445116825401783, ref_abs_avg=34.33097457885742, test_abs_avg=34.33025360107422
production_forward grad[38] vs paper_forward: mean_abs=0.7041375637054443, max_abs=2.75, mean_rel=0.11884256452322006, max_rel=6.050017833709717, norm_rel=0.025609701871871948, ref_abs_avg=27.55855941772461, test_abs_avg=27.571826934814453
production_forward grad[39] vs paper_forward: mean_abs=0.845865786075592, max_abs=5.5, mean_rel=0.17627784609794617, max_rel=760.1469116210938, norm_rel=0.02554486319422722, ref_abs_avg=33.21516418457031, test_abs_avg=33.21213912963867
production_forward grad[40] vs paper_forward: mean_abs=0.7888064384460449, max_abs=5.0, mean_rel=0.25676247477531433, max_rel=1703.1248779296875, norm_rel=0.02431291714310646, ref_abs_avg=32.544158935546875, test_abs_avg=32.541160583496094
production_forward grad[41] vs paper_forward: mean_abs=0.6172387599945068, max_abs=2.515625, mean_rel=0.24665766954421997, max_rel=49.762264251708984, norm_rel=0.024887703359127045, ref_abs_avg=25.10940170288086, test_abs_avg=25.093551635742188
production_forward grad[42] vs paper_forward: mean_abs=0.7936143279075623, max_abs=5.5, mean_rel=0.1703113317489624, max_rel=1675.908203125, norm_rel=0.025417344644665718, ref_abs_avg=31.35395050048828, test_abs_avg=31.35381317138672
production_forward grad[43] vs paper_forward: mean_abs=0.7467986941337585, max_abs=4.75, mean_rel=0.2976877987384796, max_rel=2437.5, norm_rel=0.02401137165725231, ref_abs_avg=31.15105438232422, test_abs_avg=31.157991409301758
production_forward grad[44] vs paper_forward: mean_abs=0.5850238800048828, max_abs=2.5, mean_rel=0.07394461333751678, max_rel=5.88866662979126, norm_rel=0.02160206250846386, ref_abs_avg=27.935659408569336, test_abs_avg=27.865074157714844
production_forward grad[45] vs paper_forward: mean_abs=0.7629286646842957, max_abs=5.25, mean_rel=0.16067340970039368, max_rel=1500.9073486328125, norm_rel=0.024959994480013847, ref_abs_avg=30.651477813720703, test_abs_avg=30.650283813476562
production_forward grad[46] vs paper_forward: mean_abs=0.7094770669937134, max_abs=4.484375, mean_rel=0.2847749888896942, max_rel=3156.249755859375, norm_rel=0.023688429966568947, ref_abs_avg=29.989580154418945, test_abs_avg=29.991254806518555
production_forward grad[47] vs paper_forward: mean_abs=0.5656595230102539, max_abs=2.25, mean_rel=0.10624077171087265, max_rel=5.194834232330322, norm_rel=0.023338614031672478, ref_abs_avg=24.404666900634766, test_abs_avg=24.419193267822266
production_forward grad[48] vs paper_forward: mean_abs=0.7302733659744263, max_abs=5.0, mean_rel=0.15716400742530823, max_rel=1095.322509765625, norm_rel=0.02487149089574814, ref_abs_avg=29.4409122467041, test_abs_avg=29.440231323242188
production_forward grad[49] vs paper_forward: mean_abs=0.6842970848083496, max_abs=5.25, mean_rel=0.26650887727737427, max_rel=2031.2498779296875, norm_rel=0.02339542657136917, ref_abs_avg=29.28753089904785, test_abs_avg=29.284128189086914
production_forward grad[50] vs paper_forward: mean_abs=0.6066265106201172, max_abs=2.5, mean_rel=0.11663071811199188, max_rel=12.973278999328613, norm_rel=0.02387046255171299, ref_abs_avg=25.53633689880371, test_abs_avg=25.585514068603516
production_forward grad[51] vs paper_forward: mean_abs=0.8305883407592773, max_abs=6.0, mean_rel=0.16999968886375427, max_rel=1202.7293701171875, norm_rel=0.02636691741645336, ref_abs_avg=31.60063362121582, test_abs_avg=31.598724365234375
production_forward grad[52] vs paper_forward: mean_abs=0.7708744406700134, max_abs=5.0, mean_rel=0.2575022876262665, max_rel=2109.375, norm_rel=0.024685164913535118, ref_abs_avg=31.306232452392578, test_abs_avg=31.305822372436523
production_forward grad[53] vs paper_forward: mean_abs=0.576957106590271, max_abs=2.75, mean_rel=0.4369036555290222, max_rel=158.0146942138672, norm_rel=0.025224745273590088, ref_abs_avg=23.70248794555664, test_abs_avg=23.732789993286133
production_forward grad[54] vs paper_forward: mean_abs=0.758804202079773, max_abs=6.0, mean_rel=0.18297737836837769, max_rel=1910.7613525390625, norm_rel=0.025986654683947563, ref_abs_avg=29.288164138793945, test_abs_avg=29.288890838623047
production_forward grad[55] vs paper_forward: mean_abs=0.7092815637588501, max_abs=4.875, mean_rel=0.2148413062095642, max_rel=1874.9998779296875, norm_rel=0.024317249655723572, ref_abs_avg=29.212854385375977, test_abs_avg=29.211502075195312
production_forward grad[56] vs paper_forward: mean_abs=0.5496143102645874, max_abs=2.0, mean_rel=0.16301476955413818, max_rel=14.738120079040527, norm_rel=0.023579353466629982, ref_abs_avg=23.196456909179688, test_abs_avg=23.198287963867188
production_forward grad[57] vs paper_forward: mean_abs=0.7135049104690552, max_abs=5.75, mean_rel=0.16119825839996338, max_rel=1834.2396240234375, norm_rel=0.02523883618414402, ref_abs_avg=28.300537109375, test_abs_avg=28.299169540405273
production_forward grad[58] vs paper_forward: mean_abs=0.6609541177749634, max_abs=5.5, mean_rel=0.2742798924446106, max_rel=1874.9998779296875, norm_rel=0.02357645519077778, ref_abs_avg=27.994022369384766, test_abs_avg=27.99510955810547
production_forward grad[59] vs paper_forward: mean_abs=0.5348153114318848, max_abs=2.0625, mean_rel=0.07640981674194336, max_rel=3.5376908779144287, norm_rel=0.024476563557982445, ref_abs_avg=21.66891860961914, test_abs_avg=21.69759750366211
production_forward grad[60] vs paper_forward: mean_abs=0.6616694331169128, max_abs=4.75, mean_rel=0.1495152860879898, max_rel=791.4088745117188, norm_rel=0.024809740483760834, ref_abs_avg=26.708284378051758, test_abs_avg=26.707698822021484
production_forward grad[61] vs paper_forward: mean_abs=0.6136754751205444, max_abs=4.125, mean_rel=0.24228861927986145, max_rel=1734.3748779296875, norm_rel=0.023426219820976257, ref_abs_avg=26.261947631835938, test_abs_avg=26.256881713867188
production_forward grad[62] vs paper_forward: mean_abs=0.491779088973999, max_abs=2.0, mean_rel=0.08007891476154327, max_rel=4.639122486114502, norm_rel=0.023535385727882385, ref_abs_avg=20.74570083618164, test_abs_avg=20.714086532592773
production_forward grad[63] vs paper_forward: mean_abs=0.6250420808792114, max_abs=5.0, mean_rel=0.1519143283367157, max_rel=908.1576538085938, norm_rel=0.024416804313659668, ref_abs_avg=25.62554931640625, test_abs_avg=25.626218795776367
production_forward grad[64] vs paper_forward: mean_abs=0.5862881541252136, max_abs=4.125, mean_rel=0.224452942609787, max_rel=1656.2498779296875, norm_rel=0.02319423109292984, ref_abs_avg=25.3399715423584, test_abs_avg=25.342710494995117
production_forward grad[65] vs paper_forward: mean_abs=0.44530290365219116, max_abs=1.6875, mean_rel=0.06361408531665802, max_rel=3.6624958515167236, norm_rel=0.022467462345957756, ref_abs_avg=20.803085327148438, test_abs_avg=20.83248519897461
production_forward grad[66] vs paper_forward: mean_abs=0.5937150716781616, max_abs=5.5, mean_rel=0.1518399715423584, max_rel=1062.3577880859375, norm_rel=0.02411472611129284, ref_abs_avg=24.631240844726562, test_abs_avg=24.63155174255371
production_forward grad[67] vs paper_forward: mean_abs=0.5447468757629395, max_abs=4.0, mean_rel=0.2605075538158417, max_rel=1968.7498779296875, norm_rel=0.022698543965816498, ref_abs_avg=23.98655128479004, test_abs_avg=23.99281883239746
production_forward grad[68] vs paper_forward: mean_abs=0.38339781761169434, max_abs=1.75, mean_rel=0.15810084342956543, max_rel=13.650861740112305, norm_rel=0.019037410616874695, ref_abs_avg=20.167598724365234, test_abs_avg=20.162883758544922
production_forward grad[69] vs paper_forward: mean_abs=0.5626124739646912, max_abs=4.5, mean_rel=0.1506769061088562, max_rel=1077.4193115234375, norm_rel=0.02375972643494606, ref_abs_avg=23.687047958374023, test_abs_avg=23.688556671142578
production_forward grad[70] vs paper_forward: mean_abs=0.5195127725601196, max_abs=4.0, mean_rel=0.251750111579895, max_rel=1812.4998779296875, norm_rel=0.02182239666581154, ref_abs_avg=23.70197105407715, test_abs_avg=23.697879791259766
production_forward grad[71] vs paper_forward: mean_abs=0.4167671203613281, max_abs=1.5, mean_rel=0.06938566267490387, max_rel=2.393805742263794, norm_rel=0.020170744508504868, ref_abs_avg=20.75304412841797, test_abs_avg=20.768226623535156
production_forward grad[72] vs paper_forward: mean_abs=0.5362730622291565, max_abs=4.625, mean_rel=0.145473450422287, max_rel=1001.8280029296875, norm_rel=0.023456208407878876, ref_abs_avg=22.899269104003906, test_abs_avg=22.899307250976562
production_forward grad[73] vs paper_forward: mean_abs=0.49510258436203003, max_abs=3.5, mean_rel=0.20718348026275635, max_rel=1953.1248779296875, norm_rel=0.021750081330537796, ref_abs_avg=22.77716064453125, test_abs_avg=22.776729583740234
production_forward grad[74] vs paper_forward: mean_abs=0.4792630672454834, max_abs=1.65625, mean_rel=0.09525616466999054, max_rel=6.838956832885742, norm_rel=0.022060487419366837, ref_abs_avg=21.95119857788086, test_abs_avg=21.910409927368164
production_forward grad[75] vs paper_forward: mean_abs=0.6126663684844971, max_abs=5.0, mean_rel=0.15211433172225952, max_rel=1251.143798828125, norm_rel=0.025097517296671867, ref_abs_avg=24.445791244506836, test_abs_avg=24.445602416992188
production_forward grad[76] vs paper_forward: mean_abs=0.5732258558273315, max_abs=4.0625, mean_rel=0.24683046340942383, max_rel=1250.0, norm_rel=0.023822130635380745, ref_abs_avg=24.05337142944336, test_abs_avg=24.055940628051758
production_forward grad[77] vs paper_forward: mean_abs=0.4528695344924927, max_abs=1.875, mean_rel=0.2567923963069916, max_rel=62.93764877319336, norm_rel=0.02394886314868927, ref_abs_avg=18.93608283996582, test_abs_avg=18.903911590576172
production_forward grad[78] vs paper_forward: mean_abs=0.5634803771972656, max_abs=5.0, mean_rel=0.15566585958003998, max_rel=1019.8110961914062, norm_rel=0.024406924843788147, ref_abs_avg=23.112775802612305, test_abs_avg=23.11225128173828
production_forward grad[79] vs paper_forward: mean_abs=0.5184007287025452, max_abs=4.0, mean_rel=0.22488312423229218, max_rel=1539.0623779296875, norm_rel=0.022933855652809143, ref_abs_avg=22.554485321044922, test_abs_avg=22.55431365966797
production_forward grad[80] vs paper_forward: mean_abs=0.4341311454772949, max_abs=1.75, mean_rel=0.08563699573278427, max_rel=8.866537094116211, norm_rel=0.02383660152554512, ref_abs_avg=18.338659286499023, test_abs_avg=18.333423614501953
production_forward grad[81] vs paper_forward: mean_abs=0.5241726636886597, max_abs=5.5, mean_rel=0.15524205565452576, max_rel=1052.275390625, norm_rel=0.023784618824720383, ref_abs_avg=22.069169998168945, test_abs_avg=22.068984985351562
production_forward grad[82] vs paper_forward: mean_abs=0.47810542583465576, max_abs=4.5, mean_rel=0.1963576078414917, max_rel=1187.5, norm_rel=0.02232505939900875, ref_abs_avg=21.489160537719727, test_abs_avg=21.493213653564453
production_forward grad[83] vs paper_forward: mean_abs=0.376187801361084, max_abs=1.5546875, mean_rel=0.09590309113264084, max_rel=10.46780776977539, norm_rel=0.02155986800789833, ref_abs_avg=17.414867401123047, test_abs_avg=17.39276695251465
production_forward grad[84] vs paper_forward: mean_abs=0.48290571570396423, max_abs=4.625, mean_rel=0.150826096534729, max_rel=1259.0438232421875, norm_rel=0.023077843710780144, ref_abs_avg=21.005718231201172, test_abs_avg=21.004676818847656
production_forward grad[85] vs paper_forward: mean_abs=0.4365924000740051, max_abs=4.625, mean_rel=0.19479677081108093, max_rel=2437.5, norm_rel=0.02106095664203167, ref_abs_avg=20.810348510742188, test_abs_avg=20.812177658081055
production_forward grad[86] vs paper_forward: mean_abs=0.35965919494628906, max_abs=1.5, mean_rel=0.07554233074188232, max_rel=2.9232046604156494, norm_rel=0.02171231433749199, ref_abs_avg=16.9937801361084, test_abs_avg=17.012657165527344
production_forward grad[87] vs paper_forward: mean_abs=0.4520058035850525, max_abs=4.5, mean_rel=0.13954997062683105, max_rel=726.2759399414062, norm_rel=0.02253052406013012, ref_abs_avg=20.199729919433594, test_abs_avg=20.197444915771484
production_forward grad[88] vs paper_forward: mean_abs=0.41262537240982056, max_abs=4.0, mean_rel=0.17773771286010742, max_rel=1999.9998779296875, norm_rel=0.02036518044769764, ref_abs_avg=20.310237884521484, test_abs_avg=20.31125259399414
production_forward grad[89] vs paper_forward: mean_abs=0.32332098484039307, max_abs=1.375, mean_rel=0.06958938390016556, max_rel=2.0405120849609375, norm_rel=0.020913027226924896, ref_abs_avg=15.86269474029541, test_abs_avg=15.870159149169922
production_forward grad[90] vs paper_forward: mean_abs=0.42117196321487427, max_abs=4.0, mean_rel=0.13874197006225586, max_rel=790.0745849609375, norm_rel=0.022053610533475876, ref_abs_avg=19.267946243286133, test_abs_avg=19.267494201660156
production_forward grad[91] vs paper_forward: mean_abs=0.38464778661727905, max_abs=3.75, mean_rel=0.18559511005878448, max_rel=1031.25, norm_rel=0.02000054344534874, ref_abs_avg=19.248991012573242, test_abs_avg=19.24510955810547
production_forward grad[92] vs paper_forward: mean_abs=0.3230583667755127, max_abs=1.25, mean_rel=0.11536981165409088, max_rel=11.386541366577148, norm_rel=0.02011512778699398, ref_abs_avg=16.10123062133789, test_abs_avg=16.11487579345703
production_forward grad[93] vs paper_forward: mean_abs=0.40401262044906616, max_abs=4.8125, mean_rel=0.13859793543815613, max_rel=1071.6513671875, norm_rel=0.021551981568336487, ref_abs_avg=18.987159729003906, test_abs_avg=18.987733840942383
production_forward grad[94] vs paper_forward: mean_abs=0.36702555418014526, max_abs=4.75, mean_rel=0.17046675086021423, max_rel=921.8749389648438, norm_rel=0.019882893189787865, ref_abs_avg=18.775047302246094, test_abs_avg=18.783010482788086
production_forward grad[95] vs paper_forward: mean_abs=0.2976747751235962, max_abs=1.28125, mean_rel=0.24105316400527954, max_rel=49.77203369140625, norm_rel=0.019378751516342163, ref_abs_avg=15.982851028442383, test_abs_avg=15.98659896850586
production_forward grad[96] vs paper_forward: mean_abs=0.379716157913208, max_abs=5.0, mean_rel=0.12150269001722336, max_rel=541.396484375, norm_rel=0.02094312757253647, ref_abs_avg=18.445327758789062, test_abs_avg=18.443662643432617
production_forward grad[97] vs paper_forward: mean_abs=0.3403608798980713, max_abs=3.875, mean_rel=0.17716923356056213, max_rel=1687.4998779296875, norm_rel=0.019488809630274773, ref_abs_avg=17.762676239013672, test_abs_avg=17.748802185058594
torch_compile_phases_forward vs paper_forward output: mean_abs=0.001664182636886835, max_abs=0.078125
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008895155973732471, max_abs=0.390625, mean_rel=0.07600295543670654, max_rel=108.68462371826172, norm_rel=0.020846083760261536, ref_abs_avg=0.4614817500114441, test_abs_avg=0.4614758789539337
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.5738677978515625, max_abs=64.0, mean_rel=0.1976175308227539, max_rel=918.8793334960938, norm_rel=0.021068139001727104, ref_abs_avg=322.84356689453125, test_abs_avg=322.8900451660156
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.3134126663208008, max_abs=5.7734375, mean_rel=0.1272810846567154, max_rel=20.43284797668457, norm_rel=0.02431408129632473, ref_abs_avg=55.132720947265625, test_abs_avg=55.108253479003906
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.697149395942688, max_abs=11.5, mean_rel=0.2012498676776886, max_rel=4493.884765625, norm_rel=0.025582553818821907, ref_abs_avg=66.75446319580078, test_abs_avg=66.750732421875
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5833380222320557, max_abs=10.0, mean_rel=0.45851194858551025, max_rel=4375.0, norm_rel=0.02412106655538082, ref_abs_avg=66.01354217529297, test_abs_avg=65.99573516845703
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.2383469343185425, max_abs=4.25, mean_rel=0.2133091688156128, max_rel=35.68268966674805, norm_rel=0.026845069602131844, ref_abs_avg=45.96514892578125, test_abs_avg=46.08142852783203
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.4856321811676025, max_abs=10.0, mean_rel=0.18152250349521637, max_rel=1944.906005859375, norm_rel=0.02528211660683155, ref_abs_avg=59.17359924316406, test_abs_avg=59.17272186279297
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3795990943908691, max_abs=8.5, mean_rel=0.44476598501205444, max_rel=4937.5, norm_rel=0.023632854223251343, ref_abs_avg=58.61775207519531, test_abs_avg=58.6096076965332
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0633652210235596, max_abs=4.25, mean_rel=0.15779462456703186, max_rel=25.513790130615234, norm_rel=0.024888236075639725, ref_abs_avg=43.111061096191406, test_abs_avg=43.0841178894043
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.350881576538086, max_abs=10.0, mean_rel=0.17264264822006226, max_rel=1048.29052734375, norm_rel=0.025011910125613213, ref_abs_avg=54.31037521362305, test_abs_avg=54.31089782714844
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.2491497993469238, max_abs=7.875, mean_rel=0.3520992398262024, max_rel=3124.999755859375, norm_rel=0.023226391524076462, ref_abs_avg=53.96864318847656, test_abs_avg=53.973297119140625
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=1.0050115585327148, max_abs=4.375, mean_rel=0.10939829796552658, max_rel=12.674070358276367, norm_rel=0.02533833310008049, ref_abs_avg=39.254661560058594, test_abs_avg=39.24085998535156
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.2446352243423462, max_abs=9.0, mean_rel=0.1713581681251526, max_rel=1256.479736328125, norm_rel=0.024975277483463287, ref_abs_avg=50.15018844604492, test_abs_avg=50.15039825439453
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1489126682281494, max_abs=6.75, mean_rel=0.31081733107566833, max_rel=4718.75, norm_rel=0.02305787056684494, ref_abs_avg=50.054649353027344, test_abs_avg=50.054542541503906
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.9347763061523438, max_abs=3.75, mean_rel=0.10970334708690643, max_rel=9.656829833984375, norm_rel=0.026415003463625908, ref_abs_avg=35.790283203125, test_abs_avg=35.88578796386719
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1572850942611694, max_abs=8.0, mean_rel=0.17852970957756042, max_rel=2407.439697265625, norm_rel=0.024644482880830765, ref_abs_avg=47.24266815185547, test_abs_avg=47.24032974243164
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0668282508850098, max_abs=7.0, mean_rel=0.35285454988479614, max_rel=3374.999755859375, norm_rel=0.023004909977316856, ref_abs_avg=46.56147766113281, test_abs_avg=46.555110931396484
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8544211387634277, max_abs=3.5, mean_rel=0.06742402911186218, max_rel=2.541506767272949, norm_rel=0.02252151444554329, ref_abs_avg=38.23847198486328, test_abs_avg=38.23066711425781
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0903583765029907, max_abs=7.5, mean_rel=0.1617106795310974, max_rel=1264.07177734375, norm_rel=0.024599315598607063, ref_abs_avg=44.58213806152344, test_abs_avg=44.578208923339844
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=1.0117456912994385, max_abs=6.015625, mean_rel=0.32970288395881653, max_rel=3937.499755859375, norm_rel=0.02308150753378868, ref_abs_avg=44.01583480834961, test_abs_avg=44.018524169921875
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.8223998546600342, max_abs=3.5625, mean_rel=0.188347727060318, max_rel=48.10486602783203, norm_rel=0.023259414359927177, ref_abs_avg=35.71661376953125, test_abs_avg=35.77874755859375
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=1.0352740287780762, max_abs=8.0, mean_rel=0.15608519315719604, max_rel=1507.263916015625, norm_rel=0.024438537657260895, ref_abs_avg=42.601844787597656, test_abs_avg=42.597999572753906
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9539009928703308, max_abs=5.5, mean_rel=0.32331737875938416, max_rel=2562.5, norm_rel=0.022764557972550392, ref_abs_avg=42.150115966796875, test_abs_avg=42.14894485473633
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7910661697387695, max_abs=3.5, mean_rel=0.08215297758579254, max_rel=4.216988563537598, norm_rel=0.02492673136293888, ref_abs_avg=33.00856399536133, test_abs_avg=32.98162841796875
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9875204563140869, max_abs=7.0, mean_rel=0.1600649356842041, max_rel=961.80078125, norm_rel=0.024299519136548042, ref_abs_avg=40.873695373535156, test_abs_avg=40.87366485595703
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.912428617477417, max_abs=6.0, mean_rel=0.27590620517730713, max_rel=2265.625, norm_rel=0.022821348160505295, ref_abs_avg=40.18817138671875, test_abs_avg=40.18981170654297
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8933801651000977, max_abs=4.0, mean_rel=0.08969130367040634, max_rel=9.077800750732422, norm_rel=0.025034241378307343, ref_abs_avg=35.86631774902344, test_abs_avg=35.86478042602539
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.1550722122192383, max_abs=8.0, mean_rel=0.17093223333358765, max_rel=1395.1292724609375, norm_rel=0.026308828964829445, ref_abs_avg=44.16026306152344, test_abs_avg=44.16539764404297
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0605067014694214, max_abs=6.75, mean_rel=0.3731542229652405, max_rel=4125.0, norm_rel=0.024736041203141212, ref_abs_avg=43.079345703125, test_abs_avg=43.07876205444336
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.8468480110168457, max_abs=3.125, mean_rel=0.14137952029705048, max_rel=17.664358139038086, norm_rel=0.027006739750504494, ref_abs_avg=31.315065383911133, test_abs_avg=31.268421173095703
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.0439212322235107, max_abs=7.5, mean_rel=0.17920427024364471, max_rel=1856.011962890625, norm_rel=0.026536300778388977, ref_abs_avg=39.48816680908203, test_abs_avg=39.485347747802734
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9711611270904541, max_abs=6.5, mean_rel=0.3400839567184448, max_rel=2781.249755859375, norm_rel=0.024859661236405373, ref_abs_avg=39.22298812866211, test_abs_avg=39.22719192504883
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7410540580749512, max_abs=3.25, mean_rel=0.12303853780031204, max_rel=22.774045944213867, norm_rel=0.024219805374741554, ref_abs_avg=30.834375381469727, test_abs_avg=30.727563858032227
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9594042897224426, max_abs=7.0, mean_rel=0.18040913343429565, max_rel=1409.4178466796875, norm_rel=0.026278862729668617, ref_abs_avg=36.68043518066406, test_abs_avg=36.6810302734375
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8940486907958984, max_abs=5.75, mean_rel=0.33333680033683777, max_rel=2218.75, norm_rel=0.02481353096663952, ref_abs_avg=36.15293884277344, test_abs_avg=36.15980529785156
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.6981481909751892, max_abs=3.0, mean_rel=0.23681426048278809, max_rel=42.79676818847656, norm_rel=0.026168053969740868, ref_abs_avg=27.694332122802734, test_abs_avg=27.724754333496094
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8983882069587708, max_abs=6.5, mean_rel=0.16180497407913208, max_rel=1171.207763671875, norm_rel=0.02585713565349579, ref_abs_avg=34.857269287109375, test_abs_avg=34.856056213378906
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8374899625778198, max_abs=5.75, mean_rel=0.2504824995994568, max_rel=2343.75, norm_rel=0.024473635479807854, ref_abs_avg=34.33097457885742, test_abs_avg=34.32814025878906
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6895947456359863, max_abs=2.75, mean_rel=0.11727134883403778, max_rel=7.060760498046875, norm_rel=0.0254012830555439, ref_abs_avg=27.55855941772461, test_abs_avg=27.56207275390625
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8482368588447571, max_abs=6.0, mean_rel=0.18199999630451202, max_rel=1177.5826416015625, norm_rel=0.025621896609663963, ref_abs_avg=33.21516418457031, test_abs_avg=33.21369934082031
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7903283834457397, max_abs=4.9375, mean_rel=0.2719186544418335, max_rel=2171.875, norm_rel=0.024358579888939857, ref_abs_avg=32.544158935546875, test_abs_avg=32.54110336303711
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6167171001434326, max_abs=2.75, mean_rel=0.20452448725700378, max_rel=60.834190368652344, norm_rel=0.025168471038341522, ref_abs_avg=25.10940170288086, test_abs_avg=25.08131217956543
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7961792945861816, max_abs=5.25, mean_rel=0.1730402559041977, max_rel=2037.073486328125, norm_rel=0.02550020068883896, ref_abs_avg=31.35395050048828, test_abs_avg=31.35397720336914
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7500946521759033, max_abs=5.0, mean_rel=0.2878858745098114, max_rel=2531.25, norm_rel=0.024102265015244484, ref_abs_avg=31.15105438232422, test_abs_avg=31.158607482910156
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5901508331298828, max_abs=2.6015625, mean_rel=0.0712447315454483, max_rel=5.102603435516357, norm_rel=0.021640457212924957, ref_abs_avg=27.935659408569336, test_abs_avg=27.86078643798828
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7653650641441345, max_abs=5.0, mean_rel=0.1595463752746582, max_rel=1863.70751953125, norm_rel=0.02503049746155739, ref_abs_avg=30.651477813720703, test_abs_avg=30.65110206604004
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7132986783981323, max_abs=5.265625, mean_rel=0.2874588966369629, max_rel=2624.999755859375, norm_rel=0.023816771805286407, ref_abs_avg=29.989580154418945, test_abs_avg=29.9931640625
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5674676895141602, max_abs=2.625, mean_rel=0.11070327460765839, max_rel=5.597014904022217, norm_rel=0.023918183520436287, ref_abs_avg=24.404666900634766, test_abs_avg=24.434436798095703
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.7324730157852173, max_abs=6.0, mean_rel=0.15598106384277344, max_rel=1597.9051513671875, norm_rel=0.024946238845586777, ref_abs_avg=29.4409122467041, test_abs_avg=29.43975830078125
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6866230368614197, max_abs=5.0, mean_rel=0.259138286113739, max_rel=1999.9998779296875, norm_rel=0.023478548973798752, ref_abs_avg=29.28753089904785, test_abs_avg=29.284826278686523
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6284205913543701, max_abs=2.75, mean_rel=0.09632930159568787, max_rel=4.975498199462891, norm_rel=0.024700352922081947, ref_abs_avg=25.53633689880371, test_abs_avg=25.610774993896484
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.8305580615997314, max_abs=6.0, mean_rel=0.1675393283367157, max_rel=1365.5023193359375, norm_rel=0.026365483179688454, ref_abs_avg=31.60063362121582, test_abs_avg=31.59972381591797
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7677834033966064, max_abs=5.3125, mean_rel=0.25355619192123413, max_rel=2437.5, norm_rel=0.024589618667960167, ref_abs_avg=31.306232452392578, test_abs_avg=31.30603790283203
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5792654752731323, max_abs=2.5, mean_rel=0.5248175263404846, max_rel=141.72862243652344, norm_rel=0.025191789492964745, ref_abs_avg=23.70248794555664, test_abs_avg=23.7447566986084
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.759456992149353, max_abs=5.5, mean_rel=0.1829357147216797, max_rel=1671.6756591796875, norm_rel=0.026020457968115807, ref_abs_avg=29.288164138793945, test_abs_avg=29.289518356323242
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.7111238837242126, max_abs=4.53125, mean_rel=0.2158697247505188, max_rel=2203.125, norm_rel=0.024418823421001434, ref_abs_avg=29.212854385375977, test_abs_avg=29.21062469482422
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5774726867675781, max_abs=2.0, mean_rel=0.21418407559394836, max_rel=18.485946655273438, norm_rel=0.024505900219082832, ref_abs_avg=23.196456909179688, test_abs_avg=23.221420288085938
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.7143827676773071, max_abs=5.5, mean_rel=0.16079509258270264, max_rel=1680.1685791015625, norm_rel=0.025280801579356194, ref_abs_avg=28.300537109375, test_abs_avg=28.300403594970703
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6587520837783813, max_abs=4.5, mean_rel=0.26109132170677185, max_rel=2093.75, norm_rel=0.023509053513407707, ref_abs_avg=27.994022369384766, test_abs_avg=27.994853973388672
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5099191665649414, max_abs=2.0625, mean_rel=0.0832168385386467, max_rel=5.900230407714844, norm_rel=0.023321358487010002, ref_abs_avg=21.66891860961914, test_abs_avg=21.683759689331055
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6635971069335938, max_abs=5.0, mean_rel=0.14796137809753418, max_rel=730.2619018554688, norm_rel=0.02488815225660801, ref_abs_avg=26.708284378051758, test_abs_avg=26.70900535583496
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.6141360402107239, max_abs=4.5, mean_rel=0.24650691449642181, max_rel=2156.25, norm_rel=0.023411374539136887, ref_abs_avg=26.261947631835938, test_abs_avg=26.25849151611328
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4955185651779175, max_abs=2.0, mean_rel=0.09041163325309753, max_rel=5.786173343658447, norm_rel=0.023678164929151535, ref_abs_avg=20.74570083618164, test_abs_avg=20.716983795166016
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.6272420883178711, max_abs=5.0, mean_rel=0.1527252197265625, max_rel=852.211181640625, norm_rel=0.024514710530638695, ref_abs_avg=25.62554931640625, test_abs_avg=25.627395629882812
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5877746939659119, max_abs=4.0, mean_rel=0.2177674025297165, max_rel=1812.4998779296875, norm_rel=0.023250319063663483, ref_abs_avg=25.3399715423584, test_abs_avg=25.341625213623047
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.46480274200439453, max_abs=2.25, mean_rel=0.08175691962242126, max_rel=6.612289905548096, norm_rel=0.02295769937336445, ref_abs_avg=20.803085327148438, test_abs_avg=20.849281311035156
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5949811935424805, max_abs=5.0, mean_rel=0.14978721737861633, max_rel=1035.316162109375, norm_rel=0.02415570244193077, ref_abs_avg=24.631240844726562, test_abs_avg=24.63323211669922
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5469323396682739, max_abs=4.0, mean_rel=0.26955854892730713, max_rel=2749.999755859375, norm_rel=0.02280334383249283, ref_abs_avg=23.98655128479004, test_abs_avg=23.992748260498047
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.41242289543151855, max_abs=1.75, mean_rel=0.18150968849658966, max_rel=20.151569366455078, norm_rel=0.019863111898303032, ref_abs_avg=20.167598724365234, test_abs_avg=20.177335739135742
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.564645528793335, max_abs=5.0, mean_rel=0.15195442736148834, max_rel=1256.989013671875, norm_rel=0.023836921900510788, ref_abs_avg=23.687047958374023, test_abs_avg=23.68950653076172
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5197216272354126, max_abs=4.5, mean_rel=0.2517145276069641, max_rel=1867.1873779296875, norm_rel=0.02185676246881485, ref_abs_avg=23.70197105407715, test_abs_avg=23.701961517333984
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.4060487747192383, max_abs=1.5, mean_rel=0.07298875600099564, max_rel=3.4415955543518066, norm_rel=0.02006685733795166, ref_abs_avg=20.75304412841797, test_abs_avg=20.747234344482422
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5371957421302795, max_abs=4.0, mean_rel=0.1450318843126297, max_rel=1111.250244140625, norm_rel=0.02351512759923935, ref_abs_avg=22.899269104003906, test_abs_avg=22.89984703063965
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.4951682686805725, max_abs=3.5, mean_rel=0.18971791863441467, max_rel=1546.8748779296875, norm_rel=0.021735329180955887, ref_abs_avg=22.77716064453125, test_abs_avg=22.775249481201172
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4788074493408203, max_abs=1.77734375, mean_rel=0.09172351658344269, max_rel=8.41847038269043, norm_rel=0.02222622185945511, ref_abs_avg=21.95119857788086, test_abs_avg=21.92438316345215
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.6117326021194458, max_abs=5.0, mean_rel=0.15094566345214844, max_rel=1152.6944580078125, norm_rel=0.02505725994706154, ref_abs_avg=24.445791244506836, test_abs_avg=24.44619369506836
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.570334792137146, max_abs=4.125, mean_rel=0.24305883049964905, max_rel=1562.4998779296875, norm_rel=0.023696154356002808, ref_abs_avg=24.05337142944336, test_abs_avg=24.056575775146484
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.44924962520599365, max_abs=1.625, mean_rel=0.190485417842865, max_rel=19.58381462097168, norm_rel=0.02351016364991665, ref_abs_avg=18.93608283996582, test_abs_avg=18.9217529296875
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5628231167793274, max_abs=5.0, mean_rel=0.15171611309051514, max_rel=953.642578125, norm_rel=0.024379119277000427, ref_abs_avg=23.112775802612305, test_abs_avg=23.11148452758789
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.5171797275543213, max_abs=5.0, mean_rel=0.2252490073442459, max_rel=1406.2498779296875, norm_rel=0.02292480506002903, ref_abs_avg=22.554485321044922, test_abs_avg=22.555879592895508
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.4261493682861328, max_abs=1.578125, mean_rel=0.07647412270307541, max_rel=8.341411590576172, norm_rel=0.023386674001812935, ref_abs_avg=18.338659286499023, test_abs_avg=18.348487854003906
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.5243678092956543, max_abs=5.0, mean_rel=0.15819713473320007, max_rel=1013.6649169921875, norm_rel=0.023789962753653526, ref_abs_avg=22.069169998168945, test_abs_avg=22.06890106201172
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.47912469506263733, max_abs=4.5, mean_rel=0.1889776587486267, max_rel=1218.75, norm_rel=0.022395115345716476, ref_abs_avg=21.489160537719727, test_abs_avg=21.492332458496094
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.37639617919921875, max_abs=1.375, mean_rel=0.08295631408691406, max_rel=7.868322849273682, norm_rel=0.021218551322817802, ref_abs_avg=17.414867401123047, test_abs_avg=17.394481658935547
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.48292797803878784, max_abs=4.25, mean_rel=0.15313056111335754, max_rel=1039.0548095703125, norm_rel=0.023085111752152443, ref_abs_avg=21.005718231201172, test_abs_avg=21.004484176635742
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.4393608570098877, max_abs=4.25, mean_rel=0.20086133480072021, max_rel=2093.75, norm_rel=0.021170398220419884, ref_abs_avg=20.810348510742188, test_abs_avg=20.815624237060547
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.3654823303222656, max_abs=1.375, mean_rel=0.07704862952232361, max_rel=3.715298891067505, norm_rel=0.021734479814767838, ref_abs_avg=16.9937801361084, test_abs_avg=16.997678756713867
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4524044394493103, max_abs=4.0625, mean_rel=0.1412837654352188, max_rel=565.3453369140625, norm_rel=0.02252865582704544, ref_abs_avg=20.199729919433594, test_abs_avg=20.197410583496094
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.41051504015922546, max_abs=4.25, mean_rel=0.17928236722946167, max_rel=1812.4998779296875, norm_rel=0.020243868231773376, ref_abs_avg=20.310237884521484, test_abs_avg=20.309139251708984
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.3290224075317383, max_abs=1.375, mean_rel=0.07509036362171173, max_rel=3.7178778648376465, norm_rel=0.02132290229201317, ref_abs_avg=15.86269474029541, test_abs_avg=15.88119125366211
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.4221813678741455, max_abs=4.0, mean_rel=0.13677367568016052, max_rel=568.9105834960938, norm_rel=0.02210956998169422, ref_abs_avg=19.267946243286133, test_abs_avg=19.267990112304688
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.38741710782051086, max_abs=3.3125, mean_rel=0.1829068809747696, max_rel=941.4061889648438, norm_rel=0.02012505941092968, ref_abs_avg=19.248991012573242, test_abs_avg=19.246124267578125
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.31882643699645996, max_abs=1.25, mean_rel=0.1028178334236145, max_rel=11.25578784942627, norm_rel=0.019857004284858704, ref_abs_avg=16.10123062133789, test_abs_avg=16.117359161376953
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.4049803614616394, max_abs=4.5, mean_rel=0.13895457983016968, max_rel=1106.7957763671875, norm_rel=0.02160104364156723, ref_abs_avg=18.987159729003906, test_abs_avg=18.986572265625
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3657037317752838, max_abs=5.0, mean_rel=0.16907435655593872, max_rel=999.9999389648438, norm_rel=0.019733011722564697, ref_abs_avg=18.775047302246094, test_abs_avg=18.782047271728516
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.30526864528656006, max_abs=1.25, mean_rel=0.14586345851421356, max_rel=33.14109420776367, norm_rel=0.019495705142617226, ref_abs_avg=15.982851028442383, test_abs_avg=15.9898681640625
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3801865577697754, max_abs=4.625, mean_rel=0.12238931655883789, max_rel=640.5354614257812, norm_rel=0.02096184529364109, ref_abs_avg=18.445327758789062, test_abs_avg=18.443544387817383
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.343178927898407, max_abs=3.875, mean_rel=0.18319815397262573, max_rel=1812.4998779296875, norm_rel=0.01973666250705719, ref_abs_avg=17.762676239013672, test_abs_avg=17.745864868164062
production_forward2 vs paper_forward output: mean_abs=0.0016604738775640726, max_abs=0.0625
production_forward2 grad[0] vs paper_forward: mean_abs=0.0088798888027668, max_abs=0.5, mean_rel=0.07588572800159454, max_rel=107.97050476074219, norm_rel=0.020811481401324272, ref_abs_avg=0.4614817500114441, test_abs_avg=0.4614693522453308
production_forward2 grad[1] vs paper_forward: mean_abs=7.554663181304932, max_abs=64.0, mean_rel=0.23191863298416138, max_rel=1378.7423095703125, norm_rel=0.02104104310274124, ref_abs_avg=322.84356689453125, test_abs_avg=322.89569091796875
production_forward2 grad[2] vs paper_forward: mean_abs=1.3279047012329102, max_abs=5.38671875, mean_rel=0.11868170648813248, max_rel=10.498644828796387, norm_rel=0.024807533249258995, ref_abs_avg=55.132720947265625, test_abs_avg=55.137664794921875
production_forward2 grad[3] vs paper_forward: mean_abs=1.697638988494873, max_abs=11.0, mean_rel=0.19334927201271057, max_rel=2040.23291015625, norm_rel=0.025577377527952194, ref_abs_avg=66.75446319580078, test_abs_avg=66.74716186523438
production_forward2 grad[4] vs paper_forward: mean_abs=1.580031394958496, max_abs=10.171875, mean_rel=0.4402734041213989, max_rel=5624.99951171875, norm_rel=0.02407102659344673, ref_abs_avg=66.01354217529297, test_abs_avg=65.99214935302734
production_forward2 grad[5] vs paper_forward: mean_abs=1.2370494604110718, max_abs=4.5, mean_rel=0.28509604930877686, max_rel=87.46709442138672, norm_rel=0.02712268941104412, ref_abs_avg=45.96514892578125, test_abs_avg=46.04334259033203
production_forward2 grad[6] vs paper_forward: mean_abs=1.4827029705047607, max_abs=11.0, mean_rel=0.18058231472969055, max_rel=2354.32177734375, norm_rel=0.025219229981303215, ref_abs_avg=59.17359924316406, test_abs_avg=59.168792724609375
production_forward2 grad[7] vs paper_forward: mean_abs=1.3746318817138672, max_abs=9.0, mean_rel=0.4386194944381714, max_rel=5437.49951171875, norm_rel=0.023550810292363167, ref_abs_avg=58.61775207519531, test_abs_avg=58.61357879638672
production_forward2 grad[8] vs paper_forward: mean_abs=1.0773588418960571, max_abs=4.0, mean_rel=0.13860578835010529, max_rel=14.171634674072266, norm_rel=0.02526560053229332, ref_abs_avg=43.111061096191406, test_abs_avg=43.06140899658203
production_forward2 grad[9] vs paper_forward: mean_abs=1.350567102432251, max_abs=9.0, mean_rel=0.17396092414855957, max_rel=1780.8270263671875, norm_rel=0.024987095966935158, ref_abs_avg=54.31037521362305, test_abs_avg=54.30968475341797
production_forward2 grad[10] vs paper_forward: mean_abs=1.249069094657898, max_abs=7.5, mean_rel=0.36260542273521423, max_rel=3874.999755859375, norm_rel=0.023223750293254852, ref_abs_avg=53.96864318847656, test_abs_avg=53.972755432128906
production_forward2 grad[11] vs paper_forward: mean_abs=0.9838371276855469, max_abs=3.796875, mean_rel=0.11629163473844528, max_rel=10.487398147583008, norm_rel=0.024931730702519417, ref_abs_avg=39.254661560058594, test_abs_avg=39.2281608581543
production_forward2 grad[12] vs paper_forward: mean_abs=1.2414103746414185, max_abs=9.0, mean_rel=0.16614656150341034, max_rel=1109.45458984375, norm_rel=0.024898139759898186, ref_abs_avg=50.15018844604492, test_abs_avg=50.15031433105469
production_forward2 grad[13] vs paper_forward: mean_abs=1.148836374282837, max_abs=7.75, mean_rel=0.30782434344291687, max_rel=4593.75, norm_rel=0.02308143675327301, ref_abs_avg=50.054649353027344, test_abs_avg=50.05313491821289
production_forward2 grad[14] vs paper_forward: mean_abs=0.9328765869140625, max_abs=3.5, mean_rel=0.10744846612215042, max_rel=9.94694995880127, norm_rel=0.026218609884381294, ref_abs_avg=35.790283203125, test_abs_avg=35.863670349121094
production_forward2 grad[15] vs paper_forward: mean_abs=1.1553997993469238, max_abs=8.0, mean_rel=0.17965254187583923, max_rel=2016.34814453125, norm_rel=0.024611128494143486, ref_abs_avg=47.24266815185547, test_abs_avg=47.239845275878906
production_forward2 grad[16] vs paper_forward: mean_abs=1.066666603088379, max_abs=6.03125, mean_rel=0.3964001536369324, max_rel=4937.5, norm_rel=0.023000376299023628, ref_abs_avg=46.56147766113281, test_abs_avg=46.558990478515625
production_forward2 grad[17] vs paper_forward: mean_abs=0.8214287757873535, max_abs=3.5, mean_rel=0.07556536793708801, max_rel=9.130599021911621, norm_rel=0.02177017740905285, ref_abs_avg=38.23847198486328, test_abs_avg=38.17902374267578
production_forward2 grad[18] vs paper_forward: mean_abs=1.0875331163406372, max_abs=7.0, mean_rel=0.16067807376384735, max_rel=1368.6629638671875, norm_rel=0.024557964876294136, ref_abs_avg=44.58213806152344, test_abs_avg=44.58049011230469
production_forward2 grad[19] vs paper_forward: mean_abs=1.008935809135437, max_abs=6.5, mean_rel=0.31460994482040405, max_rel=2718.749755859375, norm_rel=0.02303541637957096, ref_abs_avg=44.01583480834961, test_abs_avg=44.01871109008789
production_forward2 grad[20] vs paper_forward: mean_abs=0.8307251930236816, max_abs=3.25, mean_rel=0.18937399983406067, max_rel=43.906917572021484, norm_rel=0.023353638127446175, ref_abs_avg=35.71661376953125, test_abs_avg=35.77443313598633
production_forward2 grad[21] vs paper_forward: mean_abs=1.0328700542449951, max_abs=7.0, mean_rel=0.1565156728029251, max_rel=1694.5509033203125, norm_rel=0.024378923699259758, ref_abs_avg=42.601844787597656, test_abs_avg=42.59746551513672
production_forward2 grad[22] vs paper_forward: mean_abs=0.9527462720870972, max_abs=5.9375, mean_rel=0.35408496856689453, max_rel=2812.499755859375, norm_rel=0.022714953869581223, ref_abs_avg=42.150115966796875, test_abs_avg=42.148193359375
production_forward2 grad[23] vs paper_forward: mean_abs=0.7836246490478516, max_abs=3.0, mean_rel=0.07988063991069794, max_rel=4.48438024520874, norm_rel=0.024678537622094154, ref_abs_avg=33.00856399536133, test_abs_avg=32.961612701416016
production_forward2 grad[24] vs paper_forward: mean_abs=0.9844465255737305, max_abs=7.0, mean_rel=0.1544341892004013, max_rel=1102.2890625, norm_rel=0.02421550452709198, ref_abs_avg=40.873695373535156, test_abs_avg=40.87018585205078
production_forward2 grad[25] vs paper_forward: mean_abs=0.9082688689231873, max_abs=5.5, mean_rel=0.2774503827095032, max_rel=2312.5, norm_rel=0.022729957476258278, ref_abs_avg=40.18817138671875, test_abs_avg=40.185340881347656
production_forward2 grad[26] vs paper_forward: mean_abs=0.8579769134521484, max_abs=4.0, mean_rel=0.0863533690571785, max_rel=8.067401885986328, norm_rel=0.024448798969388008, ref_abs_avg=35.86631774902344, test_abs_avg=35.8425407409668
production_forward2 grad[27] vs paper_forward: mean_abs=1.1527091264724731, max_abs=7.5, mean_rel=0.17366838455200195, max_rel=1755.5413818359375, norm_rel=0.026256639510393143, ref_abs_avg=44.16026306152344, test_abs_avg=44.162261962890625
production_forward2 grad[28] vs paper_forward: mean_abs=1.0545854568481445, max_abs=6.875, mean_rel=0.37409472465515137, max_rel=4062.499755859375, norm_rel=0.024604499340057373, ref_abs_avg=43.079345703125, test_abs_avg=43.07703399658203
production_forward2 grad[29] vs paper_forward: mean_abs=0.8202404975891113, max_abs=3.1328125, mean_rel=0.1750773936510086, max_rel=28.56215476989746, norm_rel=0.026081275194883347, ref_abs_avg=31.315065383911133, test_abs_avg=31.25393295288086
production_forward2 grad[30] vs paper_forward: mean_abs=1.0416982173919678, max_abs=7.0, mean_rel=0.17490088939666748, max_rel=1586.143310546875, norm_rel=0.026498088613152504, ref_abs_avg=39.48816680908203, test_abs_avg=39.48469161987305
production_forward2 grad[31] vs paper_forward: mean_abs=0.9713062047958374, max_abs=6.5, mean_rel=0.3633987307548523, max_rel=3718.749755859375, norm_rel=0.024852637201547623, ref_abs_avg=39.22298812866211, test_abs_avg=39.22527313232422
production_forward2 grad[32] vs paper_forward: mean_abs=0.7112917900085449, max_abs=3.56640625, mean_rel=0.10880318284034729, max_rel=18.232248306274414, norm_rel=0.023571019992232323, ref_abs_avg=30.834375381469727, test_abs_avg=30.743410110473633
production_forward2 grad[33] vs paper_forward: mean_abs=0.9575481414794922, max_abs=6.625, mean_rel=0.1787920892238617, max_rel=1179.7923583984375, norm_rel=0.02621849998831749, ref_abs_avg=36.68043518066406, test_abs_avg=36.67936706542969
production_forward2 grad[34] vs paper_forward: mean_abs=0.8929330706596375, max_abs=6.25, mean_rel=0.3442670702934265, max_rel=2671.874755859375, norm_rel=0.024762991815805435, ref_abs_avg=36.15293884277344, test_abs_avg=36.15730667114258
production_forward2 grad[35] vs paper_forward: mean_abs=0.7063095569610596, max_abs=2.75, mean_rel=0.180490642786026, max_rel=28.925939559936523, norm_rel=0.026428041979670525, ref_abs_avg=27.694332122802734, test_abs_avg=27.696561813354492
production_forward2 grad[36] vs paper_forward: mean_abs=0.8981336951255798, max_abs=6.5, mean_rel=0.16766944527626038, max_rel=1443.7763671875, norm_rel=0.0258472952991724, ref_abs_avg=34.857269287109375, test_abs_avg=34.85558319091797
production_forward2 grad[37] vs paper_forward: mean_abs=0.838374137878418, max_abs=5.25, mean_rel=0.26337558031082153, max_rel=2624.999755859375, norm_rel=0.024483520537614822, ref_abs_avg=34.33097457885742, test_abs_avg=34.330238342285156
production_forward2 grad[38] vs paper_forward: mean_abs=0.6940901279449463, max_abs=2.75, mean_rel=0.12866900861263275, max_rel=7.2051520347595215, norm_rel=0.025182150304317474, ref_abs_avg=27.55855941772461, test_abs_avg=27.573787689208984
production_forward2 grad[39] vs paper_forward: mean_abs=0.8469246625900269, max_abs=6.0, mean_rel=0.1771332323551178, max_rel=1158.8876953125, norm_rel=0.02559206634759903, ref_abs_avg=33.21516418457031, test_abs_avg=33.213111877441406
production_forward2 grad[40] vs paper_forward: mean_abs=0.7899947166442871, max_abs=5.0, mean_rel=0.26555949449539185, max_rel=2250.0, norm_rel=0.024344749748706818, ref_abs_avg=32.544158935546875, test_abs_avg=32.54168701171875
production_forward2 grad[41] vs paper_forward: mean_abs=0.6284644603729248, max_abs=2.5, mean_rel=0.2551092207431793, max_rel=62.6199836730957, norm_rel=0.025351669639348984, ref_abs_avg=25.10940170288086, test_abs_avg=25.090736389160156
production_forward2 grad[42] vs paper_forward: mean_abs=0.7949822545051575, max_abs=5.046875, mean_rel=0.17421582341194153, max_rel=1974.081298828125, norm_rel=0.02546011656522751, ref_abs_avg=31.35395050048828, test_abs_avg=31.353618621826172
production_forward2 grad[43] vs paper_forward: mean_abs=0.7495750188827515, max_abs=5.0, mean_rel=0.3013453185558319, max_rel=2187.5, norm_rel=0.024093441665172577, ref_abs_avg=31.15105438232422, test_abs_avg=31.15731430053711
production_forward2 grad[44] vs paper_forward: mean_abs=0.5864744186401367, max_abs=2.6796875, mean_rel=0.07271943241357803, max_rel=6.371359348297119, norm_rel=0.02171102911233902, ref_abs_avg=27.935659408569336, test_abs_avg=27.87771224975586
production_forward2 grad[45] vs paper_forward: mean_abs=0.7644548416137695, max_abs=5.25, mean_rel=0.1614854782819748, max_rel=1991.754638671875, norm_rel=0.025013091042637825, ref_abs_avg=30.651477813720703, test_abs_avg=30.650541305541992
production_forward2 grad[46] vs paper_forward: mean_abs=0.7111580967903137, max_abs=5.03125, mean_rel=0.2905467748641968, max_rel=2531.25, norm_rel=0.02374151162803173, ref_abs_avg=29.989580154418945, test_abs_avg=29.991050720214844
production_forward2 grad[47] vs paper_forward: mean_abs=0.5662345886230469, max_abs=2.125, mean_rel=0.10954081267118454, max_rel=4.077665328979492, norm_rel=0.023544752970337868, ref_abs_avg=24.404666900634766, test_abs_avg=24.41374397277832
production_forward2 grad[48] vs paper_forward: mean_abs=0.7319099307060242, max_abs=5.203125, mean_rel=0.15417206287384033, max_rel=1400.167724609375, norm_rel=0.02492038905620575, ref_abs_avg=29.4409122467041, test_abs_avg=29.43930435180664
production_forward2 grad[49] vs paper_forward: mean_abs=0.6848595142364502, max_abs=4.75, mean_rel=0.26905685663223267, max_rel=1906.2498779296875, norm_rel=0.02341579832136631, ref_abs_avg=29.28753089904785, test_abs_avg=29.284137725830078
production_forward2 grad[50] vs paper_forward: mean_abs=0.6223430633544922, max_abs=2.25, mean_rel=0.10737445950508118, max_rel=11.956219673156738, norm_rel=0.024272166192531586, ref_abs_avg=25.53633689880371, test_abs_avg=25.59945297241211
production_forward2 grad[51] vs paper_forward: mean_abs=0.8290277719497681, max_abs=6.0, mean_rel=0.16690492630004883, max_rel=1139.8143310546875, norm_rel=0.026308367028832436, ref_abs_avg=31.60063362121582, test_abs_avg=31.59817123413086
production_forward2 grad[52] vs paper_forward: mean_abs=0.7684682607650757, max_abs=5.0, mean_rel=0.25893136858940125, max_rel=2187.5, norm_rel=0.024630511179566383, ref_abs_avg=31.306232452392578, test_abs_avg=31.303936004638672
production_forward2 grad[53] vs paper_forward: mean_abs=0.5913628339767456, max_abs=2.625, mean_rel=0.5760039687156677, max_rel=228.8236846923828, norm_rel=0.025554846972227097, ref_abs_avg=23.70248794555664, test_abs_avg=23.734893798828125
production_forward2 grad[54] vs paper_forward: mean_abs=0.758595883846283, max_abs=6.0, mean_rel=0.18134592473506927, max_rel=1785.0032958984375, norm_rel=0.02598322182893753, ref_abs_avg=29.288164138793945, test_abs_avg=29.288463592529297
production_forward2 grad[55] vs paper_forward: mean_abs=0.7109546661376953, max_abs=4.75, mean_rel=0.21427422761917114, max_rel=2187.5, norm_rel=0.024385616183280945, ref_abs_avg=29.212854385375977, test_abs_avg=29.20980453491211
production_forward2 grad[56] vs paper_forward: mean_abs=0.5631875991821289, max_abs=2.5, mean_rel=0.1796858310699463, max_rel=13.337852478027344, norm_rel=0.023798933252692223, ref_abs_avg=23.196456909179688, test_abs_avg=23.200735092163086
production_forward2 grad[57] vs paper_forward: mean_abs=0.7132585048675537, max_abs=5.0, mean_rel=0.16287238895893097, max_rel=1812.2294921875, norm_rel=0.02524084784090519, ref_abs_avg=28.300537109375, test_abs_avg=28.298917770385742
production_forward2 grad[58] vs paper_forward: mean_abs=0.6625930070877075, max_abs=4.5, mean_rel=0.27641475200653076, max_rel=1999.9998779296875, norm_rel=0.023634467273950577, ref_abs_avg=27.994022369384766, test_abs_avg=27.994163513183594
production_forward2 grad[59] vs paper_forward: mean_abs=0.5316524505615234, max_abs=1.875, mean_rel=0.08182326704263687, max_rel=3.029862403869629, norm_rel=0.024254633113741875, ref_abs_avg=21.66891860961914, test_abs_avg=21.675418853759766
production_forward2 grad[60] vs paper_forward: mean_abs=0.6618096828460693, max_abs=5.0, mean_rel=0.14878475666046143, max_rel=739.5777587890625, norm_rel=0.024820629507303238, ref_abs_avg=26.708284378051758, test_abs_avg=26.70806884765625
production_forward2 grad[61] vs paper_forward: mean_abs=0.6149507164955139, max_abs=4.0, mean_rel=0.2451438307762146, max_rel=1578.1248779296875, norm_rel=0.02345786802470684, ref_abs_avg=26.261947631835938, test_abs_avg=26.25720977783203
production_forward2 grad[62] vs paper_forward: mean_abs=0.4916808605194092, max_abs=1.875, mean_rel=0.08188154548406601, max_rel=8.566023826599121, norm_rel=0.023795614019036293, ref_abs_avg=20.74570083618164, test_abs_avg=20.71826934814453
production_forward2 grad[63] vs paper_forward: mean_abs=0.6256554126739502, max_abs=4.5, mean_rel=0.15415313839912415, max_rel=881.6644897460938, norm_rel=0.024459663778543472, ref_abs_avg=25.62554931640625, test_abs_avg=25.62656021118164
production_forward2 grad[64] vs paper_forward: mean_abs=0.5870065689086914, max_abs=4.5, mean_rel=0.2241874635219574, max_rel=1531.2498779296875, norm_rel=0.023218637332320213, ref_abs_avg=25.3399715423584, test_abs_avg=25.342880249023438
production_forward2 grad[65] vs paper_forward: mean_abs=0.4586460590362549, max_abs=1.875, mean_rel=0.0736156553030014, max_rel=6.849857330322266, norm_rel=0.02278207242488861, ref_abs_avg=20.803085327148438, test_abs_avg=20.837005615234375
production_forward2 grad[66] vs paper_forward: mean_abs=0.5943063497543335, max_abs=5.25, mean_rel=0.15061575174331665, max_rel=1452.57421875, norm_rel=0.024130389094352722, ref_abs_avg=24.631240844726562, test_abs_avg=24.632488250732422
production_forward2 grad[67] vs paper_forward: mean_abs=0.5452627539634705, max_abs=4.0, mean_rel=0.25920435786247253, max_rel=2140.625, norm_rel=0.022721512243151665, ref_abs_avg=23.98655128479004, test_abs_avg=23.99323272705078
production_forward2 grad[68] vs paper_forward: mean_abs=0.386826753616333, max_abs=1.5, mean_rel=0.14882490038871765, max_rel=15.174235343933105, norm_rel=0.01914471760392189, ref_abs_avg=20.167598724365234, test_abs_avg=20.160877227783203
production_forward2 grad[69] vs paper_forward: mean_abs=0.5632179379463196, max_abs=5.0, mean_rel=0.15167798101902008, max_rel=1225.759521484375, norm_rel=0.023786764591932297, ref_abs_avg=23.687047958374023, test_abs_avg=23.6885986328125
production_forward2 grad[70] vs paper_forward: mean_abs=0.5201395750045776, max_abs=4.0, mean_rel=0.2491365373134613, max_rel=2093.75, norm_rel=0.02184951864182949, ref_abs_avg=23.70197105407715, test_abs_avg=23.69735336303711
production_forward2 grad[71] vs paper_forward: mean_abs=0.4188213348388672, max_abs=1.578125, mean_rel=0.07177463918924332, max_rel=3.7138466835021973, norm_rel=0.02039237879216671, ref_abs_avg=20.75304412841797, test_abs_avg=20.76192283630371
production_forward2 grad[72] vs paper_forward: mean_abs=0.5371862649917603, max_abs=4.375, mean_rel=0.14694920182228088, max_rel=991.1192626953125, norm_rel=0.023494599387049675, ref_abs_avg=22.899269104003906, test_abs_avg=22.89933204650879
production_forward2 grad[73] vs paper_forward: mean_abs=0.49602317810058594, max_abs=3.5, mean_rel=0.2057078331708908, max_rel=2125.0, norm_rel=0.02177872322499752, ref_abs_avg=22.77716064453125, test_abs_avg=22.776905059814453
production_forward2 grad[74] vs paper_forward: mean_abs=0.48339366912841797, max_abs=1.8125, mean_rel=0.10452473908662796, max_rel=10.334050178527832, norm_rel=0.021787600591778755, ref_abs_avg=21.95119857788086, test_abs_avg=21.91107940673828
production_forward2 grad[75] vs paper_forward: mean_abs=0.6102110147476196, max_abs=5.0, mean_rel=0.14993399381637573, max_rel=972.138916015625, norm_rel=0.024992244318127632, ref_abs_avg=24.445791244506836, test_abs_avg=24.4447021484375
production_forward2 grad[76] vs paper_forward: mean_abs=0.5700905919075012, max_abs=4.0, mean_rel=0.2409410923719406, max_rel=1374.9998779296875, norm_rel=0.02369149960577488, ref_abs_avg=24.05337142944336, test_abs_avg=24.05624771118164
production_forward2 grad[77] vs paper_forward: mean_abs=0.45473524928092957, max_abs=1.75, mean_rel=0.22559651732444763, max_rel=41.99141311645508, norm_rel=0.023905040696263313, ref_abs_avg=18.93608283996582, test_abs_avg=18.916393280029297
production_forward2 grad[78] vs paper_forward: mean_abs=0.5630795359611511, max_abs=5.5, mean_rel=0.15444402396678925, max_rel=1035.262939453125, norm_rel=0.024379190057516098, ref_abs_avg=23.112775802612305, test_abs_avg=23.112483978271484
production_forward2 grad[79] vs paper_forward: mean_abs=0.5152440071105957, max_abs=4.25, mean_rel=0.22685527801513672, max_rel=1546.8748779296875, norm_rel=0.022803733125329018, ref_abs_avg=22.554485321044922, test_abs_avg=22.55443572998047
production_forward2 grad[80] vs paper_forward: mean_abs=0.42936158180236816, max_abs=1.75, mean_rel=0.08104132115840912, max_rel=8.543383598327637, norm_rel=0.023759102448821068, ref_abs_avg=18.338659286499023, test_abs_avg=18.33692169189453
production_forward2 grad[81] vs paper_forward: mean_abs=0.5241441130638123, max_abs=5.0, mean_rel=0.15851372480392456, max_rel=1080.5218505859375, norm_rel=0.02377755008637905, ref_abs_avg=22.069169998168945, test_abs_avg=22.06859016418457
production_forward2 grad[82] vs paper_forward: mean_abs=0.4784340262413025, max_abs=4.5, mean_rel=0.19967257976531982, max_rel=1218.75, norm_rel=0.022321034222841263, ref_abs_avg=21.489160537719727, test_abs_avg=21.49254608154297
production_forward2 grad[83] vs paper_forward: mean_abs=0.37241697311401367, max_abs=1.5546875, mean_rel=0.09921121597290039, max_rel=12.291345596313477, norm_rel=0.021111449226737022, ref_abs_avg=17.414867401123047, test_abs_avg=17.388713836669922
production_forward2 grad[84] vs paper_forward: mean_abs=0.48255085945129395, max_abs=4.5, mean_rel=0.15041273832321167, max_rel=1094.0521240234375, norm_rel=0.02306903526186943, ref_abs_avg=21.005718231201172, test_abs_avg=21.00478744506836
production_forward2 grad[85] vs paper_forward: mean_abs=0.4374849498271942, max_abs=4.5, mean_rel=0.1957121640443802, max_rel=2234.375, norm_rel=0.021094663068652153, ref_abs_avg=20.810348510742188, test_abs_avg=20.812055587768555
production_forward2 grad[86] vs paper_forward: mean_abs=0.3555610179901123, max_abs=1.5, mean_rel=0.06669815629720688, max_rel=2.1678807735443115, norm_rel=0.021344982087612152, ref_abs_avg=16.9937801361084, test_abs_avg=17.014677047729492
production_forward2 grad[87] vs paper_forward: mean_abs=0.4519036114215851, max_abs=5.0, mean_rel=0.13935483992099762, max_rel=847.2545166015625, norm_rel=0.02252434752881527, ref_abs_avg=20.199729919433594, test_abs_avg=20.197402954101562
production_forward2 grad[88] vs paper_forward: mean_abs=0.4129082262516022, max_abs=3.875, mean_rel=0.17914624512195587, max_rel=2062.5, norm_rel=0.020370004698634148, ref_abs_avg=20.310237884521484, test_abs_avg=20.31146240234375
production_forward2 grad[89] vs paper_forward: mean_abs=0.3255951404571533, max_abs=1.28125, mean_rel=0.07217966765165329, max_rel=2.7060892581939697, norm_rel=0.020971087738871574, ref_abs_avg=15.86269474029541, test_abs_avg=15.861927032470703
production_forward2 grad[90] vs paper_forward: mean_abs=0.4210694432258606, max_abs=4.5, mean_rel=0.13622137904167175, max_rel=556.7186889648438, norm_rel=0.022051479667425156, ref_abs_avg=19.267946243286133, test_abs_avg=19.267494201660156
production_forward2 grad[91] vs paper_forward: mean_abs=0.384907603263855, max_abs=3.75, mean_rel=0.1848468780517578, max_rel=1125.0, norm_rel=0.020005803555250168, ref_abs_avg=19.248991012573242, test_abs_avg=19.24513816833496
production_forward2 grad[92] vs paper_forward: mean_abs=0.327038049697876, max_abs=1.25, mean_rel=0.12022744864225388, max_rel=10.688813209533691, norm_rel=0.020265603438019753, ref_abs_avg=16.10123062133789, test_abs_avg=16.119380950927734
production_forward2 grad[93] vs paper_forward: mean_abs=0.4042969346046448, max_abs=4.8125, mean_rel=0.1369512975215912, max_rel=1024.7923583984375, norm_rel=0.021560490131378174, ref_abs_avg=18.987159729003906, test_abs_avg=18.98787498474121
production_forward2 grad[94] vs paper_forward: mean_abs=0.3671689033508301, max_abs=4.75, mean_rel=0.17011265456676483, max_rel=828.1249389648438, norm_rel=0.01988394744694233, ref_abs_avg=18.775047302246094, test_abs_avg=18.783092498779297
production_forward2 grad[95] vs paper_forward: mean_abs=0.2976747751235962, max_abs=1.28125, mean_rel=0.24105316400527954, max_rel=49.77203369140625, norm_rel=0.019378751516342163, ref_abs_avg=15.982851028442383, test_abs_avg=15.98659896850586
production_forward2 grad[96] vs paper_forward: mean_abs=0.379716157913208, max_abs=5.0, mean_rel=0.12150269001722336, max_rel=541.396484375, norm_rel=0.02094312757253647, ref_abs_avg=18.445327758789062, test_abs_avg=18.443662643432617
production_forward2 grad[97] vs paper_forward: mean_abs=0.3403608798980713, max_abs=3.875, mean_rel=0.17716923356056213, max_rel=1687.4998779296875, norm_rel=0.019488809630274773, ref_abs_avg=17.762676239013672, test_abs_avg=17.748802185058594
identity layers + randn queries
mean abs randn paper: 0.2177734375
production_forward2 fwd+bwd:  243.421 ms
production_forward2 fwd-only: 24.778 ms
production_forward2 bwd-only: 219.153 ms
production_forward2 peak allocated: fwd=2.692 GiB, fwd+bwd=6.071 GiB
production_forward2 peak reserved:  fwd=2.977 GiB, fwd+bwd=8.727 GiB
torch_compile_phases_forward fwd+bwd:  260.432 ms
torch_compile_phases_forward fwd-only: 43.600 ms
torch_compile_phases_forward bwd-only: 213.839 ms
torch_compile_phases_forward peak allocated: fwd=5.342 GiB, fwd+bwd=6.469 GiB
torch_compile_phases_forward peak reserved:  fwd=5.852 GiB, fwd+bwd=9.852 GiB
production_forward fwd+bwd:  124.480 ms
production_forward fwd-only: 22.777 ms
production_forward bwd-only: 102.057 ms
production_forward peak allocated: fwd=2.192 GiB, fwd+bwd=6.071 GiB
production_forward peak reserved:  fwd=2.227 GiB, fwd+bwd=6.102 GiB
paper_forward fwd+bwd:  536.054 ms
paper_forward fwd-only: 97.328 ms
paper_forward bwd-only: 439.759 ms
paper_forward peak allocated: fwd=6.194 GiB, fwd+bwd=10.068 GiB
paper_forward peak reserved:  fwd=6.227 GiB, fwd+bwd=10.227 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.001596258021891117, max_abs=0.0546875
production_forward grad[0] vs paper_forward: mean_abs=0.008457420393824577, max_abs=0.482421875, mean_rel=0.0739588662981987, max_rel=161.10708618164062, norm_rel=0.02025221846997738, ref_abs_avg=0.45100098848342896, test_abs_avg=0.4510093927383423
production_forward grad[1] vs paper_forward: mean_abs=7.224140644073486, max_abs=62.0, mean_rel=0.17505739629268646, max_rel=440.9304504394531, norm_rel=0.020455239340662956, ref_abs_avg=312.2709655761719, test_abs_avg=312.2920837402344
production_forward grad[2] vs paper_forward: mean_abs=1.2495990991592407, max_abs=6.0, mean_rel=0.388888418674469, max_rel=131.50535583496094, norm_rel=0.02284567803144455, ref_abs_avg=55.33161163330078, test_abs_avg=55.43883514404297
production_forward grad[3] vs paper_forward: mean_abs=1.6231608390808105, max_abs=13.0, mean_rel=0.17421334981918335, max_rel=1781.42529296875, norm_rel=0.025029784068465233, ref_abs_avg=65.29249572753906, test_abs_avg=65.30018615722656
production_forward grad[4] vs paper_forward: mean_abs=1.5014855861663818, max_abs=11.875, mean_rel=0.5275877714157104, max_rel=4687.5, norm_rel=0.023452233523130417, ref_abs_avg=64.38485717773438, test_abs_avg=64.39712524414062
production_forward grad[5] vs paper_forward: mean_abs=1.1215167045593262, max_abs=4.375, mean_rel=0.14442308247089386, max_rel=29.513818740844727, norm_rel=0.02478051371872425, ref_abs_avg=44.46185302734375, test_abs_avg=44.44917297363281
production_forward grad[6] vs paper_forward: mean_abs=1.4120374917984009, max_abs=9.25, mean_rel=0.16045832633972168, max_rel=2603.421142578125, norm_rel=0.02468213625252247, ref_abs_avg=57.5794677734375, test_abs_avg=57.58450698852539
production_forward grad[7] vs paper_forward: mean_abs=1.3080146312713623, max_abs=9.25, mean_rel=0.38556838035583496, max_rel=4312.5, norm_rel=0.022898482158780098, ref_abs_avg=57.41876220703125, test_abs_avg=57.41958999633789
production_forward grad[8] vs paper_forward: mean_abs=1.0034699440002441, max_abs=3.75, mean_rel=0.1481683850288391, max_rel=17.353567123413086, norm_rel=0.02437232993543148, ref_abs_avg=42.60327911376953, test_abs_avg=42.56932830810547
production_forward grad[9] vs paper_forward: mean_abs=1.287189245223999, max_abs=8.5, mean_rel=0.16012424230575562, max_rel=1603.9278564453125, norm_rel=0.024373959749937057, ref_abs_avg=53.116214752197266, test_abs_avg=53.12090301513672
production_forward grad[10] vs paper_forward: mean_abs=1.1975177526474, max_abs=7.75, mean_rel=0.38301897048950195, max_rel=3437.499755859375, norm_rel=0.023109814152121544, ref_abs_avg=52.12453842163086, test_abs_avg=52.1235466003418
production_forward grad[11] vs paper_forward: mean_abs=0.907231330871582, max_abs=3.78125, mean_rel=0.08808168768882751, max_rel=8.2679443359375, norm_rel=0.021973775699734688, ref_abs_avg=41.1490364074707, test_abs_avg=41.17754364013672
production_forward grad[12] vs paper_forward: mean_abs=1.1793371438980103, max_abs=8.0, mean_rel=0.1642647087574005, max_rel=1039.94677734375, norm_rel=0.024236397817730904, ref_abs_avg=48.96343994140625, test_abs_avg=48.96625900268555
production_forward grad[13] vs paper_forward: mean_abs=1.0884957313537598, max_abs=6.5, mean_rel=0.2901025414466858, max_rel=3156.249755859375, norm_rel=0.02251124195754528, ref_abs_avg=48.59226989746094, test_abs_avg=48.589115142822266
production_forward grad[14] vs paper_forward: mean_abs=0.8279032707214355, max_abs=3.625, mean_rel=0.07294351607561111, max_rel=3.3559610843658447, norm_rel=0.022267024964094162, ref_abs_avg=37.92662811279297, test_abs_avg=37.91217041015625
production_forward grad[15] vs paper_forward: mean_abs=1.0950309038162231, max_abs=7.5625, mean_rel=0.15685953199863434, max_rel=903.8271484375, norm_rel=0.024131430312991142, ref_abs_avg=45.61953353881836, test_abs_avg=45.623626708984375
production_forward grad[16] vs paper_forward: mean_abs=1.0069217681884766, max_abs=6.5, mean_rel=0.29262790083885193, max_rel=2812.499755859375, norm_rel=0.02239544875919819, ref_abs_avg=45.17500305175781, test_abs_avg=45.17671585083008
production_forward grad[17] vs paper_forward: mean_abs=0.7918853759765625, max_abs=3.5, mean_rel=0.06583340466022491, max_rel=7.408024787902832, norm_rel=0.021290164440870285, ref_abs_avg=37.738929748535156, test_abs_avg=37.804054260253906
production_forward grad[18] vs paper_forward: mean_abs=1.0262746810913086, max_abs=7.0, mean_rel=0.1735491305589676, max_rel=4167.6318359375, norm_rel=0.023829707875847816, ref_abs_avg=43.32691955566406, test_abs_avg=43.33195495605469
production_forward grad[19] vs paper_forward: mean_abs=0.941179096698761, max_abs=5.9375, mean_rel=0.40342700481414795, max_rel=3374.999755859375, norm_rel=0.022141261026263237, ref_abs_avg=42.7127799987793, test_abs_avg=42.71599578857422
production_forward grad[20] vs paper_forward: mean_abs=0.7395579814910889, max_abs=2.875, mean_rel=0.1519288718700409, max_rel=19.311920166015625, norm_rel=0.0230121910572052, ref_abs_avg=32.465248107910156, test_abs_avg=32.48136901855469
production_forward grad[21] vs paper_forward: mean_abs=0.9662338495254517, max_abs=7.0, mean_rel=0.16212402284145355, max_rel=1082.1260986328125, norm_rel=0.023766234517097473, ref_abs_avg=40.8896598815918, test_abs_avg=40.88990020751953
production_forward grad[22] vs paper_forward: mean_abs=0.8912217617034912, max_abs=6.0, mean_rel=0.2796657085418701, max_rel=2171.875, norm_rel=0.021872080862522125, ref_abs_avg=40.97001647949219, test_abs_avg=40.96843719482422
production_forward grad[23] vs paper_forward: mean_abs=0.6719064712524414, max_abs=3.0, mean_rel=0.09664194285869598, max_rel=10.399513244628906, norm_rel=0.02129838429391384, ref_abs_avg=31.806474685668945, test_abs_avg=31.823871612548828
production_forward grad[24] vs paper_forward: mean_abs=0.9183919429779053, max_abs=6.0, mean_rel=0.16531498730182648, max_rel=2077.4130859375, norm_rel=0.02349177747964859, ref_abs_avg=39.27857208251953, test_abs_avg=39.279640197753906
production_forward grad[25] vs paper_forward: mean_abs=0.8462282419204712, max_abs=5.4375, mean_rel=0.3055301606655121, max_rel=3093.749755859375, norm_rel=0.022110603749752045, ref_abs_avg=38.50322723388672, test_abs_avg=38.50401306152344
production_forward grad[26] vs paper_forward: mean_abs=0.8248233795166016, max_abs=3.0, mean_rel=0.1732514649629593, max_rel=28.205385208129883, norm_rel=0.023232653737068176, ref_abs_avg=35.115325927734375, test_abs_avg=35.132965087890625
production_forward grad[27] vs paper_forward: mean_abs=1.0677096843719482, max_abs=7.6875, mean_rel=0.16516190767288208, max_rel=1217.74462890625, norm_rel=0.02570585161447525, ref_abs_avg=41.75147247314453, test_abs_avg=41.755287170410156
production_forward grad[28] vs paper_forward: mean_abs=0.9998958706855774, max_abs=7.0, mean_rel=0.27481991052627563, max_rel=2375.0, norm_rel=0.02420823648571968, ref_abs_avg=41.48081588745117, test_abs_avg=41.484840393066406
production_forward grad[29] vs paper_forward: mean_abs=0.7659802436828613, max_abs=3.1376953125, mean_rel=0.42547091841697693, max_rel=160.4574432373047, norm_rel=0.023867862299084663, ref_abs_avg=31.557735443115234, test_abs_avg=31.631202697753906
production_forward grad[30] vs paper_forward: mean_abs=0.9890298247337341, max_abs=6.5, mean_rel=0.1785326451063156, max_rel=1868.7593994140625, norm_rel=0.025895066559314728, ref_abs_avg=38.351966857910156, test_abs_avg=38.35226058959961
production_forward grad[31] vs paper_forward: mean_abs=0.922748327255249, max_abs=6.0, mean_rel=0.3464096784591675, max_rel=3374.999755859375, norm_rel=0.024428237229585648, ref_abs_avg=37.93696594238281, test_abs_avg=37.941261291503906
production_forward grad[32] vs paper_forward: mean_abs=0.7500106692314148, max_abs=3.125, mean_rel=0.10302083194255829, max_rel=5.360263347625732, norm_rel=0.025067314505577087, ref_abs_avg=29.431020736694336, test_abs_avg=29.474708557128906
production_forward grad[33] vs paper_forward: mean_abs=0.9259103536605835, max_abs=6.5, mean_rel=0.17665190994739532, max_rel=1734.442626953125, norm_rel=0.025827556848526, ref_abs_avg=35.99446487426758, test_abs_avg=35.99281311035156
production_forward grad[34] vs paper_forward: mean_abs=0.8529282212257385, max_abs=5.4375, mean_rel=0.33477187156677246, max_rel=2749.999755859375, norm_rel=0.024220049381256104, ref_abs_avg=35.380775451660156, test_abs_avg=35.38665771484375
production_forward grad[35] vs paper_forward: mean_abs=0.6536149978637695, max_abs=2.5, mean_rel=0.0932028591632843, max_rel=4.314523696899414, norm_rel=0.02392423525452614, ref_abs_avg=28.047130584716797, test_abs_avg=28.11555290222168
production_forward grad[36] vs paper_forward: mean_abs=0.8576577305793762, max_abs=6.0, mean_rel=0.1637624055147171, max_rel=795.3255615234375, norm_rel=0.025487681850790977, ref_abs_avg=33.79550552368164, test_abs_avg=33.79491424560547
production_forward grad[37] vs paper_forward: mean_abs=0.8033301830291748, max_abs=5.0, mean_rel=0.2608124911785126, max_rel=2500.0, norm_rel=0.024089371785521507, ref_abs_avg=33.41982650756836, test_abs_avg=33.42304611206055
production_forward grad[38] vs paper_forward: mean_abs=0.6237435340881348, max_abs=2.25, mean_rel=0.10287152975797653, max_rel=5.175599098205566, norm_rel=0.02312571369111538, ref_abs_avg=27.500953674316406, test_abs_avg=27.45136070251465
production_forward grad[39] vs paper_forward: mean_abs=0.808901309967041, max_abs=5.0, mean_rel=0.17809543013572693, max_rel=2373.635498046875, norm_rel=0.025140516459941864, ref_abs_avg=32.27251434326172, test_abs_avg=32.27400588989258
production_forward grad[40] vs paper_forward: mean_abs=0.750271201133728, max_abs=4.5, mean_rel=0.3289031684398651, max_rel=2500.0, norm_rel=0.023685969412326813, ref_abs_avg=31.763946533203125, test_abs_avg=31.769718170166016
production_forward grad[41] vs paper_forward: mean_abs=0.6458759307861328, max_abs=2.4453125, mean_rel=0.12738294899463654, max_rel=8.171303749084473, norm_rel=0.02599167823791504, ref_abs_avg=24.72956657409668, test_abs_avg=24.77192497253418
production_forward grad[42] vs paper_forward: mean_abs=0.7628855109214783, max_abs=5.5, mean_rel=0.17937731742858887, max_rel=1719.5611572265625, norm_rel=0.025038596242666245, ref_abs_avg=30.550914764404297, test_abs_avg=30.552101135253906
production_forward grad[43] vs paper_forward: mean_abs=0.7153111696243286, max_abs=4.5, mean_rel=0.2782260775566101, max_rel=2234.375, norm_rel=0.023581383749842644, ref_abs_avg=30.40683364868164, test_abs_avg=30.406757354736328
production_forward grad[44] vs paper_forward: mean_abs=0.574653148651123, max_abs=2.5, mean_rel=0.3147905468940735, max_rel=50.03468322753906, norm_rel=0.024008339270949364, ref_abs_avg=24.181915283203125, test_abs_avg=24.191640853881836
production_forward grad[45] vs paper_forward: mean_abs=0.730134904384613, max_abs=5.5, mean_rel=0.15414640307426453, max_rel=1033.6749267578125, norm_rel=0.024752043187618256, ref_abs_avg=29.594470977783203, test_abs_avg=29.595300674438477
production_forward grad[46] vs paper_forward: mean_abs=0.6796993017196655, max_abs=5.0, mean_rel=0.27803555130958557, max_rel=1843.7498779296875, norm_rel=0.023323779925704002, ref_abs_avg=29.2280216217041, test_abs_avg=29.23041534423828
production_forward grad[47] vs paper_forward: mean_abs=0.5757877826690674, max_abs=2.25, mean_rel=0.1045951098203659, max_rel=4.883205890655518, norm_rel=0.02459791861474514, ref_abs_avg=23.448001861572266, test_abs_avg=23.49013900756836
production_forward grad[48] vs paper_forward: mean_abs=0.6990542411804199, max_abs=6.0, mean_rel=0.16993176937103271, max_rel=1712.8021240234375, norm_rel=0.02435479499399662, ref_abs_avg=28.761096954345703, test_abs_avg=28.760587692260742
production_forward grad[49] vs paper_forward: mean_abs=0.6479822397232056, max_abs=4.75, mean_rel=0.2795029878616333, max_rel=2078.125, norm_rel=0.022719992324709892, ref_abs_avg=28.517372131347656, test_abs_avg=28.519601821899414
production_forward grad[50] vs paper_forward: mean_abs=0.6081315279006958, max_abs=2.5, mean_rel=0.5474380254745483, max_rel=207.1383056640625, norm_rel=0.024968363344669342, ref_abs_avg=24.127334594726562, test_abs_avg=24.113388061523438
production_forward grad[51] vs paper_forward: mean_abs=0.7707498073577881, max_abs=6.0, mean_rel=0.16085189580917358, max_rel=1181.167236328125, norm_rel=0.02613847702741623, ref_abs_avg=29.569416046142578, test_abs_avg=29.569061279296875
production_forward grad[52] vs paper_forward: mean_abs=0.7196826338768005, max_abs=4.75, mean_rel=0.25657591223716736, max_rel=1999.9998779296875, norm_rel=0.024543724954128265, ref_abs_avg=29.39780616760254, test_abs_avg=29.403764724731445
production_forward grad[53] vs paper_forward: mean_abs=0.5877223014831543, max_abs=2.375, mean_rel=0.1438458412885666, max_rel=18.577865600585938, norm_rel=0.0255039744079113, ref_abs_avg=23.39666748046875, test_abs_avg=23.373546600341797
production_forward grad[54] vs paper_forward: mean_abs=0.712722897529602, max_abs=5.0, mean_rel=0.16369670629501343, max_rel=994.9840698242188, norm_rel=0.025783734396100044, ref_abs_avg=27.732595443725586, test_abs_avg=27.73265266418457
production_forward grad[55] vs paper_forward: mean_abs=0.664456844329834, max_abs=4.75, mean_rel=0.2620549499988556, max_rel=2125.0, norm_rel=0.02396542951464653, ref_abs_avg=27.74726104736328, test_abs_avg=27.74752426147461
production_forward grad[56] vs paper_forward: mean_abs=0.5463593006134033, max_abs=2.125, mean_rel=0.230149507522583, max_rel=44.75147247314453, norm_rel=0.025606542825698853, ref_abs_avg=21.358078002929688, test_abs_avg=21.3231201171875
production_forward grad[57] vs paper_forward: mean_abs=0.6618894338607788, max_abs=5.25, mean_rel=0.16776356101036072, max_rel=1021.3348388671875, norm_rel=0.025339791551232338, ref_abs_avg=26.198177337646484, test_abs_avg=26.197120666503906
production_forward grad[58] vs paper_forward: mean_abs=0.6194117069244385, max_abs=4.5, mean_rel=0.22573065757751465, max_rel=1749.9998779296875, norm_rel=0.02384856529533863, ref_abs_avg=26.066823959350586, test_abs_avg=26.067407608032227
production_forward grad[59] vs paper_forward: mean_abs=0.5139098167419434, max_abs=2.25, mean_rel=0.07192246615886688, max_rel=1.4968658685684204, norm_rel=0.02486862987279892, ref_abs_avg=20.552642822265625, test_abs_avg=20.52869987487793
production_forward grad[60] vs paper_forward: mean_abs=0.6301922798156738, max_abs=5.5, mean_rel=0.16995307803153992, max_rel=888.4699096679688, norm_rel=0.024978823959827423, ref_abs_avg=25.25185203552246, test_abs_avg=25.253021240234375
production_forward grad[61] vs paper_forward: mean_abs=0.5780270099639893, max_abs=4.0, mean_rel=0.21534931659698486, max_rel=1749.9998779296875, norm_rel=0.023549804463982582, ref_abs_avg=24.592565536499023, test_abs_avg=24.596839904785156
production_forward grad[62] vs paper_forward: mean_abs=0.46791720390319824, max_abs=1.75, mean_rel=0.17340387403964996, max_rel=24.519250869750977, norm_rel=0.023087264969944954, ref_abs_avg=20.188934326171875, test_abs_avg=20.211334228515625
production_forward grad[63] vs paper_forward: mean_abs=0.5931323766708374, max_abs=5.0, mean_rel=0.1478368192911148, max_rel=618.027099609375, norm_rel=0.024379417300224304, ref_abs_avg=24.389375686645508, test_abs_avg=24.390506744384766
production_forward grad[64] vs paper_forward: mean_abs=0.551316499710083, max_abs=4.0, mean_rel=0.22865308821201324, max_rel=1734.3748779296875, norm_rel=0.022899234667420387, ref_abs_avg=24.027902603149414, test_abs_avg=24.031940460205078
production_forward grad[65] vs paper_forward: mean_abs=0.4511408805847168, max_abs=1.875, mean_rel=0.16225680708885193, max_rel=23.132152557373047, norm_rel=0.0237092487514019, ref_abs_avg=18.580955505371094, test_abs_avg=18.581783294677734
production_forward grad[66] vs paper_forward: mean_abs=0.5650010108947754, max_abs=4.0, mean_rel=0.1548420786857605, max_rel=828.095458984375, norm_rel=0.024183064699172974, ref_abs_avg=23.41526222229004, test_abs_avg=23.417654037475586
production_forward grad[67] vs paper_forward: mean_abs=0.5186716914176941, max_abs=3.75, mean_rel=0.23899243772029877, max_rel=1562.4998779296875, norm_rel=0.02217726968228817, ref_abs_avg=23.285140991210938, test_abs_avg=23.29094696044922
production_forward grad[68] vs paper_forward: mean_abs=0.3883100748062134, max_abs=1.625, mean_rel=0.145260751247406, max_rel=31.258968353271484, norm_rel=0.021070696413517, ref_abs_avg=18.744577407836914, test_abs_avg=18.754858016967773
production_forward grad[69] vs paper_forward: mean_abs=0.5294244289398193, max_abs=5.0, mean_rel=0.15424327552318573, max_rel=742.0400390625, norm_rel=0.023806657642126083, ref_abs_avg=22.28221893310547, test_abs_avg=22.28173065185547
production_forward grad[70] vs paper_forward: mean_abs=0.4934520423412323, max_abs=3.5, mean_rel=0.20440439879894257, max_rel=1624.9998779296875, norm_rel=0.02217045985162258, ref_abs_avg=22.28622055053711, test_abs_avg=22.28314208984375
production_forward grad[71] vs paper_forward: mean_abs=0.3843865394592285, max_abs=1.375, mean_rel=0.16967050731182098, max_rel=12.041082382202148, norm_rel=0.02083478681743145, ref_abs_avg=18.34326934814453, test_abs_avg=18.377458572387695
production_forward grad[72] vs paper_forward: mean_abs=0.5174693465232849, max_abs=5.0, mean_rel=0.13990318775177002, max_rel=801.9118041992188, norm_rel=0.02325778640806675, ref_abs_avg=22.26012420654297, test_abs_avg=22.26009178161621
production_forward grad[73] vs paper_forward: mean_abs=0.47291505336761475, max_abs=3.5, mean_rel=0.2158472090959549, max_rel=1468.7498779296875, norm_rel=0.02158251777291298, ref_abs_avg=21.86115837097168, test_abs_avg=21.857669830322266
production_forward grad[74] vs paper_forward: mean_abs=0.4618723392486572, max_abs=2.03125, mean_rel=0.09524643421173096, max_rel=3.9789721965789795, norm_rel=0.024474218487739563, ref_abs_avg=18.89735984802246, test_abs_avg=18.867509841918945
production_forward grad[75] vs paper_forward: mean_abs=0.5812089443206787, max_abs=4.75, mean_rel=0.16018983721733093, max_rel=1044.6929931640625, norm_rel=0.02485373243689537, ref_abs_avg=23.412639617919922, test_abs_avg=23.412052154541016
production_forward grad[76] vs paper_forward: mean_abs=0.528212308883667, max_abs=4.5, mean_rel=0.21254369616508484, max_rel=1687.4998779296875, norm_rel=0.023020287975668907, ref_abs_avg=22.919174194335938, test_abs_avg=22.923416137695312
production_forward grad[77] vs paper_forward: mean_abs=0.4134082794189453, max_abs=1.5, mean_rel=0.11649829894304276, max_rel=15.701287269592285, norm_rel=0.023854315280914307, ref_abs_avg=17.54078483581543, test_abs_avg=17.510997772216797
production_forward grad[78] vs paper_forward: mean_abs=0.5239824056625366, max_abs=5.0, mean_rel=0.1401059776544571, max_rel=529.5013427734375, norm_rel=0.024221263825893402, ref_abs_avg=21.674060821533203, test_abs_avg=21.674701690673828
production_forward grad[79] vs paper_forward: mean_abs=0.4904981255531311, max_abs=4.125, mean_rel=0.20304451882839203, max_rel=1523.4373779296875, norm_rel=0.02253003604710102, ref_abs_avg=21.641521453857422, test_abs_avg=21.64173698425293
production_forward grad[80] vs paper_forward: mean_abs=0.363189697265625, max_abs=1.3125, mean_rel=0.09979163110256195, max_rel=13.822729110717773, norm_rel=0.0203288234770298, ref_abs_avg=17.707599639892578, test_abs_avg=17.717025756835938
production_forward grad[81] vs paper_forward: mean_abs=0.4821624755859375, max_abs=5.0, mean_rel=0.14388759434223175, max_rel=628.8514404296875, norm_rel=0.023653987795114517, ref_abs_avg=20.43639373779297, test_abs_avg=20.4361629486084
production_forward grad[82] vs paper_forward: mean_abs=0.4493248164653778, max_abs=4.0, mean_rel=0.21614563465118408, max_rel=1468.7498779296875, norm_rel=0.021973583847284317, ref_abs_avg=20.51148223876953, test_abs_avg=20.512195587158203
production_forward grad[83] vs paper_forward: mean_abs=0.364168643951416, max_abs=1.5, mean_rel=0.31320920586586, max_rel=76.78132629394531, norm_rel=0.02219315990805626, ref_abs_avg=16.28215789794922, test_abs_avg=16.269325256347656
production_forward grad[84] vs paper_forward: mean_abs=0.45966169238090515, max_abs=5.0, mean_rel=0.1403358429670334, max_rel=1130.1737060546875, norm_rel=0.022869205102324486, ref_abs_avg=20.17764663696289, test_abs_avg=20.178892135620117
production_forward grad[85] vs paper_forward: mean_abs=0.4172593951225281, max_abs=4.0, mean_rel=0.22294580936431885, max_rel=1406.2498779296875, norm_rel=0.021102426573634148, ref_abs_avg=19.862648010253906, test_abs_avg=19.852859497070312
production_forward grad[86] vs paper_forward: mean_abs=0.35504794120788574, max_abs=1.46875, mean_rel=0.10178224742412567, max_rel=8.482511520385742, norm_rel=0.022084373980760574, ref_abs_avg=15.965728759765625, test_abs_avg=15.998064994812012
production_forward grad[87] vs paper_forward: mean_abs=0.42718496918678284, max_abs=4.0, mean_rel=0.14047959446907043, max_rel=841.040771484375, norm_rel=0.022616736590862274, ref_abs_avg=19.0079345703125, test_abs_avg=19.009654998779297
production_forward grad[88] vs paper_forward: mean_abs=0.39471209049224854, max_abs=4.65625, mean_rel=0.18761684000492096, max_rel=1250.0, norm_rel=0.02054382488131523, ref_abs_avg=19.241615295410156, test_abs_avg=19.241470336914062
production_forward grad[89] vs paper_forward: mean_abs=0.31467390060424805, max_abs=1.4375, mean_rel=0.08618251234292984, max_rel=6.020353317260742, norm_rel=0.020597359165549278, ref_abs_avg=15.03955078125, test_abs_avg=14.988563537597656
production_forward grad[90] vs paper_forward: mean_abs=0.40198570489883423, max_abs=4.0, mean_rel=0.13107696175575256, max_rel=483.8177490234375, norm_rel=0.022057678550481796, ref_abs_avg=18.346031188964844, test_abs_avg=18.346725463867188
production_forward grad[91] vs paper_forward: mean_abs=0.3747498393058777, max_abs=3.75, mean_rel=0.21457479894161224, max_rel=1218.75, norm_rel=0.020453905686736107, ref_abs_avg=18.47541046142578, test_abs_avg=18.47509765625
production_forward grad[92] vs paper_forward: mean_abs=0.29924190044403076, max_abs=1.125, mean_rel=0.1157994493842125, max_rel=12.320401191711426, norm_rel=0.018893981352448463, ref_abs_avg=16.333789825439453, test_abs_avg=16.34463119506836
production_forward grad[93] vs paper_forward: mean_abs=0.38481730222702026, max_abs=4.0, mean_rel=0.1292169839143753, max_rel=1171.73193359375, norm_rel=0.02169407345354557, ref_abs_avg=17.946361541748047, test_abs_avg=17.94729232788086
production_forward grad[94] vs paper_forward: mean_abs=0.35745769739151, max_abs=4.5, mean_rel=0.1723567545413971, max_rel=1640.6248779296875, norm_rel=0.020342370495200157, ref_abs_avg=17.91436195373535, test_abs_avg=17.91999053955078
production_forward grad[95] vs paper_forward: mean_abs=0.29990243911743164, max_abs=1.25, mean_rel=0.1340816617012024, max_rel=16.671106338500977, norm_rel=0.019957860931754112, ref_abs_avg=15.153791427612305, test_abs_avg=15.169294357299805
production_forward grad[96] vs paper_forward: mean_abs=0.3577744662761688, max_abs=4.6640625, mean_rel=0.12068364024162292, max_rel=559.4188232421875, norm_rel=0.020974954590201378, ref_abs_avg=17.363662719726562, test_abs_avg=17.36322784423828
production_forward grad[97] vs paper_forward: mean_abs=0.3191410303115845, max_abs=3.5, mean_rel=0.15930865705013275, max_rel=999.9999389648438, norm_rel=0.018380651250481606, ref_abs_avg=17.518436431884766, test_abs_avg=17.518383026123047
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0015983309131115675, max_abs=0.0546875
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008477482944726944, max_abs=0.501953125, mean_rel=0.07409079372882843, max_rel=137.9774169921875, norm_rel=0.020299335941672325, ref_abs_avg=0.45100098848342896, test_abs_avg=0.4509992003440857
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.199189186096191, max_abs=56.0, mean_rel=0.25906410813331604, max_rel=1267.8375244140625, norm_rel=0.020412912592291832, ref_abs_avg=312.2709655761719, test_abs_avg=312.2421875
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.242340326309204, max_abs=5.0, mean_rel=0.40086284279823303, max_rel=133.33143615722656, norm_rel=0.02246461808681488, ref_abs_avg=55.33161163330078, test_abs_avg=55.389076232910156
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.6243844032287598, max_abs=10.0, mean_rel=0.17533622682094574, max_rel=2610.384521484375, norm_rel=0.025043658912181854, ref_abs_avg=65.29249572753906, test_abs_avg=65.29592895507812
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5020784139633179, max_abs=10.0, mean_rel=0.533743679523468, max_rel=5093.75, norm_rel=0.02346144989132881, ref_abs_avg=64.38485717773438, test_abs_avg=64.38914489746094
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.0509982109069824, max_abs=4.25, mean_rel=0.08166996389627457, max_rel=3.0606539249420166, norm_rel=0.02357889898121357, ref_abs_avg=44.46185302734375, test_abs_avg=44.42535400390625
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.4156296253204346, max_abs=11.0, mean_rel=0.15987950563430786, max_rel=2970.069580078125, norm_rel=0.02474295161664486, ref_abs_avg=57.5794677734375, test_abs_avg=57.5843505859375
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3159570693969727, max_abs=10.5, mean_rel=0.3733102083206177, max_rel=4187.5, norm_rel=0.023032404482364655, ref_abs_avg=57.41876220703125, test_abs_avg=57.41828155517578
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0645418167114258, max_abs=4.5, mean_rel=0.1371508240699768, max_rel=18.74046516418457, norm_rel=0.025580130517482758, ref_abs_avg=42.60327911376953, test_abs_avg=42.62757110595703
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.2911267280578613, max_abs=9.0, mean_rel=0.1613660752773285, max_rel=1369.723388671875, norm_rel=0.024446770548820496, ref_abs_avg=53.116214752197266, test_abs_avg=53.11846923828125
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.1986589431762695, max_abs=7.5, mean_rel=0.34585902094841003, max_rel=2500.0, norm_rel=0.023118557408452034, ref_abs_avg=52.12453842163086, test_abs_avg=52.122352600097656
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.8921060562133789, max_abs=3.6875, mean_rel=0.09181956946849823, max_rel=11.521963119506836, norm_rel=0.021976526826620102, ref_abs_avg=41.1490364074707, test_abs_avg=41.14268112182617
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.1846493482589722, max_abs=8.0, mean_rel=0.16847893595695496, max_rel=1793.7862548828125, norm_rel=0.0243388619273901, ref_abs_avg=48.96343994140625, test_abs_avg=48.96306228637695
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.0953996181488037, max_abs=7.0, mean_rel=0.30098992586135864, max_rel=3249.999755859375, norm_rel=0.022638626396656036, ref_abs_avg=48.59226989746094, test_abs_avg=48.58863830566406
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8291530609130859, max_abs=3.0, mean_rel=0.06945076584815979, max_rel=2.1637816429138184, norm_rel=0.022509997710585594, ref_abs_avg=37.92662811279297, test_abs_avg=37.941741943359375
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.0999860763549805, max_abs=7.5, mean_rel=0.1530110239982605, max_rel=819.6793823242188, norm_rel=0.024228692054748535, ref_abs_avg=45.61953353881836, test_abs_avg=45.624610900878906
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0132510662078857, max_abs=6.0, mean_rel=0.33832311630249023, max_rel=2999.999755859375, norm_rel=0.02254531718790531, ref_abs_avg=45.17500305175781, test_abs_avg=45.17467498779297
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.7785177230834961, max_abs=3.5, mean_rel=0.07024715095758438, max_rel=9.312604904174805, norm_rel=0.020823849365115166, ref_abs_avg=37.738929748535156, test_abs_avg=37.80708312988281
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0324863195419312, max_abs=7.0, mean_rel=0.17177066206932068, max_rel=4416.4453125, norm_rel=0.023960283026099205, ref_abs_avg=43.32691955566406, test_abs_avg=43.33131790161133
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9484407901763916, max_abs=6.0, mean_rel=0.40229928493499756, max_rel=2999.999755859375, norm_rel=0.02231864258646965, ref_abs_avg=42.7127799987793, test_abs_avg=42.71154022216797
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7557728290557861, max_abs=3.0, mean_rel=0.15633761882781982, max_rel=21.60498809814453, norm_rel=0.023172030225396156, ref_abs_avg=32.465248107910156, test_abs_avg=32.46697998046875
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9710155725479126, max_abs=6.625, mean_rel=0.1693229228258133, max_rel=1107.332275390625, norm_rel=0.02390451356768608, ref_abs_avg=40.8896598815918, test_abs_avg=40.88948440551758
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.8976106643676758, max_abs=5.625, mean_rel=0.2836398482322693, max_rel=3328.124755859375, norm_rel=0.022020868957042694, ref_abs_avg=40.97001647949219, test_abs_avg=40.97133255004883
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.682098388671875, max_abs=3.0, mean_rel=0.10248887538909912, max_rel=6.874175071716309, norm_rel=0.021525781601667404, ref_abs_avg=31.806474685668945, test_abs_avg=31.810800552368164
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9237200021743774, max_abs=8.0, mean_rel=0.16862067580223083, max_rel=2356.467529296875, norm_rel=0.023647436872124672, ref_abs_avg=39.27857208251953, test_abs_avg=39.27841567993164
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8492891788482666, max_abs=5.75, mean_rel=0.3119468688964844, max_rel=2562.5, norm_rel=0.022188808768987656, ref_abs_avg=38.50322723388672, test_abs_avg=38.500709533691406
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8101806640625, max_abs=3.0, mean_rel=0.15878266096115112, max_rel=32.736907958984375, norm_rel=0.022808024659752846, ref_abs_avg=35.115325927734375, test_abs_avg=35.116554260253906
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.068639874458313, max_abs=7.0625, mean_rel=0.1682216227054596, max_rel=1345.9580078125, norm_rel=0.025740593671798706, ref_abs_avg=41.75147247314453, test_abs_avg=41.75273132324219
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0018630027770996, max_abs=6.5, mean_rel=0.28112363815307617, max_rel=2031.2498779296875, norm_rel=0.024259215220808983, ref_abs_avg=41.48081588745117, test_abs_avg=41.48359680175781
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.7539379596710205, max_abs=3.0, mean_rel=0.4199470281600952, max_rel=142.8785400390625, norm_rel=0.023328371345996857, ref_abs_avg=31.557735443115234, test_abs_avg=31.600271224975586
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.9910195469856262, max_abs=7.0, mean_rel=0.17557795345783234, max_rel=3030.15380859375, norm_rel=0.0259673073887825, ref_abs_avg=38.351966857910156, test_abs_avg=38.35194396972656
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9238199591636658, max_abs=5.625, mean_rel=0.36221250891685486, max_rel=2812.499755859375, norm_rel=0.02445880137383938, ref_abs_avg=37.93696594238281, test_abs_avg=37.93830871582031
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7115981578826904, max_abs=2.5, mean_rel=0.0922422856092453, max_rel=4.037654399871826, norm_rel=0.024016616865992546, ref_abs_avg=29.431020736694336, test_abs_avg=29.454639434814453
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9276896715164185, max_abs=7.0, mean_rel=0.18012799322605133, max_rel=2036.138916015625, norm_rel=0.02589179202914238, ref_abs_avg=35.99446487426758, test_abs_avg=35.993534088134766
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.854401707649231, max_abs=5.875, mean_rel=0.3264598548412323, max_rel=2562.5, norm_rel=0.024249423295259476, ref_abs_avg=35.380775451660156, test_abs_avg=35.38653564453125
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.6649230718612671, max_abs=2.75, mean_rel=0.0863913893699646, max_rel=4.034549713134766, norm_rel=0.023809975013136864, ref_abs_avg=28.047130584716797, test_abs_avg=28.09531021118164
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8610340356826782, max_abs=6.5, mean_rel=0.16886289417743683, max_rel=1530.02978515625, norm_rel=0.02557677961885929, ref_abs_avg=33.79550552368164, test_abs_avg=33.79364776611328
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8058628439903259, max_abs=5.3125, mean_rel=0.2667301595211029, max_rel=2812.499755859375, norm_rel=0.024161312729120255, ref_abs_avg=33.41982650756836, test_abs_avg=33.42012023925781
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6307926177978516, max_abs=2.53125, mean_rel=0.10332754254341125, max_rel=4.754042625427246, norm_rel=0.02387651801109314, ref_abs_avg=27.500953674316406, test_abs_avg=27.465251922607422
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.811423659324646, max_abs=6.25, mean_rel=0.17272982001304626, max_rel=1795.240234375, norm_rel=0.02522827684879303, ref_abs_avg=32.27251434326172, test_abs_avg=32.27393341064453
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7539757490158081, max_abs=5.140625, mean_rel=0.34165021777153015, max_rel=2593.749755859375, norm_rel=0.02380829118192196, ref_abs_avg=31.763946533203125, test_abs_avg=31.768394470214844
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6508970260620117, max_abs=2.296875, mean_rel=0.11229556798934937, max_rel=4.507686138153076, norm_rel=0.02572379820048809, ref_abs_avg=24.72956657409668, test_abs_avg=24.762733459472656
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7646304368972778, max_abs=5.5, mean_rel=0.18096008896827698, max_rel=1551.0543212890625, norm_rel=0.025116287171840668, ref_abs_avg=30.550914764404297, test_abs_avg=30.551036834716797
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.716488242149353, max_abs=4.5, mean_rel=0.26499465107917786, max_rel=1999.9998779296875, norm_rel=0.023626847192645073, ref_abs_avg=30.40683364868164, test_abs_avg=30.406414031982422
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5705552101135254, max_abs=2.5, mean_rel=0.29959267377853394, max_rel=46.59939193725586, norm_rel=0.0239525455981493, ref_abs_avg=24.181915283203125, test_abs_avg=24.195499420166016
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7318066954612732, max_abs=5.5, mean_rel=0.1552869826555252, max_rel=1144.419921875, norm_rel=0.02479551173746586, ref_abs_avg=29.594470977783203, test_abs_avg=29.594337463378906
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.6822417974472046, max_abs=5.0, mean_rel=0.2853631377220154, max_rel=1937.4998779296875, norm_rel=0.023419510573148727, ref_abs_avg=29.2280216217041, test_abs_avg=29.22847557067871
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5712385177612305, max_abs=2.625, mean_rel=0.10287810862064362, max_rel=6.781830310821533, norm_rel=0.02424987591803074, ref_abs_avg=23.448001861572266, test_abs_avg=23.485567092895508
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.6999367475509644, max_abs=5.0, mean_rel=0.17341451346874237, max_rel=2083.488037109375, norm_rel=0.02440427616238594, ref_abs_avg=28.761096954345703, test_abs_avg=28.760622024536133
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.649553656578064, max_abs=4.25, mean_rel=0.27991238236427307, max_rel=2312.5, norm_rel=0.022787852212786674, ref_abs_avg=28.517372131347656, test_abs_avg=28.519874572753906
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6073004007339478, max_abs=2.75, mean_rel=0.3912683129310608, max_rel=121.36466979980469, norm_rel=0.024854831397533417, ref_abs_avg=24.127334594726562, test_abs_avg=24.11663246154785
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.7704238891601562, max_abs=6.0, mean_rel=0.16177287697792053, max_rel=960.9778442382812, norm_rel=0.026131030172109604, ref_abs_avg=29.569416046142578, test_abs_avg=29.567222595214844
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7178304195404053, max_abs=4.75, mean_rel=0.27439427375793457, max_rel=3499.999755859375, norm_rel=0.024465566501021385, ref_abs_avg=29.39780616760254, test_abs_avg=29.4005069732666
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.6021819114685059, max_abs=2.125, mean_rel=0.13354089856147766, max_rel=11.89342975616455, norm_rel=0.025696856901049614, ref_abs_avg=23.39666748046875, test_abs_avg=23.389951705932617
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7131228446960449, max_abs=5.75, mean_rel=0.16325972974300385, max_rel=908.0049438476562, norm_rel=0.025798557326197624, ref_abs_avg=27.732595443725586, test_abs_avg=27.73225212097168
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6640032529830933, max_abs=4.625, mean_rel=0.2614946663379669, max_rel=2375.0, norm_rel=0.023941002786159515, ref_abs_avg=27.74726104736328, test_abs_avg=27.750492095947266
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5529415607452393, max_abs=2.125, mean_rel=0.2363511621952057, max_rel=49.527950286865234, norm_rel=0.025844961404800415, ref_abs_avg=21.358078002929688, test_abs_avg=21.34728240966797
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6636087894439697, max_abs=4.5, mean_rel=0.16630958020687103, max_rel=1122.587158203125, norm_rel=0.025390593335032463, ref_abs_avg=26.198177337646484, test_abs_avg=26.197010040283203
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6198467016220093, max_abs=4.25, mean_rel=0.22353196144104004, max_rel=1624.9998779296875, norm_rel=0.023862039670348167, ref_abs_avg=26.066823959350586, test_abs_avg=26.070289611816406
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.4899749755859375, max_abs=2.0, mean_rel=0.07177960872650146, max_rel=4.104687213897705, norm_rel=0.023563072085380554, ref_abs_avg=20.552642822265625, test_abs_avg=20.539024353027344
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6307671070098877, max_abs=5.5, mean_rel=0.16990482807159424, max_rel=1078.896484375, norm_rel=0.025023920461535454, ref_abs_avg=25.25185203552246, test_abs_avg=25.252471923828125
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.5835252404212952, max_abs=4.0, mean_rel=0.2339276671409607, max_rel=1570.3123779296875, norm_rel=0.02377774566411972, ref_abs_avg=24.592565536499023, test_abs_avg=24.592689514160156
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4603005647659302, max_abs=1.875, mean_rel=0.1886487603187561, max_rel=29.712158203125, norm_rel=0.02287081629037857, ref_abs_avg=20.188934326171875, test_abs_avg=20.21749496459961
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.595095157623291, max_abs=6.0, mean_rel=0.15260049700737, max_rel=813.3819580078125, norm_rel=0.024449173361063004, ref_abs_avg=24.389375686645508, test_abs_avg=24.38930320739746
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5532103180885315, max_abs=4.0, mean_rel=0.2231982797384262, max_rel=1820.3123779296875, norm_rel=0.02300444431602955, ref_abs_avg=24.027902603149414, test_abs_avg=24.034330368041992
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4603743553161621, max_abs=1.6875, mean_rel=0.16070868074893951, max_rel=19.029367446899414, norm_rel=0.024297364056110382, ref_abs_avg=18.580955505371094, test_abs_avg=18.58233070373535
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5672374367713928, max_abs=5.0, mean_rel=0.15405838191509247, max_rel=680.4285888671875, norm_rel=0.02426913008093834, ref_abs_avg=23.41526222229004, test_abs_avg=23.416929244995117
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5182039737701416, max_abs=4.0, mean_rel=0.23644891381263733, max_rel=1406.2498779296875, norm_rel=0.022135382518172264, ref_abs_avg=23.285140991210938, test_abs_avg=23.286666870117188
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.3825265169143677, max_abs=1.8203125, mean_rel=0.14187559485435486, max_rel=24.60110092163086, norm_rel=0.020671354606747627, ref_abs_avg=18.744577407836914, test_abs_avg=18.760480880737305
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5313758850097656, max_abs=4.5, mean_rel=0.15409299731254578, max_rel=1198.570556640625, norm_rel=0.02389201521873474, ref_abs_avg=22.28221893310547, test_abs_avg=22.282466888427734
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.49541884660720825, max_abs=3.375, mean_rel=0.20125727355480194, max_rel=1656.2498779296875, norm_rel=0.02226671390235424, ref_abs_avg=22.28622055053711, test_abs_avg=22.284229278564453
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.40606069564819336, max_abs=1.4375, mean_rel=0.1465485394001007, max_rel=11.786312103271484, norm_rel=0.021899983286857605, ref_abs_avg=18.34326934814453, test_abs_avg=18.366003036499023
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5182843208312988, max_abs=4.0, mean_rel=0.13907533884048462, max_rel=893.8088989257812, norm_rel=0.023308835923671722, ref_abs_avg=22.26012420654297, test_abs_avg=22.259681701660156
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.4728015661239624, max_abs=4.0, mean_rel=0.21176862716674805, max_rel=1359.3748779296875, norm_rel=0.02156902104616165, ref_abs_avg=21.86115837097168, test_abs_avg=21.860193252563477
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4401000738143921, max_abs=1.75, mean_rel=0.11110709607601166, max_rel=12.718024253845215, norm_rel=0.023678191006183624, ref_abs_avg=18.89735984802246, test_abs_avg=18.879474639892578
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.579900324344635, max_abs=5.0, mean_rel=0.15621396899223328, max_rel=691.0672607421875, norm_rel=0.024806587025523186, ref_abs_avg=23.412639617919922, test_abs_avg=23.411115646362305
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5323617458343506, max_abs=4.25, mean_rel=0.2172369360923767, max_rel=1718.7498779296875, norm_rel=0.023201962932944298, ref_abs_avg=22.919174194335938, test_abs_avg=22.920211791992188
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.4045710563659668, max_abs=1.625, mean_rel=0.10690613090991974, max_rel=12.594318389892578, norm_rel=0.023667193949222565, ref_abs_avg=17.54078483581543, test_abs_avg=17.519224166870117
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5234715938568115, max_abs=4.5, mean_rel=0.1413121074438095, max_rel=516.3792724609375, norm_rel=0.024201663210988045, ref_abs_avg=21.674060821533203, test_abs_avg=21.674104690551758
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.4883772134780884, max_abs=4.5, mean_rel=0.20164787769317627, max_rel=1414.0623779296875, norm_rel=0.022454816848039627, ref_abs_avg=21.641521453857422, test_abs_avg=21.63943099975586
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.38188719749450684, max_abs=1.5, mean_rel=0.10668712854385376, max_rel=15.779176712036133, norm_rel=0.021004673093557358, ref_abs_avg=17.707599639892578, test_abs_avg=17.715322494506836
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.48284411430358887, max_abs=5.0, mean_rel=0.1448296755552292, max_rel=649.746337890625, norm_rel=0.023700278252363205, ref_abs_avg=20.43639373779297, test_abs_avg=20.43595314025879
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4480682909488678, max_abs=4.0, mean_rel=0.21892446279525757, max_rel=1406.2498779296875, norm_rel=0.02187332510948181, ref_abs_avg=20.51148223876953, test_abs_avg=20.513660430908203
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3631415367126465, max_abs=1.25, mean_rel=0.3011614978313446, max_rel=76.47419738769531, norm_rel=0.022250361740589142, ref_abs_avg=16.28215789794922, test_abs_avg=16.278322219848633
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.46084117889404297, max_abs=5.0, mean_rel=0.1432340443134308, max_rel=1397.692138671875, norm_rel=0.02291359193623066, ref_abs_avg=20.17764663696289, test_abs_avg=20.17851448059082
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.419483482837677, max_abs=3.75, mean_rel=0.21494191884994507, max_rel=1499.9998779296875, norm_rel=0.02119864523410797, ref_abs_avg=19.862648010253906, test_abs_avg=19.852108001708984
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.3570718765258789, max_abs=1.375, mean_rel=0.1026352047920227, max_rel=9.530858993530273, norm_rel=0.022100400179624557, ref_abs_avg=15.965728759765625, test_abs_avg=16.00297737121582
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4278339147567749, max_abs=4.25, mean_rel=0.13930073380470276, max_rel=752.7387084960938, norm_rel=0.022655636072158813, ref_abs_avg=19.0079345703125, test_abs_avg=19.009572982788086
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.39900368452072144, max_abs=4.25, mean_rel=0.18849585950374603, max_rel=1421.8748779296875, norm_rel=0.020784009248018265, ref_abs_avg=19.241615295410156, test_abs_avg=19.244619369506836
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.30849123001098633, max_abs=1.3125, mean_rel=0.08970611542463303, max_rel=10.117888450622559, norm_rel=0.020403297618031502, ref_abs_avg=15.03955078125, test_abs_avg=14.988163948059082
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.4031005799770355, max_abs=4.0, mean_rel=0.1293395459651947, max_rel=486.7558288574219, norm_rel=0.022120347246527672, ref_abs_avg=18.346031188964844, test_abs_avg=18.34685516357422
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.3778502345085144, max_abs=4.0, mean_rel=0.21255287528038025, max_rel=1125.0, norm_rel=0.020630626007914543, ref_abs_avg=18.47541046142578, test_abs_avg=18.4708251953125
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.3017768859863281, max_abs=1.25, mean_rel=0.09393994510173798, max_rel=13.312697410583496, norm_rel=0.01908181980252266, ref_abs_avg=16.333789825439453, test_abs_avg=16.347000122070312
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.38574132323265076, max_abs=3.5, mean_rel=0.130109503865242, max_rel=828.5684204101562, norm_rel=0.02174990624189377, ref_abs_avg=17.946361541748047, test_abs_avg=17.947349548339844
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.35908544063568115, max_abs=3.75, mean_rel=0.18314845860004425, max_rel=2109.375, norm_rel=0.02036111429333687, ref_abs_avg=17.91436195373535, test_abs_avg=17.920167922973633
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.30550599098205566, max_abs=1.03515625, mean_rel=0.11024690419435501, max_rel=16.75990104675293, norm_rel=0.020448768511414528, ref_abs_avg=15.153791427612305, test_abs_avg=15.158050537109375
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3582507371902466, max_abs=6.5, mean_rel=0.11863303184509277, max_rel=387.8876647949219, norm_rel=0.02099687047302723, ref_abs_avg=17.363662719726562, test_abs_avg=17.362367630004883
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.3185705244541168, max_abs=3.5, mean_rel=0.16713941097259521, max_rel=1374.9998779296875, norm_rel=0.01835271157324314, ref_abs_avg=17.518436431884766, test_abs_avg=17.51852798461914
production_forward2 vs paper_forward output: mean_abs=0.001596258021891117, max_abs=0.0546875
production_forward2 grad[0] vs paper_forward: mean_abs=0.008466176688671112, max_abs=0.501953125, mean_rel=0.07394055277109146, max_rel=151.3214569091797, norm_rel=0.020265964791178703, ref_abs_avg=0.45100098848342896, test_abs_avg=0.45099937915802
production_forward2 grad[1] vs paper_forward: mean_abs=7.177000999450684, max_abs=62.0, mean_rel=0.15341059863567352, max_rel=176.82843017578125, norm_rel=0.020259663462638855, ref_abs_avg=312.2709655761719, test_abs_avg=312.2501220703125
production_forward2 grad[2] vs paper_forward: mean_abs=1.2347412109375, max_abs=5.75, mean_rel=0.2697150707244873, max_rel=55.95183563232422, norm_rel=0.022207040339708328, ref_abs_avg=55.33161163330078, test_abs_avg=55.47355651855469
production_forward2 grad[3] vs paper_forward: mean_abs=1.6201850175857544, max_abs=12.0, mean_rel=0.17177847027778625, max_rel=2174.910888671875, norm_rel=0.024979403242468834, ref_abs_avg=65.29249572753906, test_abs_avg=65.29847717285156
production_forward2 grad[4] vs paper_forward: mean_abs=1.4995936155319214, max_abs=10.375, mean_rel=0.5360546112060547, max_rel=4812.5, norm_rel=0.023399796336889267, ref_abs_avg=64.38485717773438, test_abs_avg=64.39146423339844
production_forward2 grad[5] vs paper_forward: mean_abs=1.083142638206482, max_abs=4.875, mean_rel=0.1416001170873642, max_rel=32.165348052978516, norm_rel=0.024282485246658325, ref_abs_avg=44.46185302734375, test_abs_avg=44.443511962890625
production_forward2 grad[6] vs paper_forward: mean_abs=1.415358304977417, max_abs=10.0, mean_rel=0.15786537528038025, max_rel=1894.567626953125, norm_rel=0.024740144610404968, ref_abs_avg=57.5794677734375, test_abs_avg=57.58562469482422
production_forward2 grad[7] vs paper_forward: mean_abs=1.3130173683166504, max_abs=9.75, mean_rel=0.37732526659965515, max_rel=4281.25, norm_rel=0.02300303429365158, ref_abs_avg=57.41876220703125, test_abs_avg=57.41975402832031
production_forward2 grad[8] vs paper_forward: mean_abs=1.0505077838897705, max_abs=5.0, mean_rel=0.15243078768253326, max_rel=21.236881256103516, norm_rel=0.025539925321936607, ref_abs_avg=42.60327911376953, test_abs_avg=42.62521743774414
production_forward2 grad[9] vs paper_forward: mean_abs=1.2892544269561768, max_abs=9.0, mean_rel=0.1646721512079239, max_rel=1785.1407470703125, norm_rel=0.024409713223576546, ref_abs_avg=53.116214752197266, test_abs_avg=53.120487213134766
production_forward2 grad[10] vs paper_forward: mean_abs=1.1974661350250244, max_abs=6.75, mean_rel=0.3352220058441162, max_rel=3687.499755859375, norm_rel=0.023105524480342865, ref_abs_avg=52.12453842163086, test_abs_avg=52.12366485595703
production_forward2 grad[11] vs paper_forward: mean_abs=0.9089250564575195, max_abs=3.65625, mean_rel=0.09030589461326599, max_rel=8.871535301208496, norm_rel=0.022138776257634163, ref_abs_avg=41.1490364074707, test_abs_avg=41.153099060058594
production_forward2 grad[12] vs paper_forward: mean_abs=1.1825642585754395, max_abs=8.5, mean_rel=0.16721975803375244, max_rel=1832.7779541015625, norm_rel=0.024289580062031746, ref_abs_avg=48.96343994140625, test_abs_avg=48.965736389160156
production_forward2 grad[13] vs paper_forward: mean_abs=1.091099500656128, max_abs=6.5, mean_rel=0.30932939052581787, max_rel=3437.499755859375, norm_rel=0.022545302286744118, ref_abs_avg=48.59226989746094, test_abs_avg=48.58610534667969
production_forward2 grad[14] vs paper_forward: mean_abs=0.8389863967895508, max_abs=3.4375, mean_rel=0.06687310338020325, max_rel=2.382660388946533, norm_rel=0.02276192046701908, ref_abs_avg=37.92662811279297, test_abs_avg=37.90359878540039
production_forward2 grad[15] vs paper_forward: mean_abs=1.098227620124817, max_abs=8.5, mean_rel=0.16053268313407898, max_rel=1306.9287109375, norm_rel=0.024203995242714882, ref_abs_avg=45.61953353881836, test_abs_avg=45.62254333496094
production_forward2 grad[16] vs paper_forward: mean_abs=1.01170015335083, max_abs=6.5, mean_rel=0.29528120160102844, max_rel=2937.499755859375, norm_rel=0.022490952163934708, ref_abs_avg=45.17500305175781, test_abs_avg=45.17646026611328
production_forward2 grad[17] vs paper_forward: mean_abs=0.7964916229248047, max_abs=3.5, mean_rel=0.07451456040143967, max_rel=7.693711757659912, norm_rel=0.02139330841600895, ref_abs_avg=37.738929748535156, test_abs_avg=37.79113006591797
production_forward2 grad[18] vs paper_forward: mean_abs=1.0305500030517578, max_abs=8.0, mean_rel=0.17393572628498077, max_rel=3856.61474609375, norm_rel=0.023912684991955757, ref_abs_avg=43.32691955566406, test_abs_avg=43.33112716674805
production_forward2 grad[19] vs paper_forward: mean_abs=0.9456976652145386, max_abs=5.5, mean_rel=0.4199534058570862, max_rel=3218.749755859375, norm_rel=0.022249843925237656, ref_abs_avg=42.7127799987793, test_abs_avg=42.71329879760742
production_forward2 grad[20] vs paper_forward: mean_abs=0.732496976852417, max_abs=3.25, mean_rel=0.13257688283920288, max_rel=17.082550048828125, norm_rel=0.022613385692238808, ref_abs_avg=32.465248107910156, test_abs_avg=32.44944763183594
production_forward2 grad[21] vs paper_forward: mean_abs=0.9691739678382874, max_abs=6.0, mean_rel=0.15979354083538055, max_rel=801.8854370117188, norm_rel=0.023859737440943718, ref_abs_avg=40.8896598815918, test_abs_avg=40.889461517333984
production_forward2 grad[22] vs paper_forward: mean_abs=0.8941579461097717, max_abs=5.625, mean_rel=0.2887326180934906, max_rel=2843.749755859375, norm_rel=0.021951846778392792, ref_abs_avg=40.97001647949219, test_abs_avg=40.96885299682617
production_forward2 grad[23] vs paper_forward: mean_abs=0.6931858062744141, max_abs=3.125, mean_rel=0.09998802095651627, max_rel=8.673958778381348, norm_rel=0.021961363032460213, ref_abs_avg=31.806474685668945, test_abs_avg=31.801937103271484
production_forward2 grad[24] vs paper_forward: mean_abs=0.9211927652359009, max_abs=6.0, mean_rel=0.1664516031742096, max_rel=2728.5400390625, norm_rel=0.023585505783557892, ref_abs_avg=39.27857208251953, test_abs_avg=39.27884292602539
production_forward2 grad[25] vs paper_forward: mean_abs=0.8486895561218262, max_abs=5.6875, mean_rel=0.31855592131614685, max_rel=3374.999755859375, norm_rel=0.02216986007988453, ref_abs_avg=38.50322723388672, test_abs_avg=38.50170135498047
production_forward2 grad[26] vs paper_forward: mean_abs=0.7964553833007812, max_abs=3.748046875, mean_rel=0.1639420986175537, max_rel=32.47034454345703, norm_rel=0.022614583373069763, ref_abs_avg=35.115325927734375, test_abs_avg=35.139312744140625
production_forward2 grad[27] vs paper_forward: mean_abs=1.0663535594940186, max_abs=8.0, mean_rel=0.16252166032791138, max_rel=1088.6334228515625, norm_rel=0.025665603578090668, ref_abs_avg=41.75147247314453, test_abs_avg=41.754180908203125
production_forward2 grad[28] vs paper_forward: mean_abs=1.000536561012268, max_abs=6.75, mean_rel=0.2952224016189575, max_rel=2250.0, norm_rel=0.02423599734902382, ref_abs_avg=41.48081588745117, test_abs_avg=41.48539352416992
production_forward2 grad[29] vs paper_forward: mean_abs=0.759911060333252, max_abs=4.0, mean_rel=0.3997093141078949, max_rel=130.89292907714844, norm_rel=0.023973817005753517, ref_abs_avg=31.557735443115234, test_abs_avg=31.607585906982422
production_forward2 grad[30] vs paper_forward: mean_abs=0.9892417192459106, max_abs=7.0, mean_rel=0.17318516969680786, max_rel=2387.389404296875, norm_rel=0.025907693430781364, ref_abs_avg=38.351966857910156, test_abs_avg=38.352516174316406
production_forward2 grad[31] vs paper_forward: mean_abs=0.9206889271736145, max_abs=6.0, mean_rel=0.36432763934135437, max_rel=3312.499755859375, norm_rel=0.024370361119508743, ref_abs_avg=37.93696594238281, test_abs_avg=37.93973159790039
production_forward2 grad[32] vs paper_forward: mean_abs=0.7266921997070312, max_abs=2.75, mean_rel=0.0925225168466568, max_rel=5.208333492279053, norm_rel=0.024866165593266487, ref_abs_avg=29.431020736694336, test_abs_avg=29.47639274597168
production_forward2 grad[33] vs paper_forward: mean_abs=0.9259051084518433, max_abs=7.0, mean_rel=0.17701956629753113, max_rel=1734.442626953125, norm_rel=0.025825683027505875, ref_abs_avg=35.99446487426758, test_abs_avg=35.992828369140625
production_forward2 grad[34] vs paper_forward: mean_abs=0.8527324199676514, max_abs=6.0625, mean_rel=0.3120543658733368, max_rel=2484.375, norm_rel=0.024205561727285385, ref_abs_avg=35.380775451660156, test_abs_avg=35.386253356933594
production_forward2 grad[35] vs paper_forward: mean_abs=0.6555612087249756, max_abs=3.0, mean_rel=0.0794057697057724, max_rel=3.2859299182891846, norm_rel=0.023843195289373398, ref_abs_avg=28.047130584716797, test_abs_avg=28.08792495727539
production_forward2 grad[36] vs paper_forward: mean_abs=0.860274076461792, max_abs=6.0, mean_rel=0.16569840908050537, max_rel=1727.940673828125, norm_rel=0.025548506528139114, ref_abs_avg=33.79550552368164, test_abs_avg=33.793212890625
production_forward2 grad[37] vs paper_forward: mean_abs=0.8045798540115356, max_abs=5.0625, mean_rel=0.25896400213241577, max_rel=2437.5, norm_rel=0.024129964411258698, ref_abs_avg=33.41982650756836, test_abs_avg=33.421993255615234
production_forward2 grad[38] vs paper_forward: mean_abs=0.6045951843261719, max_abs=2.375, mean_rel=0.10021300613880157, max_rel=7.006254196166992, norm_rel=0.02299991436302662, ref_abs_avg=27.500953674316406, test_abs_avg=27.463367462158203
production_forward2 grad[39] vs paper_forward: mean_abs=0.8101983070373535, max_abs=5.5, mean_rel=0.1799597144126892, max_rel=2869.40283203125, norm_rel=0.025187818333506584, ref_abs_avg=32.27251434326172, test_abs_avg=32.27386474609375
production_forward2 grad[40] vs paper_forward: mean_abs=0.7518457174301147, max_abs=4.375, mean_rel=0.33462443947792053, max_rel=2187.5, norm_rel=0.02371937222778797, ref_abs_avg=31.763946533203125, test_abs_avg=31.76918601989746
production_forward2 grad[41] vs paper_forward: mean_abs=0.6511712074279785, max_abs=2.75, mean_rel=0.11929388344287872, max_rel=7.644509315490723, norm_rel=0.026076439768075943, ref_abs_avg=24.72956657409668, test_abs_avg=24.76658058166504
production_forward2 grad[42] vs paper_forward: mean_abs=0.7639427185058594, max_abs=5.5, mean_rel=0.1784805953502655, max_rel=1823.787109375, norm_rel=0.02508343569934368, ref_abs_avg=30.550914764404297, test_abs_avg=30.55132293701172
production_forward2 grad[43] vs paper_forward: mean_abs=0.7176868915557861, max_abs=5.0, mean_rel=0.2759896516799927, max_rel=2593.749755859375, norm_rel=0.023667285218834877, ref_abs_avg=30.40683364868164, test_abs_avg=30.407405853271484
production_forward2 grad[44] vs paper_forward: mean_abs=0.5678558349609375, max_abs=2.5, mean_rel=0.29311603307724, max_rel=42.63559341430664, norm_rel=0.023656345903873444, ref_abs_avg=24.181915283203125, test_abs_avg=24.17362403869629
production_forward2 grad[45] vs paper_forward: mean_abs=0.7306362986564636, max_abs=5.75, mean_rel=0.15364083647727966, max_rel=901.0899047851562, norm_rel=0.024765778332948685, ref_abs_avg=29.594470977783203, test_abs_avg=29.594484329223633
production_forward2 grad[46] vs paper_forward: mean_abs=0.6820746660232544, max_abs=4.5, mean_rel=0.28645116090774536, max_rel=2015.6248779296875, norm_rel=0.02339521422982216, ref_abs_avg=29.2280216217041, test_abs_avg=29.229215621948242
production_forward2 grad[47] vs paper_forward: mean_abs=0.5793933868408203, max_abs=2.40625, mean_rel=0.09373429417610168, max_rel=3.7413110733032227, norm_rel=0.02463899739086628, ref_abs_avg=23.448001861572266, test_abs_avg=23.504589080810547
production_forward2 grad[48] vs paper_forward: mean_abs=0.6993680000305176, max_abs=6.0, mean_rel=0.17489585280418396, max_rel=1746.313232421875, norm_rel=0.024374231696128845, ref_abs_avg=28.761096954345703, test_abs_avg=28.76007080078125
production_forward2 grad[49] vs paper_forward: mean_abs=0.6502268314361572, max_abs=4.25, mean_rel=0.28776809573173523, max_rel=1890.6248779296875, norm_rel=0.022783761844038963, ref_abs_avg=28.517372131347656, test_abs_avg=28.52102279663086
production_forward2 grad[50] vs paper_forward: mean_abs=0.6205767393112183, max_abs=2.5, mean_rel=0.46436840295791626, max_rel=162.10714721679688, norm_rel=0.02487087808549404, ref_abs_avg=24.127334594726562, test_abs_avg=24.105140686035156
production_forward2 grad[51] vs paper_forward: mean_abs=0.7695939540863037, max_abs=5.0, mean_rel=0.16386817395687103, max_rel=952.6297607421875, norm_rel=0.026095356792211533, ref_abs_avg=29.569416046142578, test_abs_avg=29.56841278076172
production_forward2 grad[52] vs paper_forward: mean_abs=0.7155726552009583, max_abs=4.5, mean_rel=0.2707812190055847, max_rel=2624.999755859375, norm_rel=0.024393679574131966, ref_abs_avg=29.39780616760254, test_abs_avg=29.404935836791992
production_forward2 grad[53] vs paper_forward: mean_abs=0.5847666263580322, max_abs=2.375, mean_rel=0.1382620930671692, max_rel=19.847063064575195, norm_rel=0.025317251682281494, ref_abs_avg=23.39666748046875, test_abs_avg=23.368934631347656
production_forward2 grad[54] vs paper_forward: mean_abs=0.7125552296638489, max_abs=5.0, mean_rel=0.16361379623413086, max_rel=926.0830688476562, norm_rel=0.02576417848467827, ref_abs_avg=27.732595443725586, test_abs_avg=27.7325439453125
production_forward2 grad[55] vs paper_forward: mean_abs=0.663662314414978, max_abs=4.875, mean_rel=0.2642006278038025, max_rel=1937.4998779296875, norm_rel=0.023926516994833946, ref_abs_avg=27.74726104736328, test_abs_avg=27.74764060974121
production_forward2 grad[56] vs paper_forward: mean_abs=0.5401637554168701, max_abs=2.0, mean_rel=0.21084217727184296, max_rel=31.616165161132812, norm_rel=0.02526637353003025, ref_abs_avg=21.358078002929688, test_abs_avg=21.319419860839844
production_forward2 grad[57] vs paper_forward: mean_abs=0.6616293787956238, max_abs=5.0, mean_rel=0.16834843158721924, max_rel=1466.3883056640625, norm_rel=0.02533470094203949, ref_abs_avg=26.198177337646484, test_abs_avg=26.19717025756836
production_forward2 grad[58] vs paper_forward: mean_abs=0.6197305917739868, max_abs=5.0, mean_rel=0.21818408370018005, max_rel=1499.9998779296875, norm_rel=0.023850539699196815, ref_abs_avg=26.066823959350586, test_abs_avg=26.067060470581055
production_forward2 grad[59] vs paper_forward: mean_abs=0.502471923828125, max_abs=2.25, mean_rel=0.07048752903938293, max_rel=2.958649158477783, norm_rel=0.024215633049607277, ref_abs_avg=20.552642822265625, test_abs_avg=20.541427612304688
production_forward2 grad[60] vs paper_forward: mean_abs=0.6297597885131836, max_abs=5.0, mean_rel=0.16949808597564697, max_rel=907.5125732421875, norm_rel=0.02497556246817112, ref_abs_avg=25.25185203552246, test_abs_avg=25.252944946289062
production_forward2 grad[61] vs paper_forward: mean_abs=0.5797243118286133, max_abs=4.0, mean_rel=0.22798776626586914, max_rel=1499.9998779296875, norm_rel=0.023615829646587372, ref_abs_avg=24.592565536499023, test_abs_avg=24.59637451171875
production_forward2 grad[62] vs paper_forward: mean_abs=0.4630556106567383, max_abs=1.875, mean_rel=0.16915258765220642, max_rel=16.243051528930664, norm_rel=0.0232168510556221, ref_abs_avg=20.188934326171875, test_abs_avg=20.21575927734375
production_forward2 grad[63] vs paper_forward: mean_abs=0.5938405990600586, max_abs=4.5, mean_rel=0.14952319860458374, max_rel=596.7001342773438, norm_rel=0.024412745609879494, ref_abs_avg=24.389375686645508, test_abs_avg=24.39004135131836
production_forward2 grad[64] vs paper_forward: mean_abs=0.5516754984855652, max_abs=4.0, mean_rel=0.2294393926858902, max_rel=1710.9373779296875, norm_rel=0.022924743592739105, ref_abs_avg=24.027902603149414, test_abs_avg=24.03199005126953
production_forward2 grad[65] vs paper_forward: mean_abs=0.45857715606689453, max_abs=1.9375, mean_rel=0.17415647208690643, max_rel=28.42258644104004, norm_rel=0.02396215684711933, ref_abs_avg=18.580955505371094, test_abs_avg=18.56821632385254
production_forward2 grad[66] vs paper_forward: mean_abs=0.5659894347190857, max_abs=4.0, mean_rel=0.15710410475730896, max_rel=779.7753295898438, norm_rel=0.024218857288360596, ref_abs_avg=23.41526222229004, test_abs_avg=23.417617797851562
production_forward2 grad[67] vs paper_forward: mean_abs=0.5190156698226929, max_abs=3.75, mean_rel=0.23684872686862946, max_rel=1546.8748779296875, norm_rel=0.02217993512749672, ref_abs_avg=23.285140991210938, test_abs_avg=23.290477752685547
production_forward2 grad[68] vs paper_forward: mean_abs=0.3859797716140747, max_abs=1.5078125, mean_rel=0.1277012676000595, max_rel=21.27216911315918, norm_rel=0.020909560844302177, ref_abs_avg=18.744577407836914, test_abs_avg=18.75766944885254
production_forward2 grad[69] vs paper_forward: mean_abs=0.5299672484397888, max_abs=5.0, mean_rel=0.15339018404483795, max_rel=836.4830322265625, norm_rel=0.023831071332097054, ref_abs_avg=22.28221893310547, test_abs_avg=22.28135108947754
production_forward2 grad[70] vs paper_forward: mean_abs=0.49322742223739624, max_abs=3.625, mean_rel=0.20818987488746643, max_rel=1624.9998779296875, norm_rel=0.02217003144323826, ref_abs_avg=22.28622055053711, test_abs_avg=22.282920837402344
production_forward2 grad[71] vs paper_forward: mean_abs=0.3860483169555664, max_abs=1.375, mean_rel=0.17415331304073334, max_rel=13.473440170288086, norm_rel=0.021233825013041496, ref_abs_avg=18.34326934814453, test_abs_avg=18.384828567504883
production_forward2 grad[72] vs paper_forward: mean_abs=0.5177562236785889, max_abs=5.0, mean_rel=0.13978293538093567, max_rel=815.6857299804688, norm_rel=0.023268183693289757, ref_abs_avg=22.26012420654297, test_abs_avg=22.26000213623047
production_forward2 grad[73] vs paper_forward: mean_abs=0.4734606146812439, max_abs=3.625, mean_rel=0.22123965620994568, max_rel=1359.3748779296875, norm_rel=0.021589120849967003, ref_abs_avg=21.86115837097168, test_abs_avg=21.85809326171875
production_forward2 grad[74] vs paper_forward: mean_abs=0.46793675422668457, max_abs=2.171875, mean_rel=0.10795089602470398, max_rel=8.867583274841309, norm_rel=0.024580011144280434, ref_abs_avg=18.89735984802246, test_abs_avg=18.88196563720703
production_forward2 grad[75] vs paper_forward: mean_abs=0.5781221389770508, max_abs=5.375, mean_rel=0.15500691533088684, max_rel=841.7819213867188, norm_rel=0.02474081702530384, ref_abs_avg=23.412639617919922, test_abs_avg=23.411745071411133
production_forward2 grad[76] vs paper_forward: mean_abs=0.5256155729293823, max_abs=4.5, mean_rel=0.21521759033203125, max_rel=1749.9998779296875, norm_rel=0.022912325337529182, ref_abs_avg=22.919174194335938, test_abs_avg=22.923847198486328
production_forward2 grad[77] vs paper_forward: mean_abs=0.4175539016723633, max_abs=1.5, mean_rel=0.09633491933345795, max_rel=6.075232982635498, norm_rel=0.02378244139254093, ref_abs_avg=17.54078483581543, test_abs_avg=17.50308609008789
production_forward2 grad[78] vs paper_forward: mean_abs=0.5230481624603271, max_abs=4.125, mean_rel=0.1400105357170105, max_rel=514.2786865234375, norm_rel=0.02417847141623497, ref_abs_avg=21.674060821533203, test_abs_avg=21.674209594726562
production_forward2 grad[79] vs paper_forward: mean_abs=0.4904297888278961, max_abs=4.5, mean_rel=0.20439523458480835, max_rel=1523.4373779296875, norm_rel=0.022529244422912598, ref_abs_avg=21.641521453857422, test_abs_avg=21.64243507385254
production_forward2 grad[80] vs paper_forward: mean_abs=0.3646702766418457, max_abs=1.3125, mean_rel=0.10450449585914612, max_rel=14.162981033325195, norm_rel=0.02025962993502617, ref_abs_avg=17.707599639892578, test_abs_avg=17.71792221069336
production_forward2 grad[81] vs paper_forward: mean_abs=0.4819841682910919, max_abs=5.0, mean_rel=0.1451415866613388, max_rel=684.885009765625, norm_rel=0.023648520931601524, ref_abs_avg=20.43639373779297, test_abs_avg=20.435787200927734
production_forward2 grad[82] vs paper_forward: mean_abs=0.44916361570358276, max_abs=4.0, mean_rel=0.21170444786548615, max_rel=1531.2498779296875, norm_rel=0.02195614017546177, ref_abs_avg=20.51148223876953, test_abs_avg=20.51180648803711
production_forward2 grad[83] vs paper_forward: mean_abs=0.36801767349243164, max_abs=1.5, mean_rel=0.27596113085746765, max_rel=60.19655990600586, norm_rel=0.022551383823156357, ref_abs_avg=16.28215789794922, test_abs_avg=16.268604278564453
production_forward2 grad[84] vs paper_forward: mean_abs=0.45991677045822144, max_abs=5.0, mean_rel=0.1403186321258545, max_rel=1135.7978515625, norm_rel=0.022883497178554535, ref_abs_avg=20.17764663696289, test_abs_avg=20.179176330566406
production_forward2 grad[85] vs paper_forward: mean_abs=0.41731059551239014, max_abs=4.0, mean_rel=0.2256273329257965, max_rel=1406.2498779296875, norm_rel=0.021093880757689476, ref_abs_avg=19.862648010253906, test_abs_avg=19.852439880371094
production_forward2 grad[86] vs paper_forward: mean_abs=0.3544583320617676, max_abs=1.390625, mean_rel=0.10257332772016525, max_rel=8.19685173034668, norm_rel=0.021995699033141136, ref_abs_avg=15.965728759765625, test_abs_avg=16.004873275756836
production_forward2 grad[87] vs paper_forward: mean_abs=0.4273947477340698, max_abs=4.5, mean_rel=0.1389952301979065, max_rel=656.0431518554688, norm_rel=0.02262146770954132, ref_abs_avg=19.0079345703125, test_abs_avg=19.010147094726562
production_forward2 grad[88] vs paper_forward: mean_abs=0.39470455050468445, max_abs=4.78125, mean_rel=0.18676972389221191, max_rel=1304.6873779296875, norm_rel=0.020551739260554314, ref_abs_avg=19.241615295410156, test_abs_avg=19.241626739501953
production_forward2 grad[89] vs paper_forward: mean_abs=0.31009602546691895, max_abs=1.5625, mean_rel=0.08041299879550934, max_rel=6.087525844573975, norm_rel=0.02042662352323532, ref_abs_avg=15.03955078125, test_abs_avg=14.987449645996094
production_forward2 grad[90] vs paper_forward: mean_abs=0.4028569757938385, max_abs=4.5, mean_rel=0.13027231395244598, max_rel=467.42327880859375, norm_rel=0.022099291905760765, ref_abs_avg=18.346031188964844, test_abs_avg=18.34662628173828
production_forward2 grad[91] vs paper_forward: mean_abs=0.3749050498008728, max_abs=3.75, mean_rel=0.21698321402072906, max_rel=1203.125, norm_rel=0.020466092973947525, ref_abs_avg=18.47541046142578, test_abs_avg=18.475244522094727
production_forward2 grad[92] vs paper_forward: mean_abs=0.30003440380096436, max_abs=1.0625, mean_rel=0.11734334379434586, max_rel=12.855805397033691, norm_rel=0.01886759325861931, ref_abs_avg=16.333789825439453, test_abs_avg=16.340166091918945
production_forward2 grad[93] vs paper_forward: mean_abs=0.3849793076515198, max_abs=4.0, mean_rel=0.12961727380752563, max_rel=1100.7325439453125, norm_rel=0.02171553298830986, ref_abs_avg=17.946361541748047, test_abs_avg=17.947357177734375
production_forward2 grad[94] vs paper_forward: mean_abs=0.3576579689979553, max_abs=4.5, mean_rel=0.17280259728431702, max_rel=1546.8748779296875, norm_rel=0.02035222202539444, ref_abs_avg=17.91436195373535, test_abs_avg=17.919921875
production_forward2 grad[95] vs paper_forward: mean_abs=0.29990243911743164, max_abs=1.25, mean_rel=0.1340816617012024, max_rel=16.671106338500977, norm_rel=0.019957860931754112, ref_abs_avg=15.153791427612305, test_abs_avg=15.169294357299805
production_forward2 grad[96] vs paper_forward: mean_abs=0.3577744662761688, max_abs=4.6640625, mean_rel=0.12068364024162292, max_rel=559.4188232421875, norm_rel=0.020974954590201378, ref_abs_avg=17.363662719726562, test_abs_avg=17.36322784423828
production_forward2 grad[97] vs paper_forward: mean_abs=0.3191410303115845, max_abs=3.5, mean_rel=0.15930865705013275, max_rel=999.9999389648438, norm_rel=0.018380651250481606, ref_abs_avg=17.518436431884766, test_abs_avg=17.518383026123047
identity layers + randn queries
mean abs randn paper: 0.220703125
production_forward fwd+bwd:  124.505 ms
production_forward fwd-only: 22.784 ms
production_forward bwd-only: 102.137 ms
production_forward peak allocated: fwd=2.192 GiB, fwd+bwd=6.071 GiB
production_forward peak reserved:  fwd=2.227 GiB, fwd+bwd=6.102 GiB
torch_compile_phases_forward fwd+bwd:  260.518 ms
torch_compile_phases_forward fwd-only: 43.626 ms
torch_compile_phases_forward bwd-only: 213.846 ms
torch_compile_phases_forward peak allocated: fwd=5.342 GiB, fwd+bwd=6.469 GiB
torch_compile_phases_forward peak reserved:  fwd=5.852 GiB, fwd+bwd=9.852 GiB
paper_forward fwd+bwd:  536.189 ms
paper_forward fwd-only: 97.316 ms
paper_forward bwd-only: 439.946 ms
paper_forward peak allocated: fwd=6.194 GiB, fwd+bwd=10.068 GiB
paper_forward peak reserved:  fwd=6.227 GiB, fwd+bwd=10.227 GiB
production_forward2 fwd+bwd:  243.378 ms
production_forward2 fwd-only: 24.777 ms
production_forward2 bwd-only: 219.146 ms
production_forward2 peak allocated: fwd=2.692 GiB, fwd+bwd=6.071 GiB
production_forward2 peak reserved:  fwd=2.977 GiB, fwd+bwd=8.727 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016741123981773853, max_abs=0.04296875
production_forward grad[0] vs paper_forward: mean_abs=0.008893009275197983, max_abs=0.41796875, mean_rel=0.07548496127128601, max_rel=108.44863891601562, norm_rel=0.02057594433426857, ref_abs_avg=0.46663185954093933, test_abs_avg=0.4666447341442108
production_forward grad[1] vs paper_forward: mean_abs=7.556502342224121, max_abs=56.0, mean_rel=0.3954860270023346, max_rel=4447.365234375, norm_rel=0.021112937480211258, ref_abs_avg=322.0852966308594, test_abs_avg=322.16119384765625
production_forward grad[2] vs paper_forward: mean_abs=1.3170287609100342, max_abs=6.25, mean_rel=0.1043189987540245, max_rel=16.868974685668945, norm_rel=0.025020381435751915, ref_abs_avg=56.16132354736328, test_abs_avg=56.17161560058594
production_forward grad[3] vs paper_forward: mean_abs=1.6962816715240479, max_abs=14.0, mean_rel=0.18796631693840027, max_rel=1860.8065185546875, norm_rel=0.02527463249862194, ref_abs_avg=67.49689483642578, test_abs_avg=67.49959564208984
production_forward grad[4] vs paper_forward: mean_abs=1.5600481033325195, max_abs=10.0, mean_rel=0.4229925870895386, max_rel=6374.99951171875, norm_rel=0.02357388101518154, ref_abs_avg=66.54086303710938, test_abs_avg=66.53956604003906
production_forward grad[5] vs paper_forward: mean_abs=1.131784439086914, max_abs=5.125, mean_rel=0.08976208418607712, max_rel=3.816993236541748, norm_rel=0.02337941899895668, ref_abs_avg=49.25524139404297, test_abs_avg=49.24208450317383
production_forward grad[6] vs paper_forward: mean_abs=1.4796231985092163, max_abs=10.0, mean_rel=0.1634427160024643, max_rel=2046.9041748046875, norm_rel=0.024975158274173737, ref_abs_avg=59.542205810546875, test_abs_avg=59.542236328125
production_forward grad[7] vs paper_forward: mean_abs=1.362685203552246, max_abs=8.25, mean_rel=0.3441236615180969, max_rel=3437.499755859375, norm_rel=0.023412412032485008, ref_abs_avg=58.50823211669922, test_abs_avg=58.50101852416992
production_forward grad[8] vs paper_forward: mean_abs=1.0430727005004883, max_abs=4.75, mean_rel=0.09986047446727753, max_rel=10.899144172668457, norm_rel=0.0226706825196743, ref_abs_avg=46.467018127441406, test_abs_avg=46.437870025634766
production_forward grad[9] vs paper_forward: mean_abs=1.326460361480713, max_abs=8.5, mean_rel=0.18030694127082825, max_rel=1848.322265625, norm_rel=0.024730894714593887, ref_abs_avg=53.939334869384766, test_abs_avg=53.939735412597656
production_forward grad[10] vs paper_forward: mean_abs=1.223718285560608, max_abs=8.0, mean_rel=0.37077078223228455, max_rel=4250.0, norm_rel=0.02310153841972351, ref_abs_avg=53.24089050292969, test_abs_avg=53.24345016479492
production_forward grad[11] vs paper_forward: mean_abs=0.9544582366943359, max_abs=3.5, mean_rel=0.11361443996429443, max_rel=13.656607627868652, norm_rel=0.024962935596704483, ref_abs_avg=38.369720458984375, test_abs_avg=38.312286376953125
production_forward grad[12] vs paper_forward: mean_abs=1.2275310754776, max_abs=8.0, mean_rel=0.16665825247764587, max_rel=1332.2974853515625, norm_rel=0.024536477401852608, ref_abs_avg=50.32079315185547, test_abs_avg=50.32419204711914
production_forward grad[13] vs paper_forward: mean_abs=1.134539008140564, max_abs=7.0, mean_rel=0.3419908285140991, max_rel=3749.999755859375, norm_rel=0.02304796501994133, ref_abs_avg=49.488853454589844, test_abs_avg=49.49058532714844
production_forward grad[14] vs paper_forward: mean_abs=0.9248886108398438, max_abs=3.71875, mean_rel=0.10992265492677689, max_rel=5.285126686096191, norm_rel=0.02411048673093319, ref_abs_avg=38.65168762207031, test_abs_avg=38.59223175048828
production_forward grad[15] vs paper_forward: mean_abs=1.1465133428573608, max_abs=9.0, mean_rel=0.15540489554405212, max_rel=897.5476684570312, norm_rel=0.024251874536275864, ref_abs_avg=47.54347229003906, test_abs_avg=47.54659652709961
production_forward grad[16] vs paper_forward: mean_abs=1.0573298931121826, max_abs=6.3125, mean_rel=0.31866180896759033, max_rel=2874.999755859375, norm_rel=0.022718610242009163, ref_abs_avg=46.834312438964844, test_abs_avg=46.84081268310547
production_forward grad[17] vs paper_forward: mean_abs=0.812956690788269, max_abs=3.5, mean_rel=0.7352268099784851, max_rel=282.6301574707031, norm_rel=0.021769747138023376, ref_abs_avg=38.43598937988281, test_abs_avg=38.436893463134766
production_forward grad[18] vs paper_forward: mean_abs=1.0800150632858276, max_abs=8.0, mean_rel=0.17349106073379517, max_rel=2246.379638671875, norm_rel=0.024242009967565536, ref_abs_avg=44.84699249267578, test_abs_avg=44.84721374511719
production_forward grad[19] vs paper_forward: mean_abs=0.9986900091171265, max_abs=6.0, mean_rel=0.2681654095649719, max_rel=3515.624755859375, norm_rel=0.022531801834702492, ref_abs_avg=44.59004211425781, test_abs_avg=44.592445373535156
production_forward grad[20] vs paper_forward: mean_abs=0.7996921539306641, max_abs=3.125, mean_rel=0.09621407091617584, max_rel=11.019822120666504, norm_rel=0.02196810021996498, ref_abs_avg=36.25498962402344, test_abs_avg=36.20807647705078
production_forward grad[21] vs paper_forward: mean_abs=1.0246508121490479, max_abs=7.0, mean_rel=0.16025865077972412, max_rel=1574.2342529296875, norm_rel=0.024073025211691856, ref_abs_avg=42.800209045410156, test_abs_avg=42.803565979003906
production_forward grad[22] vs paper_forward: mean_abs=0.9419278502464294, max_abs=5.5, mean_rel=0.30209052562713623, max_rel=3843.749755859375, norm_rel=0.02233181893825531, ref_abs_avg=42.37274169921875, test_abs_avg=42.37403106689453
production_forward grad[23] vs paper_forward: mean_abs=0.7942891120910645, max_abs=3.375, mean_rel=0.0876685157418251, max_rel=7.171266078948975, norm_rel=0.02422710321843624, ref_abs_avg=32.76759338378906, test_abs_avg=32.801841735839844
production_forward grad[24] vs paper_forward: mean_abs=0.9704150557518005, max_abs=7.0, mean_rel=0.15640592575073242, max_rel=2243.032958984375, norm_rel=0.02400531992316246, ref_abs_avg=40.680389404296875, test_abs_avg=40.68351745605469
production_forward grad[25] vs paper_forward: mean_abs=0.8997896909713745, max_abs=5.0, mean_rel=0.3019658923149109, max_rel=2999.999755859375, norm_rel=0.022549360990524292, ref_abs_avg=40.0908203125, test_abs_avg=40.092166900634766
production_forward grad[26] vs paper_forward: mean_abs=0.8745179176330566, max_abs=3.75, mean_rel=0.10764572024345398, max_rel=9.394813537597656, norm_rel=0.023937446996569633, ref_abs_avg=37.00743103027344, test_abs_avg=37.007476806640625
production_forward grad[27] vs paper_forward: mean_abs=1.1213312149047852, max_abs=8.0, mean_rel=0.16854031383991241, max_rel=1878.5428466796875, norm_rel=0.025726081803441048, ref_abs_avg=43.81545639038086, test_abs_avg=43.822723388671875
production_forward grad[28] vs paper_forward: mean_abs=1.0418169498443604, max_abs=7.25, mean_rel=0.3408026397228241, max_rel=3749.999755859375, norm_rel=0.02430139295756817, ref_abs_avg=43.024417877197266, test_abs_avg=43.01683044433594
production_forward grad[29] vs paper_forward: mean_abs=0.8475222587585449, max_abs=2.75, mean_rel=0.0936330258846283, max_rel=9.438985824584961, norm_rel=0.02570982277393341, ref_abs_avg=32.8680419921875, test_abs_avg=32.88214111328125
production_forward grad[30] vs paper_forward: mean_abs=1.0512449741363525, max_abs=7.0, mean_rel=0.1634398251771927, max_rel=1292.9658203125, norm_rel=0.026027260348200798, ref_abs_avg=40.60076141357422, test_abs_avg=40.602783203125
production_forward grad[31] vs paper_forward: mean_abs=0.9895321130752563, max_abs=6.25, mean_rel=0.34270912408828735, max_rel=2874.999755859375, norm_rel=0.024745887145400047, ref_abs_avg=40.08302307128906, test_abs_avg=40.091026306152344
production_forward grad[32] vs paper_forward: mean_abs=0.7475728988647461, max_abs=3.0, mean_rel=0.10011759400367737, max_rel=4.851495742797852, norm_rel=0.023797666653990746, ref_abs_avg=31.821247100830078, test_abs_avg=31.76155662536621
production_forward grad[33] vs paper_forward: mean_abs=0.9710350036621094, max_abs=6.0, mean_rel=0.17762790620326996, max_rel=2343.66796875, norm_rel=0.025890078395605087, ref_abs_avg=37.64593505859375, test_abs_avg=37.64372253417969
production_forward grad[34] vs paper_forward: mean_abs=0.9075924158096313, max_abs=5.625, mean_rel=0.26647043228149414, max_rel=2749.999755859375, norm_rel=0.02453252673149109, ref_abs_avg=37.139808654785156, test_abs_avg=37.13767623901367
production_forward grad[35] vs paper_forward: mean_abs=0.7252194881439209, max_abs=3.03515625, mean_rel=0.10277548432350159, max_rel=5.7504496574401855, norm_rel=0.026474442332983017, ref_abs_avg=27.519367218017578, test_abs_avg=27.484336853027344
production_forward grad[36] vs paper_forward: mean_abs=0.9021422863006592, max_abs=8.0, mean_rel=0.1695501059293747, max_rel=1614.055419921875, norm_rel=0.02558237873017788, ref_abs_avg=35.41404342651367, test_abs_avg=35.41261291503906
production_forward grad[37] vs paper_forward: mean_abs=0.8472610116004944, max_abs=5.0, mean_rel=0.30781763792037964, max_rel=2484.375, norm_rel=0.024412011727690697, ref_abs_avg=34.84027099609375, test_abs_avg=34.836265563964844
production_forward grad[38] vs paper_forward: mean_abs=0.6198608875274658, max_abs=2.625, mean_rel=0.2332908660173416, max_rel=38.86892318725586, norm_rel=0.022744720801711082, ref_abs_avg=27.86807632446289, test_abs_avg=27.876178741455078
production_forward grad[39] vs paper_forward: mean_abs=0.8530603647232056, max_abs=6.0, mean_rel=0.17127248644828796, max_rel=1725.858154296875, norm_rel=0.025312090292572975, ref_abs_avg=33.805850982666016, test_abs_avg=33.807167053222656
production_forward grad[40] vs paper_forward: mean_abs=0.8007445335388184, max_abs=5.1875, mean_rel=0.28867942094802856, max_rel=3218.749755859375, norm_rel=0.024007882922887802, ref_abs_avg=33.423988342285156, test_abs_avg=33.42325973510742
production_forward grad[41] vs paper_forward: mean_abs=0.6364274024963379, max_abs=2.5, mean_rel=0.1670805811882019, max_rel=32.72450637817383, norm_rel=0.022901015356183052, ref_abs_avg=28.028589248657227, test_abs_avg=27.9476318359375
production_forward grad[42] vs paper_forward: mean_abs=0.8150171041488647, max_abs=6.0, mean_rel=0.1613861620426178, max_rel=1285.6080322265625, norm_rel=0.025072088465094566, ref_abs_avg=32.58353805541992, test_abs_avg=32.58488464355469
production_forward grad[43] vs paper_forward: mean_abs=0.759459912776947, max_abs=4.75, mean_rel=0.32430127263069153, max_rel=2437.5, norm_rel=0.023576682433485985, ref_abs_avg=32.27558135986328, test_abs_avg=32.28783416748047
production_forward grad[44] vs paper_forward: mean_abs=0.6054286956787109, max_abs=2.25, mean_rel=0.11374151706695557, max_rel=9.086909294128418, norm_rel=0.02594425529241562, ref_abs_avg=23.331209182739258, test_abs_avg=23.33349609375
production_forward grad[45] vs paper_forward: mean_abs=0.7678430080413818, max_abs=5.0, mean_rel=0.16783881187438965, max_rel=1211.20703125, norm_rel=0.02485409379005432, ref_abs_avg=30.998455047607422, test_abs_avg=30.997751235961914
production_forward grad[46] vs paper_forward: mean_abs=0.723170280456543, max_abs=4.5, mean_rel=0.20095296204090118, max_rel=1624.9998779296875, norm_rel=0.02353624813258648, ref_abs_avg=30.809581756591797, test_abs_avg=30.808115005493164
production_forward grad[47] vs paper_forward: mean_abs=0.5433597564697266, max_abs=2.0, mean_rel=0.1379588395357132, max_rel=16.202932357788086, norm_rel=0.02168877422809601, ref_abs_avg=24.488143920898438, test_abs_avg=24.497650146484375
production_forward grad[48] vs paper_forward: mean_abs=0.7363860607147217, max_abs=5.25, mean_rel=0.17011260986328125, max_rel=1383.7401123046875, norm_rel=0.024750608950853348, ref_abs_avg=29.84271240234375, test_abs_avg=29.841819763183594
production_forward grad[49] vs paper_forward: mean_abs=0.684888482093811, max_abs=4.5, mean_rel=0.22465947270393372, max_rel=2031.2498779296875, norm_rel=0.023353496566414833, ref_abs_avg=29.415910720825195, test_abs_avg=29.412731170654297
production_forward grad[50] vs paper_forward: mean_abs=0.6481151580810547, max_abs=2.8203125, mean_rel=0.11596833169460297, max_rel=9.015992164611816, norm_rel=0.025780746713280678, ref_abs_avg=25.235851287841797, test_abs_avg=25.301700592041016
production_forward grad[51] vs paper_forward: mean_abs=0.8308864235877991, max_abs=6.0, mean_rel=0.1747140884399414, max_rel=1774.910400390625, norm_rel=0.026580439880490303, ref_abs_avg=31.424663543701172, test_abs_avg=31.427330017089844
production_forward grad[52] vs paper_forward: mean_abs=0.7756034135818481, max_abs=5.75, mean_rel=0.27075982093811035, max_rel=2218.75, norm_rel=0.024902941659092903, ref_abs_avg=31.29350471496582, test_abs_avg=31.298742294311523
production_forward grad[53] vs paper_forward: mean_abs=0.591971755027771, max_abs=2.25, mean_rel=0.10112707316875458, max_rel=4.639845371246338, norm_rel=0.025957806035876274, ref_abs_avg=22.90882110595703, test_abs_avg=22.890647888183594
production_forward grad[54] vs paper_forward: mean_abs=0.7727890014648438, max_abs=6.0, mean_rel=0.17079928517341614, max_rel=1295.5821533203125, norm_rel=0.026171445846557617, ref_abs_avg=29.58047866821289, test_abs_avg=29.582735061645508
production_forward grad[55] vs paper_forward: mean_abs=0.7177832126617432, max_abs=5.25, mean_rel=0.24917158484458923, max_rel=1968.7498779296875, norm_rel=0.024862654507160187, ref_abs_avg=28.8914852142334, test_abs_avg=28.898927688598633
production_forward grad[56] vs paper_forward: mean_abs=0.5279064178466797, max_abs=2.375, mean_rel=0.10572583228349686, max_rel=3.6954636573791504, norm_rel=0.0226149782538414, ref_abs_avg=22.77166175842285, test_abs_avg=22.784832000732422
production_forward grad[57] vs paper_forward: mean_abs=0.7037438750267029, max_abs=5.25, mean_rel=0.16873648762702942, max_rel=1780.3626708984375, norm_rel=0.02564861625432968, ref_abs_avg=27.461261749267578, test_abs_avg=27.4647216796875
production_forward grad[58] vs paper_forward: mean_abs=0.6647211313247681, max_abs=4.0, mean_rel=0.26654571294784546, max_rel=2250.0, norm_rel=0.02422916702926159, ref_abs_avg=27.469255447387695, test_abs_avg=27.473709106445312
production_forward grad[59] vs paper_forward: mean_abs=0.5047229528427124, max_abs=2.0, mean_rel=0.08591298758983612, max_rel=4.374187469482422, norm_rel=0.022995321080088615, ref_abs_avg=21.908000946044922, test_abs_avg=21.926807403564453
production_forward grad[60] vs paper_forward: mean_abs=0.6631289720535278, max_abs=6.0, mean_rel=0.15600642561912537, max_rel=937.5989990234375, norm_rel=0.025027748197317123, ref_abs_avg=26.500511169433594, test_abs_avg=26.503314971923828
production_forward grad[61] vs paper_forward: mean_abs=0.6161932945251465, max_abs=4.09375, mean_rel=0.24725529551506042, max_rel=1843.7498779296875, norm_rel=0.02374623529613018, ref_abs_avg=26.02222442626953, test_abs_avg=26.026004791259766
production_forward grad[62] vs paper_forward: mean_abs=0.4626428484916687, max_abs=2.0, mean_rel=0.09593938291072845, max_rel=6.974343299865723, norm_rel=0.023327073082327843, ref_abs_avg=20.29357147216797, test_abs_avg=20.29340171813965
production_forward grad[63] vs paper_forward: mean_abs=0.6245710849761963, max_abs=5.0, mean_rel=0.15502695739269257, max_rel=1245.058349609375, norm_rel=0.024631589651107788, ref_abs_avg=25.365079879760742, test_abs_avg=25.364620208740234
production_forward grad[64] vs paper_forward: mean_abs=0.5848858952522278, max_abs=4.5, mean_rel=0.2433665543794632, max_rel=1937.4998779296875, norm_rel=0.02385316975414753, ref_abs_avg=24.599367141723633, test_abs_avg=24.604360580444336
production_forward grad[65] vs paper_forward: mean_abs=0.467861533164978, max_abs=1.8125, mean_rel=0.17666152119636536, max_rel=56.50807571411133, norm_rel=0.02383972331881523, ref_abs_avg=20.203231811523438, test_abs_avg=20.201480865478516
production_forward grad[66] vs paper_forward: mean_abs=0.5965962409973145, max_abs=4.546875, mean_rel=0.1544063836336136, max_rel=1138.2498779296875, norm_rel=0.02423708140850067, ref_abs_avg=24.642349243164062, test_abs_avg=24.641395568847656
production_forward grad[67] vs paper_forward: mean_abs=0.5561895370483398, max_abs=4.5, mean_rel=0.19607749581336975, max_rel=1593.7498779296875, norm_rel=0.023162418976426125, ref_abs_avg=24.052745819091797, test_abs_avg=24.042217254638672
production_forward grad[68] vs paper_forward: mean_abs=0.4382455348968506, max_abs=2.0, mean_rel=0.08983399718999863, max_rel=6.944444179534912, norm_rel=0.02123597078025341, ref_abs_avg=21.0762882232666, test_abs_avg=21.081382751464844
production_forward grad[69] vs paper_forward: mean_abs=0.568036675453186, max_abs=5.0, mean_rel=0.16114062070846558, max_rel=1173.7822265625, norm_rel=0.024029502645134926, ref_abs_avg=23.651145935058594, test_abs_avg=23.649677276611328
production_forward grad[70] vs paper_forward: mean_abs=0.5291644930839539, max_abs=3.625, mean_rel=0.20579813420772552, max_rel=1531.2498779296875, norm_rel=0.022294791415333748, ref_abs_avg=23.639257431030273, test_abs_avg=23.637191772460938
production_forward grad[71] vs paper_forward: mean_abs=0.42296302318573, max_abs=1.625, mean_rel=0.09668989479541779, max_rel=11.577775001525879, norm_rel=0.02253253385424614, ref_abs_avg=18.974536895751953, test_abs_avg=19.009939193725586
production_forward grad[72] vs paper_forward: mean_abs=0.5405746698379517, max_abs=4.0, mean_rel=0.14330944418907166, max_rel=813.9794921875, norm_rel=0.023560447618365288, ref_abs_avg=22.948837280273438, test_abs_avg=22.948884963989258
production_forward grad[73] vs paper_forward: mean_abs=0.49392619729042053, max_abs=3.25, mean_rel=0.2216256856918335, max_rel=1499.9998779296875, norm_rel=0.021747490391135216, ref_abs_avg=22.646167755126953, test_abs_avg=22.647003173828125
production_forward grad[74] vs paper_forward: mean_abs=0.4556357264518738, max_abs=1.8125, mean_rel=0.20065976679325104, max_rel=32.419376373291016, norm_rel=0.022136937826871872, ref_abs_avg=20.70419692993164, test_abs_avg=20.679615020751953
production_forward grad[75] vs paper_forward: mean_abs=0.605847954750061, max_abs=5.0, mean_rel=0.1575596034526825, max_rel=829.5665893554688, norm_rel=0.02508424036204815, ref_abs_avg=24.203937530517578, test_abs_avg=24.202585220336914
production_forward grad[76] vs paper_forward: mean_abs=0.5684249997138977, max_abs=4.15625, mean_rel=0.2300407588481903, max_rel=1515.6248779296875, norm_rel=0.023707017302513123, ref_abs_avg=23.988142013549805, test_abs_avg=23.987625122070312
production_forward grad[77] vs paper_forward: mean_abs=0.4527554512023926, max_abs=1.712890625, mean_rel=0.1582379937171936, max_rel=19.01181983947754, norm_rel=0.023181091994047165, ref_abs_avg=19.522232055664062, test_abs_avg=19.544170379638672
production_forward grad[78] vs paper_forward: mean_abs=0.5627188682556152, max_abs=5.0, mean_rel=0.1481935679912567, max_rel=673.390380859375, norm_rel=0.024454763159155846, ref_abs_avg=23.015583038330078, test_abs_avg=23.014944076538086
production_forward grad[79] vs paper_forward: mean_abs=0.5183970928192139, max_abs=4.125, mean_rel=0.227862149477005, max_rel=1515.6248779296875, norm_rel=0.022221121937036514, ref_abs_avg=23.25675392150879, test_abs_avg=23.251585006713867
production_forward grad[80] vs paper_forward: mean_abs=0.43063831329345703, max_abs=1.875, mean_rel=0.06801873445510864, max_rel=1.8353825807571411, norm_rel=0.02247125282883644, ref_abs_avg=19.126304626464844, test_abs_avg=19.07547378540039
production_forward grad[81] vs paper_forward: mean_abs=0.5306735634803772, max_abs=6.0, mean_rel=0.15303167700767517, max_rel=597.1365966796875, norm_rel=0.024148650467395782, ref_abs_avg=22.0107421875, test_abs_avg=22.007966995239258
production_forward grad[82] vs paper_forward: mean_abs=0.4881550073623657, max_abs=4.0, mean_rel=0.22325432300567627, max_rel=1499.9998779296875, norm_rel=0.02264600247144699, ref_abs_avg=21.662092208862305, test_abs_avg=21.659120559692383
production_forward grad[83] vs paper_forward: mean_abs=0.37972307205200195, max_abs=1.5625, mean_rel=0.09707474708557129, max_rel=11.226677894592285, norm_rel=0.021779382601380348, ref_abs_avg=17.442293167114258, test_abs_avg=17.44235610961914
production_forward grad[84] vs paper_forward: mean_abs=0.48851609230041504, max_abs=5.0, mean_rel=0.15054982900619507, max_rel=1211.029541015625, norm_rel=0.022976921871304512, ref_abs_avg=21.334823608398438, test_abs_avg=21.331972122192383
production_forward grad[85] vs paper_forward: mean_abs=0.4391871392726898, max_abs=3.75, mean_rel=0.22833138704299927, max_rel=1374.9998779296875, norm_rel=0.021050861105322838, ref_abs_avg=20.917903900146484, test_abs_avg=20.911609649658203
production_forward grad[86] vs paper_forward: mean_abs=0.3515198230743408, max_abs=1.625, mean_rel=0.10740115493535995, max_rel=15.095052719116211, norm_rel=0.022005213424563408, ref_abs_avg=16.089275360107422, test_abs_avg=16.073131561279297
production_forward grad[87] vs paper_forward: mean_abs=0.4564099609851837, max_abs=4.3828125, mean_rel=0.13542714715003967, max_rel=767.599853515625, norm_rel=0.02287086844444275, ref_abs_avg=20.05618667602539, test_abs_avg=20.05404281616211
production_forward grad[88] vs paper_forward: mean_abs=0.41782474517822266, max_abs=4.5, mean_rel=0.19400173425674438, max_rel=1359.3748779296875, norm_rel=0.02110055834054947, ref_abs_avg=19.85717010498047, test_abs_avg=19.849225997924805
production_forward grad[89] vs paper_forward: mean_abs=0.3600277900695801, max_abs=1.34375, mean_rel=0.08445756137371063, max_rel=3.9746789932250977, norm_rel=0.02082205004990101, ref_abs_avg=16.804622650146484, test_abs_avg=16.773908615112305
production_forward grad[90] vs paper_forward: mean_abs=0.4433710277080536, max_abs=5.25, mean_rel=0.13939379155635834, max_rel=835.25537109375, norm_rel=0.02228284440934658, ref_abs_avg=20.024398803710938, test_abs_avg=20.023975372314453
production_forward grad[91] vs paper_forward: mean_abs=0.3956829905509949, max_abs=4.0, mean_rel=0.18920516967773438, max_rel=1757.8123779296875, norm_rel=0.02043282985687256, ref_abs_avg=19.459054946899414, test_abs_avg=19.454605102539062
production_forward grad[92] vs paper_forward: mean_abs=0.3200490474700928, max_abs=1.125, mean_rel=0.10661625117063522, max_rel=16.549325942993164, norm_rel=0.019369075074791908, ref_abs_avg=16.615110397338867, test_abs_avg=16.6032772064209
production_forward grad[93] vs paper_forward: mean_abs=0.3988874554634094, max_abs=4.25, mean_rel=0.13187643885612488, max_rel=684.4544677734375, norm_rel=0.021839382126927376, ref_abs_avg=18.455795288085938, test_abs_avg=18.454139709472656
production_forward grad[94] vs paper_forward: mean_abs=0.369945764541626, max_abs=3.5, mean_rel=0.1865224540233612, max_rel=1156.25, norm_rel=0.0200192853808403, ref_abs_avg=18.59664535522461, test_abs_avg=18.590496063232422
production_forward grad[95] vs paper_forward: mean_abs=0.30229341983795166, max_abs=1.125, mean_rel=0.10429996997117996, max_rel=8.21235179901123, norm_rel=0.020682966336607933, ref_abs_avg=14.607351303100586, test_abs_avg=14.610157012939453
production_forward grad[96] vs paper_forward: mean_abs=0.3920721411705017, max_abs=4.5, mean_rel=0.12798650562763214, max_rel=535.2941284179688, norm_rel=0.02139623649418354, ref_abs_avg=18.65093421936035, test_abs_avg=18.65087127685547
production_forward grad[97] vs paper_forward: mean_abs=0.34585314989089966, max_abs=3.375, mean_rel=0.16325700283050537, max_rel=1125.0, norm_rel=0.019049903377890587, ref_abs_avg=18.27365493774414, test_abs_avg=18.267793655395508
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016775119584053755, max_abs=0.04296875
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008913736790418625, max_abs=0.4609375, mean_rel=0.075596883893013, max_rel=105.49937438964844, norm_rel=0.020629581063985825, ref_abs_avg=0.46663185954093933, test_abs_avg=0.4666336178779602
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.543246269226074, max_abs=60.0, mean_rel=0.46439650654792786, max_rel=5537.10302734375, norm_rel=0.02115343138575554, ref_abs_avg=322.0852966308594, test_abs_avg=322.1158447265625
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.3456902503967285, max_abs=5.0, mean_rel=0.07752988487482071, max_rel=4.813527584075928, norm_rel=0.025695813819766045, ref_abs_avg=56.16132354736328, test_abs_avg=56.1797981262207
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.7008715867996216, max_abs=12.0, mean_rel=0.17941074073314667, max_rel=1703.4759521484375, norm_rel=0.02535449154675007, ref_abs_avg=67.49689483642578, test_abs_avg=67.49751281738281
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5696576833724976, max_abs=10.0, mean_rel=0.44134268164634705, max_rel=6624.99951171875, norm_rel=0.023690782487392426, ref_abs_avg=66.54086303710938, test_abs_avg=66.53694152832031
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.0935430526733398, max_abs=4.5, mean_rel=0.08087477087974548, max_rel=3.308917999267578, norm_rel=0.022555939853191376, ref_abs_avg=49.25524139404297, test_abs_avg=49.19044494628906
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.485816240310669, max_abs=10.5, mean_rel=0.1663396954536438, max_rel=2441.9658203125, norm_rel=0.025088444352149963, ref_abs_avg=59.542205810546875, test_abs_avg=59.53984069824219
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3616657257080078, max_abs=9.0, mean_rel=0.3281862139701843, max_rel=3749.999755859375, norm_rel=0.02339995466172695, ref_abs_avg=58.50823211669922, test_abs_avg=58.502235412597656
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0638809204101562, max_abs=4.125, mean_rel=0.10512632131576538, max_rel=8.594831466674805, norm_rel=0.023065010085701942, ref_abs_avg=46.467018127441406, test_abs_avg=46.45137023925781
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.3303699493408203, max_abs=9.5, mean_rel=0.18234258890151978, max_rel=1906.4549560546875, norm_rel=0.024807417765259743, ref_abs_avg=53.939334869384766, test_abs_avg=53.93943786621094
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.2315893173217773, max_abs=7.25, mean_rel=0.3874891996383667, max_rel=4500.0, norm_rel=0.023249195888638496, ref_abs_avg=53.24089050292969, test_abs_avg=53.240135192871094
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9477849006652832, max_abs=3.75, mean_rel=0.10601650923490524, max_rel=15.050293922424316, norm_rel=0.024719292297959328, ref_abs_avg=38.369720458984375, test_abs_avg=38.291603088378906
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.2318886518478394, max_abs=8.0, mean_rel=0.16727718710899353, max_rel=1294.7830810546875, norm_rel=0.024641213938593864, ref_abs_avg=50.32079315185547, test_abs_avg=50.32313537597656
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1467056274414062, max_abs=6.5, mean_rel=0.3643288314342499, max_rel=4312.5, norm_rel=0.02328301966190338, ref_abs_avg=49.488853454589844, test_abs_avg=49.49022674560547
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.9317550659179688, max_abs=3.84375, mean_rel=0.09744805842638016, max_rel=7.075933456420898, norm_rel=0.0245404914021492, ref_abs_avg=38.65168762207031, test_abs_avg=38.5845947265625
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1526577472686768, max_abs=8.0, mean_rel=0.15537616610527039, max_rel=838.6934204101562, norm_rel=0.0243842676281929, ref_abs_avg=47.54347229003906, test_abs_avg=47.54461669921875
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.065582036972046, max_abs=6.0, mean_rel=0.3246763348579407, max_rel=2812.499755859375, norm_rel=0.022901706397533417, ref_abs_avg=46.834312438964844, test_abs_avg=46.8383674621582
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8305271863937378, max_abs=3.75, mean_rel=0.7327136993408203, max_rel=250.45933532714844, norm_rel=0.02197732776403427, ref_abs_avg=38.43598937988281, test_abs_avg=38.44601821899414
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.085501790046692, max_abs=8.0, mean_rel=0.17385944724082947, max_rel=2050.48876953125, norm_rel=0.024376116693019867, ref_abs_avg=44.84699249267578, test_abs_avg=44.84722137451172
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=1.0043985843658447, max_abs=6.5, mean_rel=0.29749777913093567, max_rel=2999.999755859375, norm_rel=0.022645173594355583, ref_abs_avg=44.59004211425781, test_abs_avg=44.59321975708008
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.819298505783081, max_abs=4.125, mean_rel=0.0966978520154953, max_rel=14.439517974853516, norm_rel=0.022763686254620552, ref_abs_avg=36.25498962402344, test_abs_avg=36.22937774658203
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=1.029106855392456, max_abs=8.0, mean_rel=0.16256295144557953, max_rel=1656.96044921875, norm_rel=0.02417127415537834, ref_abs_avg=42.800209045410156, test_abs_avg=42.80146026611328
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9479557275772095, max_abs=5.5, mean_rel=0.30785420536994934, max_rel=4562.5, norm_rel=0.022465718910098076, ref_abs_avg=42.37274169921875, test_abs_avg=42.37504196166992
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.774540901184082, max_abs=3.0625, mean_rel=0.11491527408361435, max_rel=15.520109176635742, norm_rel=0.023788338527083397, ref_abs_avg=32.76759338378906, test_abs_avg=32.77220916748047
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9744825959205627, max_abs=7.0, mean_rel=0.15354545414447784, max_rel=1342.0560302734375, norm_rel=0.02410346455872059, ref_abs_avg=40.680389404296875, test_abs_avg=40.68147277832031
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.9016190767288208, max_abs=5.3125, mean_rel=0.2975326180458069, max_rel=3390.624755859375, norm_rel=0.022593343630433083, ref_abs_avg=40.0908203125, test_abs_avg=40.09089660644531
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8563942909240723, max_abs=3.125, mean_rel=0.09603657573461533, max_rel=9.203651428222656, norm_rel=0.023283127695322037, ref_abs_avg=37.00743103027344, test_abs_avg=36.99793243408203
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.1218724250793457, max_abs=8.0, mean_rel=0.16626441478729248, max_rel=1682.38623046875, norm_rel=0.025725416839122772, ref_abs_avg=43.81545639038086, test_abs_avg=43.817779541015625
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.039531946182251, max_abs=8.0, mean_rel=0.3377770483493805, max_rel=3078.124755859375, norm_rel=0.02424233965575695, ref_abs_avg=43.024417877197266, test_abs_avg=43.01405715942383
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.8543186187744141, max_abs=2.75, mean_rel=0.09717124700546265, max_rel=9.48245906829834, norm_rel=0.025926167145371437, ref_abs_avg=32.8680419921875, test_abs_avg=32.843482971191406
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.0548990964889526, max_abs=7.0, mean_rel=0.16750024259090424, max_rel=1672.3966064453125, norm_rel=0.02611403912305832, ref_abs_avg=40.60076141357422, test_abs_avg=40.600120544433594
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9928551316261292, max_abs=6.1875, mean_rel=0.33906862139701843, max_rel=2562.5, norm_rel=0.02485203556716442, ref_abs_avg=40.08302307128906, test_abs_avg=40.08856964111328
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7716267108917236, max_abs=3.75, mean_rel=0.09865555167198181, max_rel=5.46757173538208, norm_rel=0.02410106547176838, ref_abs_avg=31.821247100830078, test_abs_avg=31.760665893554688
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9749264717102051, max_abs=6.25, mean_rel=0.1783132553100586, max_rel=2190.485595703125, norm_rel=0.026017948985099792, ref_abs_avg=37.64593505859375, test_abs_avg=37.64379119873047
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.9091697931289673, max_abs=5.5, mean_rel=0.2784794569015503, max_rel=2375.0, norm_rel=0.024574728682637215, ref_abs_avg=37.139808654785156, test_abs_avg=37.131996154785156
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.7074123620986938, max_abs=3.1484375, mean_rel=0.10417428612709045, max_rel=6.328707218170166, norm_rel=0.026109974831342697, ref_abs_avg=27.519367218017578, test_abs_avg=27.505115509033203
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.9060213565826416, max_abs=6.0, mean_rel=0.17116913199424744, max_rel=1172.6251220703125, norm_rel=0.025697126984596252, ref_abs_avg=35.41404342651367, test_abs_avg=35.4116096496582
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8511143922805786, max_abs=5.75, mean_rel=0.31157585978507996, max_rel=2375.0, norm_rel=0.024534009397029877, ref_abs_avg=34.84027099609375, test_abs_avg=34.838829040527344
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6567181944847107, max_abs=2.40625, mean_rel=0.2626218795776367, max_rel=67.62020874023438, norm_rel=0.02331041730940342, ref_abs_avg=27.86807632446289, test_abs_avg=27.863311767578125
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8555670976638794, max_abs=6.0, mean_rel=0.17179018259048462, max_rel=1435.89453125, norm_rel=0.02538851834833622, ref_abs_avg=33.805850982666016, test_abs_avg=33.806434631347656
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7985655069351196, max_abs=5.0, mean_rel=0.2715476453304291, max_rel=2421.875, norm_rel=0.02396821603178978, ref_abs_avg=33.423988342285156, test_abs_avg=33.422698974609375
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6296544075012207, max_abs=2.75, mean_rel=0.19172599911689758, max_rel=48.325721740722656, norm_rel=0.022924145683646202, ref_abs_avg=28.028589248657227, test_abs_avg=27.991239547729492
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.8159773945808411, max_abs=5.25, mean_rel=0.16032561659812927, max_rel=1004.7279052734375, norm_rel=0.025094905868172646, ref_abs_avg=32.58353805541992, test_abs_avg=32.5848388671875
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7591581344604492, max_abs=4.5, mean_rel=0.2975849211215973, max_rel=2187.5, norm_rel=0.023579338565468788, ref_abs_avg=32.27558135986328, test_abs_avg=32.286476135253906
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.6288137435913086, max_abs=2.5, mean_rel=0.2165835201740265, max_rel=64.5898208618164, norm_rel=0.02647966332733631, ref_abs_avg=23.331209182739258, test_abs_avg=23.343008041381836
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7713949680328369, max_abs=5.5, mean_rel=0.1722109019756317, max_rel=1330.1104736328125, norm_rel=0.024973955005407333, ref_abs_avg=30.998455047607422, test_abs_avg=30.99825668334961
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.723734438419342, max_abs=4.875, mean_rel=0.2278115451335907, max_rel=1937.4998779296875, norm_rel=0.02354198321700096, ref_abs_avg=30.809581756591797, test_abs_avg=30.807785034179688
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5664737224578857, max_abs=2.25, mean_rel=0.13127902150154114, max_rel=11.401460647583008, norm_rel=0.022705713286995888, ref_abs_avg=24.488143920898438, test_abs_avg=24.48586654663086
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.7384378910064697, max_abs=5.0, mean_rel=0.16649094223976135, max_rel=1710.7003173828125, norm_rel=0.024820519611239433, ref_abs_avg=29.84271240234375, test_abs_avg=29.841581344604492
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6885808706283569, max_abs=4.25, mean_rel=0.23384633660316467, max_rel=1749.9998779296875, norm_rel=0.02348199114203453, ref_abs_avg=29.415910720825195, test_abs_avg=29.412824630737305
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6431941986083984, max_abs=2.5390625, mean_rel=0.10921214520931244, max_rel=8.689621925354004, norm_rel=0.025438616052269936, ref_abs_avg=25.235851287841797, test_abs_avg=25.290891647338867
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.8304891586303711, max_abs=6.0, mean_rel=0.18159790337085724, max_rel=2175.715576171875, norm_rel=0.026561619713902473, ref_abs_avg=31.424663543701172, test_abs_avg=31.425235748291016
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.773612916469574, max_abs=5.9375, mean_rel=0.2647158205509186, max_rel=2046.8748779296875, norm_rel=0.02483958937227726, ref_abs_avg=31.29350471496582, test_abs_avg=31.299701690673828
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.6165475845336914, max_abs=2.375, mean_rel=0.10227859020233154, max_rel=5.34912109375, norm_rel=0.02676582522690296, ref_abs_avg=22.90882110595703, test_abs_avg=22.89156723022461
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7726761102676392, max_abs=6.0, mean_rel=0.16891542077064514, max_rel=1333.9134521484375, norm_rel=0.026178285479545593, ref_abs_avg=29.58047866821289, test_abs_avg=29.582260131835938
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.7184793949127197, max_abs=4.5, mean_rel=0.26804810762405396, max_rel=2015.6248779296875, norm_rel=0.024901261553168297, ref_abs_avg=28.8914852142334, test_abs_avg=28.897876739501953
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5440528392791748, max_abs=2.0, mean_rel=0.11433041095733643, max_rel=5.433747291564941, norm_rel=0.023389413952827454, ref_abs_avg=22.77166175842285, test_abs_avg=22.772197723388672
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.7053530216217041, max_abs=5.0, mean_rel=0.1665637195110321, max_rel=1681.4371337890625, norm_rel=0.025709835812449455, ref_abs_avg=27.461261749267578, test_abs_avg=27.463470458984375
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.665484607219696, max_abs=4.0, mean_rel=0.2771044373512268, max_rel=2421.875, norm_rel=0.02426411211490631, ref_abs_avg=27.469255447387695, test_abs_avg=27.473655700683594
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.513539731502533, max_abs=2.375, mean_rel=0.09474039822816849, max_rel=6.122664928436279, norm_rel=0.023284515365958214, ref_abs_avg=21.908000946044922, test_abs_avg=21.928340911865234
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.664824903011322, max_abs=5.5, mean_rel=0.15689454972743988, max_rel=1115.094970703125, norm_rel=0.02508782222867012, ref_abs_avg=26.500511169433594, test_abs_avg=26.502111434936523
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.6170928478240967, max_abs=4.25, mean_rel=0.25274813175201416, max_rel=2390.625, norm_rel=0.023787522688508034, ref_abs_avg=26.02222442626953, test_abs_avg=26.025863647460938
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4733448028564453, max_abs=2.125, mean_rel=0.09905850142240524, max_rel=6.57275915145874, norm_rel=0.02354275807738304, ref_abs_avg=20.29357147216797, test_abs_avg=20.290584564208984
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.6254677772521973, max_abs=5.0, mean_rel=0.15578262507915497, max_rel=1438.3887939453125, norm_rel=0.024668745696544647, ref_abs_avg=25.365079879760742, test_abs_avg=25.36458969116211
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.584129810333252, max_abs=4.5, mean_rel=0.2566010355949402, max_rel=2312.5, norm_rel=0.023825451731681824, ref_abs_avg=24.599367141723633, test_abs_avg=24.601924896240234
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.48458993434906006, max_abs=1.75, mean_rel=0.22856919467449188, max_rel=78.8552474975586, norm_rel=0.02464282140135765, ref_abs_avg=20.203231811523438, test_abs_avg=20.224599838256836
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5974945425987244, max_abs=5.0, mean_rel=0.15303827822208405, max_rel=912.1494750976562, norm_rel=0.024287164211273193, ref_abs_avg=24.642349243164062, test_abs_avg=24.641529083251953
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5578314065933228, max_abs=4.0, mean_rel=0.18903854489326477, max_rel=1468.7498779296875, norm_rel=0.023182732984423637, ref_abs_avg=24.052745819091797, test_abs_avg=24.045326232910156
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.4444293975830078, max_abs=2.0, mean_rel=0.09615375846624374, max_rel=6.114757537841797, norm_rel=0.021457241848111153, ref_abs_avg=21.0762882232666, test_abs_avg=21.083940505981445
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5691714286804199, max_abs=5.5, mean_rel=0.16255629062652588, max_rel=1603.395263671875, norm_rel=0.024081144481897354, ref_abs_avg=23.651145935058594, test_abs_avg=23.649730682373047
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5288065075874329, max_abs=3.75, mean_rel=0.20422613620758057, max_rel=1531.2498779296875, norm_rel=0.022281844168901443, ref_abs_avg=23.639257431030273, test_abs_avg=23.638774871826172
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.41465115547180176, max_abs=1.5703125, mean_rel=0.06957677751779556, max_rel=3.179241180419922, norm_rel=0.022239243611693382, ref_abs_avg=18.974536895751953, test_abs_avg=18.988800048828125
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5420326590538025, max_abs=4.0, mean_rel=0.1459662914276123, max_rel=941.0117797851562, norm_rel=0.023614879697561264, ref_abs_avg=22.948837280273438, test_abs_avg=22.948745727539062
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.4964977502822876, max_abs=3.3125, mean_rel=0.23405879735946655, max_rel=1374.9998779296875, norm_rel=0.021856889128684998, ref_abs_avg=22.646167755126953, test_abs_avg=22.64722442626953
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4637553095817566, max_abs=1.7734375, mean_rel=0.2256237417459488, max_rel=62.17884063720703, norm_rel=0.02263590320944786, ref_abs_avg=20.70419692993164, test_abs_avg=20.689056396484375
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.6040228009223938, max_abs=5.0, mean_rel=0.16161151230335236, max_rel=1080.836181640625, norm_rel=0.025007111951708794, ref_abs_avg=24.203937530517578, test_abs_avg=24.202491760253906
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5648192167282104, max_abs=3.90625, mean_rel=0.22800259292125702, max_rel=1492.1873779296875, norm_rel=0.023557603359222412, ref_abs_avg=23.988142013549805, test_abs_avg=23.98638916015625
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.4544105529785156, max_abs=2.0, mean_rel=0.14964541792869568, max_rel=15.187728881835938, norm_rel=0.02387351170182228, ref_abs_avg=19.522232055664062, test_abs_avg=19.5234375
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5625072717666626, max_abs=5.0625, mean_rel=0.14846482872962952, max_rel=765.8377685546875, norm_rel=0.024443477392196655, ref_abs_avg=23.015583038330078, test_abs_avg=23.014392852783203
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.5169063806533813, max_abs=4.25, mean_rel=0.22179856896400452, max_rel=1843.7498779296875, norm_rel=0.0221443809568882, ref_abs_avg=23.25675392150879, test_abs_avg=23.253236770629883
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.4257011413574219, max_abs=2.0, mean_rel=0.06669490039348602, max_rel=1.5213623046875, norm_rel=0.022489672526717186, ref_abs_avg=19.126304626464844, test_abs_avg=19.084911346435547
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.5316927433013916, max_abs=7.0, mean_rel=0.1529918909072876, max_rel=628.9307861328125, norm_rel=0.024198686704039574, ref_abs_avg=22.0107421875, test_abs_avg=22.007678985595703
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.48526692390441895, max_abs=4.25, mean_rel=0.21267962455749512, max_rel=1234.375, norm_rel=0.022467032074928284, ref_abs_avg=21.662092208862305, test_abs_avg=21.659969329833984
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.36513423919677734, max_abs=1.5, mean_rel=0.09846100211143494, max_rel=10.222312927246094, norm_rel=0.02129613608121872, ref_abs_avg=17.442293167114258, test_abs_avg=17.428245544433594
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.48862630128860474, max_abs=4.5, mean_rel=0.14841949939727783, max_rel=1280.4224853515625, norm_rel=0.022984378039836884, ref_abs_avg=21.334823608398438, test_abs_avg=21.330427169799805
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.43590086698532104, max_abs=4.0, mean_rel=0.20606832206249237, max_rel=1374.9998779296875, norm_rel=0.020844517275691032, ref_abs_avg=20.917903900146484, test_abs_avg=20.91119956970215
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.35925936698913574, max_abs=1.75, mean_rel=0.12453636527061462, max_rel=15.403045654296875, norm_rel=0.022331131622195244, ref_abs_avg=16.089275360107422, test_abs_avg=16.05756378173828
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4568735361099243, max_abs=5.0, mean_rel=0.13764657080173492, max_rel=1019.5599365234375, norm_rel=0.02288433350622654, ref_abs_avg=20.05618667602539, test_abs_avg=20.054489135742188
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.4155304729938507, max_abs=4.0, mean_rel=0.19851870834827423, max_rel=1273.4375, norm_rel=0.020954463630914688, ref_abs_avg=19.85717010498047, test_abs_avg=19.848474502563477
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.35340970754623413, max_abs=1.375, mean_rel=0.10084293782711029, max_rel=9.045122146606445, norm_rel=0.020594825968146324, ref_abs_avg=16.804622650146484, test_abs_avg=16.783090591430664
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.44447264075279236, max_abs=5.0, mean_rel=0.13570338487625122, max_rel=652.5531616210938, norm_rel=0.02234116569161415, ref_abs_avg=20.024398803710938, test_abs_avg=20.02412223815918
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.39768558740615845, max_abs=3.75, mean_rel=0.18510594964027405, max_rel=1382.8123779296875, norm_rel=0.02053706906735897, ref_abs_avg=19.459054946899414, test_abs_avg=19.452484130859375
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.3144618272781372, max_abs=1.25, mean_rel=0.09386463463306427, max_rel=20.524124145507812, norm_rel=0.019169814884662628, ref_abs_avg=16.615110397338867, test_abs_avg=16.59392547607422
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.39898401498794556, max_abs=4.0, mean_rel=0.1334068328142166, max_rel=809.232177734375, norm_rel=0.02185688354074955, ref_abs_avg=18.455795288085938, test_abs_avg=18.453977584838867
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3738865256309509, max_abs=3.5, mean_rel=0.18127840757369995, max_rel=1156.25, norm_rel=0.020273862406611443, ref_abs_avg=18.59664535522461, test_abs_avg=18.589107513427734
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.3072357177734375, max_abs=1.1875, mean_rel=0.0979168713092804, max_rel=6.801728248596191, norm_rel=0.02112426608800888, ref_abs_avg=14.607351303100586, test_abs_avg=14.608282089233398
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.392496794462204, max_abs=4.25, mean_rel=0.1267610639333725, max_rel=492.30487060546875, norm_rel=0.02141597494482994, ref_abs_avg=18.65093421936035, test_abs_avg=18.650476455688477
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.3500760495662689, max_abs=3.125, mean_rel=0.15889902412891388, max_rel=1624.9998779296875, norm_rel=0.01929680071771145, ref_abs_avg=18.27365493774414, test_abs_avg=18.265647888183594
production_forward2 vs paper_forward output: mean_abs=0.0016741123981773853, max_abs=0.04296875
production_forward2 grad[0] vs paper_forward: mean_abs=0.008903170004487038, max_abs=0.4296875, mean_rel=0.07549792528152466, max_rel=112.45730590820312, norm_rel=0.020600121468305588, ref_abs_avg=0.46663185954093933, test_abs_avg=0.46663373708724976
production_forward2 grad[1] vs paper_forward: mean_abs=7.549649715423584, max_abs=56.0, mean_rel=0.5091069936752319, max_rel=6479.57861328125, norm_rel=0.02108626440167427, ref_abs_avg=322.0852966308594, test_abs_avg=322.1294860839844
production_forward2 grad[2] vs paper_forward: mean_abs=1.2984375953674316, max_abs=6.0, mean_rel=0.185415118932724, max_rel=57.184913635253906, norm_rel=0.02439403347671032, ref_abs_avg=56.16132354736328, test_abs_avg=56.16975402832031
production_forward2 grad[3] vs paper_forward: mean_abs=1.6933281421661377, max_abs=12.0, mean_rel=0.18680095672607422, max_rel=2518.20166015625, norm_rel=0.025220666080713272, ref_abs_avg=67.49689483642578, test_abs_avg=67.49539184570312
production_forward2 grad[4] vs paper_forward: mean_abs=1.5644367933273315, max_abs=9.875, mean_rel=0.4217855632305145, max_rel=6374.99951171875, norm_rel=0.023637259379029274, ref_abs_avg=66.54086303710938, test_abs_avg=66.53890991210938
production_forward2 grad[5] vs paper_forward: mean_abs=1.1282823085784912, max_abs=4.75, mean_rel=0.09588326513767242, max_rel=5.557697296142578, norm_rel=0.023013949394226074, ref_abs_avg=49.25524139404297, test_abs_avg=49.219635009765625
production_forward2 grad[6] vs paper_forward: mean_abs=1.4827455282211304, max_abs=9.75, mean_rel=0.16594234108924866, max_rel=2633.510986328125, norm_rel=0.025046637281775475, ref_abs_avg=59.542205810546875, test_abs_avg=59.540245056152344
production_forward2 grad[7] vs paper_forward: mean_abs=1.3600893020629883, max_abs=8.5, mean_rel=0.3400517702102661, max_rel=3124.999755859375, norm_rel=0.023360401391983032, ref_abs_avg=58.50823211669922, test_abs_avg=58.49773406982422
production_forward2 grad[8] vs paper_forward: mean_abs=1.0722455978393555, max_abs=4.5, mean_rel=0.10991460829973221, max_rel=8.67766284942627, norm_rel=0.023233188316226006, ref_abs_avg=46.467018127441406, test_abs_avg=46.43315505981445
production_forward2 grad[9] vs paper_forward: mean_abs=1.3296679258346558, max_abs=11.0, mean_rel=0.1836780458688736, max_rel=1912.4940185546875, norm_rel=0.02478659898042679, ref_abs_avg=53.939334869384766, test_abs_avg=53.93986129760742
production_forward2 grad[10] vs paper_forward: mean_abs=1.2306686639785767, max_abs=7.5, mean_rel=0.37248528003692627, max_rel=5000.0, norm_rel=0.0232270285487175, ref_abs_avg=53.24089050292969, test_abs_avg=53.24201202392578
production_forward2 grad[11] vs paper_forward: mean_abs=0.9845428466796875, max_abs=3.21875, mean_rel=0.11094570904970169, max_rel=13.838393211364746, norm_rel=0.025824354961514473, ref_abs_avg=38.369720458984375, test_abs_avg=38.310791015625
production_forward2 grad[12] vs paper_forward: mean_abs=1.2310082912445068, max_abs=8.0, mean_rel=0.15927284955978394, max_rel=1022.6597290039062, norm_rel=0.024623598903417587, ref_abs_avg=50.32079315185547, test_abs_avg=50.32407760620117
production_forward2 grad[13] vs paper_forward: mean_abs=1.139580488204956, max_abs=6.75, mean_rel=0.3578791320323944, max_rel=4500.0, norm_rel=0.02314496971666813, ref_abs_avg=49.488853454589844, test_abs_avg=49.48780059814453
production_forward2 grad[14] vs paper_forward: mean_abs=0.8999083042144775, max_abs=3.5, mean_rel=0.10067908465862274, max_rel=6.342042446136475, norm_rel=0.023776622489094734, ref_abs_avg=38.65168762207031, test_abs_avg=38.59299087524414
production_forward2 grad[15] vs paper_forward: mean_abs=1.1503911018371582, max_abs=7.5234375, mean_rel=0.15624406933784485, max_rel=697.777099609375, norm_rel=0.02433692291378975, ref_abs_avg=47.54347229003906, test_abs_avg=47.54522705078125
production_forward2 grad[16] vs paper_forward: mean_abs=1.0646517276763916, max_abs=6.5, mean_rel=0.3249349296092987, max_rel=2687.499755859375, norm_rel=0.022864090278744698, ref_abs_avg=46.834312438964844, test_abs_avg=46.84008026123047
production_forward2 grad[17] vs paper_forward: mean_abs=0.8203161954879761, max_abs=4.0, mean_rel=0.7677479982376099, max_rel=298.1609191894531, norm_rel=0.021808277815580368, ref_abs_avg=38.43598937988281, test_abs_avg=38.43092727661133
production_forward2 grad[18] vs paper_forward: mean_abs=1.0832149982452393, max_abs=8.5, mean_rel=0.17383217811584473, max_rel=2241.262451171875, norm_rel=0.024310030043125153, ref_abs_avg=44.84699249267578, test_abs_avg=44.84726333618164
production_forward2 grad[19] vs paper_forward: mean_abs=1.001876950263977, max_abs=5.5, mean_rel=0.2712990343570709, max_rel=3312.499755859375, norm_rel=0.022590747103095055, ref_abs_avg=44.59004211425781, test_abs_avg=44.59189987182617
production_forward2 grad[20] vs paper_forward: mean_abs=0.8135900497436523, max_abs=3.0, mean_rel=0.10482990741729736, max_rel=15.772619247436523, norm_rel=0.022333046421408653, ref_abs_avg=36.25498962402344, test_abs_avg=36.19362258911133
production_forward2 grad[21] vs paper_forward: mean_abs=1.0273523330688477, max_abs=8.0, mean_rel=0.16411185264587402, max_rel=1421.421142578125, norm_rel=0.02411901019513607, ref_abs_avg=42.800209045410156, test_abs_avg=42.80229949951172
production_forward2 grad[22] vs paper_forward: mean_abs=0.9468610286712646, max_abs=5.25, mean_rel=0.2968807816505432, max_rel=4062.499755859375, norm_rel=0.022438669577240944, ref_abs_avg=42.37274169921875, test_abs_avg=42.37272262573242
production_forward2 grad[23] vs paper_forward: mean_abs=0.777409553527832, max_abs=2.9375, mean_rel=0.10997004806995392, max_rel=12.26635456085205, norm_rel=0.02389850839972496, ref_abs_avg=32.76759338378906, test_abs_avg=32.77355194091797
production_forward2 grad[24] vs paper_forward: mean_abs=0.9735954999923706, max_abs=6.5, mean_rel=0.15721943974494934, max_rel=1250.11962890625, norm_rel=0.02407434768974781, ref_abs_avg=40.680389404296875, test_abs_avg=40.683441162109375
production_forward2 grad[25] vs paper_forward: mean_abs=0.9023779630661011, max_abs=5.25, mean_rel=0.29166167974472046, max_rel=2874.999755859375, norm_rel=0.022605398669838905, ref_abs_avg=40.0908203125, test_abs_avg=40.09156036376953
production_forward2 grad[26] vs paper_forward: mean_abs=0.8749270439147949, max_abs=3.75, mean_rel=0.09996632486581802, max_rel=7.451333522796631, norm_rel=0.02387063391506672, ref_abs_avg=37.00743103027344, test_abs_avg=37.00518798828125
production_forward2 grad[27] vs paper_forward: mean_abs=1.1204307079315186, max_abs=8.0, mean_rel=0.16621005535125732, max_rel=1229.717041015625, norm_rel=0.02569982223212719, ref_abs_avg=43.81545639038086, test_abs_avg=43.81849670410156
production_forward2 grad[28] vs paper_forward: mean_abs=1.0398948192596436, max_abs=6.75, mean_rel=0.32041338086128235, max_rel=3124.999755859375, norm_rel=0.024239493533968925, ref_abs_avg=43.024417877197266, test_abs_avg=43.01537322998047
production_forward2 grad[29] vs paper_forward: mean_abs=0.8582000732421875, max_abs=2.890625, mean_rel=0.09623830020427704, max_rel=9.091205596923828, norm_rel=0.025981388986110687, ref_abs_avg=32.8680419921875, test_abs_avg=32.864105224609375
production_forward2 grad[30] vs paper_forward: mean_abs=1.05410897731781, max_abs=7.0, mean_rel=0.16394734382629395, max_rel=1120.034912109375, norm_rel=0.026087665930390358, ref_abs_avg=40.60076141357422, test_abs_avg=40.60257339477539
production_forward2 grad[31] vs paper_forward: mean_abs=0.9920265674591064, max_abs=6.5, mean_rel=0.35106927156448364, max_rel=2749.999755859375, norm_rel=0.02481374889612198, ref_abs_avg=40.08302307128906, test_abs_avg=40.08888244628906
production_forward2 grad[32] vs paper_forward: mean_abs=0.7445831298828125, max_abs=3.5, mean_rel=0.10363689810037613, max_rel=6.814264297485352, norm_rel=0.023852389305830002, ref_abs_avg=31.821247100830078, test_abs_avg=31.801273345947266
production_forward2 grad[33] vs paper_forward: mean_abs=0.9733482599258423, max_abs=6.5, mean_rel=0.17956557869911194, max_rel=1792.2119140625, norm_rel=0.02596936747431755, ref_abs_avg=37.64593505859375, test_abs_avg=37.64356231689453
production_forward2 grad[34] vs paper_forward: mean_abs=0.908845067024231, max_abs=6.0, mean_rel=0.25879424810409546, max_rel=2500.0, norm_rel=0.024570796638727188, ref_abs_avg=37.139808654785156, test_abs_avg=37.13656234741211
production_forward2 grad[35] vs paper_forward: mean_abs=0.7285805940628052, max_abs=2.98046875, mean_rel=0.10121184587478638, max_rel=8.159855842590332, norm_rel=0.026506207883358, ref_abs_avg=27.519367218017578, test_abs_avg=27.49034881591797
production_forward2 grad[36] vs paper_forward: mean_abs=0.9052713513374329, max_abs=6.0, mean_rel=0.17104065418243408, max_rel=1669.708740234375, norm_rel=0.025679728016257286, ref_abs_avg=35.41404342651367, test_abs_avg=35.412784576416016
production_forward2 grad[37] vs paper_forward: mean_abs=0.8489720225334167, max_abs=5.5, mean_rel=0.3150467276573181, max_rel=2500.0, norm_rel=0.02448280341923237, ref_abs_avg=34.84027099609375, test_abs_avg=34.83540344238281
production_forward2 grad[38] vs paper_forward: mean_abs=0.6253254413604736, max_abs=2.5, mean_rel=0.2773076891899109, max_rel=64.04861450195312, norm_rel=0.022711703553795815, ref_abs_avg=27.86807632446289, test_abs_avg=27.880197525024414
production_forward2 grad[39] vs paper_forward: mean_abs=0.8552657961845398, max_abs=6.0, mean_rel=0.17199358344078064, max_rel=1670.626953125, norm_rel=0.025359412655234337, ref_abs_avg=33.805850982666016, test_abs_avg=33.80679702758789
production_forward2 grad[40] vs paper_forward: mean_abs=0.8009233474731445, max_abs=5.53125, mean_rel=0.27591198682785034, max_rel=2828.124755859375, norm_rel=0.02403162233531475, ref_abs_avg=33.423988342285156, test_abs_avg=33.422142028808594
production_forward2 grad[41] vs paper_forward: mean_abs=0.6413249969482422, max_abs=2.5, mean_rel=0.18097467720508575, max_rel=34.817352294921875, norm_rel=0.02310718595981598, ref_abs_avg=28.028589248657227, test_abs_avg=27.952152252197266
production_forward2 grad[42] vs paper_forward: mean_abs=0.8161711692810059, max_abs=6.0, mean_rel=0.16190601885318756, max_rel=1248.6500244140625, norm_rel=0.025094423443078995, ref_abs_avg=32.58353805541992, test_abs_avg=32.58442687988281
production_forward2 grad[43] vs paper_forward: mean_abs=0.7610355019569397, max_abs=4.75, mean_rel=0.3172534108161926, max_rel=1999.9998779296875, norm_rel=0.02361460216343403, ref_abs_avg=32.27558135986328, test_abs_avg=32.28924560546875
production_forward2 grad[44] vs paper_forward: mean_abs=0.6188869476318359, max_abs=2.875, mean_rel=0.20327168703079224, max_rel=54.57221984863281, norm_rel=0.026282893493771553, ref_abs_avg=23.331209182739258, test_abs_avg=23.318798065185547
production_forward2 grad[45] vs paper_forward: mean_abs=0.7695027589797974, max_abs=5.0, mean_rel=0.16976298391819, max_rel=1253.999755859375, norm_rel=0.024895165115594864, ref_abs_avg=30.998455047607422, test_abs_avg=30.99806785583496
production_forward2 grad[46] vs paper_forward: mean_abs=0.7238438129425049, max_abs=4.75, mean_rel=0.20565176010131836, max_rel=1937.4998779296875, norm_rel=0.023546205833554268, ref_abs_avg=30.809581756591797, test_abs_avg=30.808134078979492
production_forward2 grad[47] vs paper_forward: mean_abs=0.5672025680541992, max_abs=2.0, mean_rel=0.14838142693042755, max_rel=15.8395357131958, norm_rel=0.02243090607225895, ref_abs_avg=24.488143920898438, test_abs_avg=24.50409698486328
production_forward2 grad[48] vs paper_forward: mean_abs=0.7369188666343689, max_abs=5.5, mean_rel=0.1691005378961563, max_rel=1544.846923828125, norm_rel=0.024772677570581436, ref_abs_avg=29.84271240234375, test_abs_avg=29.842235565185547
production_forward2 grad[49] vs paper_forward: mean_abs=0.6857724189758301, max_abs=4.0, mean_rel=0.2290966510772705, max_rel=2031.2498779296875, norm_rel=0.023387162014842033, ref_abs_avg=29.415910720825195, test_abs_avg=29.41332244873047
production_forward2 grad[50] vs paper_forward: mean_abs=0.6476020812988281, max_abs=3.0546875, mean_rel=0.1245356872677803, max_rel=16.318538665771484, norm_rel=0.02542184852063656, ref_abs_avg=25.235851287841797, test_abs_avg=25.299318313598633
production_forward2 grad[51] vs paper_forward: mean_abs=0.8296396136283875, max_abs=6.0, mean_rel=0.17764705419540405, max_rel=1245.2750244140625, norm_rel=0.02652689814567566, ref_abs_avg=31.424663543701172, test_abs_avg=31.427030563354492
production_forward2 grad[52] vs paper_forward: mean_abs=0.773605465888977, max_abs=5.5625, mean_rel=0.264107882976532, max_rel=2250.0, norm_rel=0.024830911308526993, ref_abs_avg=31.29350471496582, test_abs_avg=31.298818588256836
production_forward2 grad[53] vs paper_forward: mean_abs=0.60526442527771, max_abs=2.125, mean_rel=0.10847032070159912, max_rel=5.546142101287842, norm_rel=0.026222283020615578, ref_abs_avg=22.90882110595703, test_abs_avg=22.881704330444336
production_forward2 grad[54] vs paper_forward: mean_abs=0.7723292708396912, max_abs=5.0, mean_rel=0.16924750804901123, max_rel=1249.58447265625, norm_rel=0.02615044079720974, ref_abs_avg=29.58047866821289, test_abs_avg=29.58237075805664
production_forward2 grad[55] vs paper_forward: mean_abs=0.7175023555755615, max_abs=5.375, mean_rel=0.26260533928871155, max_rel=2031.2498779296875, norm_rel=0.02485746517777443, ref_abs_avg=28.8914852142334, test_abs_avg=28.897472381591797
production_forward2 grad[56] vs paper_forward: mean_abs=0.5204458236694336, max_abs=2.25, mean_rel=0.11202603578567505, max_rel=5.493711948394775, norm_rel=0.022462429478764534, ref_abs_avg=22.77166175842285, test_abs_avg=22.7899169921875
production_forward2 grad[57] vs paper_forward: mean_abs=0.7051111459732056, max_abs=4.75, mean_rel=0.16777357459068298, max_rel=1670.4454345703125, norm_rel=0.025703294202685356, ref_abs_avg=27.461261749267578, test_abs_avg=27.4642333984375
production_forward2 grad[58] vs paper_forward: mean_abs=0.6661478281021118, max_abs=4.34375, mean_rel=0.2742622196674347, max_rel=1999.9998779296875, norm_rel=0.024281971156597137, ref_abs_avg=27.469255447387695, test_abs_avg=27.473909378051758
production_forward2 grad[59] vs paper_forward: mean_abs=0.500225305557251, max_abs=1.9375, mean_rel=0.0927605926990509, max_rel=7.298969268798828, norm_rel=0.02273264341056347, ref_abs_avg=21.908000946044922, test_abs_avg=21.924394607543945
production_forward2 grad[60] vs paper_forward: mean_abs=0.6634542942047119, max_abs=6.0, mean_rel=0.15721577405929565, max_rel=962.95556640625, norm_rel=0.025045247748494148, ref_abs_avg=26.500511169433594, test_abs_avg=26.502840042114258
production_forward2 grad[61] vs paper_forward: mean_abs=0.6166267395019531, max_abs=4.375, mean_rel=0.25617313385009766, max_rel=1937.4998779296875, norm_rel=0.02377285622060299, ref_abs_avg=26.02222442626953, test_abs_avg=26.026962280273438
production_forward2 grad[62] vs paper_forward: mean_abs=0.4672069549560547, max_abs=2.0, mean_rel=0.10480684041976929, max_rel=11.763026237487793, norm_rel=0.023254891857504845, ref_abs_avg=20.29357147216797, test_abs_avg=20.2967586517334
production_forward2 grad[63] vs paper_forward: mean_abs=0.6253216862678528, max_abs=6.0, mean_rel=0.1546655297279358, max_rel=1361.056640625, norm_rel=0.02466960996389389, ref_abs_avg=25.365079879760742, test_abs_avg=25.36408233642578
production_forward2 grad[64] vs paper_forward: mean_abs=0.5859026908874512, max_abs=5.0, mean_rel=0.24634480476379395, max_rel=1812.4998779296875, norm_rel=0.023885849863290787, ref_abs_avg=24.599367141723633, test_abs_avg=24.60491943359375
production_forward2 grad[65] vs paper_forward: mean_abs=0.4687551259994507, max_abs=1.8125, mean_rel=0.15377521514892578, max_rel=43.328975677490234, norm_rel=0.02415216714143753, ref_abs_avg=20.203231811523438, test_abs_avg=20.22060203552246
production_forward2 grad[66] vs paper_forward: mean_abs=0.5974850058555603, max_abs=5.0, mean_rel=0.15335087478160858, max_rel=991.6631469726562, norm_rel=0.024272631853818893, ref_abs_avg=24.642349243164062, test_abs_avg=24.641441345214844
production_forward2 grad[67] vs paper_forward: mean_abs=0.5566472411155701, max_abs=4.0, mean_rel=0.1893836259841919, max_rel=1656.2498779296875, norm_rel=0.023182561621069908, ref_abs_avg=24.052745819091797, test_abs_avg=24.042884826660156
production_forward2 grad[68] vs paper_forward: mean_abs=0.44467949867248535, max_abs=1.875, mean_rel=0.09070439636707306, max_rel=7.316468238830566, norm_rel=0.021189840510487556, ref_abs_avg=21.0762882232666, test_abs_avg=21.085081100463867
production_forward2 grad[69] vs paper_forward: mean_abs=0.5685967803001404, max_abs=4.0, mean_rel=0.1601565033197403, max_rel=1219.812255859375, norm_rel=0.024058641865849495, ref_abs_avg=23.651145935058594, test_abs_avg=23.649032592773438
production_forward2 grad[70] vs paper_forward: mean_abs=0.5305804014205933, max_abs=3.75, mean_rel=0.2110057920217514, max_rel=1406.2498779296875, norm_rel=0.022349374368786812, ref_abs_avg=23.639257431030273, test_abs_avg=23.63788604736328
production_forward2 grad[71] vs paper_forward: mean_abs=0.4102494716644287, max_abs=1.3984375, mean_rel=0.0951056033372879, max_rel=13.142454147338867, norm_rel=0.022057224065065384, ref_abs_avg=18.974536895751953, test_abs_avg=19.005111694335938
production_forward2 grad[72] vs paper_forward: mean_abs=0.5413311719894409, max_abs=4.5, mean_rel=0.14360208809375763, max_rel=870.0231323242188, norm_rel=0.02359488420188427, ref_abs_avg=22.948837280273438, test_abs_avg=22.949085235595703
production_forward2 grad[73] vs paper_forward: mean_abs=0.49467581510543823, max_abs=3.5, mean_rel=0.22578421235084534, max_rel=1531.2498779296875, norm_rel=0.021784164011478424, ref_abs_avg=22.646167755126953, test_abs_avg=22.646310806274414
production_forward2 grad[74] vs paper_forward: mean_abs=0.45869553089141846, max_abs=1.875, mean_rel=0.20825357735157013, max_rel=36.958953857421875, norm_rel=0.022266583517193794, ref_abs_avg=20.70419692993164, test_abs_avg=20.68966293334961
production_forward2 grad[75] vs paper_forward: mean_abs=0.6039360761642456, max_abs=5.5, mean_rel=0.15805985033512115, max_rel=719.1463012695312, norm_rel=0.025005804374814034, ref_abs_avg=24.203937530517578, test_abs_avg=24.201393127441406
production_forward2 grad[76] vs paper_forward: mean_abs=0.5655982494354248, max_abs=4.0, mean_rel=0.22982586920261383, max_rel=1390.6248779296875, norm_rel=0.0235954150557518, ref_abs_avg=23.988142013549805, test_abs_avg=23.987768173217773
production_forward2 grad[77] vs paper_forward: mean_abs=0.44534528255462646, max_abs=1.75, mean_rel=0.15680618584156036, max_rel=20.40239906311035, norm_rel=0.02294204942882061, ref_abs_avg=19.522232055664062, test_abs_avg=19.535566329956055
production_forward2 grad[78] vs paper_forward: mean_abs=0.5618539452552795, max_abs=4.75, mean_rel=0.14884623885154724, max_rel=911.1122436523438, norm_rel=0.024409528821706772, ref_abs_avg=23.015583038330078, test_abs_avg=23.014564514160156
production_forward2 grad[79] vs paper_forward: mean_abs=0.5175127387046814, max_abs=4.375, mean_rel=0.23253068327903748, max_rel=1546.8748779296875, norm_rel=0.022182023152709007, ref_abs_avg=23.25675392150879, test_abs_avg=23.25165557861328
production_forward2 grad[80] vs paper_forward: mean_abs=0.433685302734375, max_abs=1.875, mean_rel=0.06711651384830475, max_rel=1.8644589185714722, norm_rel=0.02248787321150303, ref_abs_avg=19.126304626464844, test_abs_avg=19.06792449951172
production_forward2 grad[81] vs paper_forward: mean_abs=0.5312552452087402, max_abs=6.0, mean_rel=0.15374816954135895, max_rel=612.1776733398438, norm_rel=0.024166112765669823, ref_abs_avg=22.0107421875, test_abs_avg=22.007686614990234
production_forward2 grad[82] vs paper_forward: mean_abs=0.4881569743156433, max_abs=4.25, mean_rel=0.22191432118415833, max_rel=1312.4998779296875, norm_rel=0.022638246417045593, ref_abs_avg=21.662092208862305, test_abs_avg=21.65858268737793
production_forward2 grad[83] vs paper_forward: mean_abs=0.38385534286499023, max_abs=1.4609375, mean_rel=0.09780912846326828, max_rel=10.608607292175293, norm_rel=0.02181861363351345, ref_abs_avg=17.442293167114258, test_abs_avg=17.431644439697266
production_forward2 grad[84] vs paper_forward: mean_abs=0.48842406272888184, max_abs=6.0, mean_rel=0.15106076002120972, max_rel=1204.7210693359375, norm_rel=0.0229646023362875, ref_abs_avg=21.334823608398438, test_abs_avg=21.331207275390625
production_forward2 grad[85] vs paper_forward: mean_abs=0.4395979046821594, max_abs=3.75, mean_rel=0.22638371586799622, max_rel=1499.9998779296875, norm_rel=0.021061627194285393, ref_abs_avg=20.917903900146484, test_abs_avg=20.911819458007812
production_forward2 grad[86] vs paper_forward: mean_abs=0.34937453269958496, max_abs=1.625, mean_rel=0.11491024494171143, max_rel=15.311081886291504, norm_rel=0.021847212687134743, ref_abs_avg=16.089275360107422, test_abs_avg=16.061885833740234
production_forward2 grad[87] vs paper_forward: mean_abs=0.4564639627933502, max_abs=4.5, mean_rel=0.1346738338470459, max_rel=824.863525390625, norm_rel=0.022881779819726944, ref_abs_avg=20.05618667602539, test_abs_avg=20.054622650146484
production_forward2 grad[88] vs paper_forward: mean_abs=0.4178256392478943, max_abs=4.5, mean_rel=0.19879934191703796, max_rel=1421.8748779296875, norm_rel=0.02110595814883709, ref_abs_avg=19.85717010498047, test_abs_avg=19.84927749633789
production_forward2 grad[89] vs paper_forward: mean_abs=0.3537788391113281, max_abs=1.36181640625, mean_rel=0.08529003709554672, max_rel=4.216026306152344, norm_rel=0.020495902746915817, ref_abs_avg=16.804622650146484, test_abs_avg=16.77525520324707
production_forward2 grad[90] vs paper_forward: mean_abs=0.443715900182724, max_abs=5.0, mean_rel=0.14019566774368286, max_rel=890.4865112304688, norm_rel=0.022286297753453255, ref_abs_avg=20.024398803710938, test_abs_avg=20.023902893066406
production_forward2 grad[91] vs paper_forward: mean_abs=0.3955559730529785, max_abs=4.0, mean_rel=0.19225186109542847, max_rel=1734.3748779296875, norm_rel=0.020428458228707314, ref_abs_avg=19.459054946899414, test_abs_avg=19.454620361328125
production_forward2 grad[92] vs paper_forward: mean_abs=0.3154184818267822, max_abs=1.25, mean_rel=0.11501520872116089, max_rel=22.384666442871094, norm_rel=0.019207609817385674, ref_abs_avg=16.615110397338867, test_abs_avg=16.59861183166504
production_forward2 grad[93] vs paper_forward: mean_abs=0.3991156220436096, max_abs=4.0, mean_rel=0.13337808847427368, max_rel=758.6466064453125, norm_rel=0.021851317957043648, ref_abs_avg=18.455795288085938, test_abs_avg=18.453948974609375
production_forward2 grad[94] vs paper_forward: mean_abs=0.3698718249797821, max_abs=3.25, mean_rel=0.18331165611743927, max_rel=1140.625, norm_rel=0.020014498382806778, ref_abs_avg=18.59664535522461, test_abs_avg=18.590389251708984
production_forward2 grad[95] vs paper_forward: mean_abs=0.30229341983795166, max_abs=1.125, mean_rel=0.10429996997117996, max_rel=8.21235179901123, norm_rel=0.020682966336607933, ref_abs_avg=14.607351303100586, test_abs_avg=14.610157012939453
production_forward2 grad[96] vs paper_forward: mean_abs=0.3920721411705017, max_abs=4.5, mean_rel=0.12798650562763214, max_rel=535.2941284179688, norm_rel=0.02139623649418354, ref_abs_avg=18.65093421936035, test_abs_avg=18.65087127685547
production_forward2 grad[97] vs paper_forward: mean_abs=0.34585314989089966, max_abs=3.375, mean_rel=0.16325700283050537, max_rel=1125.0, norm_rel=0.019049903377890587, ref_abs_avg=18.27365493774414, test_abs_avg=18.267793655395508

