identity layers + randn queries
mean abs randn paper: 0.216796875
production_forward2 fwd+bwd:  224.421 ms
production_forward2 fwd-only: 22.342 ms
production_forward2 bwd-only: 202.151 ms
production_forward2 peak allocated: fwd=2.692 GiB, fwd+bwd=6.071 GiB
production_forward2 peak reserved:  fwd=3.055 GiB, fwd+bwd=8.805 GiB
mean abs difference randn: 0.001617431640625
mean relative difference randn: 0.028564453125

Autotune Choices Stats:
{"num_choices": 13, "num_triton_choices": 12, "best_kernel": "triton_mm_26", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.2470719963312149, "best_triton_pos": 0}
AUTOTUNE mm(131072x512, 512x8)
strides: [512, 1], [1, 512]
dtypes: torch.float32, torch.float32
  triton_mm_26 0.2471 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_18 0.2574 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2
  triton_mm_20 0.2621 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_23 0.2621 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_24 0.2621 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_19 0.2634 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2
  triton_mm_27 0.2642 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_25 0.2677 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_28 0.2690 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8
  mm 0.2738 ms 90.2% 
SingleProcess AUTOTUNE benchmarking takes 0.5553 seconds and 0.3411 seconds precompiling for 13 choices
Autotune Choices Stats:
{"num_choices": 13, "num_triton_choices": 12, "best_kernel": "triton_mm_38", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.43753600120544434, "best_triton_pos": 0}
AUTOTUNE mm(262144x512, 512x8)
strides: [512, 1], [1, 512]
dtypes: torch.float32, torch.float32
  triton_mm_38 0.4375 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_35 0.4520 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_36 0.4530 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_37 0.4539 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_34 0.4540 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_32 0.4547 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_30 0.4580 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2
  triton_mm_31 0.4590 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2
  triton_mm_39 0.4608 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_40 0.4608 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8
SingleProcess AUTOTUNE benchmarking takes 0.5475 seconds and 0.3234 seconds precompiling for 13 choices
Autotune Choices Stats:
{"num_choices": 13, "num_triton_choices": 12, "best_kernel": "triton_mm_50", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.7899199724197388, "best_triton_pos": 0}
AUTOTUNE mm(524288x512, 512x8)
strides: [512, 1], [1, 512]
dtypes: torch.float32, torch.float32
  triton_mm_50 0.7899 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_47 0.8502 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_48 0.8519 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_49 0.8561 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_46 0.8571 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_44 0.8632 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_42 0.8673 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2
  triton_mm_43 0.8684 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2
  triton_mm_52 0.8724 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8
  triton_mm_51 0.8868 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
SingleProcess AUTOTUNE benchmarking takes 0.5342 seconds and 0.0013 seconds precompiling for 13 choices

torch_compile_phases_forward fwd+bwd:  190.030 ms
torch_compile_phases_forward fwd-only: 36.456 ms
torch_compile_phases_forward bwd-only: 152.609 ms
torch_compile_phases_forward peak allocated: fwd=14.157 GiB, fwd+bwd=14.784 GiB
torch_compile_phases_forward peak reserved:  fwd=14.453 GiB, fwd+bwd=18.705 GiB
mean abs difference randn: 0.00162506103515625
mean relative difference randn: 0.0286865234375
paper_forward fwd+bwd:  380.876 ms
paper_forward fwd-only: 85.738 ms
paper_forward bwd-only: 294.008 ms
paper_forward peak allocated: fwd=65.065 GiB, fwd+bwd=67.184 GiB
paper_forward peak reserved:  fwd=65.141 GiB, fwd+bwd=67.893 GiB
mean abs difference randn: 3.600120544433594e-05
mean relative difference randn: 0.000629425048828125
production_forward fwd+bwd:  112.023 ms
production_forward fwd-only: 20.512 ms
production_forward bwd-only: 91.744 ms
production_forward peak allocated: fwd=54.851 GiB, fwd+bwd=58.730 GiB
production_forward peak reserved:  fwd=56.744 GiB, fwd+bwd=58.994 GiB
mean abs difference randn: 0.001617431640625
mean relative difference randn: 0.028564453125

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016198582015931606, max_abs=0.0390625
production_forward grad[0] vs paper_forward: mean_abs=0.008631909266114235, max_abs=0.423828125, mean_rel=0.07445527613162994, max_rel=129.91847229003906, norm_rel=0.020413510501384735, ref_abs_avg=0.4570659101009369, test_abs_avg=0.457078218460083
production_forward grad[1] vs paper_forward: mean_abs=7.409096717834473, max_abs=48.0, mean_rel=0.12289761751890182, max_rel=218.75595092773438, norm_rel=0.020721301436424255, ref_abs_avg=317.34033203125, test_abs_avg=317.2888488769531
production_forward grad[2] vs paper_forward: mean_abs=1.3104915618896484, max_abs=5.0, mean_rel=0.07017242163419724, max_rel=6.384943962097168, norm_rel=0.02263900637626648, ref_abs_avg=59.7886962890625, test_abs_avg=59.87975311279297
production_forward grad[3] vs paper_forward: mean_abs=1.63200044631958, max_abs=12.0, mean_rel=0.16235913336277008, max_rel=2207.429443359375, norm_rel=0.024240026250481606, ref_abs_avg=67.71835327148438, test_abs_avg=67.72477722167969
production_forward grad[4] vs paper_forward: mean_abs=1.5805600881576538, max_abs=10.0, mean_rel=0.1547262966632843, max_rel=1017.8270263671875, norm_rel=0.023654691874980927, ref_abs_avg=67.1153564453125, test_abs_avg=67.12377166748047
production_forward grad[5] vs paper_forward: mean_abs=1.179605484008789, max_abs=5.5, mean_rel=0.07170677930116653, max_rel=4.537046432495117, norm_rel=0.024388322606682777, ref_abs_avg=49.304603576660156, test_abs_avg=49.33555603027344
production_forward grad[6] vs paper_forward: mean_abs=1.4088881015777588, max_abs=9.515625, mean_rel=0.16177335381507874, max_rel=1377.3974609375, norm_rel=0.023832304403185844, ref_abs_avg=59.39966583251953, test_abs_avg=59.404449462890625
production_forward grad[7] vs paper_forward: mean_abs=1.3689236640930176, max_abs=9.0, mean_rel=0.15971045196056366, max_rel=2250.53466796875, norm_rel=0.023511894047260284, ref_abs_avg=58.57343292236328, test_abs_avg=58.57106399536133
production_forward grad[8] vs paper_forward: mean_abs=1.0566158294677734, max_abs=5.0, mean_rel=0.07634606212377548, max_rel=3.5424656867980957, norm_rel=0.024605736136436462, ref_abs_avg=44.26823043823242, test_abs_avg=44.2353515625
production_forward grad[9] vs paper_forward: mean_abs=1.2770246267318726, max_abs=8.5, mean_rel=0.16928833723068237, max_rel=3003.75390625, norm_rel=0.023556699976325035, ref_abs_avg=54.464271545410156, test_abs_avg=54.460044860839844
production_forward grad[10] vs paper_forward: mean_abs=1.2428855895996094, max_abs=8.0, mean_rel=0.1605246365070343, max_rel=1152.723388671875, norm_rel=0.02338935248553753, ref_abs_avg=53.364158630371094, test_abs_avg=53.358516693115234
production_forward grad[11] vs paper_forward: mean_abs=0.953275203704834, max_abs=3.9375, mean_rel=0.10408736020326614, max_rel=12.095181465148926, norm_rel=0.023706967011094093, ref_abs_avg=41.348602294921875, test_abs_avg=41.27490234375
production_forward grad[12] vs paper_forward: mean_abs=1.163097858428955, max_abs=7.5, mean_rel=0.163629412651062, max_rel=1628.7803955078125, norm_rel=0.023394262418150902, ref_abs_avg=49.95176696777344, test_abs_avg=49.955631256103516
production_forward grad[13] vs paper_forward: mean_abs=1.1322898864746094, max_abs=8.0, mean_rel=0.16033470630645752, max_rel=786.8655395507812, norm_rel=0.023012356832623482, ref_abs_avg=49.46324920654297, test_abs_avg=49.46479797363281
production_forward grad[14] vs paper_forward: mean_abs=0.8408339023590088, max_abs=3.125, mean_rel=0.2540209889411926, max_rel=61.86915588378906, norm_rel=0.0228246059268713, ref_abs_avg=37.17572784423828, test_abs_avg=37.126102447509766
production_forward grad[15] vs paper_forward: mean_abs=1.0751597881317139, max_abs=7.0, mean_rel=0.1559341549873352, max_rel=1025.7509765625, norm_rel=0.023243121802806854, ref_abs_avg=46.44171142578125, test_abs_avg=46.44089889526367
production_forward grad[16] vs paper_forward: mean_abs=1.0540874004364014, max_abs=6.3984375, mean_rel=0.16907557845115662, max_rel=2242.011962890625, norm_rel=0.022855602204799652, ref_abs_avg=46.2628173828125, test_abs_avg=46.2631950378418
production_forward grad[17] vs paper_forward: mean_abs=0.8049240112304688, max_abs=3.25, mean_rel=0.0734478160738945, max_rel=5.883532524108887, norm_rel=0.021908504888415337, ref_abs_avg=37.53044128417969, test_abs_avg=37.522308349609375
production_forward grad[18] vs paper_forward: mean_abs=1.0116996765136719, max_abs=7.0, mean_rel=0.16623792052268982, max_rel=2103.358642578125, norm_rel=0.022997930645942688, ref_abs_avg=44.158016204833984, test_abs_avg=44.158409118652344
production_forward grad[19] vs paper_forward: mean_abs=0.9949471950531006, max_abs=6.75, mean_rel=0.1523895263671875, max_rel=2073.53759765625, norm_rel=0.022829405963420868, ref_abs_avg=43.75371551513672, test_abs_avg=43.752685546875
production_forward grad[20] vs paper_forward: mean_abs=0.7910856008529663, max_abs=3.0, mean_rel=0.21939167380332947, max_rel=39.76972961425781, norm_rel=0.02264958992600441, ref_abs_avg=34.724666595458984, test_abs_avg=34.74109649658203
production_forward grad[21] vs paper_forward: mean_abs=0.957069993019104, max_abs=6.0, mean_rel=0.15128342807292938, max_rel=1168.5345458984375, norm_rel=0.02294137328863144, ref_abs_avg=41.92742156982422, test_abs_avg=41.92993927001953
production_forward grad[22] vs paper_forward: mean_abs=0.9317134618759155, max_abs=6.0, mean_rel=0.13126757740974426, max_rel=699.0028686523438, norm_rel=0.022673046216368675, ref_abs_avg=41.310211181640625, test_abs_avg=41.31005096435547
production_forward grad[23] vs paper_forward: mean_abs=0.7488555908203125, max_abs=2.75, mean_rel=0.08076034486293793, max_rel=4.816752910614014, norm_rel=0.022335076704621315, ref_abs_avg=33.95231246948242, test_abs_avg=34.00889205932617
production_forward grad[24] vs paper_forward: mean_abs=0.9059421420097351, max_abs=6.25, mean_rel=0.14253365993499756, max_rel=801.8441772460938, norm_rel=0.022766495123505592, ref_abs_avg=40.008445739746094, test_abs_avg=40.008697509765625
production_forward grad[25] vs paper_forward: mean_abs=0.8891167640686035, max_abs=5.5, mean_rel=0.1721304953098297, max_rel=2487.36083984375, norm_rel=0.02251066453754902, ref_abs_avg=39.666236877441406, test_abs_avg=39.668212890625
production_forward grad[26] vs paper_forward: mean_abs=0.8375740051269531, max_abs=3.5, mean_rel=0.09930197894573212, max_rel=4.59513521194458, norm_rel=0.023863697424530983, ref_abs_avg=35.12461853027344, test_abs_avg=35.1815185546875
production_forward grad[27] vs paper_forward: mean_abs=1.045783519744873, max_abs=7.0, mean_rel=0.16914907097816467, max_rel=1538.8133544921875, norm_rel=0.024762583896517754, ref_abs_avg=42.35835647583008, test_abs_avg=42.361961364746094
production_forward grad[28] vs paper_forward: mean_abs=1.0200003385543823, max_abs=9.25, mean_rel=0.16185835003852844, max_rel=1338.6142578125, norm_rel=0.02454584650695324, ref_abs_avg=41.76753234863281, test_abs_avg=41.77290344238281
production_forward grad[29] vs paper_forward: mean_abs=0.7889785766601562, max_abs=3.5, mean_rel=0.07716619968414307, max_rel=4.5009355545043945, norm_rel=0.02347656898200512, ref_abs_avg=34.43804931640625, test_abs_avg=34.456390380859375
production_forward grad[30] vs paper_forward: mean_abs=0.9698455333709717, max_abs=6.0, mean_rel=0.16290678083896637, max_rel=1453.2293701171875, norm_rel=0.024921726435422897, ref_abs_avg=39.090171813964844, test_abs_avg=39.09141159057617
production_forward grad[31] vs paper_forward: mean_abs=0.9577081799507141, max_abs=6.25, mean_rel=0.1571185290813446, max_rel=662.0606079101562, norm_rel=0.024731101468205452, ref_abs_avg=38.85661315917969, test_abs_avg=38.84980392456055
production_forward grad[32] vs paper_forward: mean_abs=0.7381957173347473, max_abs=3.5, mean_rel=0.379684180021286, max_rel=74.10455322265625, norm_rel=0.024312477558851242, ref_abs_avg=30.967910766601562, test_abs_avg=30.921049118041992
production_forward grad[33] vs paper_forward: mean_abs=0.9083123803138733, max_abs=6.1875, mean_rel=0.18065626919269562, max_rel=1373.4857177734375, norm_rel=0.024786999449133873, ref_abs_avg=36.744606018066406, test_abs_avg=36.74618148803711
production_forward grad[34] vs paper_forward: mean_abs=0.8846518397331238, max_abs=6.0, mean_rel=0.15662506222724915, max_rel=1063.5677490234375, norm_rel=0.02439706027507782, ref_abs_avg=36.38019943237305, test_abs_avg=36.381996154785156
production_forward grad[35] vs paper_forward: mean_abs=0.6623659133911133, max_abs=2.9375, mean_rel=0.06876542419195175, max_rel=4.373345375061035, norm_rel=0.022905806079506874, ref_abs_avg=29.8350772857666, test_abs_avg=29.860828399658203
production_forward grad[36] vs paper_forward: mean_abs=0.8457667231559753, max_abs=6.0, mean_rel=0.16741807758808136, max_rel=1026.710693359375, norm_rel=0.024414096027612686, ref_abs_avg=34.71375274658203, test_abs_avg=34.71529006958008
production_forward grad[37] vs paper_forward: mean_abs=0.837755560874939, max_abs=5.375, mean_rel=0.18738813698291779, max_rel=2627.7958984375, norm_rel=0.024287650361657143, ref_abs_avg=34.583675384521484, test_abs_avg=34.577171325683594
production_forward grad[38] vs paper_forward: mean_abs=0.6143307685852051, max_abs=2.75, mean_rel=0.07336722314357758, max_rel=4.300759792327881, norm_rel=0.02270892634987831, ref_abs_avg=27.153606414794922, test_abs_avg=27.10758399963379
production_forward grad[39] vs paper_forward: mean_abs=0.8017059564590454, max_abs=6.0, mean_rel=0.15794214606285095, max_rel=930.7965087890625, norm_rel=0.024211730808019638, ref_abs_avg=33.219635009765625, test_abs_avg=33.21659851074219
production_forward grad[40] vs paper_forward: mean_abs=0.7916998267173767, max_abs=5.0, mean_rel=0.1626567244529724, max_rel=914.9733276367188, norm_rel=0.0239862147718668, ref_abs_avg=33.022769927978516, test_abs_avg=33.0307731628418
production_forward grad[41] vs paper_forward: mean_abs=0.6284685134887695, max_abs=2.5, mean_rel=0.11223599314689636, max_rel=8.419496536254883, norm_rel=0.023678557947278023, ref_abs_avg=26.114885330200195, test_abs_avg=26.08452606201172
production_forward grad[42] vs paper_forward: mean_abs=0.756292462348938, max_abs=5.0625, mean_rel=0.16389605402946472, max_rel=1283.2784423828125, norm_rel=0.023938249796628952, ref_abs_avg=31.647933959960938, test_abs_avg=31.650680541992188
production_forward grad[43] vs paper_forward: mean_abs=0.7453383207321167, max_abs=4.75, mean_rel=0.14759203791618347, max_rel=365.2608642578125, norm_rel=0.024032285436987877, ref_abs_avg=31.114395141601562, test_abs_avg=31.115108489990234
production_forward grad[44] vs paper_forward: mean_abs=0.5575147867202759, max_abs=2.25, mean_rel=0.11071563512086868, max_rel=24.914306640625, norm_rel=0.022416742518544197, ref_abs_avg=24.87116241455078, test_abs_avg=24.843807220458984
production_forward grad[45] vs paper_forward: mean_abs=0.716915488243103, max_abs=4.75, mean_rel=0.15838858485221863, max_rel=790.2335205078125, norm_rel=0.02366834133863449, ref_abs_avg=30.36396026611328, test_abs_avg=30.36548614501953
production_forward grad[46] vs paper_forward: mean_abs=0.7102632522583008, max_abs=4.5, mean_rel=0.16084498167037964, max_rel=724.3173217773438, norm_rel=0.0237498190253973, ref_abs_avg=29.996376037597656, test_abs_avg=30.007709503173828
production_forward grad[47] vs paper_forward: mean_abs=0.5702848434448242, max_abs=2.125, mean_rel=0.10289403051137924, max_rel=9.543004989624023, norm_rel=0.024008916690945625, ref_abs_avg=23.493194580078125, test_abs_avg=23.42050552368164
production_forward grad[48] vs paper_forward: mean_abs=0.6876962184906006, max_abs=4.5, mean_rel=0.14993464946746826, max_rel=1012.7832641601562, norm_rel=0.023484261706471443, ref_abs_avg=29.346229553222656, test_abs_avg=29.345643997192383
production_forward grad[49] vs paper_forward: mean_abs=0.677096962928772, max_abs=4.5, mean_rel=0.15498366951942444, max_rel=943.4360961914062, norm_rel=0.023310407996177673, ref_abs_avg=29.075544357299805, test_abs_avg=29.073394775390625
production_forward grad[50] vs paper_forward: mean_abs=0.6016829013824463, max_abs=2.75, mean_rel=0.10023029893636703, max_rel=7.931641101837158, norm_rel=0.024660512804985046, ref_abs_avg=24.89023208618164, test_abs_avg=24.886451721191406
production_forward grad[51] vs paper_forward: mean_abs=0.7541296482086182, max_abs=5.0, mean_rel=0.16406835615634918, max_rel=972.9349975585938, norm_rel=0.025076109915971756, ref_abs_avg=30.124298095703125, test_abs_avg=30.126367568969727
production_forward grad[52] vs paper_forward: mean_abs=0.7459940910339355, max_abs=5.0, mean_rel=0.16766493022441864, max_rel=1810.8641357421875, norm_rel=0.025094520300626755, ref_abs_avg=29.828937530517578, test_abs_avg=29.827503204345703
production_forward grad[53] vs paper_forward: mean_abs=0.5688039660453796, max_abs=2.5, mean_rel=0.7059201598167419, max_rel=322.904052734375, norm_rel=0.023161761462688446, ref_abs_avg=25.008472442626953, test_abs_avg=25.004640579223633
production_forward grad[54] vs paper_forward: mean_abs=0.7055981159210205, max_abs=4.5, mean_rel=0.16223019361495972, max_rel=1550.7186279296875, norm_rel=0.024924250319600105, ref_abs_avg=28.367549896240234, test_abs_avg=28.368614196777344
production_forward grad[55] vs paper_forward: mean_abs=0.6959623098373413, max_abs=5.0, mean_rel=0.17131584882736206, max_rel=937.5989990234375, norm_rel=0.024518871679902077, ref_abs_avg=28.384349822998047, test_abs_avg=28.389236450195312
production_forward grad[56] vs paper_forward: mean_abs=0.5204524993896484, max_abs=2.0625, mean_rel=0.09033028781414032, max_rel=6.087202548980713, norm_rel=0.02411579340696335, ref_abs_avg=21.840564727783203, test_abs_avg=21.869421005249023
production_forward grad[57] vs paper_forward: mean_abs=0.6585659384727478, max_abs=4.25, mean_rel=0.1511068046092987, max_rel=1154.9156494140625, norm_rel=0.024144737049937248, ref_abs_avg=27.25929832458496, test_abs_avg=27.259334564208984
production_forward grad[58] vs paper_forward: mean_abs=0.6451855897903442, max_abs=4.25, mean_rel=0.15434379875659943, max_rel=923.5087890625, norm_rel=0.024289973080158234, ref_abs_avg=26.56673812866211, test_abs_avg=26.5731201171875
production_forward grad[59] vs paper_forward: mean_abs=0.5085592269897461, max_abs=2.0625, mean_rel=0.10008083283901215, max_rel=17.141691207885742, norm_rel=0.023271525278687477, ref_abs_avg=22.301631927490234, test_abs_avg=22.288267135620117
production_forward grad[60] vs paper_forward: mean_abs=0.6161109805107117, max_abs=5.0, mean_rel=0.15191420912742615, max_rel=1104.5863037109375, norm_rel=0.0239522997289896, ref_abs_avg=25.69477081298828, test_abs_avg=25.694400787353516
production_forward grad[61] vs paper_forward: mean_abs=0.604884147644043, max_abs=4.0, mean_rel=0.15838056802749634, max_rel=1468.3697509765625, norm_rel=0.023839743807911873, ref_abs_avg=25.421520233154297, test_abs_avg=25.42532730102539
production_forward grad[62] vs paper_forward: mean_abs=0.4734768867492676, max_abs=2.0, mean_rel=0.10261204838752747, max_rel=7.2869954109191895, norm_rel=0.023990416899323463, ref_abs_avg=19.95753288269043, test_abs_avg=19.995628356933594
production_forward grad[63] vs paper_forward: mean_abs=0.5771825313568115, max_abs=4.125, mean_rel=0.151810884475708, max_rel=1025.3070068359375, norm_rel=0.023327607661485672, ref_abs_avg=24.69721221923828, test_abs_avg=24.698150634765625
production_forward grad[64] vs paper_forward: mean_abs=0.5657346248626709, max_abs=3.625, mean_rel=0.15018294751644135, max_rel=1261.288818359375, norm_rel=0.023057540878653526, ref_abs_avg=24.55116844177246, test_abs_avg=24.54894256591797
production_forward grad[65] vs paper_forward: mean_abs=0.46119117736816406, max_abs=1.75, mean_rel=0.12852567434310913, max_rel=30.32101058959961, norm_rel=0.0242630485445261, ref_abs_avg=18.743284225463867, test_abs_avg=18.731067657470703
production_forward grad[66] vs paper_forward: mean_abs=0.5529531836509705, max_abs=4.5, mean_rel=0.15149205923080444, max_rel=767.5551147460938, norm_rel=0.023106949403882027, ref_abs_avg=23.928951263427734, test_abs_avg=23.931663513183594
production_forward grad[67] vs paper_forward: mean_abs=0.5378759503364563, max_abs=4.25, mean_rel=0.14420375227928162, max_rel=420.74847412109375, norm_rel=0.02314240299165249, ref_abs_avg=23.26605987548828, test_abs_avg=23.262958526611328
production_forward grad[68] vs paper_forward: mean_abs=0.41852009296417236, max_abs=1.96875, mean_rel=0.26608580350875854, max_rel=32.716373443603516, norm_rel=0.02302321046590805, ref_abs_avg=18.46875, test_abs_avg=18.500850677490234
production_forward grad[69] vs paper_forward: mean_abs=0.5167450904846191, max_abs=4.0, mean_rel=0.1435956209897995, max_rel=538.6544189453125, norm_rel=0.02260683849453926, ref_abs_avg=22.874717712402344, test_abs_avg=22.87527847290039
production_forward grad[70] vs paper_forward: mean_abs=0.5147347450256348, max_abs=4.0, mean_rel=0.1450570821762085, max_rel=531.8903198242188, norm_rel=0.022559238597750664, ref_abs_avg=22.80172348022461, test_abs_avg=22.800331115722656
production_forward grad[71] vs paper_forward: mean_abs=0.39005327224731445, max_abs=1.75, mean_rel=0.08846458047628403, max_rel=9.063429832458496, norm_rel=0.021231424063444138, ref_abs_avg=18.626190185546875, test_abs_avg=18.620777130126953
production_forward grad[72] vs paper_forward: mean_abs=0.49548178911209106, max_abs=4.0, mean_rel=0.155216783285141, max_rel=990.91552734375, norm_rel=0.02231770008802414, ref_abs_avg=22.17337989807129, test_abs_avg=22.17281723022461
production_forward grad[73] vs paper_forward: mean_abs=0.4837283492088318, max_abs=4.0, mean_rel=0.15708008408546448, max_rel=1197.66357421875, norm_rel=0.02209952101111412, ref_abs_avg=21.875965118408203, test_abs_avg=21.87483024597168
production_forward grad[74] vs paper_forward: mean_abs=0.46255016326904297, max_abs=1.625, mean_rel=0.15992245078086853, max_rel=34.77323532104492, norm_rel=0.025034738704562187, ref_abs_avg=18.43286895751953, test_abs_avg=18.44207763671875
production_forward grad[75] vs paper_forward: mean_abs=0.5672793984413147, max_abs=4.5, mean_rel=0.15715020895004272, max_rel=1136.9942626953125, norm_rel=0.024334434419870377, ref_abs_avg=23.290096282958984, test_abs_avg=23.28881072998047
production_forward grad[76] vs paper_forward: mean_abs=0.549296498298645, max_abs=4.40625, mean_rel=0.16205662488937378, max_rel=953.6419067382812, norm_rel=0.024027744308114052, ref_abs_avg=22.907581329345703, test_abs_avg=22.90827178955078
production_forward grad[77] vs paper_forward: mean_abs=0.42324769496917725, max_abs=1.53125, mean_rel=0.26534131169319153, max_rel=32.45288848876953, norm_rel=0.02315584197640419, ref_abs_avg=18.14274787902832, test_abs_avg=18.12887954711914
production_forward grad[78] vs paper_forward: mean_abs=0.5152022838592529, max_abs=4.5, mean_rel=0.1550201177597046, max_rel=1024.0013427734375, norm_rel=0.02348783239722252, ref_abs_avg=21.922100067138672, test_abs_avg=21.921714782714844
production_forward grad[79] vs paper_forward: mean_abs=0.498695969581604, max_abs=4.0, mean_rel=0.15890780091285706, max_rel=937.6631469726562, norm_rel=0.02316421829164028, ref_abs_avg=21.570972442626953, test_abs_avg=21.561302185058594
production_forward grad[80] vs paper_forward: mean_abs=0.3752470016479492, max_abs=1.375, mean_rel=0.06445284932851791, max_rel=2.5022478103637695, norm_rel=0.020006973296403885, ref_abs_avg=18.20590591430664, test_abs_avg=18.16749382019043
production_forward grad[81] vs paper_forward: mean_abs=0.47188690304756165, max_abs=4.5, mean_rel=0.14901846647262573, max_rel=1132.6712646484375, norm_rel=0.02263888716697693, ref_abs_avg=20.867229461669922, test_abs_avg=20.86542510986328
production_forward grad[82] vs paper_forward: mean_abs=0.45886528491973877, max_abs=3.625, mean_rel=0.14361301064491272, max_rel=763.071533203125, norm_rel=0.022876327857375145, ref_abs_avg=20.12671661376953, test_abs_avg=20.12445640563965
production_forward grad[83] vs paper_forward: mean_abs=0.36861610412597656, max_abs=1.625, mean_rel=0.062965527176857, max_rel=3.6900060176849365, norm_rel=0.022393032908439636, ref_abs_avg=16.515230178833008, test_abs_avg=16.514732360839844
production_forward grad[84] vs paper_forward: mean_abs=0.44367262721061707, max_abs=3.5, mean_rel=0.14421296119689941, max_rel=785.677734375, norm_rel=0.022196317091584206, ref_abs_avg=20.023630142211914, test_abs_avg=20.02446746826172
production_forward grad[85] vs paper_forward: mean_abs=0.42963385581970215, max_abs=4.0, mean_rel=0.14004604518413544, max_rel=866.4317626953125, norm_rel=0.02200818806886673, ref_abs_avg=19.582111358642578, test_abs_avg=19.584381103515625
production_forward grad[86] vs paper_forward: mean_abs=0.3341989517211914, max_abs=1.359375, mean_rel=0.11061249673366547, max_rel=12.05604362487793, norm_rel=0.02139262668788433, ref_abs_avg=15.552360534667969, test_abs_avg=15.53693962097168
production_forward grad[87] vs paper_forward: mean_abs=0.41507765650749207, max_abs=5.5, mean_rel=0.13834258913993835, max_rel=981.5955200195312, norm_rel=0.02166544459760189, ref_abs_avg=19.224945068359375, test_abs_avg=19.223670959472656
production_forward grad[88] vs paper_forward: mean_abs=0.406388521194458, max_abs=3.5, mean_rel=0.14216941595077515, max_rel=730.5680541992188, norm_rel=0.02134915068745613, ref_abs_avg=19.09933090209961, test_abs_avg=19.112552642822266
production_forward grad[89] vs paper_forward: mean_abs=0.32711049914360046, max_abs=1.25, mean_rel=0.14559878408908844, max_rel=13.809382438659668, norm_rel=0.0206379983574152, ref_abs_avg=15.875741958618164, test_abs_avg=15.870680809020996
production_forward grad[90] vs paper_forward: mean_abs=0.40162134170532227, max_abs=3.875, mean_rel=0.13769614696502686, max_rel=1093.23095703125, norm_rel=0.02118031494319439, ref_abs_avg=19.076217651367188, test_abs_avg=19.075899124145508
production_forward grad[91] vs paper_forward: mean_abs=0.381184846162796, max_abs=4.51171875, mean_rel=0.12313498556613922, max_rel=415.2546691894531, norm_rel=0.020549815148115158, ref_abs_avg=18.660329818725586, test_abs_avg=18.655643463134766
production_forward grad[92] vs paper_forward: mean_abs=0.3024815320968628, max_abs=1.19140625, mean_rel=0.9578438997268677, max_rel=454.9374694824219, norm_rel=0.02057402953505516, ref_abs_avg=14.86874008178711, test_abs_avg=14.873095512390137
production_forward grad[93] vs paper_forward: mean_abs=0.3689641058444977, max_abs=4.0, mean_rel=0.1317102015018463, max_rel=775.7775268554688, norm_rel=0.020633289590477943, ref_abs_avg=18.035741806030273, test_abs_avg=18.036540985107422
production_forward grad[94] vs paper_forward: mean_abs=0.3617144823074341, max_abs=3.875, mean_rel=0.12741824984550476, max_rel=542.6544189453125, norm_rel=0.020165909081697464, ref_abs_avg=18.089677810668945, test_abs_avg=18.086315155029297
production_forward grad[95] vs paper_forward: mean_abs=0.28818511962890625, max_abs=1.0546875, mean_rel=0.07881084084510803, max_rel=3.9773731231689453, norm_rel=0.0192340649664402, ref_abs_avg=15.0747652053833, test_abs_avg=15.044068336486816
production_forward grad[96] vs paper_forward: mean_abs=0.3417031168937683, max_abs=3.75, mean_rel=0.12044857442378998, max_rel=529.5870361328125, norm_rel=0.020000921562314034, ref_abs_avg=17.331872940063477, test_abs_avg=17.33136749267578
production_forward grad[97] vs paper_forward: mean_abs=0.33852916955947876, max_abs=4.0, mean_rel=0.12660183012485504, max_rel=1015.6629638671875, norm_rel=0.019967203959822655, ref_abs_avg=17.316814422607422, test_abs_avg=17.316856384277344
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016231246991083026, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008648359216749668, max_abs=0.4375, mean_rel=0.07452622056007385, max_rel=121.1634521484375, norm_rel=0.020444979891180992, ref_abs_avg=0.4570659101009369, test_abs_avg=0.4570661187171936
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.340113162994385, max_abs=48.0, mean_rel=0.11089660972356796, max_rel=57.43272399902344, norm_rel=0.020548401400446892, ref_abs_avg=317.34033203125, test_abs_avg=317.3355407714844
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.283254623413086, max_abs=5.0, mean_rel=0.06985615938901901, max_rel=6.486628532409668, norm_rel=0.022307323291897774, ref_abs_avg=59.7886962890625, test_abs_avg=59.905006408691406
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.629974365234375, max_abs=12.0, mean_rel=0.16060467064380646, max_rel=2692.59130859375, norm_rel=0.024210158735513687, ref_abs_avg=67.71835327148438, test_abs_avg=67.7174301147461
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.575556993484497, max_abs=11.0, mean_rel=0.15754669904708862, max_rel=1330.3599853515625, norm_rel=0.023591497913002968, ref_abs_avg=67.1153564453125, test_abs_avg=67.12188720703125
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.1600818634033203, max_abs=4.5, mean_rel=0.06309221684932709, max_rel=6.903476238250732, norm_rel=0.02388034574687481, ref_abs_avg=49.304603576660156, test_abs_avg=49.30054473876953
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.4126617908477783, max_abs=9.3203125, mean_rel=0.160888209939003, max_rel=891.6737060546875, norm_rel=0.023907143622636795, ref_abs_avg=59.39966583251953, test_abs_avg=59.402931213378906
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3761811256408691, max_abs=9.0, mean_rel=0.1626710295677185, max_rel=3354.5244140625, norm_rel=0.023630505427718163, ref_abs_avg=58.57343292236328, test_abs_avg=58.57136535644531
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0589771270751953, max_abs=4.875, mean_rel=0.07525018602609634, max_rel=3.1898717880249023, norm_rel=0.024399835616350174, ref_abs_avg=44.26823043823242, test_abs_avg=44.23846435546875
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.281709909439087, max_abs=8.25, mean_rel=0.1607944369316101, max_rel=1957.9022216796875, norm_rel=0.02363431081175804, ref_abs_avg=54.464271545410156, test_abs_avg=54.45862579345703
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.2526195049285889, max_abs=7.75, mean_rel=0.1665307879447937, max_rel=1825.240234375, norm_rel=0.023584585636854172, ref_abs_avg=53.364158630371094, test_abs_avg=53.355712890625
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9011383056640625, max_abs=3.5, mean_rel=0.11975157260894775, max_rel=21.820693969726562, norm_rel=0.022616399452090263, ref_abs_avg=41.348602294921875, test_abs_avg=41.277931213378906
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.1679188013076782, max_abs=7.5, mean_rel=0.16367465257644653, max_rel=1583.6976318359375, norm_rel=0.023491842672228813, ref_abs_avg=49.95176696777344, test_abs_avg=49.954200744628906
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1386674642562866, max_abs=7.5, mean_rel=0.165985107421875, max_rel=955.2216186523438, norm_rel=0.023156357929110527, ref_abs_avg=49.46324920654297, test_abs_avg=49.463069915771484
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8218955993652344, max_abs=3.5, mean_rel=0.19058650732040405, max_rel=43.787540435791016, norm_rel=0.022709498181939125, ref_abs_avg=37.17572784423828, test_abs_avg=37.08003616333008
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.0804314613342285, max_abs=8.0, mean_rel=0.16081678867340088, max_rel=1088.2738037109375, norm_rel=0.023355407640337944, ref_abs_avg=46.44171142578125, test_abs_avg=46.440345764160156
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.058086633682251, max_abs=6.75, mean_rel=0.1608351171016693, max_rel=1722.7197265625, norm_rel=0.022941458970308304, ref_abs_avg=46.2628173828125, test_abs_avg=46.26121520996094
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8189659118652344, max_abs=3.375, mean_rel=0.0803292840719223, max_rel=4.870404243469238, norm_rel=0.02224164642393589, ref_abs_avg=37.53044128417969, test_abs_avg=37.47915267944336
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0166107416152954, max_abs=6.5, mean_rel=0.1595936417579651, max_rel=1836.547607421875, norm_rel=0.023104118183255196, ref_abs_avg=44.158016204833984, test_abs_avg=44.15655517578125
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9987655878067017, max_abs=7.0, mean_rel=0.15640506148338318, max_rel=2455.90283203125, norm_rel=0.022927559912204742, ref_abs_avg=43.75371551513672, test_abs_avg=43.751522064208984
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7949913740158081, max_abs=3.25, mean_rel=0.1446511596441269, max_rel=10.310680389404297, norm_rel=0.023013729602098465, ref_abs_avg=34.724666595458984, test_abs_avg=34.743141174316406
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9623095989227295, max_abs=6.5, mean_rel=0.15666890144348145, max_rel=1417.8214111328125, norm_rel=0.023053819313645363, ref_abs_avg=41.92742156982422, test_abs_avg=41.92894744873047
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9353040456771851, max_abs=6.25, mean_rel=0.1324068307876587, max_rel=580.3786010742188, norm_rel=0.022776911035180092, ref_abs_avg=41.310211181640625, test_abs_avg=41.30963134765625
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7404022216796875, max_abs=3.0, mean_rel=0.08472691476345062, max_rel=4.89694881439209, norm_rel=0.022044453769922256, ref_abs_avg=33.95231246948242, test_abs_avg=33.98800277709961
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9104976654052734, max_abs=6.0, mean_rel=0.1410239040851593, max_rel=761.1344604492188, norm_rel=0.022875335067510605, ref_abs_avg=40.008445739746094, test_abs_avg=40.007102966308594
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8901742100715637, max_abs=5.25, mean_rel=0.1641550362110138, max_rel=2087.365966796875, norm_rel=0.022562319412827492, ref_abs_avg=39.666236877441406, test_abs_avg=39.66560363769531
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8076648712158203, max_abs=3.0, mean_rel=0.09975001215934753, max_rel=7.8988938331604, norm_rel=0.023159703239798546, ref_abs_avg=35.12461853027344, test_abs_avg=35.121700286865234
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.0449516773223877, max_abs=7.5, mean_rel=0.17287617921829224, max_rel=1758.6265869140625, norm_rel=0.024761704728007317, ref_abs_avg=42.35835647583008, test_abs_avg=42.358646392822266
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.018430471420288, max_abs=7.75, mean_rel=0.16977708041667938, max_rel=1590.888427734375, norm_rel=0.024531040340662003, ref_abs_avg=41.76753234863281, test_abs_avg=41.77337646484375
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.8106193542480469, max_abs=3.625, mean_rel=0.08810265362262726, max_rel=6.127151966094971, norm_rel=0.02392306737601757, ref_abs_avg=34.43804931640625, test_abs_avg=34.42797088623047
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.9722030162811279, max_abs=6.0, mean_rel=0.16761741042137146, max_rel=1793.58740234375, norm_rel=0.024985214695334435, ref_abs_avg=39.090171813964844, test_abs_avg=39.09001541137695
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9565351009368896, max_abs=6.375, mean_rel=0.15675979852676392, max_rel=971.3302612304688, norm_rel=0.02471882849931717, ref_abs_avg=38.85661315917969, test_abs_avg=38.853248596191406
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.748090922832489, max_abs=3.5, mean_rel=0.3356162905693054, max_rel=91.73819732666016, norm_rel=0.024238010868430138, ref_abs_avg=30.967910766601562, test_abs_avg=30.935266494750977
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9102453589439392, max_abs=6.0, mean_rel=0.18227165937423706, max_rel=1680.82177734375, norm_rel=0.024831991642713547, ref_abs_avg=36.744606018066406, test_abs_avg=36.744873046875
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8878824710845947, max_abs=5.5, mean_rel=0.1575959324836731, max_rel=902.118408203125, norm_rel=0.024474117904901505, ref_abs_avg=36.38019943237305, test_abs_avg=36.38176727294922
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.6905984878540039, max_abs=2.875, mean_rel=0.10011240839958191, max_rel=7.2027363777160645, norm_rel=0.023514626547694206, ref_abs_avg=29.8350772857666, test_abs_avg=29.858116149902344
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8487230539321899, max_abs=5.5, mean_rel=0.17064881324768066, max_rel=1233.274169921875, norm_rel=0.024509506300091743, ref_abs_avg=34.71375274658203, test_abs_avg=34.71435546875
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8410719037055969, max_abs=5.5, mean_rel=0.17635725438594818, max_rel=2268.18701171875, norm_rel=0.024377839639782906, ref_abs_avg=34.583675384521484, test_abs_avg=34.57879638671875
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6410770416259766, max_abs=2.84375, mean_rel=0.07651005685329437, max_rel=4.980233669281006, norm_rel=0.023670148104429245, ref_abs_avg=27.153606414794922, test_abs_avg=27.111974716186523
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8053987622261047, max_abs=5.25, mean_rel=0.16328604519367218, max_rel=1289.33984375, norm_rel=0.02429920621216297, ref_abs_avg=33.219635009765625, test_abs_avg=33.21670150756836
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.794326901435852, max_abs=5.5, mean_rel=0.16650177538394928, max_rel=1532.4224853515625, norm_rel=0.02407691441476345, ref_abs_avg=33.022769927978516, test_abs_avg=33.027374267578125
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6222519874572754, max_abs=2.5, mean_rel=0.09080537408590317, max_rel=2.7722465991973877, norm_rel=0.02366068586707115, ref_abs_avg=26.114885330200195, test_abs_avg=26.081449508666992
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7597166299819946, max_abs=5.625, mean_rel=0.16345903277397156, max_rel=919.5059204101562, norm_rel=0.024057086557149887, ref_abs_avg=31.647933959960938, test_abs_avg=31.650440216064453
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7496786117553711, max_abs=4.5, mean_rel=0.14670267701148987, max_rel=545.7987060546875, norm_rel=0.02417134679853916, ref_abs_avg=31.114395141601562, test_abs_avg=31.115373611450195
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5788166522979736, max_abs=2.0, mean_rel=0.2096204161643982, max_rel=73.52568817138672, norm_rel=0.02283310331404209, ref_abs_avg=24.87116241455078, test_abs_avg=24.789268493652344
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7190481424331665, max_abs=5.0, mean_rel=0.1587214320898056, max_rel=964.6965942382812, norm_rel=0.02374110370874405, ref_abs_avg=30.36396026611328, test_abs_avg=30.365188598632812
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7113393545150757, max_abs=4.5, mean_rel=0.15895482897758484, max_rel=1251.1544189453125, norm_rel=0.023786822333931923, ref_abs_avg=29.996376037597656, test_abs_avg=30.007469177246094
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5473601818084717, max_abs=2.375, mean_rel=0.11007433384656906, max_rel=13.734441757202148, norm_rel=0.02329244837164879, ref_abs_avg=23.493194580078125, test_abs_avg=23.401559829711914
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.6897873282432556, max_abs=4.03125, mean_rel=0.1519470363855362, max_rel=936.435791015625, norm_rel=0.023540977388620377, ref_abs_avg=29.346229553222656, test_abs_avg=29.345840454101562
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6787204742431641, max_abs=4.5, mean_rel=0.15384158492088318, max_rel=1007.7236328125, norm_rel=0.023357370868325233, ref_abs_avg=29.075544357299805, test_abs_avg=29.0754451751709
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.5956811904907227, max_abs=2.5, mean_rel=0.09967599809169769, max_rel=12.161849975585938, norm_rel=0.024131955578923225, ref_abs_avg=24.89023208618164, test_abs_avg=24.9056453704834
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.7538647055625916, max_abs=5.0, mean_rel=0.16387003660202026, max_rel=1261.0343017578125, norm_rel=0.025063255801796913, ref_abs_avg=30.124298095703125, test_abs_avg=30.12592887878418
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7433785200119019, max_abs=5.5625, mean_rel=0.1659500002861023, max_rel=1797.550048828125, norm_rel=0.024995317682623863, ref_abs_avg=29.828937530517578, test_abs_avg=29.82695770263672
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.567335307598114, max_abs=2.25, mean_rel=0.49072471261024475, max_rel=212.98739624023438, norm_rel=0.02280697040259838, ref_abs_avg=25.008472442626953, test_abs_avg=24.99907875061035
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7065904140472412, max_abs=4.875, mean_rel=0.16409549117088318, max_rel=1654.0821533203125, norm_rel=0.02494986355304718, ref_abs_avg=28.367549896240234, test_abs_avg=28.368267059326172
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6953722834587097, max_abs=4.5, mean_rel=0.17545539140701294, max_rel=1083.3992919921875, norm_rel=0.0245292279869318, ref_abs_avg=28.384349822998047, test_abs_avg=28.39118766784668
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5268936157226562, max_abs=2.1875, mean_rel=0.11284641176462173, max_rel=7.856738090515137, norm_rel=0.024807177484035492, ref_abs_avg=21.840564727783203, test_abs_avg=21.887718200683594
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6602416038513184, max_abs=5.0, mean_rel=0.14905062317848206, max_rel=1077.1907958984375, norm_rel=0.02420973591506481, ref_abs_avg=27.25929832458496, test_abs_avg=27.25960922241211
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6446980237960815, max_abs=5.25, mean_rel=0.15411433577537537, max_rel=545.8671875, norm_rel=0.024268293753266335, ref_abs_avg=26.56673812866211, test_abs_avg=26.574100494384766
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.511387825012207, max_abs=2.0, mean_rel=0.10257461667060852, max_rel=18.603302001953125, norm_rel=0.023030217736959457, ref_abs_avg=22.301631927490234, test_abs_avg=22.284961700439453
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6170001029968262, max_abs=4.5, mean_rel=0.1493895947933197, max_rel=1176.337890625, norm_rel=0.02399045042693615, ref_abs_avg=25.69477081298828, test_abs_avg=25.693933486938477
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.6039563417434692, max_abs=4.5, mean_rel=0.15298248827457428, max_rel=873.329345703125, norm_rel=0.023846512660384178, ref_abs_avg=25.421520233154297, test_abs_avg=25.424640655517578
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4866182804107666, max_abs=2.0, mean_rel=0.09841938316822052, max_rel=6.170216083526611, norm_rel=0.024175824597477913, ref_abs_avg=19.95753288269043, test_abs_avg=19.98480224609375
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.5788939595222473, max_abs=4.5, mean_rel=0.15386006236076355, max_rel=1060.083740234375, norm_rel=0.023408010601997375, ref_abs_avg=24.69721221923828, test_abs_avg=24.696426391601562
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5721823573112488, max_abs=4.25, mean_rel=0.15223054587841034, max_rel=1064.9296875, norm_rel=0.023323964327573776, ref_abs_avg=24.55116844177246, test_abs_avg=24.547367095947266
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4578838348388672, max_abs=2.25, mean_rel=0.11209332942962646, max_rel=22.826631546020508, norm_rel=0.024511221796274185, ref_abs_avg=18.743284225463867, test_abs_avg=18.749671936035156
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5540329217910767, max_abs=4.5, mean_rel=0.15145227313041687, max_rel=740.8473510742188, norm_rel=0.023159775882959366, ref_abs_avg=23.928951263427734, test_abs_avg=23.931514739990234
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5389590859413147, max_abs=4.0, mean_rel=0.14432358741760254, max_rel=580.7429809570312, norm_rel=0.0232203658670187, ref_abs_avg=23.26605987548828, test_abs_avg=23.262187957763672
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.43366825580596924, max_abs=1.75, mean_rel=0.26236969232559204, max_rel=30.515005111694336, norm_rel=0.0235601719468832, ref_abs_avg=18.46875, test_abs_avg=18.4893798828125
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5181365609169006, max_abs=3.75, mean_rel=0.1440058946609497, max_rel=732.87255859375, norm_rel=0.0226824302226305, ref_abs_avg=22.874717712402344, test_abs_avg=22.875717163085938
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5128875970840454, max_abs=4.0, mean_rel=0.1410130262374878, max_rel=683.1011962890625, norm_rel=0.02248416468501091, ref_abs_avg=22.80172348022461, test_abs_avg=22.802730560302734
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.40078651905059814, max_abs=1.875, mean_rel=0.08878538012504578, max_rel=10.840383529663086, norm_rel=0.02188567817211151, ref_abs_avg=18.626190185546875, test_abs_avg=18.616439819335938
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.49693402647972107, max_abs=4.0, mean_rel=0.15084952116012573, max_rel=859.5473022460938, norm_rel=0.022375602275133133, ref_abs_avg=22.17337989807129, test_abs_avg=22.172372817993164
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.4852869510650635, max_abs=3.5, mean_rel=0.15837116539478302, max_rel=1556.837890625, norm_rel=0.022160639986395836, ref_abs_avg=21.875965118408203, test_abs_avg=21.870929718017578
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.46760082244873047, max_abs=2.046875, mean_rel=0.1935500204563141, max_rel=51.06196212768555, norm_rel=0.025497913360595703, ref_abs_avg=18.43286895751953, test_abs_avg=18.435869216918945
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5627238750457764, max_abs=4.5, mean_rel=0.15364955365657806, max_rel=829.0945434570312, norm_rel=0.024149954319000244, ref_abs_avg=23.290096282958984, test_abs_avg=23.288646697998047
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5457335114479065, max_abs=4.21875, mean_rel=0.15621578693389893, max_rel=787.641357421875, norm_rel=0.02383209578692913, ref_abs_avg=22.907581329345703, test_abs_avg=22.903932571411133
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.3982645273208618, max_abs=1.6875, mean_rel=0.26840901374816895, max_rel=58.712745666503906, norm_rel=0.021857155486941338, ref_abs_avg=18.14274787902832, test_abs_avg=18.124584197998047
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.514427661895752, max_abs=4.0, mean_rel=0.1548892706632614, max_rel=674.4655151367188, norm_rel=0.023465417325496674, ref_abs_avg=21.922100067138672, test_abs_avg=21.92055320739746
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.49793559312820435, max_abs=5.0, mean_rel=0.15555298328399658, max_rel=848.579345703125, norm_rel=0.023126982152462006, ref_abs_avg=21.570972442626953, test_abs_avg=21.55964469909668
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.37390565872192383, max_abs=1.625, mean_rel=0.067683145403862, max_rel=3.6864728927612305, norm_rel=0.020124148577451706, ref_abs_avg=18.20590591430664, test_abs_avg=18.1815242767334
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.4722663164138794, max_abs=3.75, mean_rel=0.14790485799312592, max_rel=879.9216918945312, norm_rel=0.02264259196817875, ref_abs_avg=20.867229461669922, test_abs_avg=20.864952087402344
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4580361843109131, max_abs=3.75, mean_rel=0.14522221684455872, max_rel=687.12646484375, norm_rel=0.022814875468611717, ref_abs_avg=20.12671661376953, test_abs_avg=20.128795623779297
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3759235739707947, max_abs=1.75, mean_rel=0.06943459808826447, max_rel=3.9912309646606445, norm_rel=0.022959863767027855, ref_abs_avg=16.515230178833008, test_abs_avg=16.513351440429688
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.44403311610221863, max_abs=4.0, mean_rel=0.14575864374637604, max_rel=1182.989990234375, norm_rel=0.022190412506461143, ref_abs_avg=20.023630142211914, test_abs_avg=20.023414611816406
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.4296885132789612, max_abs=4.5, mean_rel=0.14376980066299438, max_rel=866.4317626953125, norm_rel=0.022047707810997963, ref_abs_avg=19.582111358642578, test_abs_avg=19.583932876586914
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.32763969898223877, max_abs=1.25, mean_rel=0.09395521134138107, max_rel=5.8398356437683105, norm_rel=0.02136528491973877, ref_abs_avg=15.552360534667969, test_abs_avg=15.545177459716797
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4155406355857849, max_abs=4.5, mean_rel=0.13978739082813263, max_rel=746.9060668945312, norm_rel=0.021693184971809387, ref_abs_avg=19.224945068359375, test_abs_avg=19.2232666015625
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.40590783953666687, max_abs=3.75, mean_rel=0.14110758900642395, max_rel=955.212890625, norm_rel=0.0213541928678751, ref_abs_avg=19.09933090209961, test_abs_avg=19.114532470703125
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.3250417709350586, max_abs=1.375, mean_rel=0.14948594570159912, max_rel=16.53199005126953, norm_rel=0.020388487726449966, ref_abs_avg=15.875741958618164, test_abs_avg=15.872077941894531
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.40169912576675415, max_abs=4.25, mean_rel=0.1365417093038559, max_rel=1063.4844970703125, norm_rel=0.021201958879828453, ref_abs_avg=19.076217651367188, test_abs_avg=19.075284957885742
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.37987059354782104, max_abs=4.32421875, mean_rel=0.12094172835350037, max_rel=378.048095703125, norm_rel=0.02041923813521862, ref_abs_avg=18.660329818725586, test_abs_avg=18.657623291015625
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.3027020990848541, max_abs=1.1484375, mean_rel=0.5895009636878967, max_rel=267.0945129394531, norm_rel=0.02065216936171055, ref_abs_avg=14.86874008178711, test_abs_avg=14.875246047973633
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.3692759871482849, max_abs=4.0, mean_rel=0.13305214047431946, max_rel=647.0792236328125, norm_rel=0.020638706162571907, ref_abs_avg=18.035741806030273, test_abs_avg=18.035707473754883
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.36111152172088623, max_abs=3.90625, mean_rel=0.12578943371772766, max_rel=550.4485473632812, norm_rel=0.020083364099264145, ref_abs_avg=18.089677810668945, test_abs_avg=18.087743759155273
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.3064889907836914, max_abs=1.25, mean_rel=0.07080215960741043, max_rel=3.159801959991455, norm_rel=0.019995803013443947, ref_abs_avg=15.0747652053833, test_abs_avg=15.048443794250488
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3421424925327301, max_abs=4.0, mean_rel=0.1208137571811676, max_rel=578.637939453125, norm_rel=0.02003081887960434, ref_abs_avg=17.331872940063477, test_abs_avg=17.331165313720703
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.3385849595069885, max_abs=4.0, mean_rel=0.12202657759189606, max_rel=1184.884521484375, norm_rel=0.01998005248606205, ref_abs_avg=17.316814422607422, test_abs_avg=17.318838119506836
production_forward2 vs paper_forward output: mean_abs=0.0016198582015931606, max_abs=0.0390625
production_forward2 grad[0] vs paper_forward: mean_abs=0.008635898120701313, max_abs=0.4375, mean_rel=0.07442229241132736, max_rel=121.1634521484375, norm_rel=0.020420705899596214, ref_abs_avg=0.4570659101009369, test_abs_avg=0.4570658802986145
production_forward2 grad[1] vs paper_forward: mean_abs=7.362952709197998, max_abs=48.0, mean_rel=0.11641153693199158, max_rel=147.1778106689453, norm_rel=0.020607974380254745, ref_abs_avg=317.34033203125, test_abs_avg=317.3161926269531
production_forward2 grad[2] vs paper_forward: mean_abs=1.3148193359375, max_abs=4.5, mean_rel=0.06493759155273438, max_rel=3.6546196937561035, norm_rel=0.022350821644067764, ref_abs_avg=59.7886962890625, test_abs_avg=59.88233947753906
production_forward2 grad[3] vs paper_forward: mean_abs=1.6304649114608765, max_abs=11.0, mean_rel=0.16037550568580627, max_rel=1857.893798828125, norm_rel=0.024203255772590637, ref_abs_avg=67.71835327148438, test_abs_avg=67.71922302246094
production_forward2 grad[4] vs paper_forward: mean_abs=1.581282377243042, max_abs=10.25, mean_rel=0.15389785170555115, max_rel=958.1182250976562, norm_rel=0.02366519905626774, ref_abs_avg=67.1153564453125, test_abs_avg=67.12411499023438
production_forward2 grad[5] vs paper_forward: mean_abs=1.1733112335205078, max_abs=5.5, mean_rel=0.06940610706806183, max_rel=5.236976146697998, norm_rel=0.02400299906730652, ref_abs_avg=49.304603576660156, test_abs_avg=49.33982849121094
production_forward2 grad[6] vs paper_forward: mean_abs=1.4125950336456299, max_abs=9.5, mean_rel=0.16753941774368286, max_rel=1866.586669921875, norm_rel=0.02389807440340519, ref_abs_avg=59.39966583251953, test_abs_avg=59.403289794921875
production_forward2 grad[7] vs paper_forward: mean_abs=1.3787540197372437, max_abs=9.5, mean_rel=0.16258388757705688, max_rel=2406.2255859375, norm_rel=0.023666363209486008, ref_abs_avg=58.57343292236328, test_abs_avg=58.572635650634766
production_forward2 grad[8] vs paper_forward: mean_abs=1.0412445068359375, max_abs=5.0, mean_rel=0.07454001903533936, max_rel=3.1457977294921875, norm_rel=0.024120541289448738, ref_abs_avg=44.26823043823242, test_abs_avg=44.253177642822266
production_forward2 grad[9] vs paper_forward: mean_abs=1.280731201171875, max_abs=8.0, mean_rel=0.16443736851215363, max_rel=2458.575927734375, norm_rel=0.023620225489139557, ref_abs_avg=54.464271545410156, test_abs_avg=54.45944595336914
production_forward2 grad[10] vs paper_forward: mean_abs=1.2507610321044922, max_abs=8.0, mean_rel=0.15748390555381775, max_rel=964.8902587890625, norm_rel=0.023550957441329956, ref_abs_avg=53.364158630371094, test_abs_avg=53.35711669921875
production_forward2 grad[11] vs paper_forward: mean_abs=0.9356517791748047, max_abs=3.25, mean_rel=0.11476682126522064, max_rel=18.463665008544922, norm_rel=0.023087380453944206, ref_abs_avg=41.348602294921875, test_abs_avg=41.28566360473633
production_forward2 grad[12] vs paper_forward: mean_abs=1.1662969589233398, max_abs=7.5, mean_rel=0.1654844582080841, max_rel=1467.212158203125, norm_rel=0.023466894403100014, ref_abs_avg=49.95176696777344, test_abs_avg=49.954566955566406
production_forward2 grad[13] vs paper_forward: mean_abs=1.1350220441818237, max_abs=7.5, mean_rel=0.1606777310371399, max_rel=767.773193359375, norm_rel=0.023079056292772293, ref_abs_avg=49.46324920654297, test_abs_avg=49.461334228515625
production_forward2 grad[14] vs paper_forward: mean_abs=0.8349611759185791, max_abs=4.25, mean_rel=0.23040613532066345, max_rel=61.3737678527832, norm_rel=0.023184025660157204, ref_abs_avg=37.17572784423828, test_abs_avg=37.10899353027344
production_forward2 grad[15] vs paper_forward: mean_abs=1.0790197849273682, max_abs=7.0, mean_rel=0.16022345423698425, max_rel=1098.6943359375, norm_rel=0.023319952189922333, ref_abs_avg=46.44171142578125, test_abs_avg=46.43955993652344
production_forward2 grad[16] vs paper_forward: mean_abs=1.054985523223877, max_abs=7.25, mean_rel=0.1617729663848877, max_rel=2254.677734375, norm_rel=0.022878214716911316, ref_abs_avg=46.2628173828125, test_abs_avg=46.26327133178711
production_forward2 grad[17] vs paper_forward: mean_abs=0.7982654571533203, max_abs=3.5, mean_rel=0.07875876873731613, max_rel=4.617122173309326, norm_rel=0.022162986919283867, ref_abs_avg=37.53044128417969, test_abs_avg=37.47594451904297
production_forward2 grad[18] vs paper_forward: mean_abs=1.0153988599777222, max_abs=6.25, mean_rel=0.16392108798027039, max_rel=1877.595458984375, norm_rel=0.023068759590387344, ref_abs_avg=44.158016204833984, test_abs_avg=44.15824508666992
production_forward2 grad[19] vs paper_forward: mean_abs=0.9974404573440552, max_abs=7.0, mean_rel=0.15694166719913483, max_rel=2088.243896484375, norm_rel=0.022898105904459953, ref_abs_avg=43.75371551513672, test_abs_avg=43.750282287597656
production_forward2 grad[20] vs paper_forward: mean_abs=0.7898257970809937, max_abs=3.5, mean_rel=0.1763778030872345, max_rel=18.213760375976562, norm_rel=0.022825688123703003, ref_abs_avg=34.724666595458984, test_abs_avg=34.74846649169922
production_forward2 grad[21] vs paper_forward: mean_abs=0.9594928026199341, max_abs=6.0, mean_rel=0.15339919924736023, max_rel=849.4916381835938, norm_rel=0.023007303476333618, ref_abs_avg=41.92742156982422, test_abs_avg=41.92870330810547
production_forward2 grad[22] vs paper_forward: mean_abs=0.9344905018806458, max_abs=6.0, mean_rel=0.1323721706867218, max_rel=640.533935546875, norm_rel=0.022756356745958328, ref_abs_avg=41.310211181640625, test_abs_avg=41.306907653808594
production_forward2 grad[23] vs paper_forward: mean_abs=0.7774341106414795, max_abs=3.0, mean_rel=0.08452226221561432, max_rel=5.338025093078613, norm_rel=0.02288898639380932, ref_abs_avg=33.95231246948242, test_abs_avg=34.02185821533203
production_forward2 grad[24] vs paper_forward: mean_abs=0.9084142446517944, max_abs=6.75, mean_rel=0.14496682584285736, max_rel=855.2591552734375, norm_rel=0.022840067744255066, ref_abs_avg=40.008445739746094, test_abs_avg=40.007713317871094
production_forward2 grad[25] vs paper_forward: mean_abs=0.8902385234832764, max_abs=5.25, mean_rel=0.17277497053146362, max_rel=2798.28759765625, norm_rel=0.022568192332983017, ref_abs_avg=39.666236877441406, test_abs_avg=39.66753387451172
production_forward2 grad[26] vs paper_forward: mean_abs=0.7830018997192383, max_abs=3.375, mean_rel=0.08102404326200485, max_rel=3.677922487258911, norm_rel=0.022498808801174164, ref_abs_avg=35.12461853027344, test_abs_avg=35.11854553222656
production_forward2 grad[27] vs paper_forward: mean_abs=1.0435566902160645, max_abs=7.0, mean_rel=0.17317336797714233, max_rel=1603.49560546875, norm_rel=0.02472122758626938, ref_abs_avg=42.35835647583008, test_abs_avg=42.359283447265625
production_forward2 grad[28] vs paper_forward: mean_abs=1.0174740552902222, max_abs=7.5, mean_rel=0.16428077220916748, max_rel=1571.482666015625, norm_rel=0.024492211639881134, ref_abs_avg=41.76753234863281, test_abs_avg=41.77381134033203
production_forward2 grad[29] vs paper_forward: mean_abs=0.7831015586853027, max_abs=4.0, mean_rel=0.07627172768115997, max_rel=4.080912113189697, norm_rel=0.023134304210543633, ref_abs_avg=34.43804931640625, test_abs_avg=34.45491027832031
production_forward2 grad[30] vs paper_forward: mean_abs=0.9700535535812378, max_abs=6.5, mean_rel=0.16249705851078033, max_rel=1551.4095458984375, norm_rel=0.024922002106904984, ref_abs_avg=39.090171813964844, test_abs_avg=39.08955764770508
production_forward2 grad[31] vs paper_forward: mean_abs=0.9584656953811646, max_abs=6.5, mean_rel=0.1549445241689682, max_rel=760.5682983398438, norm_rel=0.024752598255872726, ref_abs_avg=38.85661315917969, test_abs_avg=38.84930419921875
production_forward2 grad[32] vs paper_forward: mean_abs=0.736223042011261, max_abs=3.5, mean_rel=0.38811570405960083, max_rel=72.42488861083984, norm_rel=0.023911921307444572, ref_abs_avg=30.967910766601562, test_abs_avg=30.917516708374023
production_forward2 grad[33] vs paper_forward: mean_abs=0.9094445705413818, max_abs=7.0, mean_rel=0.18280112743377686, max_rel=1582.3267822265625, norm_rel=0.02481391467154026, ref_abs_avg=36.744606018066406, test_abs_avg=36.745052337646484
production_forward2 grad[34] vs paper_forward: mean_abs=0.8871872425079346, max_abs=6.0, mean_rel=0.1572626680135727, max_rel=894.77978515625, norm_rel=0.024449409916996956, ref_abs_avg=36.38019943237305, test_abs_avg=36.38180923461914
production_forward2 grad[35] vs paper_forward: mean_abs=0.6689963340759277, max_abs=2.84375, mean_rel=0.0813576802611351, max_rel=6.890896320343018, norm_rel=0.022686706855893135, ref_abs_avg=29.8350772857666, test_abs_avg=29.83238983154297
production_forward2 grad[36] vs paper_forward: mean_abs=0.8479792475700378, max_abs=6.0, mean_rel=0.16723176836967468, max_rel=913.1408081054688, norm_rel=0.024481728672981262, ref_abs_avg=34.71375274658203, test_abs_avg=34.71441650390625
production_forward2 grad[37] vs paper_forward: mean_abs=0.8400063514709473, max_abs=6.0, mean_rel=0.18428067862987518, max_rel=2019.22705078125, norm_rel=0.024336008355021477, ref_abs_avg=34.583675384521484, test_abs_avg=34.5771484375
production_forward2 grad[38] vs paper_forward: mean_abs=0.6211004257202148, max_abs=2.84375, mean_rel=0.07550652325153351, max_rel=4.578726291656494, norm_rel=0.02276083081960678, ref_abs_avg=27.153606414794922, test_abs_avg=27.114402770996094
production_forward2 grad[39] vs paper_forward: mean_abs=0.803370475769043, max_abs=5.5, mean_rel=0.16337619721889496, max_rel=1283.556884765625, norm_rel=0.024252643808722496, ref_abs_avg=33.219635009765625, test_abs_avg=33.21630859375
production_forward2 grad[40] vs paper_forward: mean_abs=0.7939653992652893, max_abs=5.5, mean_rel=0.16722413897514343, max_rel=919.6157836914062, norm_rel=0.024064157158136368, ref_abs_avg=33.022769927978516, test_abs_avg=33.03229904174805
production_forward2 grad[41] vs paper_forward: mean_abs=0.5956202745437622, max_abs=2.25, mean_rel=0.1039593517780304, max_rel=9.832818984985352, norm_rel=0.022824548184871674, ref_abs_avg=26.114885330200195, test_abs_avg=26.079288482666016
production_forward2 grad[42] vs paper_forward: mean_abs=0.7586451172828674, max_abs=5.0, mean_rel=0.16329705715179443, max_rel=1052.487060546875, norm_rel=0.02401544153690338, ref_abs_avg=31.647933959960938, test_abs_avg=31.65040397644043
production_forward2 grad[43] vs paper_forward: mean_abs=0.7469668388366699, max_abs=5.0, mean_rel=0.14593257009983063, max_rel=387.7066955566406, norm_rel=0.02408640831708908, ref_abs_avg=31.114395141601562, test_abs_avg=31.11460304260254
production_forward2 grad[44] vs paper_forward: mean_abs=0.5782877206802368, max_abs=2.0, mean_rel=0.16818967461585999, max_rel=52.95933532714844, norm_rel=0.02308826893568039, ref_abs_avg=24.87116241455078, test_abs_avg=24.806686401367188
production_forward2 grad[45] vs paper_forward: mean_abs=0.7189885377883911, max_abs=5.25, mean_rel=0.160193532705307, max_rel=801.1375122070312, norm_rel=0.023733338341116905, ref_abs_avg=30.36396026611328, test_abs_avg=30.365013122558594
production_forward2 grad[46] vs paper_forward: mean_abs=0.7120116949081421, max_abs=4.625, mean_rel=0.16114765405654907, max_rel=1084.56640625, norm_rel=0.023801228031516075, ref_abs_avg=29.996376037597656, test_abs_avg=30.005924224853516
production_forward2 grad[47] vs paper_forward: mean_abs=0.5654401183128357, max_abs=2.4375, mean_rel=0.10044276714324951, max_rel=8.473760604858398, norm_rel=0.024214817211031914, ref_abs_avg=23.493194580078125, test_abs_avg=23.407150268554688
production_forward2 grad[48] vs paper_forward: mean_abs=0.6885668039321899, max_abs=4.5, mean_rel=0.15066123008728027, max_rel=929.4951171875, norm_rel=0.023520592600107193, ref_abs_avg=29.346229553222656, test_abs_avg=29.345964431762695
production_forward2 grad[49] vs paper_forward: mean_abs=0.6776261329650879, max_abs=4.5, mean_rel=0.15597961843013763, max_rel=856.2122192382812, norm_rel=0.02332215942442417, ref_abs_avg=29.075544357299805, test_abs_avg=29.07416343688965
production_forward2 grad[50] vs paper_forward: mean_abs=0.5999135971069336, max_abs=2.375, mean_rel=0.09591041505336761, max_rel=9.175820350646973, norm_rel=0.024479132145643234, ref_abs_avg=24.89023208618164, test_abs_avg=24.87969970703125
production_forward2 grad[51] vs paper_forward: mean_abs=0.7520776391029358, max_abs=5.0, mean_rel=0.1654832661151886, max_rel=930.1246337890625, norm_rel=0.025004975497722626, ref_abs_avg=30.124298095703125, test_abs_avg=30.125789642333984
production_forward2 grad[52] vs paper_forward: mean_abs=0.7419755458831787, max_abs=5.0, mean_rel=0.1681251972913742, max_rel=1697.6943359375, norm_rel=0.02494548261165619, ref_abs_avg=29.828937530517578, test_abs_avg=29.827255249023438
production_forward2 grad[53] vs paper_forward: mean_abs=0.5618864893913269, max_abs=2.25, mean_rel=0.701328694820404, max_rel=318.83306884765625, norm_rel=0.022361014038324356, ref_abs_avg=25.008472442626953, test_abs_avg=25.010299682617188
production_forward2 grad[54] vs paper_forward: mean_abs=0.7056462168693542, max_abs=5.0, mean_rel=0.16526126861572266, max_rel=1441.612548828125, norm_rel=0.02492297627031803, ref_abs_avg=28.367549896240234, test_abs_avg=28.368635177612305
production_forward2 grad[55] vs paper_forward: mean_abs=0.6943467855453491, max_abs=4.46875, mean_rel=0.17334827780723572, max_rel=905.9032592773438, norm_rel=0.024476835504174232, ref_abs_avg=28.384349822998047, test_abs_avg=28.387853622436523
production_forward2 grad[56] vs paper_forward: mean_abs=0.5134692192077637, max_abs=2.03125, mean_rel=0.09215192496776581, max_rel=4.325925350189209, norm_rel=0.023874934762716293, ref_abs_avg=21.840564727783203, test_abs_avg=21.860313415527344
production_forward2 grad[57] vs paper_forward: mean_abs=0.6592332124710083, max_abs=4.5, mean_rel=0.15222209692001343, max_rel=1224.63330078125, norm_rel=0.02416972815990448, ref_abs_avg=27.25929832458496, test_abs_avg=27.258779525756836
production_forward2 grad[58] vs paper_forward: mean_abs=0.6459084153175354, max_abs=5.25, mean_rel=0.1528272032737732, max_rel=814.78857421875, norm_rel=0.024312255904078484, ref_abs_avg=26.56673812866211, test_abs_avg=26.572776794433594
production_forward2 grad[59] vs paper_forward: mean_abs=0.5159711837768555, max_abs=2.0, mean_rel=0.10279132425785065, max_rel=17.313644409179688, norm_rel=0.023384150117635727, ref_abs_avg=22.301631927490234, test_abs_avg=22.280397415161133
production_forward2 grad[60] vs paper_forward: mean_abs=0.6164423823356628, max_abs=4.75, mean_rel=0.14694374799728394, max_rel=803.229736328125, norm_rel=0.023971686139702797, ref_abs_avg=25.69477081298828, test_abs_avg=25.694143295288086
production_forward2 grad[61] vs paper_forward: mean_abs=0.6055404543876648, max_abs=4.0, mean_rel=0.15645438432693481, max_rel=1397.2236328125, norm_rel=0.02388201281428337, ref_abs_avg=25.421520233154297, test_abs_avg=25.424495697021484
production_forward2 grad[62] vs paper_forward: mean_abs=0.4826169013977051, max_abs=2.0, mean_rel=0.08397597074508667, max_rel=4.637036323547363, norm_rel=0.02441345527768135, ref_abs_avg=19.95753288269043, test_abs_avg=19.983600616455078
production_forward2 grad[63] vs paper_forward: mean_abs=0.5776876211166382, max_abs=4.375, mean_rel=0.14993885159492493, max_rel=1008.986572265625, norm_rel=0.023364033550024033, ref_abs_avg=24.69721221923828, test_abs_avg=24.69754981994629
production_forward2 grad[64] vs paper_forward: mean_abs=0.5665670037269592, max_abs=4.0, mean_rel=0.15074598789215088, max_rel=1361.4541015625, norm_rel=0.023089641705155373, ref_abs_avg=24.55116844177246, test_abs_avg=24.54930877685547
production_forward2 grad[65] vs paper_forward: mean_abs=0.47089195251464844, max_abs=1.5, mean_rel=0.13484814763069153, max_rel=32.31951141357422, norm_rel=0.0247499980032444, ref_abs_avg=18.743284225463867, test_abs_avg=18.738784790039062
production_forward2 grad[66] vs paper_forward: mean_abs=0.5530873537063599, max_abs=4.5, mean_rel=0.14977169036865234, max_rel=925.9047241210938, norm_rel=0.023117421194911003, ref_abs_avg=23.928951263427734, test_abs_avg=23.93199920654297
production_forward2 grad[67] vs paper_forward: mean_abs=0.5391128063201904, max_abs=3.75, mean_rel=0.1451515257358551, max_rel=478.19677734375, norm_rel=0.023208247497677803, ref_abs_avg=23.26605987548828, test_abs_avg=23.26326560974121
production_forward2 grad[68] vs paper_forward: mean_abs=0.42673081159591675, max_abs=2.09375, mean_rel=0.3400272727012634, max_rel=41.05839538574219, norm_rel=0.02336815930902958, ref_abs_avg=18.46875, test_abs_avg=18.506229400634766
production_forward2 grad[69] vs paper_forward: mean_abs=0.5170859098434448, max_abs=4.0, mean_rel=0.1411893218755722, max_rel=633.3201904296875, norm_rel=0.022627506405115128, ref_abs_avg=22.874717712402344, test_abs_avg=22.875381469726562
production_forward2 grad[70] vs paper_forward: mean_abs=0.5152183771133423, max_abs=4.0, mean_rel=0.14261771738529205, max_rel=569.6930541992188, norm_rel=0.022577760741114616, ref_abs_avg=22.80172348022461, test_abs_avg=22.800533294677734
production_forward2 grad[71] vs paper_forward: mean_abs=0.39116859436035156, max_abs=1.84375, mean_rel=0.08801494538784027, max_rel=11.381196022033691, norm_rel=0.021316522732377052, ref_abs_avg=18.626190185546875, test_abs_avg=18.619062423706055
production_forward2 grad[72] vs paper_forward: mean_abs=0.49623215198516846, max_abs=4.0, mean_rel=0.15556606650352478, max_rel=928.2813110351562, norm_rel=0.02233758009970188, ref_abs_avg=22.17337989807129, test_abs_avg=22.172489166259766
production_forward2 grad[73] vs paper_forward: mean_abs=0.4851095676422119, max_abs=3.5, mean_rel=0.15976838767528534, max_rel=1182.3299560546875, norm_rel=0.022153398022055626, ref_abs_avg=21.875965118408203, test_abs_avg=21.874095916748047
production_forward2 grad[74] vs paper_forward: mean_abs=0.46370744705200195, max_abs=1.75, mean_rel=0.14773766696453094, max_rel=29.66304588317871, norm_rel=0.025016754865646362, ref_abs_avg=18.43286895751953, test_abs_avg=18.433637619018555
production_forward2 grad[75] vs paper_forward: mean_abs=0.5635507702827454, max_abs=4.5, mean_rel=0.15368323028087616, max_rel=981.3526611328125, norm_rel=0.02418278530240059, ref_abs_avg=23.290096282958984, test_abs_avg=23.289173126220703
production_forward2 grad[76] vs paper_forward: mean_abs=0.5477377772331238, max_abs=4.28125, mean_rel=0.16189514100551605, max_rel=848.3594970703125, norm_rel=0.02395835518836975, ref_abs_avg=22.907581329345703, test_abs_avg=22.908103942871094
production_forward2 grad[77] vs paper_forward: mean_abs=0.41785967350006104, max_abs=1.53125, mean_rel=0.24331071972846985, max_rel=24.04595375061035, norm_rel=0.022807475179433823, ref_abs_avg=18.14274787902832, test_abs_avg=18.13733673095703
production_forward2 grad[78] vs paper_forward: mean_abs=0.5142239928245544, max_abs=4.09375, mean_rel=0.1559484899044037, max_rel=999.0344848632812, norm_rel=0.023450952023267746, ref_abs_avg=21.922100067138672, test_abs_avg=21.920974731445312
production_forward2 grad[79] vs paper_forward: mean_abs=0.49807822704315186, max_abs=4.0, mean_rel=0.16048002243041992, max_rel=944.661376953125, norm_rel=0.023144623264670372, ref_abs_avg=21.570972442626953, test_abs_avg=21.56108856201172
production_forward2 grad[80] vs paper_forward: mean_abs=0.37353038787841797, max_abs=1.3125, mean_rel=0.06388954818248749, max_rel=1.6178836822509766, norm_rel=0.02000250667333603, ref_abs_avg=18.20590591430664, test_abs_avg=18.181922912597656
production_forward2 grad[81] vs paper_forward: mean_abs=0.47197654843330383, max_abs=4.0, mean_rel=0.14930826425552368, max_rel=1082.121337890625, norm_rel=0.022636448964476585, ref_abs_avg=20.867229461669922, test_abs_avg=20.865699768066406
production_forward2 grad[82] vs paper_forward: mean_abs=0.45799046754837036, max_abs=3.625, mean_rel=0.14272397756576538, max_rel=796.7239990234375, norm_rel=0.0228376854211092, ref_abs_avg=20.12671661376953, test_abs_avg=20.12508773803711
production_forward2 grad[83] vs paper_forward: mean_abs=0.3651256561279297, max_abs=1.625, mean_rel=0.06556270271539688, max_rel=4.12510871887207, norm_rel=0.022323034703731537, ref_abs_avg=16.515230178833008, test_abs_avg=16.501354217529297
production_forward2 grad[84] vs paper_forward: mean_abs=0.44349223375320435, max_abs=4.0, mean_rel=0.1459357738494873, max_rel=883.8372802734375, norm_rel=0.02217983454465866, ref_abs_avg=20.023630142211914, test_abs_avg=20.024293899536133
production_forward2 grad[85] vs paper_forward: mean_abs=0.4297807216644287, max_abs=4.5, mean_rel=0.14063063263893127, max_rel=680.3195190429688, norm_rel=0.022008715197443962, ref_abs_avg=19.582111358642578, test_abs_avg=19.584575653076172
production_forward2 grad[86] vs paper_forward: mean_abs=0.3378119468688965, max_abs=1.515625, mean_rel=0.11347587406635284, max_rel=13.610095024108887, norm_rel=0.021479934453964233, ref_abs_avg=15.552360534667969, test_abs_avg=15.527960777282715
production_forward2 grad[87] vs paper_forward: mean_abs=0.4149314761161804, max_abs=5.5, mean_rel=0.1375623345375061, max_rel=774.3551635742188, norm_rel=0.02165619097650051, ref_abs_avg=19.224945068359375, test_abs_avg=19.223705291748047
production_forward2 grad[88] vs paper_forward: mean_abs=0.4064818024635315, max_abs=3.5, mean_rel=0.14414368569850922, max_rel=793.607666015625, norm_rel=0.021352769806981087, ref_abs_avg=19.09933090209961, test_abs_avg=19.112890243530273
production_forward2 grad[89] vs paper_forward: mean_abs=0.3306131362915039, max_abs=1.25, mean_rel=0.13883160054683685, max_rel=13.11127758026123, norm_rel=0.020691249519586563, ref_abs_avg=15.875741958618164, test_abs_avg=15.873043060302734
production_forward2 grad[90] vs paper_forward: mean_abs=0.4019063115119934, max_abs=3.9375, mean_rel=0.13783833384513855, max_rel=1063.4844970703125, norm_rel=0.02119339257478714, ref_abs_avg=19.076217651367188, test_abs_avg=19.07613754272461
production_forward2 grad[91] vs paper_forward: mean_abs=0.3816690146923065, max_abs=4.41796875, mean_rel=0.12320458889007568, max_rel=453.2821350097656, norm_rel=0.020570876076817513, ref_abs_avg=18.660329818725586, test_abs_avg=18.655405044555664
production_forward2 grad[92] vs paper_forward: mean_abs=0.30372750759124756, max_abs=1.24609375, mean_rel=0.8324125409126282, max_rel=392.3231506347656, norm_rel=0.02068839594721794, ref_abs_avg=14.86874008178711, test_abs_avg=14.865274429321289
production_forward2 grad[93] vs paper_forward: mean_abs=0.36905938386917114, max_abs=5.0, mean_rel=0.13118645548820496, max_rel=720.1242065429688, norm_rel=0.020630978047847748, ref_abs_avg=18.035741806030273, test_abs_avg=18.036582946777344
production_forward2 grad[94] vs paper_forward: mean_abs=0.36199599504470825, max_abs=3.875, mean_rel=0.12806886434555054, max_rel=566.0367431640625, norm_rel=0.020183511078357697, ref_abs_avg=18.089677810668945, test_abs_avg=18.08629608154297
production_forward2 grad[95] vs paper_forward: mean_abs=0.28818511962890625, max_abs=1.0546875, mean_rel=0.07881084084510803, max_rel=3.9773731231689453, norm_rel=0.0192340649664402, ref_abs_avg=15.0747652053833, test_abs_avg=15.044068336486816
production_forward2 grad[96] vs paper_forward: mean_abs=0.3417031168937683, max_abs=3.75, mean_rel=0.12044857442378998, max_rel=529.5870361328125, norm_rel=0.020000921562314034, ref_abs_avg=17.331872940063477, test_abs_avg=17.33136749267578
production_forward2 grad[97] vs paper_forward: mean_abs=0.33852916955947876, max_abs=4.0, mean_rel=0.12660183012485504, max_rel=1015.6629638671875, norm_rel=0.019967203959822655, ref_abs_avg=17.316814422607422, test_abs_avg=17.316856384277344
identity layers + randn queries
mean abs randn paper: 0.220703125
production_forward fwd+bwd:  112.036 ms
production_forward fwd-only: 20.516 ms
production_forward bwd-only: 91.671 ms
production_forward peak allocated: fwd=2.192 GiB, fwd+bwd=6.071 GiB
production_forward peak reserved:  fwd=2.305 GiB, fwd+bwd=6.180 GiB
mean abs difference randn: 0.0016326904296875
mean relative difference randn: 0.028564453125
production_forward2 fwd+bwd:  224.414 ms
production_forward2 fwd-only: 22.338 ms
production_forward2 bwd-only: 202.190 ms
production_forward2 peak allocated: fwd=3.567 GiB, fwd+bwd=6.946 GiB
production_forward2 peak reserved:  fwd=3.930 GiB, fwd+bwd=9.680 GiB
mean abs difference randn: 0.0016326904296875
mean relative difference randn: 0.028564453125
torch_compile_phases_forward fwd+bwd:  189.924 ms
torch_compile_phases_forward fwd-only: 36.477 ms
torch_compile_phases_forward bwd-only: 152.682 ms
torch_compile_phases_forward peak allocated: fwd=14.157 GiB, fwd+bwd=14.784 GiB
torch_compile_phases_forward peak reserved:  fwd=14.453 GiB, fwd+bwd=18.705 GiB
mean abs difference randn: 0.00164031982421875
mean relative difference randn: 0.028564453125
paper_forward fwd+bwd:  379.655 ms
paper_forward fwd-only: 85.739 ms
paper_forward bwd-only: 294.067 ms
paper_forward peak allocated: fwd=65.065 GiB, fwd+bwd=67.184 GiB
paper_forward peak reserved:  fwd=65.141 GiB, fwd+bwd=67.893 GiB
mean abs difference randn: 3.814697265625e-05
mean relative difference randn: 0.00066375732421875

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016348997596651316, max_abs=0.0390625
production_forward grad[0] vs paper_forward: mean_abs=0.008550171740353107, max_abs=0.3125, mean_rel=0.07443486899137497, max_rel=115.0614242553711, norm_rel=0.020327702164649963, ref_abs_avg=0.4541230797767639, test_abs_avg=0.4541316628456116
production_forward grad[1] vs paper_forward: mean_abs=7.34181022644043, max_abs=56.0, mean_rel=0.1708877831697464, max_rel=235.5942840576172, norm_rel=0.020440906286239624, ref_abs_avg=317.38336181640625, test_abs_avg=317.4446716308594
production_forward grad[2] vs paper_forward: mean_abs=1.2841339111328125, max_abs=4.0, mean_rel=0.09529871493577957, max_rel=10.34029483795166, norm_rel=0.02290610782802105, ref_abs_avg=54.7523193359375, test_abs_avg=54.682403564453125
production_forward grad[3] vs paper_forward: mean_abs=1.5447527170181274, max_abs=11.0, mean_rel=0.1782831996679306, max_rel=1862.904541015625, norm_rel=0.024279840290546417, ref_abs_avg=63.9744987487793, test_abs_avg=63.97529983520508
production_forward grad[4] vs paper_forward: mean_abs=1.5003420114517212, max_abs=9.25, mean_rel=0.18559816479682922, max_rel=3796.464599609375, norm_rel=0.023960620164871216, ref_abs_avg=62.95682144165039, test_abs_avg=62.97303771972656
production_forward grad[5] vs paper_forward: mean_abs=1.116434097290039, max_abs=4.75, mean_rel=0.08919654041528702, max_rel=9.273272514343262, norm_rel=0.024922722950577736, ref_abs_avg=46.571449279785156, test_abs_avg=46.593814849853516
production_forward grad[6] vs paper_forward: mean_abs=1.334887146949768, max_abs=8.6484375, mean_rel=0.15493959188461304, max_rel=1196.54248046875, norm_rel=0.023836659267544746, ref_abs_avg=56.344791412353516, test_abs_avg=56.343414306640625
production_forward grad[7] vs paper_forward: mean_abs=1.3116445541381836, max_abs=7.609375, mean_rel=0.15865084528923035, max_rel=2147.289794921875, norm_rel=0.023595985025167465, ref_abs_avg=55.85369873046875, test_abs_avg=55.85482406616211
production_forward grad[8] vs paper_forward: mean_abs=0.9918718338012695, max_abs=3.75, mean_rel=0.08633619546890259, max_rel=2.5375616550445557, norm_rel=0.02294021286070347, ref_abs_avg=42.16327667236328, test_abs_avg=42.164302825927734
production_forward grad[9] vs paper_forward: mean_abs=1.2400152683258057, max_abs=8.0, mean_rel=0.16080351173877716, max_rel=1358.7437744140625, norm_rel=0.023822516202926636, ref_abs_avg=52.28522491455078, test_abs_avg=52.28861999511719
production_forward grad[10] vs paper_forward: mean_abs=1.2171403169631958, max_abs=7.25, mean_rel=0.16007959842681885, max_rel=852.9102172851562, norm_rel=0.023708846420049667, ref_abs_avg=51.61609649658203, test_abs_avg=51.612220764160156
production_forward grad[11] vs paper_forward: mean_abs=0.8678226470947266, max_abs=3.25, mean_rel=0.0969509482383728, max_rel=10.819181442260742, norm_rel=0.021497204899787903, ref_abs_avg=40.71278762817383, test_abs_avg=40.72135925292969
production_forward grad[12] vs paper_forward: mean_abs=1.1337075233459473, max_abs=7.25, mean_rel=0.16065995395183563, max_rel=1598.515380859375, norm_rel=0.023482581600546837, ref_abs_avg=48.5710563659668, test_abs_avg=48.57211685180664
production_forward grad[13] vs paper_forward: mean_abs=1.1156458854675293, max_abs=7.0, mean_rel=0.14691457152366638, max_rel=1295.025390625, norm_rel=0.023415766656398773, ref_abs_avg=47.86747360229492, test_abs_avg=47.87810516357422
production_forward grad[14] vs paper_forward: mean_abs=0.8386509418487549, max_abs=3.5, mean_rel=0.12036141008138657, max_rel=26.053800582885742, norm_rel=0.020951732993125916, ref_abs_avg=39.67336654663086, test_abs_avg=39.65264892578125
production_forward grad[15] vs paper_forward: mean_abs=1.0637891292572021, max_abs=7.0, mean_rel=0.16510407626628876, max_rel=1134.4056396484375, norm_rel=0.023411288857460022, ref_abs_avg=45.647682189941406, test_abs_avg=45.65516662597656
production_forward grad[16] vs paper_forward: mean_abs=1.0372636318206787, max_abs=6.375, mean_rel=0.17034640908241272, max_rel=1240.1466064453125, norm_rel=0.0231640562415123, ref_abs_avg=44.95479965209961, test_abs_avg=44.95397186279297
production_forward grad[17] vs paper_forward: mean_abs=0.8003239631652832, max_abs=2.875, mean_rel=0.06887233257293701, max_rel=2.0651350021362305, norm_rel=0.02407637983560562, ref_abs_avg=34.05404281616211, test_abs_avg=33.96539306640625
production_forward grad[18] vs paper_forward: mean_abs=0.9996981620788574, max_abs=6.0, mean_rel=0.16014567017555237, max_rel=1038.2481689453125, norm_rel=0.023200560361146927, ref_abs_avg=43.30094528198242, test_abs_avg=43.301048278808594
production_forward grad[19] vs paper_forward: mean_abs=0.9831223487854004, max_abs=6.25, mean_rel=0.1509729027748108, max_rel=1386.4488525390625, norm_rel=0.023173756897449493, ref_abs_avg=42.66361999511719, test_abs_avg=42.66311264038086
production_forward grad[20] vs paper_forward: mean_abs=0.7589230537414551, max_abs=3.5, mean_rel=0.07203622907400131, max_rel=3.4585351943969727, norm_rel=0.022236650809645653, ref_abs_avg=34.91664505004883, test_abs_avg=34.92304992675781
production_forward grad[21] vs paper_forward: mean_abs=0.9434627294540405, max_abs=5.75, mean_rel=0.1612713634967804, max_rel=1997.6470947265625, norm_rel=0.02298841066658497, ref_abs_avg=41.26051330566406, test_abs_avg=41.2633171081543
production_forward grad[22] vs paper_forward: mean_abs=0.9217301607131958, max_abs=5.5, mean_rel=0.14665286242961884, max_rel=1283.804443359375, norm_rel=0.022883113473653793, ref_abs_avg=40.44085693359375, test_abs_avg=40.440284729003906
production_forward grad[23] vs paper_forward: mean_abs=0.7419503927230835, max_abs=2.65625, mean_rel=0.15755143761634827, max_rel=36.92213439941406, norm_rel=0.02140158973634243, ref_abs_avg=33.78462219238281, test_abs_avg=33.75836944580078
production_forward grad[24] vs paper_forward: mean_abs=0.8989346623420715, max_abs=6.0, mean_rel=0.16256162524223328, max_rel=972.011962890625, norm_rel=0.02292264625430107, ref_abs_avg=39.40210723876953, test_abs_avg=39.40163803100586
production_forward grad[25] vs paper_forward: mean_abs=0.8785093426704407, max_abs=5.75, mean_rel=0.15694886445999146, max_rel=1091.1942138671875, norm_rel=0.022762900218367577, ref_abs_avg=38.75113296508789, test_abs_avg=38.75037384033203
production_forward grad[26] vs paper_forward: mean_abs=0.9019393920898438, max_abs=3.1875, mean_rel=0.15030784904956818, max_rel=28.190526962280273, norm_rel=0.02637735940515995, ref_abs_avg=33.995643615722656, test_abs_avg=33.92805480957031
production_forward grad[27] vs paper_forward: mean_abs=1.0579396486282349, max_abs=7.5, mean_rel=0.17906883358955383, max_rel=2009.2845458984375, norm_rel=0.024960864335298538, ref_abs_avg=42.53125762939453, test_abs_avg=42.53206253051758
production_forward grad[28] vs paper_forward: mean_abs=1.0324091911315918, max_abs=7.0, mean_rel=0.1746026575565338, max_rel=1200.7308349609375, norm_rel=0.02472875453531742, ref_abs_avg=41.97158432006836, test_abs_avg=41.975440979003906
production_forward grad[29] vs paper_forward: mean_abs=0.8139729499816895, max_abs=3.15625, mean_rel=0.12354850023984909, max_rel=9.722416877746582, norm_rel=0.024950899183750153, ref_abs_avg=32.65129089355469, test_abs_avg=32.638938903808594
production_forward grad[30] vs paper_forward: mean_abs=0.9801239967346191, max_abs=6.328125, mean_rel=0.17134767770767212, max_rel=1622.6136474609375, norm_rel=0.02523036301136017, ref_abs_avg=38.97388458251953, test_abs_avg=38.975276947021484
production_forward grad[31] vs paper_forward: mean_abs=0.9669541120529175, max_abs=6.5, mean_rel=0.16086655855178833, max_rel=666.0202026367188, norm_rel=0.024964526295661926, ref_abs_avg=38.89607238769531, test_abs_avg=38.90156936645508
production_forward grad[32] vs paper_forward: mean_abs=0.6893596649169922, max_abs=3.0, mean_rel=0.09850367903709412, max_rel=15.869966506958008, norm_rel=0.023946965113282204, ref_abs_avg=29.312543869018555, test_abs_avg=29.3092041015625
production_forward grad[33] vs paper_forward: mean_abs=0.9044462442398071, max_abs=5.75, mean_rel=0.15950725972652435, max_rel=1143.406005859375, norm_rel=0.024871354922652245, ref_abs_avg=36.460418701171875, test_abs_avg=36.46237564086914
production_forward grad[34] vs paper_forward: mean_abs=0.8888327479362488, max_abs=6.0, mean_rel=0.1567385494709015, max_rel=622.4317016601562, norm_rel=0.024810705333948135, ref_abs_avg=35.939231872558594, test_abs_avg=35.938392639160156
production_forward grad[35] vs paper_forward: mean_abs=0.7037715911865234, max_abs=3.0, mean_rel=0.12319625914096832, max_rel=5.780196189880371, norm_rel=0.02585962973535061, ref_abs_avg=27.477272033691406, test_abs_avg=27.48388671875
production_forward grad[36] vs paper_forward: mean_abs=0.860680103302002, max_abs=6.0, mean_rel=0.16356505453586578, max_rel=2415.352294921875, norm_rel=0.024564657360315323, ref_abs_avg=35.13132858276367, test_abs_avg=35.131099700927734
production_forward grad[37] vs paper_forward: mean_abs=0.8379416465759277, max_abs=5.0, mean_rel=0.1585819572210312, max_rel=1023.198974609375, norm_rel=0.024403030052781105, ref_abs_avg=34.405174255371094, test_abs_avg=34.406585693359375
production_forward grad[38] vs paper_forward: mean_abs=0.6479711532592773, max_abs=3.125, mean_rel=0.07952088117599487, max_rel=5.134833812713623, norm_rel=0.02506260760128498, ref_abs_avg=26.776460647583008, test_abs_avg=26.768840789794922
production_forward grad[39] vs paper_forward: mean_abs=0.8006521463394165, max_abs=5.5, mean_rel=0.16574633121490479, max_rel=1154.548583984375, norm_rel=0.024536587297916412, ref_abs_avg=32.714576721191406, test_abs_avg=32.7146110534668
production_forward grad[40] vs paper_forward: mean_abs=0.7839223146438599, max_abs=4.625, mean_rel=0.16233311593532562, max_rel=867.3363647460938, norm_rel=0.024327309802174568, ref_abs_avg=32.323997497558594, test_abs_avg=32.32109832763672
production_forward grad[41] vs paper_forward: mean_abs=0.636934757232666, max_abs=2.25, mean_rel=0.13733260333538055, max_rel=29.535429000854492, norm_rel=0.02459808997809887, ref_abs_avg=26.369247436523438, test_abs_avg=26.393413543701172
production_forward grad[42] vs paper_forward: mean_abs=0.7561969757080078, max_abs=5.0, mean_rel=0.1707914173603058, max_rel=2001.876708984375, norm_rel=0.02420112118124962, ref_abs_avg=31.306354522705078, test_abs_avg=31.3048095703125
production_forward grad[43] vs paper_forward: mean_abs=0.7467514276504517, max_abs=5.0, mean_rel=0.17262455821037292, max_rel=1450.5072021484375, norm_rel=0.024328092113137245, ref_abs_avg=30.82233428955078, test_abs_avg=30.817272186279297
production_forward grad[44] vs paper_forward: mean_abs=0.5709896087646484, max_abs=2.28125, mean_rel=0.10663805902004242, max_rel=14.15546989440918, norm_rel=0.0242161862552166, ref_abs_avg=23.4920711517334, test_abs_avg=23.49652862548828
production_forward grad[45] vs paper_forward: mean_abs=0.7231219410896301, max_abs=4.75, mean_rel=0.1683947741985321, max_rel=1435.348876953125, norm_rel=0.024007461965084076, ref_abs_avg=30.1920166015625, test_abs_avg=30.194286346435547
production_forward grad[46] vs paper_forward: mean_abs=0.7095974087715149, max_abs=4.5, mean_rel=0.1687048077583313, max_rel=1361.970947265625, norm_rel=0.023747943341732025, ref_abs_avg=29.936647415161133, test_abs_avg=29.933719635009766
production_forward grad[47] vs paper_forward: mean_abs=0.5739291310310364, max_abs=2.1875, mean_rel=0.07856759428977966, max_rel=2.401010036468506, norm_rel=0.024434681981801987, ref_abs_avg=23.920665740966797, test_abs_avg=23.946144104003906
production_forward grad[48] vs paper_forward: mean_abs=0.6910744309425354, max_abs=4.5, mean_rel=0.1573273241519928, max_rel=1503.9189453125, norm_rel=0.023723367601633072, ref_abs_avg=29.194610595703125, test_abs_avg=29.195510864257812
production_forward grad[49] vs paper_forward: mean_abs=0.683644711971283, max_abs=4.125, mean_rel=0.16048476099967957, max_rel=972.0772705078125, norm_rel=0.02363203652203083, ref_abs_avg=28.99319839477539, test_abs_avg=28.9930419921875
production_forward grad[50] vs paper_forward: mean_abs=0.6518130302429199, max_abs=2.59228515625, mean_rel=0.20899468660354614, max_rel=32.16640090942383, norm_rel=0.024741824716329575, ref_abs_avg=26.155317306518555, test_abs_avg=26.147579193115234
production_forward grad[51] vs paper_forward: mean_abs=0.7718356847763062, max_abs=6.25, mean_rel=0.16951963305473328, max_rel=2220.40673828125, norm_rel=0.02546984888613224, ref_abs_avg=30.381343841552734, test_abs_avg=30.3809757232666
production_forward grad[52] vs paper_forward: mean_abs=0.7610459327697754, max_abs=5.0, mean_rel=0.18015870451927185, max_rel=2459.953369140625, norm_rel=0.025437580421566963, ref_abs_avg=30.0129451751709, test_abs_avg=30.013595581054688
production_forward grad[53] vs paper_forward: mean_abs=0.5815966725349426, max_abs=2.0, mean_rel=0.3173968493938446, max_rel=79.37909698486328, norm_rel=0.023616550490260124, ref_abs_avg=24.183467864990234, test_abs_avg=24.18221664428711
production_forward grad[54] vs paper_forward: mean_abs=0.7123542428016663, max_abs=5.2265625, mean_rel=0.1654956191778183, max_rel=996.7627563476562, norm_rel=0.025108983740210533, ref_abs_avg=28.39315414428711, test_abs_avg=28.39371109008789
production_forward grad[55] vs paper_forward: mean_abs=0.6957076191902161, max_abs=4.5, mean_rel=0.15713638067245483, max_rel=643.5369262695312, norm_rel=0.02473313733935356, ref_abs_avg=28.148517608642578, test_abs_avg=28.149845123291016
production_forward grad[56] vs paper_forward: mean_abs=0.5748817324638367, max_abs=1.875, mean_rel=0.12830913066864014, max_rel=27.73151397705078, norm_rel=0.023518992587924004, ref_abs_avg=23.714031219482422, test_abs_avg=23.74603843688965
production_forward grad[57] vs paper_forward: mean_abs=0.6634461879730225, max_abs=5.5, mean_rel=0.15698067843914032, max_rel=765.3096313476562, norm_rel=0.024536315351724625, ref_abs_avg=27.042261123657227, test_abs_avg=27.04341697692871
production_forward grad[58] vs paper_forward: mean_abs=0.6519107222557068, max_abs=4.0, mean_rel=0.16268423199653625, max_rel=804.8724975585938, norm_rel=0.02445410005748272, ref_abs_avg=26.692340850830078, test_abs_avg=26.688575744628906
production_forward grad[59] vs paper_forward: mean_abs=0.5111764669418335, max_abs=2.0, mean_rel=0.08519045263528824, max_rel=9.097577095031738, norm_rel=0.02434147521853447, ref_abs_avg=21.03232765197754, test_abs_avg=21.029687881469727
production_forward grad[60] vs paper_forward: mean_abs=0.6200655102729797, max_abs=4.25, mean_rel=0.16037164628505707, max_rel=621.04736328125, norm_rel=0.024265598505735397, ref_abs_avg=25.59555435180664, test_abs_avg=25.597373962402344
production_forward grad[61] vs paper_forward: mean_abs=0.6109827160835266, max_abs=3.75, mean_rel=0.161670982837677, max_rel=1061.389404296875, norm_rel=0.02393779531121254, ref_abs_avg=25.575407028198242, test_abs_avg=25.573753356933594
production_forward grad[62] vs paper_forward: mean_abs=0.47800779342651367, max_abs=2.375, mean_rel=0.12342050671577454, max_rel=10.91816234588623, norm_rel=0.022320328280329704, ref_abs_avg=21.522138595581055, test_abs_avg=21.50995445251465
production_forward grad[63] vs paper_forward: mean_abs=0.5948129892349243, max_abs=4.5, mean_rel=0.1551664173603058, max_rel=1079.914794921875, norm_rel=0.023591618984937668, ref_abs_avg=25.17809295654297, test_abs_avg=25.179466247558594
production_forward grad[64] vs paper_forward: mean_abs=0.5802109241485596, max_abs=4.0, mean_rel=0.15292397141456604, max_rel=876.822021484375, norm_rel=0.023518458008766174, ref_abs_avg=24.697032928466797, test_abs_avg=24.695743560791016
production_forward grad[65] vs paper_forward: mean_abs=0.4393808841705322, max_abs=1.9375, mean_rel=0.07822705805301666, max_rel=5.243838787078857, norm_rel=0.021693089976906776, ref_abs_avg=20.235260009765625, test_abs_avg=20.255321502685547
production_forward grad[66] vs paper_forward: mean_abs=0.558256983757019, max_abs=4.0, mean_rel=0.15086646378040314, max_rel=1133.8994140625, norm_rel=0.023294033482670784, ref_abs_avg=23.954769134521484, test_abs_avg=23.954750061035156
production_forward grad[67] vs paper_forward: mean_abs=0.544215738773346, max_abs=4.5, mean_rel=0.14880189299583435, max_rel=578.9219360351562, norm_rel=0.023264005780220032, ref_abs_avg=23.416173934936523, test_abs_avg=23.423664093017578
production_forward grad[68] vs paper_forward: mean_abs=0.4265364408493042, max_abs=1.5, mean_rel=0.08559812605381012, max_rel=2.8117122650146484, norm_rel=0.02083650790154934, ref_abs_avg=20.024185180664062, test_abs_avg=20.060583114624023
production_forward grad[69] vs paper_forward: mean_abs=0.5326799750328064, max_abs=3.75, mean_rel=0.1489095389842987, max_rel=669.2312622070312, norm_rel=0.022828195244073868, ref_abs_avg=23.307003021240234, test_abs_avg=23.309171676635742
production_forward grad[70] vs paper_forward: mean_abs=0.5237758159637451, max_abs=4.0, mean_rel=0.1567673534154892, max_rel=696.611572265625, norm_rel=0.022768065333366394, ref_abs_avg=22.993915557861328, test_abs_avg=22.998205184936523
production_forward grad[71] vs paper_forward: mean_abs=0.3924999237060547, max_abs=2.1875, mean_rel=0.07799942791461945, max_rel=5.800799369812012, norm_rel=0.021429134532809258, ref_abs_avg=18.818639755249023, test_abs_avg=18.835237503051758
production_forward grad[72] vs paper_forward: mean_abs=0.5175331830978394, max_abs=4.0, mean_rel=0.14728344976902008, max_rel=799.4199829101562, norm_rel=0.022354064509272575, ref_abs_avg=23.099628448486328, test_abs_avg=23.102031707763672
production_forward grad[73] vs paper_forward: mean_abs=0.501850962638855, max_abs=4.5, mean_rel=0.14779895544052124, max_rel=1229.26318359375, norm_rel=0.02268858626484871, ref_abs_avg=22.11975860595703, test_abs_avg=22.117158889770508
production_forward grad[74] vs paper_forward: mean_abs=0.458709716796875, max_abs=2.03125, mean_rel=0.1552574783563614, max_rel=27.775959014892578, norm_rel=0.022908685728907585, ref_abs_avg=20.387224197387695, test_abs_avg=20.390323638916016
production_forward grad[75] vs paper_forward: mean_abs=0.5763847231864929, max_abs=4.0, mean_rel=0.15550333261489868, max_rel=507.99249267578125, norm_rel=0.02421184442937374, ref_abs_avg=23.79798126220703, test_abs_avg=23.797443389892578
production_forward grad[76] vs paper_forward: mean_abs=0.565446138381958, max_abs=4.0, mean_rel=0.15554696321487427, max_rel=1746.4658203125, norm_rel=0.02398206852376461, ref_abs_avg=23.613859176635742, test_abs_avg=23.612781524658203
production_forward grad[77] vs paper_forward: mean_abs=0.4342050552368164, max_abs=1.625, mean_rel=0.07854831218719482, max_rel=9.39332103729248, norm_rel=0.02293088287115097, ref_abs_avg=19.395523071289062, test_abs_avg=19.361068725585938
production_forward grad[78] vs paper_forward: mean_abs=0.5348215103149414, max_abs=4.0, mean_rel=0.15200579166412354, max_rel=521.3605346679688, norm_rel=0.023519273847341537, ref_abs_avg=22.71481704711914, test_abs_avg=22.71446990966797
production_forward grad[79] vs paper_forward: mean_abs=0.523628830909729, max_abs=4.03125, mean_rel=0.15011513233184814, max_rel=756.4760131835938, norm_rel=0.023272860795259476, ref_abs_avg=22.542613983154297, test_abs_avg=22.53989028930664
production_forward grad[80] vs paper_forward: mean_abs=0.4101526737213135, max_abs=1.75, mean_rel=0.13399925827980042, max_rel=13.006505012512207, norm_rel=0.02302118018269539, ref_abs_avg=18.026273727416992, test_abs_avg=18.04175567626953
production_forward grad[81] vs paper_forward: mean_abs=0.49719512462615967, max_abs=5.25, mean_rel=0.147199809551239, max_rel=1242.388671875, norm_rel=0.02297850139439106, ref_abs_avg=21.636795043945312, test_abs_avg=21.636249542236328
production_forward grad[82] vs paper_forward: mean_abs=0.4831356108188629, max_abs=4.0, mean_rel=0.15340575575828552, max_rel=1413.2650146484375, norm_rel=0.023055503144860268, ref_abs_avg=20.983394622802734, test_abs_avg=20.97500228881836
production_forward grad[83] vs paper_forward: mean_abs=0.3955650329589844, max_abs=1.75, mean_rel=0.11677103489637375, max_rel=8.430121421813965, norm_rel=0.022962000221014023, ref_abs_avg=17.531328201293945, test_abs_avg=17.570945739746094
production_forward grad[84] vs paper_forward: mean_abs=0.4624631404876709, max_abs=4.5, mean_rel=0.14826169610023499, max_rel=1036.1683349609375, norm_rel=0.022216113284230232, ref_abs_avg=20.808727264404297, test_abs_avg=20.808338165283203
production_forward grad[85] vs paper_forward: mean_abs=0.4479011595249176, max_abs=4.0, mean_rel=0.1372029185295105, max_rel=720.549560546875, norm_rel=0.02189534530043602, ref_abs_avg=20.53371238708496, test_abs_avg=20.535789489746094
production_forward grad[86] vs paper_forward: mean_abs=0.3457488417625427, max_abs=1.625, mean_rel=0.06168877333402634, max_rel=2.0089943408966064, norm_rel=0.02120848372578621, ref_abs_avg=16.490127563476562, test_abs_avg=16.483001708984375
production_forward grad[87] vs paper_forward: mean_abs=0.43233585357666016, max_abs=3.5, mean_rel=0.14034047722816467, max_rel=1042.385498046875, norm_rel=0.021740693598985672, ref_abs_avg=19.952346801757812, test_abs_avg=19.952899932861328
production_forward grad[88] vs paper_forward: mean_abs=0.417816698551178, max_abs=4.0, mean_rel=0.13063886761665344, max_rel=607.8197021484375, norm_rel=0.02104974538087845, ref_abs_avg=19.90412139892578, test_abs_avg=19.90563201904297
production_forward grad[89] vs paper_forward: mean_abs=0.32657063007354736, max_abs=1.25, mean_rel=0.13265974819660187, max_rel=35.18526840209961, norm_rel=0.02176339365541935, ref_abs_avg=15.640854835510254, test_abs_avg=15.643006324768066
production_forward grad[90] vs paper_forward: mean_abs=0.40805965662002563, max_abs=4.5, mean_rel=0.12803205847740173, max_rel=628.1759643554688, norm_rel=0.021067870780825615, ref_abs_avg=19.496665954589844, test_abs_avg=19.497779846191406
production_forward grad[91] vs paper_forward: mean_abs=0.39226311445236206, max_abs=4.0, mean_rel=0.12688395380973816, max_rel=883.3120727539062, norm_rel=0.020328005775809288, ref_abs_avg=19.32282257080078, test_abs_avg=19.324016571044922
production_forward grad[92] vs paper_forward: mean_abs=0.3052023649215698, max_abs=1.375, mean_rel=0.05357174575328827, max_rel=2.0685579776763916, norm_rel=0.020089907571673393, ref_abs_avg=15.546470642089844, test_abs_avg=15.572834014892578
production_forward grad[93] vs paper_forward: mean_abs=0.37353530526161194, max_abs=3.5, mean_rel=0.13086840510368347, max_rel=914.147216796875, norm_rel=0.02055351808667183, ref_abs_avg=18.32671546936035, test_abs_avg=18.325801849365234
production_forward grad[94] vs paper_forward: mean_abs=0.3689928650856018, max_abs=3.5, mean_rel=0.13171470165252686, max_rel=606.6390380859375, norm_rel=0.020402325317263603, ref_abs_avg=18.289546966552734, test_abs_avg=18.294906616210938
production_forward grad[95] vs paper_forward: mean_abs=0.29212573170661926, max_abs=1.15625, mean_rel=0.09892033785581589, max_rel=11.459933280944824, norm_rel=0.02007126808166504, ref_abs_avg=14.817773818969727, test_abs_avg=14.806751251220703
production_forward grad[96] vs paper_forward: mean_abs=0.3485307991504669, max_abs=3.5, mean_rel=0.12117356061935425, max_rel=445.811279296875, norm_rel=0.019872494041919708, ref_abs_avg=17.805862426757812, test_abs_avg=17.80567169189453
production_forward grad[97] vs paper_forward: mean_abs=0.3374374210834503, max_abs=4.0, mean_rel=0.11729143559932709, max_rel=524.0704345703125, norm_rel=0.01907011866569519, ref_abs_avg=17.797786712646484, test_abs_avg=17.795705795288086
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016384758055210114, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.00858288537710905, max_abs=0.3916015625, mean_rel=0.07457928359508514, max_rel=118.77467346191406, norm_rel=0.020411504432559013, ref_abs_avg=0.4541230797767639, test_abs_avg=0.4541257917881012
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.3413004875183105, max_abs=54.0, mean_rel=0.1616806536912918, max_rel=142.7848358154297, norm_rel=0.020437706261873245, ref_abs_avg=317.38336181640625, test_abs_avg=317.51068115234375
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.237513542175293, max_abs=4.5, mean_rel=0.11130794137716293, max_rel=19.15315055847168, norm_rel=0.022562215104699135, ref_abs_avg=54.7523193359375, test_abs_avg=54.694297790527344
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.547871708869934, max_abs=10.0, mean_rel=0.17982901632785797, max_rel=2633.2412109375, norm_rel=0.024331673979759216, ref_abs_avg=63.9744987487793, test_abs_avg=63.97567367553711
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.512689232826233, max_abs=10.0, mean_rel=0.17969651520252228, max_rel=3217.080810546875, norm_rel=0.02418125420808792, ref_abs_avg=62.95682144165039, test_abs_avg=62.975303649902344
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.1244449615478516, max_abs=4.75, mean_rel=0.08731868118047714, max_rel=8.748224258422852, norm_rel=0.024699239060282707, ref_abs_avg=46.571449279785156, test_abs_avg=46.55664825439453
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.3439863920211792, max_abs=9.0859375, mean_rel=0.1621273308992386, max_rel=2695.176025390625, norm_rel=0.023980621248483658, ref_abs_avg=56.344791412353516, test_abs_avg=56.34661865234375
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.312389612197876, max_abs=8.25, mean_rel=0.15285073220729828, max_rel=969.36376953125, norm_rel=0.023600580170750618, ref_abs_avg=55.85369873046875, test_abs_avg=55.860374450683594
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.9990024566650391, max_abs=3.75, mean_rel=0.08580991625785828, max_rel=3.107287883758545, norm_rel=0.023380348458886147, ref_abs_avg=42.16327667236328, test_abs_avg=42.1573486328125
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.247679352760315, max_abs=8.0, mean_rel=0.1572684496641159, max_rel=1858.20556640625, norm_rel=0.02396431565284729, ref_abs_avg=52.28522491455078, test_abs_avg=52.288917541503906
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.218412160873413, max_abs=7.5, mean_rel=0.16178575158119202, max_rel=913.1408081054688, norm_rel=0.023724084720015526, ref_abs_avg=51.61609649658203, test_abs_avg=51.621246337890625
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9160070419311523, max_abs=3.75, mean_rel=0.077781543135643, max_rel=5.6916069984436035, norm_rel=0.022406620904803276, ref_abs_avg=40.71278762817383, test_abs_avg=40.779869079589844
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.1427767276763916, max_abs=7.5, mean_rel=0.15862531960010529, max_rel=2417.26611328125, norm_rel=0.023651713505387306, ref_abs_avg=48.5710563659668, test_abs_avg=48.57133483886719
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1238811016082764, max_abs=8.0, mean_rel=0.15043142437934875, max_rel=1702.6832275390625, norm_rel=0.023589367046952248, ref_abs_avg=47.86747360229492, test_abs_avg=47.87793731689453
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8639218807220459, max_abs=4.0, mean_rel=0.12893612682819366, max_rel=30.49840545654297, norm_rel=0.02187095582485199, ref_abs_avg=39.67336654663086, test_abs_avg=39.68570327758789
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.0714144706726074, max_abs=7.0, mean_rel=0.1648678183555603, max_rel=1133.651611328125, norm_rel=0.023570207878947258, ref_abs_avg=45.647682189941406, test_abs_avg=45.65403747558594
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0429896116256714, max_abs=6.0, mean_rel=0.17101293802261353, max_rel=2099.505615234375, norm_rel=0.02332461066544056, ref_abs_avg=44.95479965209961, test_abs_avg=44.95653533935547
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.7949919700622559, max_abs=3.25, mean_rel=0.0645986944437027, max_rel=1.9235849380493164, norm_rel=0.023682530969381332, ref_abs_avg=34.05404281616211, test_abs_avg=33.99781036376953
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0074204206466675, max_abs=7.0, mean_rel=0.16097865998744965, max_rel=933.0983276367188, norm_rel=0.02336880750954151, ref_abs_avg=43.30094528198242, test_abs_avg=43.30434799194336
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9892224073410034, max_abs=6.5, mean_rel=0.1513848900794983, max_rel=1241.8736572265625, norm_rel=0.0233262088149786, ref_abs_avg=42.66361999511719, test_abs_avg=42.659942626953125
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7771897315979004, max_abs=3.28125, mean_rel=0.0717802345752716, max_rel=2.9713711738586426, norm_rel=0.02294875495135784, ref_abs_avg=34.91664505004883, test_abs_avg=34.913963317871094
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9496029019355774, max_abs=6.0, mean_rel=0.1614924967288971, max_rel=1997.6470947265625, norm_rel=0.023120807483792305, ref_abs_avg=41.26051330566406, test_abs_avg=41.262855529785156
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9260649085044861, max_abs=5.75, mean_rel=0.14683201909065247, max_rel=817.0863037109375, norm_rel=0.022990597411990166, ref_abs_avg=40.44085693359375, test_abs_avg=40.44049835205078
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7539596557617188, max_abs=2.96875, mean_rel=0.18029841780662537, max_rel=44.05884552001953, norm_rel=0.021809862926602364, ref_abs_avg=33.78462219238281, test_abs_avg=33.74346160888672
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9045714139938354, max_abs=5.5, mean_rel=0.1647167205810547, max_rel=1196.332275390625, norm_rel=0.023059938102960587, ref_abs_avg=39.40210723876953, test_abs_avg=39.40117645263672
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8832203149795532, max_abs=5.75, mean_rel=0.15854918956756592, max_rel=1044.3446044921875, norm_rel=0.022887099534273148, ref_abs_avg=38.75113296508789, test_abs_avg=38.74993896484375
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.9049668312072754, max_abs=3.5, mean_rel=0.16965806484222412, max_rel=39.655303955078125, norm_rel=0.026489591225981712, ref_abs_avg=33.995643615722656, test_abs_avg=33.933372497558594
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.0605201721191406, max_abs=7.375, mean_rel=0.1789703518152237, max_rel=2301.841552734375, norm_rel=0.02501491643488407, ref_abs_avg=42.53125762939453, test_abs_avg=42.53144073486328
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.036247968673706, max_abs=7.5, mean_rel=0.17234674096107483, max_rel=941.6478271484375, norm_rel=0.024820396676659584, ref_abs_avg=41.97158432006836, test_abs_avg=41.97277069091797
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.8318295478820801, max_abs=3.84375, mean_rel=0.12287681549787521, max_rel=9.362887382507324, norm_rel=0.025211159139871597, ref_abs_avg=32.65129089355469, test_abs_avg=32.648746490478516
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.9837197661399841, max_abs=6.0, mean_rel=0.17706209421157837, max_rel=1190.4146728515625, norm_rel=0.025315234437584877, ref_abs_avg=38.97388458251953, test_abs_avg=38.97522735595703
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9723551869392395, max_abs=7.0, mean_rel=0.15917694568634033, max_rel=727.9440307617188, norm_rel=0.02509109303355217, ref_abs_avg=38.89607238769531, test_abs_avg=38.90282440185547
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7040634155273438, max_abs=2.75, mean_rel=0.08868307620286942, max_rel=6.878266334533691, norm_rel=0.024424731731414795, ref_abs_avg=29.312543869018555, test_abs_avg=29.28483772277832
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9086408019065857, max_abs=6.0, mean_rel=0.16321203112602234, max_rel=1040.1202392578125, norm_rel=0.024986736476421356, ref_abs_avg=36.460418701171875, test_abs_avg=36.46147155761719
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.897225022315979, max_abs=5.5, mean_rel=0.15957972407341003, max_rel=820.6250610351562, norm_rel=0.025031177327036858, ref_abs_avg=35.939231872558594, test_abs_avg=35.93999481201172
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.6966533660888672, max_abs=2.5625, mean_rel=0.1085611879825592, max_rel=4.731167793273926, norm_rel=0.02505762130022049, ref_abs_avg=27.477272033691406, test_abs_avg=27.50572967529297
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8633759021759033, max_abs=5.75, mean_rel=0.1662881076335907, max_rel=2337.78564453125, norm_rel=0.024634327739477158, ref_abs_avg=35.13132858276367, test_abs_avg=35.13005447387695
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.843103289604187, max_abs=5.5, mean_rel=0.15823453664779663, max_rel=1428.4310302734375, norm_rel=0.024552548304200172, ref_abs_avg=34.405174255371094, test_abs_avg=34.40608215332031
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6740623116493225, max_abs=2.75, mean_rel=0.08576594293117523, max_rel=5.820430755615234, norm_rel=0.02533170022070408, ref_abs_avg=26.776460647583008, test_abs_avg=26.807762145996094
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8034383654594421, max_abs=5.5, mean_rel=0.16307292878627777, max_rel=1147.574951171875, norm_rel=0.02463444508612156, ref_abs_avg=32.714576721191406, test_abs_avg=32.7148323059082
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7900678515434265, max_abs=5.0, mean_rel=0.15979556739330292, max_rel=901.9860229492188, norm_rel=0.02452143095433712, ref_abs_avg=32.323997497558594, test_abs_avg=32.32097625732422
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6231529712677002, max_abs=2.5, mean_rel=0.11588237434625626, max_rel=20.3450984954834, norm_rel=0.024407653138041496, ref_abs_avg=26.369247436523438, test_abs_avg=26.358753204345703
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7592825889587402, max_abs=6.0, mean_rel=0.16555534303188324, max_rel=2361.9404296875, norm_rel=0.02430751360952854, ref_abs_avg=31.306354522705078, test_abs_avg=31.304262161254883
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7492369413375854, max_abs=5.0, mean_rel=0.1762748509645462, max_rel=1641.3603515625, norm_rel=0.024419790133833885, ref_abs_avg=30.82233428955078, test_abs_avg=30.82004165649414
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5851919054985046, max_abs=2.375, mean_rel=0.09220023453235626, max_rel=11.702921867370605, norm_rel=0.024600787088274956, ref_abs_avg=23.4920711517334, test_abs_avg=23.503589630126953
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7269524335861206, max_abs=5.0, mean_rel=0.1651209145784378, max_rel=1315.5880126953125, norm_rel=0.024142369627952576, ref_abs_avg=30.1920166015625, test_abs_avg=30.193439483642578
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7126561403274536, max_abs=4.75, mean_rel=0.169290691614151, max_rel=1350.8935546875, norm_rel=0.023857783526182175, ref_abs_avg=29.936647415161133, test_abs_avg=29.933059692382812
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5818941593170166, max_abs=2.421875, mean_rel=0.09621885418891907, max_rel=4.182754039764404, norm_rel=0.024888960644602776, ref_abs_avg=23.920665740966797, test_abs_avg=23.95404624938965
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.6941224336624146, max_abs=4.75, mean_rel=0.15718576312065125, max_rel=1572.57177734375, norm_rel=0.023835111409425735, ref_abs_avg=29.194610595703125, test_abs_avg=29.194995880126953
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6851134300231934, max_abs=4.5, mean_rel=0.16608715057373047, max_rel=799.9154663085938, norm_rel=0.02366163581609726, ref_abs_avg=28.99319839477539, test_abs_avg=28.99057388305664
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6432652473449707, max_abs=2.25, mean_rel=0.20311762392520905, max_rel=28.18193244934082, norm_rel=0.024562422186136246, ref_abs_avg=26.155317306518555, test_abs_avg=26.17736053466797
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.7711763381958008, max_abs=5.25, mean_rel=0.1651799976825714, max_rel=2097.686279296875, norm_rel=0.025426339358091354, ref_abs_avg=30.381343841552734, test_abs_avg=30.38048553466797
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.758486270904541, max_abs=5.0, mean_rel=0.184768408536911, max_rel=2487.1337890625, norm_rel=0.025365805253386497, ref_abs_avg=30.0129451751709, test_abs_avg=30.011924743652344
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5757180452346802, max_abs=2.0, mean_rel=0.3469413220882416, max_rel=91.9852066040039, norm_rel=0.023543691262602806, ref_abs_avg=24.183467864990234, test_abs_avg=24.19239044189453
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7144620418548584, max_abs=5.5, mean_rel=0.1640200912952423, max_rel=975.5009155273438, norm_rel=0.02518720179796219, ref_abs_avg=28.39315414428711, test_abs_avg=28.393850326538086
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6984387636184692, max_abs=4.1875, mean_rel=0.15929259359836578, max_rel=624.8662719726562, norm_rel=0.024821171537041664, ref_abs_avg=28.148517608642578, test_abs_avg=28.149555206298828
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5565061569213867, max_abs=1.875, mean_rel=0.17578810453414917, max_rel=29.0639591217041, norm_rel=0.02276185154914856, ref_abs_avg=23.714031219482422, test_abs_avg=23.739404678344727
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6663886904716492, max_abs=5.5, mean_rel=0.1561218500137329, max_rel=1146.9366455078125, norm_rel=0.024632146582007408, ref_abs_avg=27.042261123657227, test_abs_avg=27.042770385742188
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6531530618667603, max_abs=4.0, mean_rel=0.1631849706172943, max_rel=717.693359375, norm_rel=0.024496324360370636, ref_abs_avg=26.692340850830078, test_abs_avg=26.69089698791504
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.4823741912841797, max_abs=2.21875, mean_rel=0.07140614837408066, max_rel=4.18406343460083, norm_rel=0.0234517864882946, ref_abs_avg=21.03232765197754, test_abs_avg=21.0419979095459
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6221694946289062, max_abs=4.0, mean_rel=0.16051822900772095, max_rel=927.8811645507812, norm_rel=0.024343546479940414, ref_abs_avg=25.59555435180664, test_abs_avg=25.596803665161133
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.6143350005149841, max_abs=4.25, mean_rel=0.16146151721477509, max_rel=1204.3839111328125, norm_rel=0.024072876200079918, ref_abs_avg=25.575407028198242, test_abs_avg=25.57193374633789
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4897336959838867, max_abs=2.375, mean_rel=0.12910838425159454, max_rel=12.161723136901855, norm_rel=0.02226361259818077, ref_abs_avg=21.522138595581055, test_abs_avg=21.49122428894043
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.596896231174469, max_abs=4.421875, mean_rel=0.15575098991394043, max_rel=819.273193359375, norm_rel=0.02367766760289669, ref_abs_avg=25.17809295654297, test_abs_avg=25.179397583007812
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5817918181419373, max_abs=4.0, mean_rel=0.15334077179431915, max_rel=1101.44384765625, norm_rel=0.02359050139784813, ref_abs_avg=24.697032928466797, test_abs_avg=24.69808006286621
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.45406150817871094, max_abs=1.5625, mean_rel=0.097474604845047, max_rel=6.6203460693359375, norm_rel=0.022176183760166168, ref_abs_avg=20.235260009765625, test_abs_avg=20.236581802368164
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5609369277954102, max_abs=5.0, mean_rel=0.15142560005187988, max_rel=677.8613891601562, norm_rel=0.023395122960209846, ref_abs_avg=23.954769134521484, test_abs_avg=23.9547119140625
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5460469722747803, max_abs=4.5, mean_rel=0.15193453431129456, max_rel=687.3907470703125, norm_rel=0.02335822954773903, ref_abs_avg=23.416173934936523, test_abs_avg=23.422161102294922
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.41574758291244507, max_abs=1.6875, mean_rel=0.08362998813390732, max_rel=3.10257887840271, norm_rel=0.020733578130602837, ref_abs_avg=20.024185180664062, test_abs_avg=20.063812255859375
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5350914001464844, max_abs=4.0, mean_rel=0.15020601451396942, max_rel=671.3763427734375, norm_rel=0.022935880348086357, ref_abs_avg=23.307003021240234, test_abs_avg=23.308650970458984
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5255086421966553, max_abs=4.0, mean_rel=0.15736348927021027, max_rel=632.2252197265625, norm_rel=0.022850697860121727, ref_abs_avg=22.993915557861328, test_abs_avg=22.9971923828125
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.3988267481327057, max_abs=1.875, mean_rel=0.07978411018848419, max_rel=3.669893503189087, norm_rel=0.02160337008535862, ref_abs_avg=18.818639755249023, test_abs_avg=18.841094970703125
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5201584696769714, max_abs=4.0, mean_rel=0.15085437893867493, max_rel=1083.3802490234375, norm_rel=0.02245025709271431, ref_abs_avg=23.099628448486328, test_abs_avg=23.10152816772461
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.5047128200531006, max_abs=4.0, mean_rel=0.15159858763217926, max_rel=1374.6214599609375, norm_rel=0.02280360274016857, ref_abs_avg=22.11975860595703, test_abs_avg=22.116065979003906
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.45026350021362305, max_abs=2.125, mean_rel=0.19785179197788239, max_rel=47.02435302734375, norm_rel=0.022223804146051407, ref_abs_avg=20.387224197387695, test_abs_avg=20.398500442504883
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5746530294418335, max_abs=4.328125, mean_rel=0.15855468809604645, max_rel=968.3011474609375, norm_rel=0.02413981966674328, ref_abs_avg=23.79798126220703, test_abs_avg=23.79759979248047
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5641456842422485, max_abs=4.0, mean_rel=0.1528976559638977, max_rel=1365.8160400390625, norm_rel=0.02391720563173294, ref_abs_avg=23.613859176635742, test_abs_avg=23.61227035522461
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.4345870018005371, max_abs=2.0, mean_rel=0.07719898223876953, max_rel=8.031326293945312, norm_rel=0.02264281176030636, ref_abs_avg=19.395523071289062, test_abs_avg=19.393001556396484
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5356212854385376, max_abs=4.5, mean_rel=0.15211179852485657, max_rel=512.164794921875, norm_rel=0.023533957079052925, ref_abs_avg=22.71481704711914, test_abs_avg=22.715368270874023
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.5242905616760254, max_abs=3.6875, mean_rel=0.15392670035362244, max_rel=1055.788818359375, norm_rel=0.023306334391236305, ref_abs_avg=22.542613983154297, test_abs_avg=22.542747497558594
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.4008563160896301, max_abs=1.625, mean_rel=0.09649594873189926, max_rel=7.396421432495117, norm_rel=0.022085539996623993, ref_abs_avg=18.026273727416992, test_abs_avg=18.053863525390625
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.49819308519363403, max_abs=5.0, mean_rel=0.1511724442243576, max_rel=1326.937744140625, norm_rel=0.023018987849354744, ref_abs_avg=21.636795043945312, test_abs_avg=21.637996673583984
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4840244948863983, max_abs=3.5, mean_rel=0.14820954203605652, max_rel=1194.7293701171875, norm_rel=0.023121317848563194, ref_abs_avg=20.983394622802734, test_abs_avg=20.9787654876709
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.39250990748405457, max_abs=1.625, mean_rel=0.11380806565284729, max_rel=7.608338356018066, norm_rel=0.022812779992818832, ref_abs_avg=17.531328201293945, test_abs_avg=17.577577590942383
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.46379268169403076, max_abs=4.5, mean_rel=0.14886072278022766, max_rel=851.116943359375, norm_rel=0.02226565219461918, ref_abs_avg=20.808727264404297, test_abs_avg=20.80840301513672
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.4476235508918762, max_abs=4.140625, mean_rel=0.14038150012493134, max_rel=735.94140625, norm_rel=0.021891923621296883, ref_abs_avg=20.53371238708496, test_abs_avg=20.535297393798828
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.3647480010986328, max_abs=1.375, mean_rel=0.06884853541851044, max_rel=2.9446909427642822, norm_rel=0.021955907344818115, ref_abs_avg=16.490127563476562, test_abs_avg=16.473634719848633
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4330163300037384, max_abs=3.75, mean_rel=0.1411622166633606, max_rel=816.1693725585938, norm_rel=0.0217902734875679, ref_abs_avg=19.952346801757812, test_abs_avg=19.952899932861328
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.4165509343147278, max_abs=4.0, mean_rel=0.1278207004070282, max_rel=600.0271606445312, norm_rel=0.020974712446331978, ref_abs_avg=19.90412139892578, test_abs_avg=19.90697479248047
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.33071744441986084, max_abs=1.3125, mean_rel=0.08842925727367401, max_rel=16.188579559326172, norm_rel=0.02218964323401451, ref_abs_avg=15.640854835510254, test_abs_avg=15.646064758300781
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.40851545333862305, max_abs=4.0, mean_rel=0.13104918599128723, max_rel=706.7531127929688, norm_rel=0.021090669557452202, ref_abs_avg=19.496665954589844, test_abs_avg=19.497774124145508
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.39236995577812195, max_abs=4.0, mean_rel=0.13125231862068176, max_rel=857.70361328125, norm_rel=0.020319491624832153, ref_abs_avg=19.32282257080078, test_abs_avg=19.32135772705078
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.30142784118652344, max_abs=1.25, mean_rel=0.05497026443481445, max_rel=1.944856882095337, norm_rel=0.019949791952967644, ref_abs_avg=15.546470642089844, test_abs_avg=15.549566268920898
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.3738468587398529, max_abs=3.75, mean_rel=0.1319352388381958, max_rel=904.1604614257812, norm_rel=0.020585287362337112, ref_abs_avg=18.32671546936035, test_abs_avg=18.325855255126953
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.36658525466918945, max_abs=3.5, mean_rel=0.13098031282424927, max_rel=607.3897094726562, norm_rel=0.02025788463652134, ref_abs_avg=18.289546966552734, test_abs_avg=18.299091339111328
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.2961566150188446, max_abs=1.0, mean_rel=0.12817803025245667, max_rel=18.428503036499023, norm_rel=0.019740238785743713, ref_abs_avg=14.817773818969727, test_abs_avg=14.801872253417969
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3492452800273895, max_abs=3.5, mean_rel=0.12141206860542297, max_rel=385.75457763671875, norm_rel=0.019923200830817223, ref_abs_avg=17.805862426757812, test_abs_avg=17.80564308166504
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.3401947021484375, max_abs=3.5, mean_rel=0.11672624945640564, max_rel=596.8592529296875, norm_rel=0.019188376143574715, ref_abs_avg=17.797786712646484, test_abs_avg=17.792770385742188
production_forward2 vs paper_forward output: mean_abs=0.0016348997596651316, max_abs=0.0390625
production_forward2 grad[0] vs paper_forward: mean_abs=0.008565776981413364, max_abs=0.2890625, mean_rel=0.07447803765535355, max_rel=111.3481674194336, norm_rel=0.020358353853225708, ref_abs_avg=0.4541230797767639, test_abs_avg=0.45412230491638184
production_forward2 grad[1] vs paper_forward: mean_abs=7.326354503631592, max_abs=56.0, mean_rel=0.17353107035160065, max_rel=191.8162384033203, norm_rel=0.02039644680917263, ref_abs_avg=317.38336181640625, test_abs_avg=317.46405029296875
production_forward2 grad[2] vs paper_forward: mean_abs=1.2887506484985352, max_abs=4.75, mean_rel=0.10308221727609634, max_rel=12.858253479003906, norm_rel=0.02358374372124672, ref_abs_avg=54.7523193359375, test_abs_avg=54.684478759765625
production_forward2 grad[3] vs paper_forward: mean_abs=1.5455069541931152, max_abs=10.0, mean_rel=0.17342916131019592, max_rel=1991.4322509765625, norm_rel=0.024279901757836342, ref_abs_avg=63.9744987487793, test_abs_avg=63.975013732910156
production_forward2 grad[4] vs paper_forward: mean_abs=1.5069239139556885, max_abs=9.0, mean_rel=0.1757715344429016, max_rel=2952.829833984375, norm_rel=0.024078788235783577, ref_abs_avg=62.95682144165039, test_abs_avg=62.972808837890625
production_forward2 grad[5] vs paper_forward: mean_abs=1.1432809829711914, max_abs=5.25, mean_rel=0.09948858618736267, max_rel=11.497004508972168, norm_rel=0.025140095502138138, ref_abs_avg=46.571449279785156, test_abs_avg=46.59681701660156
production_forward2 grad[6] vs paper_forward: mean_abs=1.33930504322052, max_abs=10.375, mean_rel=0.16038000583648682, max_rel=1529.2987060546875, norm_rel=0.02391067147254944, ref_abs_avg=56.344791412353516, test_abs_avg=56.343666076660156
production_forward2 grad[7] vs paper_forward: mean_abs=1.3138659000396729, max_abs=7.75, mean_rel=0.16315977275371552, max_rel=2409.9091796875, norm_rel=0.023620663210749626, ref_abs_avg=55.85369873046875, test_abs_avg=55.85751724243164
production_forward2 grad[8] vs paper_forward: mean_abs=1.0127191543579102, max_abs=4.125, mean_rel=0.08729620277881622, max_rel=3.0206589698791504, norm_rel=0.023317260667681694, ref_abs_avg=42.16327667236328, test_abs_avg=42.184593200683594
production_forward2 grad[9] vs paper_forward: mean_abs=1.2432941198349, max_abs=9.0, mean_rel=0.15858711302280426, max_rel=1331.4437255859375, norm_rel=0.023887619376182556, ref_abs_avg=52.28522491455078, test_abs_avg=52.287872314453125
production_forward2 grad[10] vs paper_forward: mean_abs=1.217618465423584, max_abs=7.0, mean_rel=0.15957920253276825, max_rel=1076.2958984375, norm_rel=0.02370074763894081, ref_abs_avg=51.61609649658203, test_abs_avg=51.616119384765625
production_forward2 grad[11] vs paper_forward: mean_abs=0.9005756378173828, max_abs=3.0, mean_rel=0.09099341928958893, max_rel=12.788169860839844, norm_rel=0.022107060998678207, ref_abs_avg=40.71278762817383, test_abs_avg=40.74359130859375
production_forward2 grad[12] vs paper_forward: mean_abs=1.1389132738113403, max_abs=7.5, mean_rel=0.15815860033035278, max_rel=1421.6480712890625, norm_rel=0.02356136403977871, ref_abs_avg=48.5710563659668, test_abs_avg=48.572391510009766
production_forward2 grad[13] vs paper_forward: mean_abs=1.1205024719238281, max_abs=7.25, mean_rel=0.14769713580608368, max_rel=1666.4468994140625, norm_rel=0.023516375571489334, ref_abs_avg=47.86747360229492, test_abs_avg=47.876441955566406
production_forward2 grad[14] vs paper_forward: mean_abs=0.8561363220214844, max_abs=3.5, mean_rel=0.1136741042137146, max_rel=24.763431549072266, norm_rel=0.02166728861629963, ref_abs_avg=39.67336654663086, test_abs_avg=39.60627746582031
production_forward2 grad[15] vs paper_forward: mean_abs=1.067309856414795, max_abs=6.5, mean_rel=0.1687374711036682, max_rel=1073.28955078125, norm_rel=0.023475967347621918, ref_abs_avg=45.647682189941406, test_abs_avg=45.653377532958984
production_forward2 grad[16] vs paper_forward: mean_abs=1.039829969406128, max_abs=6.5, mean_rel=0.17009645700454712, max_rel=1804.8682861328125, norm_rel=0.02324513904750347, ref_abs_avg=44.95479965209961, test_abs_avg=44.95494079589844
production_forward2 grad[17] vs paper_forward: mean_abs=0.7979273796081543, max_abs=2.8125, mean_rel=0.0662761852145195, max_rel=1.798173427581787, norm_rel=0.024005398154258728, ref_abs_avg=34.05404281616211, test_abs_avg=33.982295989990234
production_forward2 grad[18] vs paper_forward: mean_abs=1.0029973983764648, max_abs=6.0, mean_rel=0.16226813197135925, max_rel=1188.3790283203125, norm_rel=0.02327067218720913, ref_abs_avg=43.30094528198242, test_abs_avg=43.30110549926758
production_forward2 grad[19] vs paper_forward: mean_abs=0.9861695766448975, max_abs=6.0, mean_rel=0.15133732557296753, max_rel=1774.17333984375, norm_rel=0.0232499111443758, ref_abs_avg=42.66361999511719, test_abs_avg=42.6614990234375
production_forward2 grad[20] vs paper_forward: mean_abs=0.777574896812439, max_abs=3.0, mean_rel=0.06290087848901749, max_rel=1.6284445524215698, norm_rel=0.02264157123863697, ref_abs_avg=34.91664505004883, test_abs_avg=34.92572021484375
production_forward2 grad[21] vs paper_forward: mean_abs=0.9458100199699402, max_abs=6.0, mean_rel=0.1621413230895996, max_rel=2139.318115234375, norm_rel=0.023035382851958275, ref_abs_avg=41.26051330566406, test_abs_avg=41.263877868652344
production_forward2 grad[22] vs paper_forward: mean_abs=0.9241858720779419, max_abs=6.0, mean_rel=0.1454336792230606, max_rel=1040.082763671875, norm_rel=0.02294621430337429, ref_abs_avg=40.44085693359375, test_abs_avg=40.44156265258789
production_forward2 grad[23] vs paper_forward: mean_abs=0.7441964149475098, max_abs=2.466552734375, mean_rel=0.1726422756910324, max_rel=40.2356071472168, norm_rel=0.02161998115479946, ref_abs_avg=33.78462219238281, test_abs_avg=33.746768951416016
production_forward2 grad[24] vs paper_forward: mean_abs=0.9020897150039673, max_abs=6.0, mean_rel=0.16136455535888672, max_rel=965.66064453125, norm_rel=0.022989582270383835, ref_abs_avg=39.40210723876953, test_abs_avg=39.400875091552734
production_forward2 grad[25] vs paper_forward: mean_abs=0.8826518058776855, max_abs=6.0625, mean_rel=0.16163478791713715, max_rel=1194.26318359375, norm_rel=0.02287069335579872, ref_abs_avg=38.75113296508789, test_abs_avg=38.75019454956055
production_forward2 grad[26] vs paper_forward: mean_abs=0.9020791053771973, max_abs=3.875, mean_rel=0.1754191517829895, max_rel=39.25303268432617, norm_rel=0.02657410129904747, ref_abs_avg=33.995643615722656, test_abs_avg=33.9229850769043
production_forward2 grad[27] vs paper_forward: mean_abs=1.0584158897399902, max_abs=7.5, mean_rel=0.1739795207977295, max_rel=2156.151123046875, norm_rel=0.02497238852083683, ref_abs_avg=42.53125762939453, test_abs_avg=42.53117752075195
production_forward2 grad[28] vs paper_forward: mean_abs=1.0315583944320679, max_abs=7.0, mean_rel=0.17675969004631042, max_rel=1451.855712890625, norm_rel=0.024702496826648712, ref_abs_avg=41.97158432006836, test_abs_avg=41.97212219238281
production_forward2 grad[29] vs paper_forward: mean_abs=0.813748836517334, max_abs=3.0, mean_rel=0.13016411662101746, max_rel=16.129974365234375, norm_rel=0.025279471650719643, ref_abs_avg=32.65129089355469, test_abs_avg=32.637271881103516
production_forward2 grad[30] vs paper_forward: mean_abs=0.9808282256126404, max_abs=6.25, mean_rel=0.1726851761341095, max_rel=1213.8115234375, norm_rel=0.025241397321224213, ref_abs_avg=38.97388458251953, test_abs_avg=38.97502899169922
production_forward2 grad[31] vs paper_forward: mean_abs=0.9708449840545654, max_abs=6.5, mean_rel=0.1583075374364853, max_rel=686.6512451171875, norm_rel=0.0250492412596941, ref_abs_avg=38.89607238769531, test_abs_avg=38.90205383300781
production_forward2 grad[32] vs paper_forward: mean_abs=0.6799526214599609, max_abs=2.875, mean_rel=0.08149585127830505, max_rel=7.992622375488281, norm_rel=0.023332033306360245, ref_abs_avg=29.312543869018555, test_abs_avg=29.290428161621094
production_forward2 grad[33] vs paper_forward: mean_abs=0.9063519835472107, max_abs=6.0, mean_rel=0.16131803393363953, max_rel=1065.32080078125, norm_rel=0.024918818846344948, ref_abs_avg=36.460418701171875, test_abs_avg=36.46268844604492
production_forward2 grad[34] vs paper_forward: mean_abs=0.8913007974624634, max_abs=5.5, mean_rel=0.1573401540517807, max_rel=619.3009033203125, norm_rel=0.02487991191446781, ref_abs_avg=35.939231872558594, test_abs_avg=35.93887710571289
production_forward2 grad[35] vs paper_forward: mean_abs=0.7238041758537292, max_abs=2.75, mean_rel=0.1511722207069397, max_rel=14.124539375305176, norm_rel=0.026146462187170982, ref_abs_avg=27.477272033691406, test_abs_avg=27.498973846435547
production_forward2 grad[36] vs paper_forward: mean_abs=0.8617565035820007, max_abs=6.0, mean_rel=0.164899080991745, max_rel=2304.54296875, norm_rel=0.02458563819527626, ref_abs_avg=35.13132858276367, test_abs_avg=35.12995147705078
production_forward2 grad[37] vs paper_forward: mean_abs=0.8389948606491089, max_abs=5.0, mean_rel=0.16041028499603271, max_rel=1186.713623046875, norm_rel=0.02444816194474697, ref_abs_avg=34.405174255371094, test_abs_avg=34.40599822998047
production_forward2 grad[38] vs paper_forward: mean_abs=0.658050537109375, max_abs=3.25, mean_rel=0.07777445763349533, max_rel=5.334799766540527, norm_rel=0.02524135634303093, ref_abs_avg=26.776460647583008, test_abs_avg=26.778562545776367
production_forward2 grad[39] vs paper_forward: mean_abs=0.8019747138023376, max_abs=5.015625, mean_rel=0.1662464141845703, max_rel=1333.236328125, norm_rel=0.024576447904109955, ref_abs_avg=32.714576721191406, test_abs_avg=32.714866638183594
production_forward2 grad[40] vs paper_forward: mean_abs=0.7859499454498291, max_abs=4.625, mean_rel=0.15962259471416473, max_rel=838.91748046875, norm_rel=0.02439616434276104, ref_abs_avg=32.323997497558594, test_abs_avg=32.319637298583984
production_forward2 grad[41] vs paper_forward: mean_abs=0.6113953590393066, max_abs=2.5, mean_rel=0.11688163876533508, max_rel=20.804615020751953, norm_rel=0.0240621455013752, ref_abs_avg=26.369247436523438, test_abs_avg=26.382137298583984
production_forward2 grad[42] vs paper_forward: mean_abs=0.7582073211669922, max_abs=6.0, mean_rel=0.17102177441120148, max_rel=1721.02685546875, norm_rel=0.02426263689994812, ref_abs_avg=31.306354522705078, test_abs_avg=31.30521583557129
production_forward2 grad[43] vs paper_forward: mean_abs=0.7491390705108643, max_abs=4.625, mean_rel=0.17327025532722473, max_rel=1450.5072021484375, norm_rel=0.02438960410654545, ref_abs_avg=30.82233428955078, test_abs_avg=30.81718635559082
production_forward2 grad[44] vs paper_forward: mean_abs=0.5658540725708008, max_abs=2.125, mean_rel=0.08852903544902802, max_rel=9.51695442199707, norm_rel=0.02380801923573017, ref_abs_avg=23.4920711517334, test_abs_avg=23.491817474365234
production_forward2 grad[45] vs paper_forward: mean_abs=0.7246523499488831, max_abs=4.78125, mean_rel=0.1660630702972412, max_rel=1551.2650146484375, norm_rel=0.024059653282165527, ref_abs_avg=30.1920166015625, test_abs_avg=30.1942138671875
production_forward2 grad[46] vs paper_forward: mean_abs=0.7102499604225159, max_abs=4.75, mean_rel=0.16488602757453918, max_rel=1048.509521484375, norm_rel=0.02377963997423649, ref_abs_avg=29.936647415161133, test_abs_avg=29.935626983642578
production_forward2 grad[47] vs paper_forward: mean_abs=0.5824856758117676, max_abs=2.578125, mean_rel=0.08909986168146133, max_rel=4.297665119171143, norm_rel=0.024794811382889748, ref_abs_avg=23.920665740966797, test_abs_avg=23.954631805419922
production_forward2 grad[48] vs paper_forward: mean_abs=0.6923770904541016, max_abs=4.5, mean_rel=0.1576852947473526, max_rel=1872.1475830078125, norm_rel=0.023767435923218727, ref_abs_avg=29.194610595703125, test_abs_avg=29.194515228271484
production_forward2 grad[49] vs paper_forward: mean_abs=0.6844409108161926, max_abs=4.5, mean_rel=0.1587906777858734, max_rel=790.6094360351562, norm_rel=0.02365676499903202, ref_abs_avg=28.99319839477539, test_abs_avg=28.993162155151367
production_forward2 grad[50] vs paper_forward: mean_abs=0.6605925559997559, max_abs=2.375, mean_rel=0.2045658826828003, max_rel=28.482622146606445, norm_rel=0.024923143908381462, ref_abs_avg=26.155317306518555, test_abs_avg=26.144054412841797
production_forward2 grad[51] vs paper_forward: mean_abs=0.7705496549606323, max_abs=5.75, mean_rel=0.1697641760110855, max_rel=2253.875732421875, norm_rel=0.025419654324650764, ref_abs_avg=30.381343841552734, test_abs_avg=30.38106918334961
production_forward2 grad[52] vs paper_forward: mean_abs=0.7574142217636108, max_abs=4.75, mean_rel=0.1771886646747589, max_rel=2188.14990234375, norm_rel=0.025321023538708687, ref_abs_avg=30.0129451751709, test_abs_avg=30.01494789123535
production_forward2 grad[53] vs paper_forward: mean_abs=0.5663262605667114, max_abs=2.25, mean_rel=0.3006945848464966, max_rel=64.33309173583984, norm_rel=0.023351002484560013, ref_abs_avg=24.183467864990234, test_abs_avg=24.169143676757812
production_forward2 grad[54] vs paper_forward: mean_abs=0.7120821475982666, max_abs=5.1953125, mean_rel=0.1619052141904831, max_rel=844.22119140625, norm_rel=0.02509230747818947, ref_abs_avg=28.39315414428711, test_abs_avg=28.393505096435547
production_forward2 grad[55] vs paper_forward: mean_abs=0.6960882544517517, max_abs=4.5, mean_rel=0.1561134159564972, max_rel=525.2892456054688, norm_rel=0.024745237082242966, ref_abs_avg=28.148517608642578, test_abs_avg=28.150550842285156
production_forward2 grad[56] vs paper_forward: mean_abs=0.5479588508605957, max_abs=1.859375, mean_rel=0.14190077781677246, max_rel=29.397069931030273, norm_rel=0.022880032658576965, ref_abs_avg=23.714031219482422, test_abs_avg=23.7562255859375
production_forward2 grad[57] vs paper_forward: mean_abs=0.6642690896987915, max_abs=5.5, mean_rel=0.15495607256889343, max_rel=671.495361328125, norm_rel=0.02455310709774494, ref_abs_avg=27.042261123657227, test_abs_avg=27.043415069580078
production_forward2 grad[58] vs paper_forward: mean_abs=0.652316153049469, max_abs=4.25, mean_rel=0.16309624910354614, max_rel=694.2220458984375, norm_rel=0.024456070736050606, ref_abs_avg=26.692340850830078, test_abs_avg=26.6885986328125
production_forward2 grad[59] vs paper_forward: mean_abs=0.5087273716926575, max_abs=2.0, mean_rel=0.08799853920936584, max_rel=9.615384101867676, norm_rel=0.02430293895304203, ref_abs_avg=21.03232765197754, test_abs_avg=21.03521728515625
production_forward2 grad[60] vs paper_forward: mean_abs=0.6207685470581055, max_abs=4.0, mean_rel=0.15859979391098022, max_rel=801.0142211914062, norm_rel=0.024291042238473892, ref_abs_avg=25.59555435180664, test_abs_avg=25.597383499145508
production_forward2 grad[61] vs paper_forward: mean_abs=0.6121047139167786, max_abs=3.75, mean_rel=0.16281777620315552, max_rel=1187.5328369140625, norm_rel=0.023992620408535004, ref_abs_avg=25.575407028198242, test_abs_avg=25.57379913330078
production_forward2 grad[62] vs paper_forward: mean_abs=0.4883425533771515, max_abs=2.4375, mean_rel=0.11624015122652054, max_rel=11.273465156555176, norm_rel=0.022684691473841667, ref_abs_avg=21.522138595581055, test_abs_avg=21.505630493164062
production_forward2 grad[63] vs paper_forward: mean_abs=0.5957108736038208, max_abs=4.5, mean_rel=0.15631166100502014, max_rel=1049.2783203125, norm_rel=0.02362157590687275, ref_abs_avg=25.17809295654297, test_abs_avg=25.17926788330078
production_forward2 grad[64] vs paper_forward: mean_abs=0.5806975364685059, max_abs=4.0, mean_rel=0.15481728315353394, max_rel=822.4780883789062, norm_rel=0.02353336289525032, ref_abs_avg=24.697032928466797, test_abs_avg=24.695663452148438
production_forward2 grad[65] vs paper_forward: mean_abs=0.4512944221496582, max_abs=1.9375, mean_rel=0.11025677621364594, max_rel=10.379061698913574, norm_rel=0.022181857377290726, ref_abs_avg=20.235260009765625, test_abs_avg=20.240066528320312
production_forward2 grad[66] vs paper_forward: mean_abs=0.558988094329834, max_abs=4.0, mean_rel=0.15179744362831116, max_rel=1074.228515625, norm_rel=0.023320209234952927, ref_abs_avg=23.954769134521484, test_abs_avg=23.954483032226562
production_forward2 grad[67] vs paper_forward: mean_abs=0.5444265007972717, max_abs=4.25, mean_rel=0.14847120642662048, max_rel=678.3516845703125, norm_rel=0.023271318525075912, ref_abs_avg=23.416173934936523, test_abs_avg=23.422292709350586
production_forward2 grad[68] vs paper_forward: mean_abs=0.41402482986450195, max_abs=1.4375, mean_rel=0.0856795683503151, max_rel=3.424960136413574, norm_rel=0.020218845456838608, ref_abs_avg=20.024185180664062, test_abs_avg=20.062427520751953
production_forward2 grad[69] vs paper_forward: mean_abs=0.5334421396255493, max_abs=3.75, mean_rel=0.14862939715385437, max_rel=803.743896484375, norm_rel=0.022861424833536148, ref_abs_avg=23.307003021240234, test_abs_avg=23.30877113342285
production_forward2 grad[70] vs paper_forward: mean_abs=0.5243382453918457, max_abs=4.0, mean_rel=0.1563693881034851, max_rel=649.5408325195312, norm_rel=0.022787317633628845, ref_abs_avg=22.993915557861328, test_abs_avg=22.998600006103516
production_forward2 grad[71] vs paper_forward: mean_abs=0.400299072265625, max_abs=1.875, mean_rel=0.0766187533736229, max_rel=3.5515098571777344, norm_rel=0.021724727004766464, ref_abs_avg=18.818639755249023, test_abs_avg=18.83562660217285
production_forward2 grad[72] vs paper_forward: mean_abs=0.5187288522720337, max_abs=4.0, mean_rel=0.14830413460731506, max_rel=965.04931640625, norm_rel=0.02240673452615738, ref_abs_avg=23.099628448486328, test_abs_avg=23.101261138916016
production_forward2 grad[73] vs paper_forward: mean_abs=0.5022870898246765, max_abs=3.5, mean_rel=0.14990471303462982, max_rel=1241.37646484375, norm_rel=0.022710435092449188, ref_abs_avg=22.11975860595703, test_abs_avg=22.116737365722656
production_forward2 grad[74] vs paper_forward: mean_abs=0.45448827743530273, max_abs=1.875, mean_rel=0.18387718498706818, max_rel=32.88267517089844, norm_rel=0.022605624049901962, ref_abs_avg=20.387224197387695, test_abs_avg=20.38543701171875
production_forward2 grad[75] vs paper_forward: mean_abs=0.5728627443313599, max_abs=4.0, mean_rel=0.15578001737594604, max_rel=1047.794677734375, norm_rel=0.024064216762781143, ref_abs_avg=23.79798126220703, test_abs_avg=23.796710968017578
production_forward2 grad[76] vs paper_forward: mean_abs=0.5632173418998718, max_abs=4.0, mean_rel=0.1510501503944397, max_rel=1462.8443603515625, norm_rel=0.023875297978520393, ref_abs_avg=23.613859176635742, test_abs_avg=23.612808227539062
production_forward2 grad[77] vs paper_forward: mean_abs=0.44231700897216797, max_abs=1.625, mean_rel=0.07020828127861023, max_rel=4.448688507080078, norm_rel=0.02309664897620678, ref_abs_avg=19.395523071289062, test_abs_avg=19.36064338684082
production_forward2 grad[78] vs paper_forward: mean_abs=0.5338774919509888, max_abs=4.0, mean_rel=0.15114739537239075, max_rel=494.3660888671875, norm_rel=0.023485349491238594, ref_abs_avg=22.71481704711914, test_abs_avg=22.713760375976562
production_forward2 grad[79] vs paper_forward: mean_abs=0.5228403806686401, max_abs=4.125, mean_rel=0.14921137690544128, max_rel=731.1016235351562, norm_rel=0.023249970749020576, ref_abs_avg=22.542613983154297, test_abs_avg=22.540090560913086
production_forward2 grad[80] vs paper_forward: mean_abs=0.4042675495147705, max_abs=1.5, mean_rel=0.12199121713638306, max_rel=10.845873832702637, norm_rel=0.022409480065107346, ref_abs_avg=18.026273727416992, test_abs_avg=18.036510467529297
production_forward2 grad[81] vs paper_forward: mean_abs=0.4967741072177887, max_abs=5.0, mean_rel=0.1467556208372116, max_rel=1209.8697509765625, norm_rel=0.022954007610678673, ref_abs_avg=21.636795043945312, test_abs_avg=21.636526107788086
production_forward2 grad[82] vs paper_forward: mean_abs=0.4827340245246887, max_abs=4.0, mean_rel=0.15309351682662964, max_rel=1493.394775390625, norm_rel=0.02301189675927162, ref_abs_avg=20.983394622802734, test_abs_avg=20.974525451660156
production_forward2 grad[83] vs paper_forward: mean_abs=0.3929297924041748, max_abs=1.75, mean_rel=0.11859490722417831, max_rel=8.265764236450195, norm_rel=0.023025931790471077, ref_abs_avg=17.531328201293945, test_abs_avg=17.582691192626953
production_forward2 grad[84] vs paper_forward: mean_abs=0.46229302883148193, max_abs=4.5, mean_rel=0.14547991752624512, max_rel=1035.5396728515625, norm_rel=0.022212624549865723, ref_abs_avg=20.808727264404297, test_abs_avg=20.807472229003906
production_forward2 grad[85] vs paper_forward: mean_abs=0.4480755627155304, max_abs=4.25, mean_rel=0.13749226927757263, max_rel=695.9226684570312, norm_rel=0.021909698843955994, ref_abs_avg=20.53371238708496, test_abs_avg=20.534950256347656
production_forward2 grad[86] vs paper_forward: mean_abs=0.34848499298095703, max_abs=1.625, mean_rel=0.05908475071191788, max_rel=2.202521324157715, norm_rel=0.02130064368247986, ref_abs_avg=16.490127563476562, test_abs_avg=16.485401153564453
production_forward2 grad[87] vs paper_forward: mean_abs=0.43251949548721313, max_abs=3.5, mean_rel=0.14114533364772797, max_rel=1111.400634765625, norm_rel=0.021760880947113037, ref_abs_avg=19.952346801757812, test_abs_avg=19.952594757080078
production_forward2 grad[88] vs paper_forward: mean_abs=0.41780364513397217, max_abs=4.0, mean_rel=0.13120430707931519, max_rel=580.5458374023438, norm_rel=0.021049056202173233, ref_abs_avg=19.90412139892578, test_abs_avg=19.906024932861328
production_forward2 grad[89] vs paper_forward: mean_abs=0.32804977893829346, max_abs=1.25, mean_rel=0.14882589876651764, max_rel=43.777408599853516, norm_rel=0.021909387782216072, ref_abs_avg=15.640854835510254, test_abs_avg=15.644926071166992
production_forward2 grad[90] vs paper_forward: mean_abs=0.4083549678325653, max_abs=4.25, mean_rel=0.12739941477775574, max_rel=601.9835815429688, norm_rel=0.02108280546963215, ref_abs_avg=19.496665954589844, test_abs_avg=19.49787712097168
production_forward2 grad[91] vs paper_forward: mean_abs=0.3924593925476074, max_abs=4.0, mean_rel=0.12721315026283264, max_rel=889.7142333984375, norm_rel=0.020339278504252434, ref_abs_avg=19.32282257080078, test_abs_avg=19.32413101196289
production_forward2 grad[92] vs paper_forward: mean_abs=0.30867767333984375, max_abs=1.5, mean_rel=0.054435331374406815, max_rel=2.226620674133301, norm_rel=0.020183134824037552, ref_abs_avg=15.546470642089844, test_abs_avg=15.571670532226562
production_forward2 grad[93] vs paper_forward: mean_abs=0.37360715866088867, max_abs=3.5, mean_rel=0.13168388605117798, max_rel=869.2069091796875, norm_rel=0.02056053653359413, ref_abs_avg=18.32671546936035, test_abs_avg=18.325803756713867
production_forward2 grad[94] vs paper_forward: mean_abs=0.369324266910553, max_abs=4.0, mean_rel=0.13220754265785217, max_rel=622.0945434570312, norm_rel=0.020420480519533157, ref_abs_avg=18.289546966552734, test_abs_avg=18.294754028320312
production_forward2 grad[95] vs paper_forward: mean_abs=0.29212573170661926, max_abs=1.15625, mean_rel=0.09892033785581589, max_rel=11.459933280944824, norm_rel=0.02007126808166504, ref_abs_avg=14.817773818969727, test_abs_avg=14.806751251220703
production_forward2 grad[96] vs paper_forward: mean_abs=0.3485307991504669, max_abs=3.5, mean_rel=0.12117356061935425, max_rel=445.811279296875, norm_rel=0.019872494041919708, ref_abs_avg=17.805862426757812, test_abs_avg=17.80567169189453
production_forward2 grad[97] vs paper_forward: mean_abs=0.3374374210834503, max_abs=4.0, mean_rel=0.11729143559932709, max_rel=524.0704345703125, norm_rel=0.01907011866569519, ref_abs_avg=17.797786712646484, test_abs_avg=17.795705795288086
identity layers + randn queries
mean abs randn paper: 0.22265625
torch_compile_phases_forward fwd+bwd:  189.937 ms
torch_compile_phases_forward fwd-only: 36.597 ms
torch_compile_phases_forward bwd-only: 152.602 ms
torch_compile_phases_forward peak allocated: fwd=12.906 GiB, fwd+bwd=13.534 GiB
torch_compile_phases_forward peak reserved:  fwd=13.203 GiB, fwd+bwd=17.455 GiB
mean abs difference randn: 0.00171661376953125
mean relative difference randn: 0.0299072265625
production_forward2 fwd+bwd:  224.322 ms
production_forward2 fwd-only: 22.337 ms
production_forward2 bwd-only: 202.107 ms
production_forward2 peak allocated: fwd=37.925 GiB, fwd+bwd=41.304 GiB
production_forward2 peak reserved:  fwd=38.219 GiB, fwd+bwd=43.969 GiB
mean abs difference randn: 0.001708984375
mean relative difference randn: 0.02978515625
paper_forward fwd+bwd:  379.565 ms
paper_forward fwd-only: 85.699 ms
paper_forward bwd-only: 294.014 ms
paper_forward peak allocated: fwd=31.080 GiB, fwd+bwd=33.199 GiB
paper_forward peak reserved:  fwd=31.100 GiB, fwd+bwd=33.850 GiB
mean abs difference randn: 3.886222839355469e-05
mean relative difference randn: 0.000667572021484375
production_forward fwd+bwd:  112.023 ms
production_forward fwd-only: 20.501 ms
production_forward bwd-only: 91.692 ms
production_forward peak allocated: fwd=54.849 GiB, fwd+bwd=58.729 GiB
production_forward peak reserved:  fwd=55.111 GiB, fwd+bwd=58.736 GiB
mean abs difference randn: 0.001708984375
mean relative difference randn: 0.02978515625

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0017118918476626277, max_abs=0.0546875
production_forward grad[0] vs paper_forward: mean_abs=0.00900720339268446, max_abs=0.46875, mean_rel=0.07568597793579102, max_rel=111.76778411865234, norm_rel=0.02057279460132122, ref_abs_avg=0.4727647304534912, test_abs_avg=0.4727843999862671
production_forward grad[1] vs paper_forward: mean_abs=7.746265888214111, max_abs=64.0, mean_rel=0.1588873565196991, max_rel=304.6222229003906, norm_rel=0.020885657519102097, ref_abs_avg=331.90447998046875, test_abs_avg=331.8129577636719
production_forward grad[2] vs paper_forward: mean_abs=1.3443303108215332, max_abs=5.875, mean_rel=0.13398399949073792, max_rel=12.661304473876953, norm_rel=0.023967476561665535, ref_abs_avg=54.999725341796875, test_abs_avg=54.911659240722656
production_forward grad[3] vs paper_forward: mean_abs=1.6639503240585327, max_abs=11.0, mean_rel=0.17447268962860107, max_rel=1473.778076171875, norm_rel=0.02459891512989998, ref_abs_avg=67.93048095703125, test_abs_avg=67.93647766113281
production_forward grad[4] vs paper_forward: mean_abs=1.6157643795013428, max_abs=10.0, mean_rel=0.17498555779457092, max_rel=1360.448974609375, norm_rel=0.024315275251865387, ref_abs_avg=66.83785247802734, test_abs_avg=66.85940551757812
production_forward grad[5] vs paper_forward: mean_abs=1.0966949462890625, max_abs=4.625, mean_rel=0.11090415716171265, max_rel=11.110090255737305, norm_rel=0.023135555908083916, ref_abs_avg=48.83634948730469, test_abs_avg=48.92919158935547
production_forward grad[6] vs paper_forward: mean_abs=1.4440771341323853, max_abs=9.0, mean_rel=0.1749463975429535, max_rel=2415.770751953125, norm_rel=0.024297649040818214, ref_abs_avg=59.74573516845703, test_abs_avg=59.74812316894531
production_forward grad[7] vs paper_forward: mean_abs=1.4047811031341553, max_abs=8.5, mean_rel=0.16346773505210876, max_rel=1461.4534912109375, norm_rel=0.024016443639993668, ref_abs_avg=58.79291534423828, test_abs_avg=58.806129455566406
production_forward grad[8] vs paper_forward: mean_abs=1.0509028434753418, max_abs=4.5, mean_rel=0.09874039143323898, max_rel=7.853939533233643, norm_rel=0.02388026937842369, ref_abs_avg=45.0351448059082, test_abs_avg=45.09706497192383
production_forward grad[9] vs paper_forward: mean_abs=1.2963975667953491, max_abs=9.0, mean_rel=0.1758085936307907, max_rel=1254.976806640625, norm_rel=0.023966707289218903, ref_abs_avg=54.34791564941406, test_abs_avg=54.351783752441406
production_forward grad[10] vs paper_forward: mean_abs=1.2703852653503418, max_abs=8.5, mean_rel=0.15910476446151733, max_rel=1041.034912109375, norm_rel=0.02381289377808571, ref_abs_avg=53.60498046875, test_abs_avg=53.61521911621094
production_forward grad[11] vs paper_forward: mean_abs=1.0026850700378418, max_abs=4.0, mean_rel=0.09670130908489227, max_rel=3.10257887840271, norm_rel=0.02502855286002159, ref_abs_avg=39.19465255737305, test_abs_avg=39.15763854980469
production_forward grad[12] vs paper_forward: mean_abs=1.2067220211029053, max_abs=8.5, mean_rel=0.15778100490570068, max_rel=1784.9637451171875, norm_rel=0.02375195547938347, ref_abs_avg=51.07078170776367, test_abs_avg=51.07762145996094
production_forward grad[13] vs paper_forward: mean_abs=1.1728096008300781, max_abs=8.0, mean_rel=0.15658354759216309, max_rel=1306.40869140625, norm_rel=0.023517245426774025, ref_abs_avg=50.059730529785156, test_abs_avg=50.07291030883789
production_forward grad[14] vs paper_forward: mean_abs=0.8915228843688965, max_abs=5.0, mean_rel=0.13632315397262573, max_rel=13.904480934143066, norm_rel=0.0234335009008646, ref_abs_avg=39.0490837097168, test_abs_avg=39.145538330078125
production_forward grad[15] vs paper_forward: mean_abs=1.1172963380813599, max_abs=7.0, mean_rel=0.15776795148849487, max_rel=1050.6309814453125, norm_rel=0.023654354736208916, ref_abs_avg=47.4803352355957, test_abs_avg=47.487491607666016
production_forward grad[16] vs paper_forward: mean_abs=1.0912559032440186, max_abs=7.0, mean_rel=0.141679584980011, max_rel=547.2347412109375, norm_rel=0.023281270638108253, ref_abs_avg=47.117469787597656, test_abs_avg=47.12134552001953
production_forward grad[17] vs paper_forward: mean_abs=0.8378562927246094, max_abs=3.75, mean_rel=0.17388804256916046, max_rel=52.182369232177734, norm_rel=0.024135053157806396, ref_abs_avg=35.27251434326172, test_abs_avg=35.272403717041016
production_forward grad[18] vs paper_forward: mean_abs=1.0538654327392578, max_abs=6.5, mean_rel=0.1627412736415863, max_rel=2354.88134765625, norm_rel=0.023414986208081245, ref_abs_avg=45.22315216064453, test_abs_avg=45.229217529296875
production_forward grad[19] vs paper_forward: mean_abs=1.020456075668335, max_abs=6.125, mean_rel=0.15767456591129303, max_rel=1465.478271484375, norm_rel=0.023101305589079857, ref_abs_avg=44.43598937988281, test_abs_avg=44.44660949707031
production_forward grad[20] vs paper_forward: mean_abs=0.773956298828125, max_abs=2.75, mean_rel=0.31259608268737793, max_rel=89.34683227539062, norm_rel=0.02171679399907589, ref_abs_avg=35.95956039428711, test_abs_avg=35.96788787841797
production_forward grad[21] vs paper_forward: mean_abs=0.9982283115386963, max_abs=7.0, mean_rel=0.16845929622650146, max_rel=2266.458740234375, norm_rel=0.023231448605656624, ref_abs_avg=43.181480407714844, test_abs_avg=43.18466567993164
production_forward grad[22] vs paper_forward: mean_abs=0.9678919315338135, max_abs=6.5, mean_rel=0.17398616671562195, max_rel=1273.508544921875, norm_rel=0.02301735244691372, ref_abs_avg=42.261539459228516, test_abs_avg=42.26738739013672
production_forward grad[23] vs paper_forward: mean_abs=0.7752141952514648, max_abs=3.0, mean_rel=0.23175323009490967, max_rel=68.63565826416016, norm_rel=0.023118898272514343, ref_abs_avg=33.8761100769043, test_abs_avg=33.85158157348633
production_forward grad[24] vs paper_forward: mean_abs=0.9452148079872131, max_abs=5.5, mean_rel=0.16521897912025452, max_rel=1625.8936767578125, norm_rel=0.02311953902244568, ref_abs_avg=41.065887451171875, test_abs_avg=41.07262420654297
production_forward grad[25] vs paper_forward: mean_abs=0.9260151386260986, max_abs=6.0, mean_rel=0.15060865879058838, max_rel=1341.5311279296875, norm_rel=0.02282320149242878, ref_abs_avg=40.71256637573242, test_abs_avg=40.71629333496094
production_forward grad[26] vs paper_forward: mean_abs=0.9054069519042969, max_abs=3.5, mean_rel=0.08197882771492004, max_rel=6.653237342834473, norm_rel=0.024292388930916786, ref_abs_avg=37.00225067138672, test_abs_avg=37.0872802734375
production_forward grad[27] vs paper_forward: mean_abs=1.1110203266143799, max_abs=8.0, mean_rel=0.1697731614112854, max_rel=1030.4207763671875, norm_rel=0.02484036795794964, ref_abs_avg=44.903865814208984, test_abs_avg=44.90226745605469
production_forward grad[28] vs paper_forward: mean_abs=1.0897250175476074, max_abs=7.125, mean_rel=0.15000219643115997, max_rel=979.064697265625, norm_rel=0.024339163675904274, ref_abs_avg=44.98918151855469, test_abs_avg=44.986724853515625
production_forward grad[29] vs paper_forward: mean_abs=0.8455886840820312, max_abs=4.0, mean_rel=0.11138033866882324, max_rel=14.818947792053223, norm_rel=0.02552167885005474, ref_abs_avg=33.78529357910156, test_abs_avg=33.797813415527344
production_forward grad[30] vs paper_forward: mean_abs=1.0250489711761475, max_abs=6.5, mean_rel=0.1688893437385559, max_rel=1619.82275390625, norm_rel=0.02510429546236992, ref_abs_avg=40.97466278076172, test_abs_avg=40.97650146484375
production_forward grad[31] vs paper_forward: mean_abs=1.0016039609909058, max_abs=7.5, mean_rel=0.16088563203811646, max_rel=1023.1949462890625, norm_rel=0.024822810664772987, ref_abs_avg=40.57091522216797, test_abs_avg=40.57269287109375
production_forward grad[32] vs paper_forward: mean_abs=0.741795539855957, max_abs=2.5, mean_rel=0.13351009786128998, max_rel=14.520087242126465, norm_rel=0.023421604186296463, ref_abs_avg=30.866413116455078, test_abs_avg=30.867549896240234
production_forward grad[33] vs paper_forward: mean_abs=0.9462555050849915, max_abs=6.5, mean_rel=0.1794344186782837, max_rel=1172.197021484375, norm_rel=0.024945346638560295, ref_abs_avg=38.071495056152344, test_abs_avg=38.074684143066406
production_forward grad[34] vs paper_forward: mean_abs=0.9406685829162598, max_abs=5.75, mean_rel=0.16846778988838196, max_rel=1515.8720703125, norm_rel=0.02496752142906189, ref_abs_avg=37.81858825683594, test_abs_avg=37.81977844238281
production_forward grad[35] vs paper_forward: mean_abs=0.7131366729736328, max_abs=3.125, mean_rel=0.07831543684005737, max_rel=5.603754043579102, norm_rel=0.0241553895175457, ref_abs_avg=29.77979850769043, test_abs_avg=29.767253875732422
production_forward grad[36] vs paper_forward: mean_abs=0.8906649351119995, max_abs=6.0, mean_rel=0.16896766424179077, max_rel=2447.5322265625, norm_rel=0.024647967889904976, ref_abs_avg=36.25749969482422, test_abs_avg=36.25933837890625
production_forward grad[37] vs paper_forward: mean_abs=0.8754671216011047, max_abs=5.6875, mean_rel=0.16384518146514893, max_rel=1659.71240234375, norm_rel=0.024524347856640816, ref_abs_avg=35.81695556640625, test_abs_avg=35.81757354736328
production_forward grad[38] vs paper_forward: mean_abs=0.6647720336914062, max_abs=2.5, mean_rel=0.06853547692298889, max_rel=5.708930015563965, norm_rel=0.02236538752913475, ref_abs_avg=29.83633041381836, test_abs_avg=29.82992172241211
production_forward grad[39] vs paper_forward: mean_abs=0.845556378364563, max_abs=6.0, mean_rel=0.16530711948871613, max_rel=2931.646728515625, norm_rel=0.024425702169537544, ref_abs_avg=34.69029235839844, test_abs_avg=34.69134521484375
production_forward grad[40] vs paper_forward: mean_abs=0.8304583430290222, max_abs=5.0, mean_rel=0.1736600697040558, max_rel=1407.4208984375, norm_rel=0.024136481806635857, ref_abs_avg=34.48514175415039, test_abs_avg=34.48270034790039
production_forward grad[41] vs paper_forward: mean_abs=0.6403541564941406, max_abs=2.875, mean_rel=0.058443453162908554, max_rel=1.734724521636963, norm_rel=0.024609755724668503, ref_abs_avg=26.938446044921875, test_abs_avg=26.956676483154297
production_forward grad[42] vs paper_forward: mean_abs=0.7972057461738586, max_abs=5.0, mean_rel=0.1586676836013794, max_rel=1004.7091674804688, norm_rel=0.02404080331325531, ref_abs_avg=33.20514678955078, test_abs_avg=33.20600891113281
production_forward grad[43] vs paper_forward: mean_abs=0.7860814929008484, max_abs=5.0, mean_rel=0.17497478425502777, max_rel=2194.807373046875, norm_rel=0.023964902386069298, ref_abs_avg=32.886253356933594, test_abs_avg=32.880126953125
production_forward grad[44] vs paper_forward: mean_abs=0.6219282150268555, max_abs=2.75, mean_rel=0.1382303237915039, max_rel=21.98900032043457, norm_rel=0.024426745250821114, ref_abs_avg=25.886117935180664, test_abs_avg=25.889842987060547
production_forward grad[45] vs paper_forward: mean_abs=0.7546788454055786, max_abs=5.0, mean_rel=0.1551181674003601, max_rel=1645.61376953125, norm_rel=0.023856380954384804, ref_abs_avg=31.66620635986328, test_abs_avg=31.667381286621094
production_forward grad[46] vs paper_forward: mean_abs=0.742209792137146, max_abs=5.0, mean_rel=0.17386028170585632, max_rel=2247.33154296875, norm_rel=0.023516952991485596, ref_abs_avg=31.598007202148438, test_abs_avg=31.60297203063965
production_forward grad[47] vs paper_forward: mean_abs=0.5634386539459229, max_abs=2.0, mean_rel=0.07111141085624695, max_rel=3.218310594558716, norm_rel=0.022819319739937782, ref_abs_avg=24.687644958496094, test_abs_avg=24.65561294555664
production_forward grad[48] vs paper_forward: mean_abs=0.7193567752838135, max_abs=6.0, mean_rel=0.16116270422935486, max_rel=1502.8143310546875, norm_rel=0.023609371855854988, ref_abs_avg=30.533193588256836, test_abs_avg=30.534914016723633
production_forward grad[49] vs paper_forward: mean_abs=0.706771969795227, max_abs=5.0, mean_rel=0.16812951862812042, max_rel=1804.6417236328125, norm_rel=0.023513872176408768, ref_abs_avg=30.146224975585938, test_abs_avg=30.140655517578125
production_forward grad[50] vs paper_forward: mean_abs=0.6544075012207031, max_abs=2.625, mean_rel=0.10013020038604736, max_rel=6.084485054016113, norm_rel=0.02616993710398674, ref_abs_avg=25.389251708984375, test_abs_avg=25.405620574951172
production_forward grad[51] vs paper_forward: mean_abs=0.8092025518417358, max_abs=5.5, mean_rel=0.17039403319358826, max_rel=1895.912841796875, norm_rel=0.025223679840564728, ref_abs_avg=32.12438201904297, test_abs_avg=32.12343215942383
production_forward grad[52] vs paper_forward: mean_abs=0.7904027700424194, max_abs=6.0, mean_rel=0.1677989661693573, max_rel=1275.562744140625, norm_rel=0.025098111480474472, ref_abs_avg=31.645479202270508, test_abs_avg=31.641260147094727
production_forward grad[53] vs paper_forward: mean_abs=0.5883307456970215, max_abs=2.69140625, mean_rel=0.10222302377223969, max_rel=10.646032333374023, norm_rel=0.02436867542564869, ref_abs_avg=24.996501922607422, test_abs_avg=25.010604858398438
production_forward grad[54] vs paper_forward: mean_abs=0.7461695671081543, max_abs=5.5, mean_rel=0.1561291217803955, max_rel=862.1708984375, norm_rel=0.024958336725831032, ref_abs_avg=29.957948684692383, test_abs_avg=29.95876693725586
production_forward grad[55] vs paper_forward: mean_abs=0.7312613725662231, max_abs=6.0, mean_rel=0.1601293981075287, max_rel=728.9967041015625, norm_rel=0.024612581357359886, ref_abs_avg=29.768470764160156, test_abs_avg=29.770790100097656
production_forward grad[56] vs paper_forward: mean_abs=0.5561013221740723, max_abs=2.375, mean_rel=0.08445636928081512, max_rel=2.642897367477417, norm_rel=0.024283621460199356, ref_abs_avg=22.924423217773438, test_abs_avg=22.889394760131836
production_forward grad[57] vs paper_forward: mean_abs=0.6891747713088989, max_abs=5.0625, mean_rel=0.1613612174987793, max_rel=717.12744140625, norm_rel=0.024301879107952118, ref_abs_avg=28.367578506469727, test_abs_avg=28.366355895996094
production_forward grad[58] vs paper_forward: mean_abs=0.6815434098243713, max_abs=4.375, mean_rel=0.16503876447677612, max_rel=1053.8597412109375, norm_rel=0.024704450741410255, ref_abs_avg=27.697471618652344, test_abs_avg=27.700349807739258
production_forward grad[59] vs paper_forward: mean_abs=0.5262674689292908, max_abs=2.0, mean_rel=0.15749382972717285, max_rel=12.694469451904297, norm_rel=0.023478887975215912, ref_abs_avg=22.005126953125, test_abs_avg=22.009639739990234
production_forward grad[60] vs paper_forward: mean_abs=0.6527919769287109, max_abs=4.625, mean_rel=0.169001966714859, max_rel=1413.78076171875, norm_rel=0.023921901360154152, ref_abs_avg=27.269729614257812, test_abs_avg=27.26673126220703
production_forward grad[61] vs paper_forward: mean_abs=0.6390389204025269, max_abs=4.0, mean_rel=0.16837894916534424, max_rel=893.6511840820312, norm_rel=0.023795364424586296, ref_abs_avg=26.889217376708984, test_abs_avg=26.888994216918945
production_forward grad[62] vs paper_forward: mean_abs=0.48467540740966797, max_abs=2.25, mean_rel=0.06814969331026077, max_rel=3.6846370697021484, norm_rel=0.02191629260778427, ref_abs_avg=22.665374755859375, test_abs_avg=22.667112350463867
production_forward grad[63] vs paper_forward: mean_abs=0.6175565719604492, max_abs=4.5, mean_rel=0.15314853191375732, max_rel=1017.7073364257812, norm_rel=0.023599771782755852, ref_abs_avg=26.19078826904297, test_abs_avg=26.19192123413086
production_forward grad[64] vs paper_forward: mean_abs=0.5996377468109131, max_abs=4.0, mean_rel=0.15139150619506836, max_rel=1113.5703125, norm_rel=0.02317783236503601, ref_abs_avg=25.907398223876953, test_abs_avg=25.907377243041992
production_forward grad[65] vs paper_forward: mean_abs=0.45924901962280273, max_abs=1.9375, mean_rel=0.14152266085147858, max_rel=24.616832733154297, norm_rel=0.02164587192237377, ref_abs_avg=20.96725082397461, test_abs_avg=21.018333435058594
production_forward grad[66] vs paper_forward: mean_abs=0.5846621990203857, max_abs=4.5, mean_rel=0.14576447010040283, max_rel=759.160888671875, norm_rel=0.02324272319674492, ref_abs_avg=25.098064422607422, test_abs_avg=25.09908676147461
production_forward grad[67] vs paper_forward: mean_abs=0.5758174061775208, max_abs=4.5, mean_rel=0.14668050408363342, max_rel=616.0219116210938, norm_rel=0.023265263065695763, ref_abs_avg=24.791545867919922, test_abs_avg=24.78633689880371
production_forward grad[68] vs paper_forward: mean_abs=0.44878578186035156, max_abs=2.125, mean_rel=0.05713620036840439, max_rel=1.7430853843688965, norm_rel=0.022399183362722397, ref_abs_avg=20.585792541503906, test_abs_avg=20.567873001098633
production_forward grad[69] vs paper_forward: mean_abs=0.5513355731964111, max_abs=4.0, mean_rel=0.14876678586006165, max_rel=991.82275390625, norm_rel=0.02270241267979145, ref_abs_avg=24.24660301208496, test_abs_avg=24.248035430908203
production_forward grad[70] vs paper_forward: mean_abs=0.5403286814689636, max_abs=3.5, mean_rel=0.13642370700836182, max_rel=382.4134826660156, norm_rel=0.02260124497115612, ref_abs_avg=23.940311431884766, test_abs_avg=23.9354305267334
production_forward grad[71] vs paper_forward: mean_abs=0.4161806106567383, max_abs=1.9375, mean_rel=0.13549090921878815, max_rel=20.107559204101562, norm_rel=0.020669585093855858, ref_abs_avg=20.135738372802734, test_abs_avg=20.137266159057617
production_forward grad[72] vs paper_forward: mean_abs=0.5363340377807617, max_abs=4.5, mean_rel=0.14362430572509766, max_rel=1416.6419677734375, norm_rel=0.022117406129837036, ref_abs_avg=24.192882537841797, test_abs_avg=24.192951202392578
production_forward grad[73] vs paper_forward: mean_abs=0.5150616765022278, max_abs=3.515625, mean_rel=0.14280050992965698, max_rel=1204.6839599609375, norm_rel=0.02177439257502556, ref_abs_avg=23.679296493530273, test_abs_avg=23.676895141601562
production_forward grad[74] vs paper_forward: mean_abs=0.4714531898498535, max_abs=2.0, mean_rel=0.1239682137966156, max_rel=7.788308143615723, norm_rel=0.02253706380724907, ref_abs_avg=21.152355194091797, test_abs_avg=21.11377716064453
production_forward grad[75] vs paper_forward: mean_abs=0.6041897535324097, max_abs=5.0, mean_rel=0.1581534743309021, max_rel=1057.321533203125, norm_rel=0.02398277260363102, ref_abs_avg=25.1949462890625, test_abs_avg=25.192119598388672
production_forward grad[76] vs paper_forward: mean_abs=0.5950031876564026, max_abs=4.0, mean_rel=0.15863916277885437, max_rel=699.7928466796875, norm_rel=0.02384006790816784, ref_abs_avg=25.08181381225586, test_abs_avg=25.076942443847656
production_forward grad[77] vs paper_forward: mean_abs=0.4431314468383789, max_abs=1.8046875, mean_rel=0.071295827627182, max_rel=2.2326500415802, norm_rel=0.02163749374449253, ref_abs_avg=20.36173439025879, test_abs_avg=20.344295501708984
production_forward grad[78] vs paper_forward: mean_abs=0.5562570095062256, max_abs=4.75, mean_rel=0.14853911101818085, max_rel=981.5955200195312, norm_rel=0.02316214144229889, ref_abs_avg=24.00165367126465, test_abs_avg=24.000524520874023
production_forward grad[79] vs paper_forward: mean_abs=0.5342291593551636, max_abs=3.9375, mean_rel=0.14937323331832886, max_rel=1015.1168212890625, norm_rel=0.022713547572493553, ref_abs_avg=23.482269287109375, test_abs_avg=23.482959747314453
production_forward grad[80] vs paper_forward: mean_abs=0.42191386222839355, max_abs=1.875, mean_rel=0.08073312044143677, max_rel=8.025455474853516, norm_rel=0.021808147430419922, ref_abs_avg=19.877418518066406, test_abs_avg=19.845230102539062
production_forward grad[81] vs paper_forward: mean_abs=0.5140371322631836, max_abs=5.0, mean_rel=0.14752253890037537, max_rel=1014.2717895507812, norm_rel=0.022581515833735466, ref_abs_avg=22.714664459228516, test_abs_avg=22.714656829833984
production_forward grad[82] vs paper_forward: mean_abs=0.49697476625442505, max_abs=4.5, mean_rel=0.1492796093225479, max_rel=802.3889770507812, norm_rel=0.022211655974388123, ref_abs_avg=22.41864013671875, test_abs_avg=22.413970947265625
production_forward grad[83] vs paper_forward: mean_abs=0.35620689392089844, max_abs=1.3125, mean_rel=0.06770738959312439, max_rel=4.169416904449463, norm_rel=0.020767942070961, ref_abs_avg=17.5839786529541, test_abs_avg=17.61144256591797
production_forward grad[84] vs paper_forward: mean_abs=0.4732249081134796, max_abs=4.0, mean_rel=0.14127743244171143, max_rel=851.4767456054688, norm_rel=0.022111430764198303, ref_abs_avg=21.412174224853516, test_abs_avg=21.41321563720703
production_forward grad[85] vs paper_forward: mean_abs=0.4686809778213501, max_abs=3.75, mean_rel=0.14862501621246338, max_rel=741.7045288085938, norm_rel=0.022043365985155106, ref_abs_avg=21.319000244140625, test_abs_avg=21.31870460510254
production_forward grad[86] vs paper_forward: mean_abs=0.35583770275115967, max_abs=1.5625, mean_rel=0.15759696066379547, max_rel=31.411745071411133, norm_rel=0.021484818309545517, ref_abs_avg=16.809396743774414, test_abs_avg=16.820581436157227
production_forward grad[87] vs paper_forward: mean_abs=0.44702374935150146, max_abs=4.5, mean_rel=0.14014849066734314, max_rel=949.8075561523438, norm_rel=0.02156192436814308, ref_abs_avg=20.79501724243164, test_abs_avg=20.794008255004883
production_forward grad[88] vs paper_forward: mean_abs=0.44053101539611816, max_abs=3.40625, mean_rel=0.13512516021728516, max_rel=864.55908203125, norm_rel=0.02164207585155964, ref_abs_avg=20.52288055419922, test_abs_avg=20.51043701171875
production_forward grad[89] vs paper_forward: mean_abs=0.3328353762626648, max_abs=1.40625, mean_rel=0.09895352274179459, max_rel=16.14493751525879, norm_rel=0.02018318884074688, ref_abs_avg=16.637470245361328, test_abs_avg=16.63058090209961
production_forward grad[90] vs paper_forward: mean_abs=0.4174301326274872, max_abs=4.0, mean_rel=0.14039863646030426, max_rel=683.6094360351562, norm_rel=0.021075300872325897, ref_abs_avg=19.92449378967285, test_abs_avg=19.925186157226562
production_forward grad[91] vs paper_forward: mean_abs=0.4083600640296936, max_abs=3.5, mean_rel=0.1340174674987793, max_rel=537.5655517578125, norm_rel=0.020543351769447327, ref_abs_avg=20.103404998779297, test_abs_avg=20.092697143554688
production_forward grad[92] vs paper_forward: mean_abs=0.33650267124176025, max_abs=1.5, mean_rel=0.07747633755207062, max_rel=2.1233553886413574, norm_rel=0.020828111097216606, ref_abs_avg=16.562664031982422, test_abs_avg=16.57451820373535
production_forward grad[93] vs paper_forward: mean_abs=0.3968953788280487, max_abs=5.5, mean_rel=0.13131439685821533, max_rel=1206.83056640625, norm_rel=0.020461197942495346, ref_abs_avg=19.594707489013672, test_abs_avg=19.59369659423828
production_forward grad[94] vs paper_forward: mean_abs=0.3897797167301178, max_abs=3.0, mean_rel=0.12257330119609833, max_rel=441.15533447265625, norm_rel=0.020770691335201263, ref_abs_avg=18.970748901367188, test_abs_avg=18.96480941772461
production_forward grad[95] vs paper_forward: mean_abs=0.33864083886146545, max_abs=1.25, mean_rel=0.1817435920238495, max_rel=21.62746810913086, norm_rel=0.02119431458413601, ref_abs_avg=15.654851913452148, test_abs_avg=15.662538528442383
production_forward grad[96] vs paper_forward: mean_abs=0.382932186126709, max_abs=4.25, mean_rel=0.12083069235086441, max_rel=512.5667724609375, norm_rel=0.02003340795636177, ref_abs_avg=19.43062973022461, test_abs_avg=19.429391860961914
production_forward grad[97] vs paper_forward: mean_abs=0.3619166612625122, max_abs=3.5, mean_rel=0.12130562961101532, max_rel=610.9013671875, norm_rel=0.019170353189110756, ref_abs_avg=19.211219787597656, test_abs_avg=19.216392517089844
torch_compile_phases_forward vs paper_forward output: mean_abs=0.001714333426207304, max_abs=0.0546875
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.009027604013681412, max_abs=0.40625, mean_rel=0.07574637234210968, max_rel=120.10314178466797, norm_rel=0.020615091547369957, ref_abs_avg=0.4727647304534912, test_abs_avg=0.4727703928947449
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.7350850105285645, max_abs=56.0, mean_rel=0.1762937605381012, max_rel=651.8797607421875, norm_rel=0.020903130993247032, ref_abs_avg=331.90447998046875, test_abs_avg=331.9046630859375
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.3292741775512695, max_abs=5.5, mean_rel=0.13743233680725098, max_rel=11.600671768188477, norm_rel=0.023874975740909576, ref_abs_avg=54.999725341796875, test_abs_avg=54.98540496826172
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.6646409034729004, max_abs=11.0, mean_rel=0.1800263524055481, max_rel=2844.029052734375, norm_rel=0.024596620351076126, ref_abs_avg=67.93048095703125, test_abs_avg=67.93458557128906
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.6174525022506714, max_abs=11.0, mean_rel=0.1752489060163498, max_rel=1377.973388671875, norm_rel=0.02433571219444275, ref_abs_avg=66.83785247802734, test_abs_avg=66.84969329833984
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.1493339538574219, max_abs=4.375, mean_rel=0.10926425457000732, max_rel=9.945446014404297, norm_rel=0.02345132827758789, ref_abs_avg=48.83634948730469, test_abs_avg=48.94424819946289
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.450057029724121, max_abs=9.0, mean_rel=0.17058424651622772, max_rel=1391.538330078125, norm_rel=0.024395117536187172, ref_abs_avg=59.74573516845703, test_abs_avg=59.74537658691406
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.4117121696472168, max_abs=8.8125, mean_rel=0.15975214540958405, max_rel=1684.798583984375, norm_rel=0.02412223070859909, ref_abs_avg=58.79291534423828, test_abs_avg=58.80707931518555
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0678215026855469, max_abs=5.25, mean_rel=0.10780076682567596, max_rel=7.547904014587402, norm_rel=0.023931698873639107, ref_abs_avg=45.0351448059082, test_abs_avg=45.0582389831543
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.3035571575164795, max_abs=8.5, mean_rel=0.1748751401901245, max_rel=1604.3477783203125, norm_rel=0.024096237495541573, ref_abs_avg=54.34791564941406, test_abs_avg=54.351356506347656
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.274622917175293, max_abs=8.5, mean_rel=0.1720348298549652, max_rel=1597.49462890625, norm_rel=0.023881610482931137, ref_abs_avg=53.60498046875, test_abs_avg=53.605560302734375
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9984560012817383, max_abs=4.5, mean_rel=0.10596738010644913, max_rel=5.865813255310059, norm_rel=0.024956872686743736, ref_abs_avg=39.19465255737305, test_abs_avg=39.20853042602539
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.212890863418579, max_abs=10.0, mean_rel=0.15916216373443604, max_rel=1448.1826171875, norm_rel=0.023871425539255142, ref_abs_avg=51.07078170776367, test_abs_avg=51.07759475708008
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1816585063934326, max_abs=8.0, mean_rel=0.15772072970867157, max_rel=597.488525390625, norm_rel=0.023694276809692383, ref_abs_avg=50.059730529785156, test_abs_avg=50.064788818359375
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8903234004974365, max_abs=3.75, mean_rel=0.1361745297908783, max_rel=18.58218765258789, norm_rel=0.023028599098324776, ref_abs_avg=39.0490837097168, test_abs_avg=39.10321807861328
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1214244365692139, max_abs=7.0, mean_rel=0.16054701805114746, max_rel=1145.45654296875, norm_rel=0.02372881770133972, ref_abs_avg=47.4803352355957, test_abs_avg=47.48598098754883
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.093309760093689, max_abs=7.5, mean_rel=0.14019781351089478, max_rel=1020.6281127929688, norm_rel=0.02333694137632847, ref_abs_avg=47.117469787597656, test_abs_avg=47.122154235839844
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8371286392211914, max_abs=3.25, mean_rel=0.1675175577402115, max_rel=46.996543884277344, norm_rel=0.02393811009824276, ref_abs_avg=35.27251434326172, test_abs_avg=35.32582092285156
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0583462715148926, max_abs=6.03125, mean_rel=0.16537359356880188, max_rel=2651.191162109375, norm_rel=0.02350628562271595, ref_abs_avg=45.22315216064453, test_abs_avg=45.226863861083984
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=1.0274893045425415, max_abs=6.5, mean_rel=0.1569368988275528, max_rel=843.9125366210938, norm_rel=0.023270195350050926, ref_abs_avg=44.43598937988281, test_abs_avg=44.4466552734375
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7606735229492188, max_abs=3.75, mean_rel=0.22492758929729462, max_rel=35.439247131347656, norm_rel=0.021716922521591187, ref_abs_avg=35.95956039428711, test_abs_avg=35.943538665771484
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=1.0014019012451172, max_abs=6.5, mean_rel=0.16286280751228333, max_rel=1756.484619140625, norm_rel=0.0233132503926754, ref_abs_avg=43.181480407714844, test_abs_avg=43.18360900878906
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9718739986419678, max_abs=6.0, mean_rel=0.16536389291286469, max_rel=1587.733154296875, norm_rel=0.023133059963583946, ref_abs_avg=42.261539459228516, test_abs_avg=42.26213836669922
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7488546371459961, max_abs=2.65625, mean_rel=0.14517126977443695, max_rel=34.10847854614258, norm_rel=0.022404877468943596, ref_abs_avg=33.8761100769043, test_abs_avg=33.81991195678711
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9505552649497986, max_abs=6.5, mean_rel=0.16324970126152039, max_rel=1244.4244384765625, norm_rel=0.023240862414240837, ref_abs_avg=41.065887451171875, test_abs_avg=41.07128143310547
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.9299346208572388, max_abs=6.0, mean_rel=0.15221408009529114, max_rel=1087.7591552734375, norm_rel=0.022920016199350357, ref_abs_avg=40.71256637573242, test_abs_avg=40.71290588378906
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8812580108642578, max_abs=3.875, mean_rel=0.07950696349143982, max_rel=5.88866662979126, norm_rel=0.02391606569290161, ref_abs_avg=37.00225067138672, test_abs_avg=37.06576156616211
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.111748218536377, max_abs=7.0, mean_rel=0.17096984386444092, max_rel=1071.9852294921875, norm_rel=0.024851009249687195, ref_abs_avg=44.903865814208984, test_abs_avg=44.89872360229492
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0939689874649048, max_abs=7.375, mean_rel=0.15048439800739288, max_rel=1870.29541015625, norm_rel=0.024439970031380653, ref_abs_avg=44.98918151855469, test_abs_avg=44.98213577270508
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.9077062606811523, max_abs=4.0, mean_rel=0.11896012723445892, max_rel=17.237102508544922, norm_rel=0.02675158530473709, ref_abs_avg=33.78529357910156, test_abs_avg=33.764808654785156
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.0288703441619873, max_abs=8.0, mean_rel=0.17333242297172546, max_rel=1564.017333984375, norm_rel=0.025194041430950165, ref_abs_avg=40.97466278076172, test_abs_avg=40.97374725341797
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=1.0049116611480713, max_abs=6.5, mean_rel=0.1598936766386032, max_rel=859.5343627929688, norm_rel=0.024904074147343636, ref_abs_avg=40.57091522216797, test_abs_avg=40.56768798828125
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7616491317749023, max_abs=2.8125, mean_rel=0.15118494629859924, max_rel=19.488880157470703, norm_rel=0.02408830262720585, ref_abs_avg=30.866413116455078, test_abs_avg=30.891836166381836
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9505424499511719, max_abs=7.0, mean_rel=0.1819654107093811, max_rel=1348.317138671875, norm_rel=0.02503969706594944, ref_abs_avg=38.071495056152344, test_abs_avg=38.07274627685547
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.9450397491455078, max_abs=6.375, mean_rel=0.16854345798492432, max_rel=1595.0450439453125, norm_rel=0.025093182921409607, ref_abs_avg=37.81858825683594, test_abs_avg=37.82060623168945
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.7174587249755859, max_abs=2.8125, mean_rel=0.07331874221563339, max_rel=3.4880924224853516, norm_rel=0.024219414219260216, ref_abs_avg=29.77979850769043, test_abs_avg=29.749065399169922
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8937131762504578, max_abs=5.5, mean_rel=0.16582083702087402, max_rel=1784.6405029296875, norm_rel=0.024736525490880013, ref_abs_avg=36.25749969482422, test_abs_avg=36.25779724121094
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8812074661254883, max_abs=6.4375, mean_rel=0.164411723613739, max_rel=2202.8486328125, norm_rel=0.024675024673342705, ref_abs_avg=35.81695556640625, test_abs_avg=35.81452178955078
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6613407135009766, max_abs=2.625, mean_rel=0.07300669699907303, max_rel=3.940446138381958, norm_rel=0.022123076021671295, ref_abs_avg=29.83633041381836, test_abs_avg=29.868844985961914
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8489878177642822, max_abs=5.75, mean_rel=0.17023813724517822, max_rel=2487.4658203125, norm_rel=0.024521267041563988, ref_abs_avg=34.69029235839844, test_abs_avg=34.69178009033203
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.8341690301895142, max_abs=5.0, mean_rel=0.17014718055725098, max_rel=1685.5576171875, norm_rel=0.024259360507130623, ref_abs_avg=34.48514175415039, test_abs_avg=34.47968673706055
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6408014297485352, max_abs=2.75, mean_rel=0.05625106021761894, max_rel=1.2480032444000244, norm_rel=0.024556541815400124, ref_abs_avg=26.938446044921875, test_abs_avg=26.915328979492188
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.8006391525268555, max_abs=5.3125, mean_rel=0.16737160086631775, max_rel=2165.924560546875, norm_rel=0.024154076352715492, ref_abs_avg=33.20514678955078, test_abs_avg=33.206214904785156
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7898025512695312, max_abs=5.0, mean_rel=0.1727312207221985, max_rel=2405.8369140625, norm_rel=0.02408755198121071, ref_abs_avg=32.886253356933594, test_abs_avg=32.881526947021484
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.6141171455383301, max_abs=2.375, mean_rel=0.12828823924064636, max_rel=17.082773208618164, norm_rel=0.023959465324878693, ref_abs_avg=25.886117935180664, test_abs_avg=25.88072967529297
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7577160596847534, max_abs=6.0, mean_rel=0.15553414821624756, max_rel=1774.2935791015625, norm_rel=0.02395673654973507, ref_abs_avg=31.66620635986328, test_abs_avg=31.667612075805664
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.745256245136261, max_abs=5.0, mean_rel=0.17341291904449463, max_rel=2704.84375, norm_rel=0.023628275841474533, ref_abs_avg=31.598007202148438, test_abs_avg=31.598787307739258
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5798497200012207, max_abs=2.46875, mean_rel=0.08492940664291382, max_rel=7.098235607147217, norm_rel=0.024049896746873856, ref_abs_avg=24.687644958496094, test_abs_avg=24.651840209960938
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.7219551801681519, max_abs=6.0, mean_rel=0.1646220088005066, max_rel=1133.925048828125, norm_rel=0.02369336225092411, ref_abs_avg=30.533193588256836, test_abs_avg=30.53316879272461
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.7088078260421753, max_abs=5.0, mean_rel=0.16570168733596802, max_rel=1779.7528076171875, norm_rel=0.023551829159259796, ref_abs_avg=30.146224975585938, test_abs_avg=30.139780044555664
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6429364681243896, max_abs=2.875, mean_rel=0.10143241286277771, max_rel=6.462721824645996, norm_rel=0.02536550536751747, ref_abs_avg=25.389251708984375, test_abs_avg=25.43012237548828
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.8086365461349487, max_abs=5.5, mean_rel=0.16990506649017334, max_rel=1815.24169921875, norm_rel=0.02521091140806675, ref_abs_avg=32.12438201904297, test_abs_avg=32.12233352661133
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.791991114616394, max_abs=5.5, mean_rel=0.16724449396133423, max_rel=986.290771484375, norm_rel=0.025141671299934387, ref_abs_avg=31.645479202270508, test_abs_avg=31.64054298400879
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5879421234130859, max_abs=2.5, mean_rel=0.09156873822212219, max_rel=6.673244476318359, norm_rel=0.024198384955525398, ref_abs_avg=24.996501922607422, test_abs_avg=25.0262393951416
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7487163543701172, max_abs=5.125, mean_rel=0.1565895974636078, max_rel=940.5580444335938, norm_rel=0.02503051981329918, ref_abs_avg=29.957948684692383, test_abs_avg=29.958404541015625
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.7324550151824951, max_abs=5.5, mean_rel=0.15979991853237152, max_rel=1199.3116455078125, norm_rel=0.024652060121297836, ref_abs_avg=29.768470764160156, test_abs_avg=29.768951416015625
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5600509643554688, max_abs=2.140625, mean_rel=0.08795332908630371, max_rel=2.237342357635498, norm_rel=0.024397142231464386, ref_abs_avg=22.924423217773438, test_abs_avg=22.902917861938477
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6909290552139282, max_abs=4.75, mean_rel=0.16226637363433838, max_rel=829.0682373046875, norm_rel=0.024356096982955933, ref_abs_avg=28.367578506469727, test_abs_avg=28.365562438964844
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6813136339187622, max_abs=5.375, mean_rel=0.16438226401805878, max_rel=939.7628173828125, norm_rel=0.0247187539935112, ref_abs_avg=27.697471618652344, test_abs_avg=27.698261260986328
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5545164346694946, max_abs=2.25, mean_rel=0.13849639892578125, max_rel=9.867301940917969, norm_rel=0.024776872247457504, ref_abs_avg=22.005126953125, test_abs_avg=22.031326293945312
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.655713677406311, max_abs=5.0, mean_rel=0.1699795424938202, max_rel=1014.48095703125, norm_rel=0.02402554824948311, ref_abs_avg=27.269729614257812, test_abs_avg=27.265657424926758
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.6410161256790161, max_abs=4.25, mean_rel=0.17424196004867554, max_rel=1418.7972412109375, norm_rel=0.023891137912869453, ref_abs_avg=26.889217376708984, test_abs_avg=26.88910675048828
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4882615804672241, max_abs=1.875, mean_rel=0.0625024288892746, max_rel=3.2343759536743164, norm_rel=0.02178899198770523, ref_abs_avg=22.665374755859375, test_abs_avg=22.662813186645508
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.6189886927604675, max_abs=4.375, mean_rel=0.1545235514640808, max_rel=1092.32958984375, norm_rel=0.02362983301281929, ref_abs_avg=26.19078826904297, test_abs_avg=26.19038963317871
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.605596125125885, max_abs=4.0, mean_rel=0.15039962530136108, max_rel=1051.3482666015625, norm_rel=0.023381318897008896, ref_abs_avg=25.907398223876953, test_abs_avg=25.907161712646484
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.45708322525024414, max_abs=2.3125, mean_rel=0.11567815393209457, max_rel=12.79150390625, norm_rel=0.021910924464464188, ref_abs_avg=20.96725082397461, test_abs_avg=21.01204490661621
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5862776041030884, max_abs=4.0, mean_rel=0.14501166343688965, max_rel=620.90771484375, norm_rel=0.023318171501159668, ref_abs_avg=25.098064422607422, test_abs_avg=25.098407745361328
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5769491195678711, max_abs=4.0, mean_rel=0.1504371166229248, max_rel=497.7224426269531, norm_rel=0.023298202082514763, ref_abs_avg=24.791545867919922, test_abs_avg=24.78775978088379
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.45444297790527344, max_abs=1.96875, mean_rel=0.06428346037864685, max_rel=1.8835678100585938, norm_rel=0.022007348015904427, ref_abs_avg=20.585792541503906, test_abs_avg=20.55695152282715
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5527141690254211, max_abs=4.5, mean_rel=0.14311769604682922, max_rel=1047.7459716796875, norm_rel=0.02275734208524227, ref_abs_avg=24.24660301208496, test_abs_avg=24.247615814208984
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5404207706451416, max_abs=3.5, mean_rel=0.13734832406044006, max_rel=548.5223388671875, norm_rel=0.02258470095694065, ref_abs_avg=23.940311431884766, test_abs_avg=23.940204620361328
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.4035797119140625, max_abs=1.75, mean_rel=0.12044569104909897, max_rel=10.541595458984375, norm_rel=0.020008442923426628, ref_abs_avg=20.135738372802734, test_abs_avg=20.134626388549805
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5378051996231079, max_abs=4.5, mean_rel=0.1416674554347992, max_rel=736.0449829101562, norm_rel=0.022181537002325058, ref_abs_avg=24.192882537841797, test_abs_avg=24.192440032958984
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.5160128474235535, max_abs=4.0, mean_rel=0.14212936162948608, max_rel=1214.69873046875, norm_rel=0.02179039642214775, ref_abs_avg=23.679296493530273, test_abs_avg=23.676944732666016
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.47484493255615234, max_abs=2.0, mean_rel=0.10595464706420898, max_rel=3.438995361328125, norm_rel=0.022685952484607697, ref_abs_avg=21.152355194091797, test_abs_avg=21.119651794433594
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.6018882989883423, max_abs=5.0, mean_rel=0.15881377458572388, max_rel=875.5128784179688, norm_rel=0.02388915792107582, ref_abs_avg=25.1949462890625, test_abs_avg=25.19205093383789
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5901859998703003, max_abs=4.0, mean_rel=0.15435759723186493, max_rel=546.8595581054688, norm_rel=0.023598214611411095, ref_abs_avg=25.08181381225586, test_abs_avg=25.077869415283203
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.44817543029785156, max_abs=1.75, mean_rel=0.06726320087909698, max_rel=2.2740347385406494, norm_rel=0.02157614938914776, ref_abs_avg=20.36173439025879, test_abs_avg=20.335201263427734
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5555885434150696, max_abs=4.5, mean_rel=0.14755409955978394, max_rel=700.8916015625, norm_rel=0.02314596436917782, ref_abs_avg=24.00165367126465, test_abs_avg=23.999393463134766
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.536107063293457, max_abs=4.0, mean_rel=0.1498740315437317, max_rel=1091.546142578125, norm_rel=0.022821171209216118, ref_abs_avg=23.482269287109375, test_abs_avg=23.483163833618164
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.4058351516723633, max_abs=1.625, mean_rel=0.07287608087062836, max_rel=7.124351978302002, norm_rel=0.021191027015447617, ref_abs_avg=19.877418518066406, test_abs_avg=19.840373992919922
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.5144147276878357, max_abs=6.0, mean_rel=0.14653819799423218, max_rel=1247.741943359375, norm_rel=0.02260642684996128, ref_abs_avg=22.714664459228516, test_abs_avg=22.713478088378906
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4973423480987549, max_abs=4.5, mean_rel=0.14690738916397095, max_rel=585.0101928710938, norm_rel=0.022246016189455986, ref_abs_avg=22.41864013671875, test_abs_avg=22.41596221923828
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.36640477180480957, max_abs=1.5, mean_rel=0.07717941701412201, max_rel=5.644973278045654, norm_rel=0.0217935498803854, ref_abs_avg=17.5839786529541, test_abs_avg=17.615779876708984
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.47353625297546387, max_abs=4.5, mean_rel=0.1431369185447693, max_rel=678.2994995117188, norm_rel=0.02213325724005699, ref_abs_avg=21.412174224853516, test_abs_avg=21.41246795654297
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.468191534280777, max_abs=3.75, mean_rel=0.14765207469463348, max_rel=770.193603515625, norm_rel=0.02199835702776909, ref_abs_avg=21.319000244140625, test_abs_avg=21.31255340576172
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.35734689235687256, max_abs=1.5, mean_rel=0.24991613626480103, max_rel=76.10772705078125, norm_rel=0.02186775766313076, ref_abs_avg=16.809396743774414, test_abs_avg=16.825443267822266
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4476637840270996, max_abs=4.0, mean_rel=0.13897395133972168, max_rel=829.0606079101562, norm_rel=0.02160019241273403, ref_abs_avg=20.79501724243164, test_abs_avg=20.7938232421875
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.4375903010368347, max_abs=3.5, mean_rel=0.13036063313484192, max_rel=819.6079711914062, norm_rel=0.02145383320748806, ref_abs_avg=20.52288055419922, test_abs_avg=20.512531280517578
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.334383487701416, max_abs=1.28125, mean_rel=0.11639514565467834, max_rel=23.831798553466797, norm_rel=0.02020089700818062, ref_abs_avg=16.637470245361328, test_abs_avg=16.630115509033203
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.4182833433151245, max_abs=4.0, mean_rel=0.1394902765750885, max_rel=783.4961547851562, norm_rel=0.021105194464325905, ref_abs_avg=19.92449378967285, test_abs_avg=19.925548553466797
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.4087583124637604, max_abs=3.875, mean_rel=0.13074517250061035, max_rel=690.0892944335938, norm_rel=0.020518707111477852, ref_abs_avg=20.103404998779297, test_abs_avg=20.093109130859375
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.3386284112930298, max_abs=1.359375, mean_rel=0.08430542796850204, max_rel=2.988140344619751, norm_rel=0.020580178126692772, ref_abs_avg=16.562664031982422, test_abs_avg=16.563602447509766
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.3978757858276367, max_abs=5.25, mean_rel=0.1343124359846115, max_rel=1080.858642578125, norm_rel=0.020510252565145493, ref_abs_avg=19.594707489013672, test_abs_avg=19.5943603515625
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.39237308502197266, max_abs=3.0625, mean_rel=0.1272791028022766, max_rel=505.9501037597656, norm_rel=0.020981112495064735, ref_abs_avg=18.970748901367188, test_abs_avg=18.96368980407715
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.3340798616409302, max_abs=1.1875, mean_rel=0.18995408713817596, max_rel=25.062759399414062, norm_rel=0.02096625789999962, ref_abs_avg=15.654851913452148, test_abs_avg=15.660478591918945
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.383289635181427, max_abs=4.625, mean_rel=0.1221085637807846, max_rel=428.7129821777344, norm_rel=0.020031005144119263, ref_abs_avg=19.43062973022461, test_abs_avg=19.42917251586914
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.36489301919937134, max_abs=3.25, mean_rel=0.12158484011888504, max_rel=477.9512023925781, norm_rel=0.019322901964187622, ref_abs_avg=19.211219787597656, test_abs_avg=19.217201232910156
production_forward2 vs paper_forward output: mean_abs=0.0017118918476626277, max_abs=0.0546875
production_forward2 grad[0] vs paper_forward: mean_abs=0.009015686810016632, max_abs=0.40625, mean_rel=0.07566383481025696, max_rel=120.10314178466797, norm_rel=0.020585672929883003, ref_abs_avg=0.4727647304534912, test_abs_avg=0.47277307510375977
production_forward2 grad[1] vs paper_forward: mean_abs=7.738522052764893, max_abs=64.0, mean_rel=0.17102250456809998, max_rel=582.1376342773438, norm_rel=0.020879121497273445, ref_abs_avg=331.90447998046875, test_abs_avg=331.831787109375
production_forward2 grad[2] vs paper_forward: mean_abs=1.3306055068969727, max_abs=5.125, mean_rel=0.1173756867647171, max_rel=8.64113998413086, norm_rel=0.024423573166131973, ref_abs_avg=54.999725341796875, test_abs_avg=54.93384552001953
production_forward2 grad[3] vs paper_forward: mean_abs=1.663783073425293, max_abs=10.75, mean_rel=0.1758909523487091, max_rel=2484.78857421875, norm_rel=0.02458854578435421, ref_abs_avg=67.93048095703125, test_abs_avg=67.93484497070312
production_forward2 grad[4] vs paper_forward: mean_abs=1.6193928718566895, max_abs=9.5, mean_rel=0.176601842045784, max_rel=2271.807861328125, norm_rel=0.024351589381694794, ref_abs_avg=66.83785247802734, test_abs_avg=66.85281372070312
production_forward2 grad[5] vs paper_forward: mean_abs=1.194345474243164, max_abs=4.5, mean_rel=0.12407053261995316, max_rel=11.048792839050293, norm_rel=0.02420974336564541, ref_abs_avg=48.83634948730469, test_abs_avg=48.8878173828125
production_forward2 grad[6] vs paper_forward: mean_abs=1.4474997520446777, max_abs=9.0, mean_rel=0.17068226635456085, max_rel=2545.9658203125, norm_rel=0.024349702522158623, ref_abs_avg=59.74573516845703, test_abs_avg=59.745628356933594
production_forward2 grad[7] vs paper_forward: mean_abs=1.4114221334457397, max_abs=9.0, mean_rel=0.16148269176483154, max_rel=1917.44970703125, norm_rel=0.02413664571940899, ref_abs_avg=58.79291534423828, test_abs_avg=58.804161071777344
production_forward2 grad[8] vs paper_forward: mean_abs=1.0307774543762207, max_abs=4.23828125, mean_rel=0.11158233135938644, max_rel=8.537466049194336, norm_rel=0.023358378559350967, ref_abs_avg=45.0351448059082, test_abs_avg=45.07951736450195
production_forward2 grad[9] vs paper_forward: mean_abs=1.30164635181427, max_abs=9.0, mean_rel=0.1754688173532486, max_rel=1958.8599853515625, norm_rel=0.024062853306531906, ref_abs_avg=54.34791564941406, test_abs_avg=54.35072326660156
production_forward2 grad[10] vs paper_forward: mean_abs=1.2723960876464844, max_abs=9.5, mean_rel=0.1651953160762787, max_rel=1233.655517578125, norm_rel=0.02386355958878994, ref_abs_avg=53.60498046875, test_abs_avg=53.613243103027344
production_forward2 grad[11] vs paper_forward: mean_abs=0.9450187683105469, max_abs=4.0, mean_rel=0.09297239780426025, max_rel=4.653868675231934, norm_rel=0.023816430941224098, ref_abs_avg=39.19465255737305, test_abs_avg=39.13725280761719
production_forward2 grad[12] vs paper_forward: mean_abs=1.209383249282837, max_abs=9.5, mean_rel=0.159703329205513, max_rel=1277.893310546875, norm_rel=0.02380264177918434, ref_abs_avg=51.07078170776367, test_abs_avg=51.078311920166016
production_forward2 grad[13] vs paper_forward: mean_abs=1.179181694984436, max_abs=7.0, mean_rel=0.16322533786296844, max_rel=1020.6752319335938, norm_rel=0.02365582436323166, ref_abs_avg=50.059730529785156, test_abs_avg=50.066993713378906
production_forward2 grad[14] vs paper_forward: mean_abs=0.89569091796875, max_abs=5.0, mean_rel=0.12692973017692566, max_rel=15.588455200195312, norm_rel=0.02364567667245865, ref_abs_avg=39.0490837097168, test_abs_avg=39.131221771240234
production_forward2 grad[15] vs paper_forward: mean_abs=1.1214317083358765, max_abs=7.4375, mean_rel=0.16382408142089844, max_rel=1280.49755859375, norm_rel=0.023730559274554253, ref_abs_avg=47.4803352355957, test_abs_avg=47.48607635498047
production_forward2 grad[16] vs paper_forward: mean_abs=1.0948201417922974, max_abs=7.0, mean_rel=0.1508149951696396, max_rel=1205.4107666015625, norm_rel=0.023354806005954742, ref_abs_avg=47.117469787597656, test_abs_avg=47.12067413330078
production_forward2 grad[17] vs paper_forward: mean_abs=0.8182125091552734, max_abs=3.5, mean_rel=0.18433313071727753, max_rel=56.719966888427734, norm_rel=0.023760149255394936, ref_abs_avg=35.27251434326172, test_abs_avg=35.28501892089844
production_forward2 grad[18] vs paper_forward: mean_abs=1.0573947429656982, max_abs=7.0, mean_rel=0.16508415341377258, max_rel=2276.905029296875, norm_rel=0.023482657968997955, ref_abs_avg=45.22315216064453, test_abs_avg=45.227081298828125
production_forward2 grad[19] vs paper_forward: mean_abs=1.0216209888458252, max_abs=7.125, mean_rel=0.1581924706697464, max_rel=1097.400146484375, norm_rel=0.02313418872654438, ref_abs_avg=44.43598937988281, test_abs_avg=44.443946838378906
production_forward2 grad[20] vs paper_forward: mean_abs=0.7886714935302734, max_abs=3.5, mean_rel=0.29679402709007263, max_rel=76.22647094726562, norm_rel=0.021956108510494232, ref_abs_avg=35.95956039428711, test_abs_avg=35.980857849121094
production_forward2 grad[21] vs paper_forward: mean_abs=1.000573992729187, max_abs=7.0, mean_rel=0.1656365990638733, max_rel=2252.29296875, norm_rel=0.023289158940315247, ref_abs_avg=43.181480407714844, test_abs_avg=43.18423843383789
production_forward2 grad[22] vs paper_forward: mean_abs=0.9726550579071045, max_abs=5.5, mean_rel=0.16733801364898682, max_rel=1375.2860107421875, norm_rel=0.023120485246181488, ref_abs_avg=42.261539459228516, test_abs_avg=42.26739501953125
production_forward2 grad[23] vs paper_forward: mean_abs=0.778968334197998, max_abs=3.0, mean_rel=0.12617284059524536, max_rel=21.09636688232422, norm_rel=0.02332163229584694, ref_abs_avg=33.8761100769043, test_abs_avg=33.85091781616211
production_forward2 grad[24] vs paper_forward: mean_abs=0.9486657977104187, max_abs=6.0, mean_rel=0.1604825258255005, max_rel=1649.7626953125, norm_rel=0.023208050057291985, ref_abs_avg=41.065887451171875, test_abs_avg=41.07221984863281
production_forward2 grad[25] vs paper_forward: mean_abs=0.9295334219932556, max_abs=5.375, mean_rel=0.14948120713233948, max_rel=988.8310546875, norm_rel=0.022916538640856743, ref_abs_avg=40.71256637573242, test_abs_avg=40.71446228027344
production_forward2 grad[26] vs paper_forward: mean_abs=0.8970298767089844, max_abs=3.0, mean_rel=0.07033975422382355, max_rel=1.5296040773391724, norm_rel=0.02414686046540737, ref_abs_avg=37.00225067138672, test_abs_avg=37.05279541015625
production_forward2 grad[27] vs paper_forward: mean_abs=1.109543800354004, max_abs=7.25, mean_rel=0.1703532636165619, max_rel=1187.0380859375, norm_rel=0.024797165766358376, ref_abs_avg=44.903865814208984, test_abs_avg=44.90058898925781
production_forward2 grad[28] vs paper_forward: mean_abs=1.0886139869689941, max_abs=8.0, mean_rel=0.15365874767303467, max_rel=1645.48046875, norm_rel=0.02432314120233059, ref_abs_avg=44.98918151855469, test_abs_avg=44.98219680786133
production_forward2 grad[29] vs paper_forward: mean_abs=0.8964157104492188, max_abs=3.75, mean_rel=0.11520421504974365, max_rel=16.245038986206055, norm_rel=0.02671283297240734, ref_abs_avg=33.78529357910156, test_abs_avg=33.80224609375
production_forward2 grad[30] vs paper_forward: mean_abs=1.0270717144012451, max_abs=7.0, mean_rel=0.16991624236106873, max_rel=1549.4677734375, norm_rel=0.025143707171082497, ref_abs_avg=40.97466278076172, test_abs_avg=40.97598648071289
production_forward2 grad[31] vs paper_forward: mean_abs=1.0005509853363037, max_abs=7.0, mean_rel=0.16555583477020264, max_rel=925.6950073242188, norm_rel=0.024797866120934486, ref_abs_avg=40.57091522216797, test_abs_avg=40.571319580078125
production_forward2 grad[32] vs paper_forward: mean_abs=0.7468461990356445, max_abs=2.90625, mean_rel=0.1275666505098343, max_rel=14.762467384338379, norm_rel=0.023612666875123978, ref_abs_avg=30.866413116455078, test_abs_avg=30.862770080566406
production_forward2 grad[33] vs paper_forward: mean_abs=0.9477847814559937, max_abs=6.5, mean_rel=0.18260923027992249, max_rel=1362.483642578125, norm_rel=0.024980971589684486, ref_abs_avg=38.071495056152344, test_abs_avg=38.07384490966797
production_forward2 grad[34] vs paper_forward: mean_abs=0.9432083368301392, max_abs=6.125, mean_rel=0.16783463954925537, max_rel=1193.525146484375, norm_rel=0.025021184235811234, ref_abs_avg=37.81858825683594, test_abs_avg=37.81929016113281
production_forward2 grad[35] vs paper_forward: mean_abs=0.6928482055664062, max_abs=3.0625, mean_rel=0.07081849873065948, max_rel=2.295668125152588, norm_rel=0.02358190342783928, ref_abs_avg=29.77979850769043, test_abs_avg=29.757905960083008
production_forward2 grad[36] vs paper_forward: mean_abs=0.8921729922294617, max_abs=5.5625, mean_rel=0.16535328328609467, max_rel=1923.0465087890625, norm_rel=0.02469741925597191, ref_abs_avg=36.25749969482422, test_abs_avg=36.25896453857422
production_forward2 grad[37] vs paper_forward: mean_abs=0.879427969455719, max_abs=6.3125, mean_rel=0.16476111114025116, max_rel=1847.223876953125, norm_rel=0.024630310013890266, ref_abs_avg=35.81695556640625, test_abs_avg=35.818302154541016
production_forward2 grad[38] vs paper_forward: mean_abs=0.6651840209960938, max_abs=2.1875, mean_rel=0.07474178820848465, max_rel=5.067364692687988, norm_rel=0.022109467536211014, ref_abs_avg=29.83633041381836, test_abs_avg=29.858753204345703
production_forward2 grad[39] vs paper_forward: mean_abs=0.8468914031982422, max_abs=6.0, mean_rel=0.16774067282676697, max_rel=3449.857666015625, norm_rel=0.02445914037525654, ref_abs_avg=34.69029235839844, test_abs_avg=34.69196319580078
production_forward2 grad[40] vs paper_forward: mean_abs=0.8336002826690674, max_abs=5.0, mean_rel=0.17355936765670776, max_rel=1563.3475341796875, norm_rel=0.024246282875537872, ref_abs_avg=34.48514175415039, test_abs_avg=34.48381805419922
production_forward2 grad[41] vs paper_forward: mean_abs=0.6447830200195312, max_abs=2.625, mean_rel=0.05870372802019119, max_rel=1.5225639343261719, norm_rel=0.02443866617977619, ref_abs_avg=26.938446044921875, test_abs_avg=26.94519805908203
production_forward2 grad[42] vs paper_forward: mean_abs=0.7993409633636475, max_abs=5.0, mean_rel=0.15962067246437073, max_rel=1219.716064453125, norm_rel=0.024103758856654167, ref_abs_avg=33.20514678955078, test_abs_avg=33.205833435058594
production_forward2 grad[43] vs paper_forward: mean_abs=0.7888743877410889, max_abs=5.0, mean_rel=0.16981491446495056, max_rel=2335.49365234375, norm_rel=0.024039141833782196, ref_abs_avg=32.886253356933594, test_abs_avg=32.879520416259766
production_forward2 grad[44] vs paper_forward: mean_abs=0.6139178276062012, max_abs=2.375, mean_rel=0.09607372432947159, max_rel=7.281312942504883, norm_rel=0.02398490346968174, ref_abs_avg=25.886117935180664, test_abs_avg=25.880712509155273
production_forward2 grad[45] vs paper_forward: mean_abs=0.7561733722686768, max_abs=6.0, mean_rel=0.15251237154006958, max_rel=1571.114990234375, norm_rel=0.02390861324965954, ref_abs_avg=31.66620635986328, test_abs_avg=31.66762924194336
production_forward2 grad[46] vs paper_forward: mean_abs=0.7436445951461792, max_abs=5.5, mean_rel=0.17780287563800812, max_rel=2583.737548828125, norm_rel=0.0235687717795372, ref_abs_avg=31.598007202148438, test_abs_avg=31.601728439331055
production_forward2 grad[47] vs paper_forward: mean_abs=0.5649616718292236, max_abs=2.40625, mean_rel=0.09879224002361298, max_rel=17.67521095275879, norm_rel=0.023211507126688957, ref_abs_avg=24.687644958496094, test_abs_avg=24.66304588317871
production_forward2 grad[48] vs paper_forward: mean_abs=0.7201599478721619, max_abs=6.0, mean_rel=0.1642519235610962, max_rel=1382.92529296875, norm_rel=0.023624898865818977, ref_abs_avg=30.533193588256836, test_abs_avg=30.53461265563965
production_forward2 grad[49] vs paper_forward: mean_abs=0.7072117924690247, max_abs=5.0, mean_rel=0.16794747114181519, max_rel=1929.085693359375, norm_rel=0.02351406216621399, ref_abs_avg=30.146224975585938, test_abs_avg=30.13925552368164
production_forward2 grad[50] vs paper_forward: mean_abs=0.6357145309448242, max_abs=2.75, mean_rel=0.09605381637811661, max_rel=6.390836715698242, norm_rel=0.02522914484143257, ref_abs_avg=25.389251708984375, test_abs_avg=25.423259735107422
production_forward2 grad[51] vs paper_forward: mean_abs=0.8066693544387817, max_abs=5.25, mean_rel=0.16878804564476013, max_rel=1828.6868896484375, norm_rel=0.025138167664408684, ref_abs_avg=32.12438201904297, test_abs_avg=32.122764587402344
production_forward2 grad[52] vs paper_forward: mean_abs=0.7894741296768188, max_abs=5.625, mean_rel=0.16962939500808716, max_rel=1099.730712890625, norm_rel=0.025062352418899536, ref_abs_avg=31.645479202270508, test_abs_avg=31.639921188354492
production_forward2 grad[53] vs paper_forward: mean_abs=0.5938301086425781, max_abs=2.3984375, mean_rel=0.09718525409698486, max_rel=9.60739517211914, norm_rel=0.02443116158246994, ref_abs_avg=24.996501922607422, test_abs_avg=25.0174560546875
production_forward2 grad[54] vs paper_forward: mean_abs=0.7458124756813049, max_abs=5.0, mean_rel=0.15683649480342865, max_rel=894.2383422851562, norm_rel=0.024956217035651207, ref_abs_avg=29.957948684692383, test_abs_avg=29.95812225341797
production_forward2 grad[55] vs paper_forward: mean_abs=0.7321729063987732, max_abs=6.0, mean_rel=0.15528176724910736, max_rel=571.5340576171875, norm_rel=0.024640215560793877, ref_abs_avg=29.768470764160156, test_abs_avg=29.770153045654297
production_forward2 grad[56] vs paper_forward: mean_abs=0.5638148188591003, max_abs=2.5, mean_rel=0.09446676075458527, max_rel=3.788487672805786, norm_rel=0.024420486763119698, ref_abs_avg=22.924423217773438, test_abs_avg=22.896472930908203
production_forward2 grad[57] vs paper_forward: mean_abs=0.6894961595535278, max_abs=5.1875, mean_rel=0.15951231122016907, max_rel=642.5618896484375, norm_rel=0.024312591180205345, ref_abs_avg=28.367578506469727, test_abs_avg=28.366365432739258
production_forward2 grad[58] vs paper_forward: mean_abs=0.6806226968765259, max_abs=4.75, mean_rel=0.16432806849479675, max_rel=1020.3018188476562, norm_rel=0.02468903549015522, ref_abs_avg=27.697471618652344, test_abs_avg=27.69879913330078
production_forward2 grad[59] vs paper_forward: mean_abs=0.5427684783935547, max_abs=2.25, mean_rel=0.1579098254442215, max_rel=11.722126960754395, norm_rel=0.024364430457353592, ref_abs_avg=22.005126953125, test_abs_avg=22.004077911376953
production_forward2 grad[60] vs paper_forward: mean_abs=0.6535781621932983, max_abs=4.5, mean_rel=0.16853106021881104, max_rel=1266.5015869140625, norm_rel=0.02393919788300991, ref_abs_avg=27.269729614257812, test_abs_avg=27.26620864868164
production_forward2 grad[61] vs paper_forward: mean_abs=0.6413395404815674, max_abs=5.0, mean_rel=0.171110600233078, max_rel=1097.7110595703125, norm_rel=0.02388583868741989, ref_abs_avg=26.889217376708984, test_abs_avg=26.888986587524414
production_forward2 grad[62] vs paper_forward: mean_abs=0.4886770248413086, max_abs=2.5, mean_rel=0.06999421119689941, max_rel=3.894758939743042, norm_rel=0.022004541009664536, ref_abs_avg=22.665374755859375, test_abs_avg=22.674026489257812
production_forward2 grad[63] vs paper_forward: mean_abs=0.6177982091903687, max_abs=4.5, mean_rel=0.1541995406150818, max_rel=1350.1153564453125, norm_rel=0.023608021438121796, ref_abs_avg=26.19078826904297, test_abs_avg=26.191017150878906
production_forward2 grad[64] vs paper_forward: mean_abs=0.6007854342460632, max_abs=4.25, mean_rel=0.15235665440559387, max_rel=1026.45947265625, norm_rel=0.023202750831842422, ref_abs_avg=25.907398223876953, test_abs_avg=25.907638549804688
production_forward2 grad[65] vs paper_forward: mean_abs=0.46146726608276367, max_abs=2.0, mean_rel=0.14221496880054474, max_rel=26.466604232788086, norm_rel=0.021845676004886627, ref_abs_avg=20.96725082397461, test_abs_avg=21.02041244506836
production_forward2 grad[66] vs paper_forward: mean_abs=0.5852770805358887, max_abs=4.0, mean_rel=0.14719915390014648, max_rel=799.9025268554688, norm_rel=0.023257793858647346, ref_abs_avg=25.098064422607422, test_abs_avg=25.099151611328125
production_forward2 grad[67] vs paper_forward: mean_abs=0.576421856880188, max_abs=4.5, mean_rel=0.14735496044158936, max_rel=584.6363525390625, norm_rel=0.023291153833270073, ref_abs_avg=24.791545867919922, test_abs_avg=24.786191940307617
production_forward2 grad[68] vs paper_forward: mean_abs=0.45067787170410156, max_abs=2.125, mean_rel=0.059994664043188095, max_rel=2.0663065910339355, norm_rel=0.0223859716206789, ref_abs_avg=20.585792541503906, test_abs_avg=20.574914932250977
production_forward2 grad[69] vs paper_forward: mean_abs=0.5519342422485352, max_abs=4.5, mean_rel=0.14831843972206116, max_rel=989.1842041015625, norm_rel=0.02272227592766285, ref_abs_avg=24.24660301208496, test_abs_avg=24.248003005981445
production_forward2 grad[70] vs paper_forward: mean_abs=0.5407085418701172, max_abs=3.5, mean_rel=0.1384207159280777, max_rel=456.625244140625, norm_rel=0.022624146193265915, ref_abs_avg=23.940311431884766, test_abs_avg=23.9365177154541
production_forward2 grad[71] vs paper_forward: mean_abs=0.40307825803756714, max_abs=1.9375, mean_rel=0.11209951341152191, max_rel=14.682086944580078, norm_rel=0.020313536748290062, ref_abs_avg=20.135738372802734, test_abs_avg=20.135143280029297
production_forward2 grad[72] vs paper_forward: mean_abs=0.5367685556411743, max_abs=4.6875, mean_rel=0.14577335119247437, max_rel=1263.85498046875, norm_rel=0.02212822623550892, ref_abs_avg=24.192882537841797, test_abs_avg=24.192974090576172
production_forward2 grad[73] vs paper_forward: mean_abs=0.5152804851531982, max_abs=4.0, mean_rel=0.1425454020500183, max_rel=1189.8106689453125, norm_rel=0.02177499048411846, ref_abs_avg=23.679296493530273, test_abs_avg=23.676212310791016
production_forward2 grad[74] vs paper_forward: mean_abs=0.4669966697692871, max_abs=2.0, mean_rel=0.12058969587087631, max_rel=6.5973052978515625, norm_rel=0.022379819303750992, ref_abs_avg=21.152355194091797, test_abs_avg=21.129396438598633
production_forward2 grad[75] vs paper_forward: mean_abs=0.6006461381912231, max_abs=4.5, mean_rel=0.15672919154167175, max_rel=1259.3310546875, norm_rel=0.02384950965642929, ref_abs_avg=25.1949462890625, test_abs_avg=25.192258834838867
production_forward2 grad[76] vs paper_forward: mean_abs=0.5902328491210938, max_abs=4.25, mean_rel=0.1586407721042633, max_rel=727.0499877929688, norm_rel=0.023626625537872314, ref_abs_avg=25.08181381225586, test_abs_avg=25.077281951904297
production_forward2 grad[77] vs paper_forward: mean_abs=0.45465636253356934, max_abs=1.7109375, mean_rel=0.07145147025585175, max_rel=2.7953548431396484, norm_rel=0.021748701110482216, ref_abs_avg=20.36173439025879, test_abs_avg=20.3275146484375
production_forward2 grad[78] vs paper_forward: mean_abs=0.5549238920211792, max_abs=4.75, mean_rel=0.14508000016212463, max_rel=652.8556518554688, norm_rel=0.02311667986214161, ref_abs_avg=24.00165367126465, test_abs_avg=23.999958038330078
production_forward2 grad[79] vs paper_forward: mean_abs=0.5327338576316833, max_abs=3.5, mean_rel=0.15099604427814484, max_rel=1298.9971923828125, norm_rel=0.022643113508820534, ref_abs_avg=23.482269287109375, test_abs_avg=23.48227882385254
production_forward2 grad[80] vs paper_forward: mean_abs=0.4177837371826172, max_abs=1.5, mean_rel=0.08991952240467072, max_rel=13.263121604919434, norm_rel=0.021655458956956863, ref_abs_avg=19.877418518066406, test_abs_avg=19.845972061157227
production_forward2 grad[81] vs paper_forward: mean_abs=0.5135053396224976, max_abs=5.5, mean_rel=0.1475948542356491, max_rel=1072.6392822265625, norm_rel=0.02256937511265278, ref_abs_avg=22.714664459228516, test_abs_avg=22.71441078186035
production_forward2 grad[82] vs paper_forward: mean_abs=0.49655553698539734, max_abs=4.0, mean_rel=0.14775589108467102, max_rel=770.1543579101562, norm_rel=0.022194532677531242, ref_abs_avg=22.41864013671875, test_abs_avg=22.41376304626465
production_forward2 grad[83] vs paper_forward: mean_abs=0.36503827571868896, max_abs=1.5, mean_rel=0.07257820665836334, max_rel=4.441230297088623, norm_rel=0.021255940198898315, ref_abs_avg=17.5839786529541, test_abs_avg=17.609359741210938
production_forward2 grad[84] vs paper_forward: mean_abs=0.47303658723831177, max_abs=4.0, mean_rel=0.14199745655059814, max_rel=903.2481079101562, norm_rel=0.022103816270828247, ref_abs_avg=21.412174224853516, test_abs_avg=21.41330337524414
production_forward2 grad[85] vs paper_forward: mean_abs=0.4689216911792755, max_abs=4.25, mean_rel=0.14905886352062225, max_rel=736.1781616210938, norm_rel=0.022069450467824936, ref_abs_avg=21.319000244140625, test_abs_avg=21.318485260009766
production_forward2 grad[86] vs paper_forward: mean_abs=0.3516296148300171, max_abs=1.5, mean_rel=0.14636561274528503, max_rel=30.83127784729004, norm_rel=0.021186258643865585, ref_abs_avg=16.809396743774414, test_abs_avg=16.82614517211914
production_forward2 grad[87] vs paper_forward: mean_abs=0.44703397154808044, max_abs=4.5, mean_rel=0.13931217789649963, max_rel=905.8995971679688, norm_rel=0.021560419350862503, ref_abs_avg=20.79501724243164, test_abs_avg=20.794204711914062
production_forward2 grad[88] vs paper_forward: mean_abs=0.4409808814525604, max_abs=3.40625, mean_rel=0.1338822990655899, max_rel=819.6079711914062, norm_rel=0.02165408246219158, ref_abs_avg=20.52288055419922, test_abs_avg=20.510499954223633
production_forward2 grad[89] vs paper_forward: mean_abs=0.33086204528808594, max_abs=1.375, mean_rel=0.09836895763874054, max_rel=13.616364479064941, norm_rel=0.020244376733899117, ref_abs_avg=16.637470245361328, test_abs_avg=16.63574981689453
production_forward2 grad[90] vs paper_forward: mean_abs=0.4174923896789551, max_abs=3.875, mean_rel=0.13955239951610565, max_rel=695.8676147460938, norm_rel=0.02108314074575901, ref_abs_avg=19.92449378967285, test_abs_avg=19.92510223388672
production_forward2 grad[91] vs paper_forward: mean_abs=0.4086371064186096, max_abs=3.75, mean_rel=0.13283056020736694, max_rel=484.2722473144531, norm_rel=0.02055468037724495, ref_abs_avg=20.103404998779297, test_abs_avg=20.09326934814453
production_forward2 grad[92] vs paper_forward: mean_abs=0.33281153440475464, max_abs=1.5, mean_rel=0.07603680342435837, max_rel=2.2932238578796387, norm_rel=0.020657604560256004, ref_abs_avg=16.562664031982422, test_abs_avg=16.575410842895508
production_forward2 grad[93] vs paper_forward: mean_abs=0.3970414400100708, max_abs=5.5, mean_rel=0.1309959888458252, max_rel=1193.5704345703125, norm_rel=0.020463930442929268, ref_abs_avg=19.594707489013672, test_abs_avg=19.593374252319336
production_forward2 grad[94] vs paper_forward: mean_abs=0.3898237943649292, max_abs=3.0, mean_rel=0.12149650603532791, max_rel=431.28887939453125, norm_rel=0.020773589611053467, ref_abs_avg=18.970748901367188, test_abs_avg=18.96468734741211
production_forward2 grad[95] vs paper_forward: mean_abs=0.33864083886146545, max_abs=1.25, mean_rel=0.1817435920238495, max_rel=21.62746810913086, norm_rel=0.02119431458413601, ref_abs_avg=15.654851913452148, test_abs_avg=15.662538528442383
production_forward2 grad[96] vs paper_forward: mean_abs=0.382932186126709, max_abs=4.25, mean_rel=0.12083069235086441, max_rel=512.5667724609375, norm_rel=0.02003340795636177, ref_abs_avg=19.43062973022461, test_abs_avg=19.429391860961914
production_forward2 grad[97] vs paper_forward: mean_abs=0.3619166612625122, max_abs=3.5, mean_rel=0.12130562961101532, max_rel=610.9013671875, norm_rel=0.019170353189110756, ref_abs_avg=19.211219787597656, test_abs_avg=19.216392517089844
identity layers + randn queries
mean abs randn paper: 0.2158203125
production_forward2 fwd+bwd:  224.328 ms
production_forward2 fwd-only: 22.343 ms
production_forward2 bwd-only: 202.144 ms
production_forward2 peak allocated: fwd=2.692 GiB, fwd+bwd=6.071 GiB
production_forward2 peak reserved:  fwd=3.057 GiB, fwd+bwd=8.807 GiB
mean abs difference randn: 0.00164031982421875
mean relative difference randn: 0.0291748046875
paper_forward fwd+bwd:  379.590 ms
paper_forward fwd-only: 85.753 ms
paper_forward bwd-only: 293.997 ms
paper_forward peak allocated: fwd=31.080 GiB, fwd+bwd=33.199 GiB
paper_forward peak reserved:  fwd=31.100 GiB, fwd+bwd=33.850 GiB
mean abs difference randn: 3.7670135498046875e-05
mean relative difference randn: 0.00066375732421875
torch_compile_phases_forward fwd+bwd:  189.888 ms
torch_compile_phases_forward fwd-only: 36.577 ms
torch_compile_phases_forward bwd-only: 152.583 ms
torch_compile_phases_forward peak allocated: fwd=65.564 GiB, fwd+bwd=66.191 GiB
torch_compile_phases_forward peak reserved:  fwd=66.143 GiB, fwd+bwd=70.393 GiB
mean abs difference randn: 0.00164794921875
mean relative difference randn: 0.0291748046875
production_forward fwd+bwd:  112.010 ms
production_forward fwd-only: 20.500 ms
production_forward bwd-only: 91.651 ms
production_forward peak allocated: fwd=37.425 GiB, fwd+bwd=41.305 GiB
production_forward peak reserved:  fwd=39.227 GiB, fwd+bwd=42.227 GiB
mean abs difference randn: 0.00164031982421875
mean relative difference randn: 0.0291748046875

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016438555903732777, max_abs=0.0390625
production_forward grad[0] vs paper_forward: mean_abs=0.00867496244609356, max_abs=0.4375, mean_rel=0.07544642686843872, max_rel=90.99259185791016, norm_rel=0.020622195675969124, ref_abs_avg=0.4534105062484741, test_abs_avg=0.4534161388874054
production_forward grad[1] vs paper_forward: mean_abs=7.496057510375977, max_abs=48.0, mean_rel=0.2012290507555008, max_rel=931.21044921875, norm_rel=0.020750489085912704, ref_abs_avg=316.8620910644531, test_abs_avg=316.8612365722656
production_forward grad[2] vs paper_forward: mean_abs=1.2460594177246094, max_abs=4.625, mean_rel=0.10257428884506226, max_rel=7.812499523162842, norm_rel=0.022918038070201874, ref_abs_avg=53.99516677856445, test_abs_avg=54.024925231933594
production_forward grad[3] vs paper_forward: mean_abs=1.6020240783691406, max_abs=11.0, mean_rel=0.16393961012363434, max_rel=2033.9190673828125, norm_rel=0.024485906586050987, ref_abs_avg=65.772705078125, test_abs_avg=65.77174377441406
production_forward grad[4] vs paper_forward: mean_abs=1.5658392906188965, max_abs=10.0, mean_rel=0.17885878682136536, max_rel=2322.70849609375, norm_rel=0.024201394990086555, ref_abs_avg=65.07916259765625, test_abs_avg=65.0889892578125
production_forward grad[5] vs paper_forward: mean_abs=1.1149883270263672, max_abs=4.78125, mean_rel=0.11330518126487732, max_rel=7.257483005523682, norm_rel=0.023766668513417244, ref_abs_avg=46.73762512207031, test_abs_avg=46.763572692871094
production_forward grad[6] vs paper_forward: mean_abs=1.3832695484161377, max_abs=8.25, mean_rel=0.16675886511802673, max_rel=1381.2598876953125, norm_rel=0.02412310801446438, ref_abs_avg=57.606529235839844, test_abs_avg=57.6065788269043
production_forward grad[7] vs paper_forward: mean_abs=1.3575475215911865, max_abs=9.5, mean_rel=0.1549789309501648, max_rel=827.4277954101562, norm_rel=0.02402079850435257, ref_abs_avg=56.751468658447266, test_abs_avg=56.74929428100586
production_forward grad[8] vs paper_forward: mean_abs=1.0135092735290527, max_abs=3.625, mean_rel=0.07398407906293869, max_rel=2.372122287750244, norm_rel=0.023205308243632317, ref_abs_avg=43.7595329284668, test_abs_avg=43.834144592285156
production_forward grad[9] vs paper_forward: mean_abs=1.2471709251403809, max_abs=8.0, mean_rel=0.17076990008354187, max_rel=2061.994384765625, norm_rel=0.023922082036733627, ref_abs_avg=52.38103485107422, test_abs_avg=52.382598876953125
production_forward grad[10] vs paper_forward: mean_abs=1.2103054523468018, max_abs=7.5, mean_rel=0.18497662246227264, max_rel=2217.383056640625, norm_rel=0.023573867976665497, ref_abs_avg=51.592018127441406, test_abs_avg=51.601348876953125
production_forward grad[11] vs paper_forward: mean_abs=0.926414966583252, max_abs=4.5625, mean_rel=0.22780434787273407, max_rel=67.78568267822266, norm_rel=0.023298025131225586, ref_abs_avg=40.87166976928711, test_abs_avg=40.87923049926758
production_forward grad[12] vs paper_forward: mean_abs=1.1533772945404053, max_abs=7.5, mean_rel=0.14989301562309265, max_rel=1150.51708984375, norm_rel=0.023662250488996506, ref_abs_avg=49.00951385498047, test_abs_avg=49.012760162353516
production_forward grad[13] vs paper_forward: mean_abs=1.1291316747665405, max_abs=7.0, mean_rel=0.15973232686519623, max_rel=1086.26806640625, norm_rel=0.023547742515802383, ref_abs_avg=48.160552978515625, test_abs_avg=48.163551330566406
production_forward grad[14] vs paper_forward: mean_abs=0.8392572402954102, max_abs=4.0, mean_rel=0.1120963841676712, max_rel=11.251666069030762, norm_rel=0.021846871823072433, ref_abs_avg=38.350425720214844, test_abs_avg=38.308265686035156
production_forward grad[15] vs paper_forward: mean_abs=1.083276629447937, max_abs=7.0, mean_rel=0.14839163422584534, max_rel=1250.1690673828125, norm_rel=0.02344980277121067, ref_abs_avg=46.41209411621094, test_abs_avg=46.41475296020508
production_forward grad[16] vs paper_forward: mean_abs=1.0491244792938232, max_abs=6.0, mean_rel=0.15892472863197327, max_rel=1397.0321044921875, norm_rel=0.02316531352698803, ref_abs_avg=45.52634048461914, test_abs_avg=45.52931213378906
production_forward grad[17] vs paper_forward: mean_abs=0.8219280242919922, max_abs=3.5, mean_rel=0.09955503791570663, max_rel=11.215333938598633, norm_rel=0.02248517982661724, ref_abs_avg=36.803096771240234, test_abs_avg=36.790016174316406
production_forward grad[18] vs paper_forward: mean_abs=1.0167876482009888, max_abs=6.375, mean_rel=0.15826576948165894, max_rel=1178.3846435546875, norm_rel=0.0232998039573431, ref_abs_avg=43.816856384277344, test_abs_avg=43.81719207763672
production_forward grad[19] vs paper_forward: mean_abs=0.991675615310669, max_abs=6.0, mean_rel=0.14747902750968933, max_rel=537.7363891601562, norm_rel=0.02292029559612274, ref_abs_avg=43.46126174926758, test_abs_avg=43.462406158447266
production_forward grad[20] vs paper_forward: mean_abs=0.7352908849716187, max_abs=3.125, mean_rel=1.1357004642486572, max_rel=551.436767578125, norm_rel=0.022205039858818054, ref_abs_avg=33.90388488769531, test_abs_avg=33.942657470703125
production_forward grad[21] vs paper_forward: mean_abs=0.9575861692428589, max_abs=6.5, mean_rel=0.15666724741458893, max_rel=2053.12841796875, norm_rel=0.02314065955579281, ref_abs_avg=41.60504150390625, test_abs_avg=41.605918884277344
production_forward grad[22] vs paper_forward: mean_abs=0.9385092258453369, max_abs=6.0, mean_rel=0.1619918942451477, max_rel=1152.264404296875, norm_rel=0.02294212207198143, ref_abs_avg=41.09327697753906, test_abs_avg=41.09093475341797
production_forward grad[23] vs paper_forward: mean_abs=0.7222149968147278, max_abs=3.0, mean_rel=0.07236892729997635, max_rel=2.9450414180755615, norm_rel=0.021386580541729927, ref_abs_avg=33.610252380371094, test_abs_avg=33.613807678222656
production_forward grad[24] vs paper_forward: mean_abs=0.913329005241394, max_abs=5.625, mean_rel=0.15475168824195862, max_rel=895.3278198242188, norm_rel=0.023012220859527588, ref_abs_avg=39.85179901123047, test_abs_avg=39.85422897338867
production_forward grad[25] vs paper_forward: mean_abs=0.8952617645263672, max_abs=5.0, mean_rel=0.15382999181747437, max_rel=1260.2518310546875, norm_rel=0.022683685645461082, ref_abs_avg=39.645572662353516, test_abs_avg=39.64372253417969
production_forward grad[26] vs paper_forward: mean_abs=0.872889518737793, max_abs=4.25, mean_rel=0.10622741281986237, max_rel=7.860198974609375, norm_rel=0.02464757114648819, ref_abs_avg=35.232234954833984, test_abs_avg=35.247920989990234
production_forward grad[27] vs paper_forward: mean_abs=1.0497729778289795, max_abs=7.0, mean_rel=0.17779241502285004, max_rel=1572.88330078125, norm_rel=0.024844609200954437, ref_abs_avg=42.418556213378906, test_abs_avg=42.42280578613281
production_forward grad[28] vs paper_forward: mean_abs=1.033963918685913, max_abs=6.5, mean_rel=0.17687931656837463, max_rel=1543.4122314453125, norm_rel=0.024880489334464073, ref_abs_avg=41.757354736328125, test_abs_avg=41.76802444458008
production_forward grad[29] vs paper_forward: mean_abs=0.7786316871643066, max_abs=3.0, mean_rel=0.10170871019363403, max_rel=5.204470634460449, norm_rel=0.025490913540124893, ref_abs_avg=31.271564483642578, test_abs_avg=31.278776168823242
production_forward grad[30] vs paper_forward: mean_abs=0.9797378778457642, max_abs=6.5, mean_rel=0.16692256927490234, max_rel=974.8019409179688, norm_rel=0.025085192173719406, ref_abs_avg=39.1629638671875, test_abs_avg=39.16162872314453
production_forward grad[31] vs paper_forward: mean_abs=0.9623599052429199, max_abs=6.1630859375, mean_rel=0.16500741243362427, max_rel=836.27197265625, norm_rel=0.024927815422415733, ref_abs_avg=38.694114685058594, test_abs_avg=38.699058532714844
production_forward grad[32] vs paper_forward: mean_abs=0.765352725982666, max_abs=3.5, mean_rel=0.18747396767139435, max_rel=56.979862213134766, norm_rel=0.025151550769805908, ref_abs_avg=30.195205688476562, test_abs_avg=30.239255905151367
production_forward grad[33] vs paper_forward: mean_abs=0.9155340790748596, max_abs=6.0, mean_rel=0.16587629914283752, max_rel=2149.13134765625, norm_rel=0.024913830682635307, ref_abs_avg=36.86464309692383, test_abs_avg=36.86772155761719
production_forward grad[34] vs paper_forward: mean_abs=0.9003733396530151, max_abs=5.375, mean_rel=0.16399702429771423, max_rel=989.43115234375, norm_rel=0.02477240189909935, ref_abs_avg=36.529319763183594, test_abs_avg=36.528629302978516
production_forward grad[35] vs paper_forward: mean_abs=0.6962299346923828, max_abs=2.5, mean_rel=0.20044271647930145, max_rel=32.6062126159668, norm_rel=0.024138856679201126, ref_abs_avg=28.74938201904297, test_abs_avg=28.833984375
production_forward grad[36] vs paper_forward: mean_abs=0.8522543907165527, max_abs=5.140625, mean_rel=0.1684378683567047, max_rel=1174.788818359375, norm_rel=0.024709464982151985, ref_abs_avg=34.6072998046875, test_abs_avg=34.60980987548828
production_forward grad[37] vs paper_forward: mean_abs=0.8409488201141357, max_abs=5.25, mean_rel=0.1718006134033203, max_rel=1437.56787109375, norm_rel=0.02476475201547146, ref_abs_avg=34.08018493652344, test_abs_avg=34.08325958251953
production_forward grad[38] vs paper_forward: mean_abs=0.6613686084747314, max_abs=2.5, mean_rel=0.18211150169372559, max_rel=41.76176071166992, norm_rel=0.024890275672078133, ref_abs_avg=26.351638793945312, test_abs_avg=26.372512817382812
production_forward grad[39] vs paper_forward: mean_abs=0.8085692524909973, max_abs=5.0, mean_rel=0.16738715767860413, max_rel=2076.125, norm_rel=0.024424128234386444, ref_abs_avg=33.20873260498047, test_abs_avg=33.21080017089844
production_forward grad[40] vs paper_forward: mean_abs=0.7940371036529541, max_abs=4.75, mean_rel=0.1479756385087967, max_rel=584.8052978515625, norm_rel=0.024295352399349213, ref_abs_avg=32.743255615234375, test_abs_avg=32.737632751464844
production_forward grad[41] vs paper_forward: mean_abs=0.642756462097168, max_abs=2.625, mean_rel=0.09521861374378204, max_rel=4.215912342071533, norm_rel=0.025041956454515457, ref_abs_avg=25.427825927734375, test_abs_avg=25.393253326416016
production_forward grad[42] vs paper_forward: mean_abs=0.7675815224647522, max_abs=5.0, mean_rel=0.16064313054084778, max_rel=1323.577392578125, norm_rel=0.0241236612200737, ref_abs_avg=31.880401611328125, test_abs_avg=31.883872985839844
production_forward grad[43] vs paper_forward: mean_abs=0.7543596029281616, max_abs=6.3125, mean_rel=0.17525801062583923, max_rel=2079.375732421875, norm_rel=0.024175923317670822, ref_abs_avg=31.241113662719727, test_abs_avg=31.251209259033203
production_forward grad[44] vs paper_forward: mean_abs=0.5922880172729492, max_abs=2.375, mean_rel=0.09215554594993591, max_rel=2.796576738357544, norm_rel=0.026240499690175056, ref_abs_avg=22.847549438476562, test_abs_avg=22.87798309326172
production_forward grad[45] vs paper_forward: mean_abs=0.7360854148864746, max_abs=5.25, mean_rel=0.1628224104642868, max_rel=1660.6744384765625, norm_rel=0.023862242698669434, ref_abs_avg=30.863243103027344, test_abs_avg=30.861902236938477
production_forward grad[46] vs paper_forward: mean_abs=0.7225332260131836, max_abs=4.75, mean_rel=0.14371389150619507, max_rel=724.8796997070312, norm_rel=0.02410013973712921, ref_abs_avg=30.067012786865234, test_abs_avg=30.064546585083008
production_forward grad[47] vs paper_forward: mean_abs=0.5419392585754395, max_abs=2.1640625, mean_rel=0.12601163983345032, max_rel=21.03940773010254, norm_rel=0.023717518895864487, ref_abs_avg=23.13817024230957, test_abs_avg=23.078868865966797
production_forward grad[48] vs paper_forward: mean_abs=0.6944867372512817, max_abs=4.5, mean_rel=0.15563732385635376, max_rel=1495.5467529296875, norm_rel=0.023847872391343117, ref_abs_avg=29.204204559326172, test_abs_avg=29.20419692993164
production_forward grad[49] vs paper_forward: mean_abs=0.6866248846054077, max_abs=5.0, mean_rel=0.15741455554962158, max_rel=942.5625, norm_rel=0.02372106909751892, ref_abs_avg=29.003843307495117, test_abs_avg=29.00588607788086
production_forward grad[50] vs paper_forward: mean_abs=0.6273157596588135, max_abs=2.8125, mean_rel=0.09771563857793808, max_rel=2.9273908138275146, norm_rel=0.026123974472284317, ref_abs_avg=23.922178268432617, test_abs_avg=23.915491104125977
production_forward grad[51] vs paper_forward: mean_abs=0.7824016809463501, max_abs=5.03125, mean_rel=0.17354154586791992, max_rel=1520.00244140625, norm_rel=0.02571956254541874, ref_abs_avg=30.51802635192871, test_abs_avg=30.51569366455078
production_forward grad[52] vs paper_forward: mean_abs=0.7630529403686523, max_abs=5.0, mean_rel=0.15106770396232605, max_rel=1472.643310546875, norm_rel=0.025449134409427643, ref_abs_avg=30.038925170898438, test_abs_avg=30.02634048461914
production_forward grad[53] vs paper_forward: mean_abs=0.5716152191162109, max_abs=2.5, mean_rel=0.07830866426229477, max_rel=3.4814772605895996, norm_rel=0.023908449336886406, ref_abs_avg=24.265478134155273, test_abs_avg=24.232009887695312
production_forward grad[54] vs paper_forward: mean_abs=0.7183570861816406, max_abs=5.25, mean_rel=0.1653524935245514, max_rel=1328.4007568359375, norm_rel=0.025156836956739426, ref_abs_avg=28.59386444091797, test_abs_avg=28.593063354492188
production_forward grad[55] vs paper_forward: mean_abs=0.701617956161499, max_abs=4.6484375, mean_rel=0.16971397399902344, max_rel=824.2982177734375, norm_rel=0.025235185399651527, ref_abs_avg=27.917760848999023, test_abs_avg=27.9237003326416
production_forward grad[56] vs paper_forward: mean_abs=0.5300146341323853, max_abs=2.25, mean_rel=0.08720055222511292, max_rel=5.549718379974365, norm_rel=0.024536261335015297, ref_abs_avg=22.843719482421875, test_abs_avg=22.846372604370117
production_forward grad[57] vs paper_forward: mean_abs=0.6657752990722656, max_abs=5.0, mean_rel=0.1635405272245407, max_rel=1151.5908203125, norm_rel=0.024553166702389717, ref_abs_avg=27.116634368896484, test_abs_avg=27.116024017333984
production_forward grad[58] vs paper_forward: mean_abs=0.6561222076416016, max_abs=4.5, mean_rel=0.15567412972450256, max_rel=913.6448974609375, norm_rel=0.02440526895225048, ref_abs_avg=26.94525146484375, test_abs_avg=26.940494537353516
production_forward grad[59] vs paper_forward: mean_abs=0.509243369102478, max_abs=2.74609375, mean_rel=0.13306766748428345, max_rel=14.27237319946289, norm_rel=0.02638309635221958, ref_abs_avg=19.37409019470215, test_abs_avg=19.33827018737793
production_forward grad[60] vs paper_forward: mean_abs=0.6263239979743958, max_abs=4.5, mean_rel=0.16126291453838348, max_rel=1254.3978271484375, norm_rel=0.02431393414735794, ref_abs_avg=25.806875228881836, test_abs_avg=25.806562423706055
production_forward grad[61] vs paper_forward: mean_abs=0.6135972738265991, max_abs=4.0, mean_rel=0.1642848402261734, max_rel=1036.93603515625, norm_rel=0.023892471566796303, ref_abs_avg=25.67983055114746, test_abs_avg=25.679611206054688
production_forward grad[62] vs paper_forward: mean_abs=0.47589945793151855, max_abs=2.0625, mean_rel=0.17768530547618866, max_rel=32.39435577392578, norm_rel=0.024965893477201462, ref_abs_avg=19.432334899902344, test_abs_avg=19.39617919921875
production_forward grad[63] vs paper_forward: mean_abs=0.5934547781944275, max_abs=3.9375, mean_rel=0.15540188550949097, max_rel=1061.41796875, norm_rel=0.02382880449295044, ref_abs_avg=24.935592651367188, test_abs_avg=24.93560218811035
production_forward grad[64] vs paper_forward: mean_abs=0.5794906616210938, max_abs=4.0, mean_rel=0.16247397661209106, max_rel=611.969970703125, norm_rel=0.023567406460642815, ref_abs_avg=24.602170944213867, test_abs_avg=24.606830596923828
production_forward grad[65] vs paper_forward: mean_abs=0.4440838694572449, max_abs=1.90625, mean_rel=0.14307400584220886, max_rel=30.270584106445312, norm_rel=0.022240404039621353, ref_abs_avg=20.219223022460938, test_abs_avg=20.208595275878906
production_forward grad[66] vs paper_forward: mean_abs=0.5568095445632935, max_abs=3.75, mean_rel=0.15272678434848785, max_rel=953.669189453125, norm_rel=0.02329259365797043, ref_abs_avg=23.908180236816406, test_abs_avg=23.908710479736328
production_forward grad[67] vs paper_forward: mean_abs=0.5409207344055176, max_abs=3.75, mean_rel=0.14980007708072662, max_rel=1516.819091796875, norm_rel=0.022758327424526215, ref_abs_avg=23.78460693359375, test_abs_avg=23.78366470336914
production_forward grad[68] vs paper_forward: mean_abs=0.4329357147216797, max_abs=2.220703125, mean_rel=0.13839852809906006, max_rel=18.57450294494629, norm_rel=0.023048967123031616, ref_abs_avg=19.474403381347656, test_abs_avg=19.47056770324707
production_forward grad[69] vs paper_forward: mean_abs=0.5283178687095642, max_abs=3.5, mean_rel=0.1458868533372879, max_rel=924.88720703125, norm_rel=0.022823955863714218, ref_abs_avg=23.141183853149414, test_abs_avg=23.141508102416992
production_forward grad[70] vs paper_forward: mean_abs=0.5193247199058533, max_abs=3.75, mean_rel=0.14076662063598633, max_rel=905.2577514648438, norm_rel=0.022729281336069107, ref_abs_avg=22.86351203918457, test_abs_avg=22.85995101928711
production_forward grad[71] vs paper_forward: mean_abs=0.4186316430568695, max_abs=1.5625, mean_rel=0.4001833498477936, max_rel=76.53367614746094, norm_rel=0.023074675351381302, ref_abs_avg=17.851730346679688, test_abs_avg=17.841228485107422
production_forward grad[72] vs paper_forward: mean_abs=0.5051048398017883, max_abs=4.0625, mean_rel=0.15019828081130981, max_rel=822.2606811523438, norm_rel=0.022575465962290764, ref_abs_avg=22.36838722229004, test_abs_avg=22.37007713317871
production_forward grad[73] vs paper_forward: mean_abs=0.4902344048023224, max_abs=3.875, mean_rel=0.14664050936698914, max_rel=707.4822998046875, norm_rel=0.022260773926973343, ref_abs_avg=22.01224136352539, test_abs_avg=22.009010314941406
production_forward grad[74] vs paper_forward: mean_abs=0.4622560739517212, max_abs=2.0, mean_rel=0.11287759244441986, max_rel=7.834219455718994, norm_rel=0.025082994252443314, ref_abs_avg=18.95456314086914, test_abs_avg=18.957246780395508
production_forward grad[75] vs paper_forward: mean_abs=0.5611714124679565, max_abs=5.0, mean_rel=0.1533259153366089, max_rel=957.619384765625, norm_rel=0.023753155022859573, ref_abs_avg=23.63210678100586, test_abs_avg=23.63255500793457
production_forward grad[76] vs paper_forward: mean_abs=0.5543466210365295, max_abs=4.5, mean_rel=0.14541296660900116, max_rel=483.51806640625, norm_rel=0.023616762831807137, ref_abs_avg=23.500316619873047, test_abs_avg=23.506237030029297
production_forward grad[77] vs paper_forward: mean_abs=0.39281535148620605, max_abs=1.8125, mean_rel=0.08391959965229034, max_rel=7.188086986541748, norm_rel=0.02082391455769539, ref_abs_avg=19.49745750427246, test_abs_avg=19.51715087890625
production_forward grad[78] vs paper_forward: mean_abs=0.52580726146698, max_abs=5.0, mean_rel=0.1449701488018036, max_rel=1228.3609619140625, norm_rel=0.023110171779990196, ref_abs_avg=22.74782943725586, test_abs_avg=22.745616912841797
production_forward grad[79] vs paper_forward: mean_abs=0.5074856281280518, max_abs=5.0, mean_rel=0.1489746868610382, max_rel=625.5678100585938, norm_rel=0.022985871881246567, ref_abs_avg=22.233036041259766, test_abs_avg=22.23014259338379
production_forward grad[80] vs paper_forward: mean_abs=0.385811448097229, max_abs=1.6875, mean_rel=0.08557987213134766, max_rel=4.799301624298096, norm_rel=0.022196870297193527, ref_abs_avg=17.48955726623535, test_abs_avg=17.505287170410156
production_forward grad[81] vs paper_forward: mean_abs=0.48471930623054504, max_abs=4.25, mean_rel=0.15056315064430237, max_rel=1289.8594970703125, norm_rel=0.022454172372817993, ref_abs_avg=21.588407516479492, test_abs_avg=21.588733673095703
production_forward grad[82] vs paper_forward: mean_abs=0.4707908630371094, max_abs=4.75, mean_rel=0.14652910828590393, max_rel=1035.9345703125, norm_rel=0.02243144065141678, ref_abs_avg=21.04747200012207, test_abs_avg=21.04678726196289
production_forward grad[83] vs paper_forward: mean_abs=0.3554377555847168, max_abs=1.25, mean_rel=0.07041889429092407, max_rel=6.005882263183594, norm_rel=0.020316673442721367, ref_abs_avg=17.789915084838867, test_abs_avg=17.76556396484375
production_forward grad[84] vs paper_forward: mean_abs=0.44553935527801514, max_abs=4.0, mean_rel=0.1429583728313446, max_rel=1188.2823486328125, norm_rel=0.021892741322517395, ref_abs_avg=20.368284225463867, test_abs_avg=20.36811065673828
production_forward grad[85] vs paper_forward: mean_abs=0.4396742582321167, max_abs=4.125, mean_rel=0.13845577836036682, max_rel=545.2481079101562, norm_rel=0.021693985909223557, ref_abs_avg=20.355377197265625, test_abs_avg=20.357656478881836
production_forward grad[86] vs paper_forward: mean_abs=0.3508344292640686, max_abs=1.265625, mean_rel=0.18633121252059937, max_rel=40.980384826660156, norm_rel=0.021231507882475853, ref_abs_avg=16.751989364624023, test_abs_avg=16.732147216796875
production_forward grad[87] vs paper_forward: mean_abs=0.4258238673210144, max_abs=4.0, mean_rel=0.13262979686260223, max_rel=803.6228637695312, norm_rel=0.02149638906121254, ref_abs_avg=19.87241554260254, test_abs_avg=19.872554779052734
production_forward grad[88] vs paper_forward: mean_abs=0.4067755937576294, max_abs=4.0, mean_rel=0.12849026918411255, max_rel=668.233642578125, norm_rel=0.020923903211951256, ref_abs_avg=19.42223358154297, test_abs_avg=19.428268432617188
production_forward grad[89] vs paper_forward: mean_abs=0.32113122940063477, max_abs=1.125, mean_rel=0.12832337617874146, max_rel=21.823810577392578, norm_rel=0.01980394683778286, ref_abs_avg=16.068771362304688, test_abs_avg=16.086902618408203
production_forward grad[90] vs paper_forward: mean_abs=0.3986743688583374, max_abs=4.0, mean_rel=0.1308893859386444, max_rel=633.6300659179688, norm_rel=0.020717119798064232, ref_abs_avg=19.379024505615234, test_abs_avg=19.378795623779297
production_forward grad[91] vs paper_forward: mean_abs=0.3854680061340332, max_abs=3.75, mean_rel=0.127058744430542, max_rel=801.54296875, norm_rel=0.02079838141798973, ref_abs_avg=18.738975524902344, test_abs_avg=18.727705001831055
production_forward grad[92] vs paper_forward: mean_abs=0.32791829109191895, max_abs=1.25, mean_rel=0.1373235136270523, max_rel=21.810688018798828, norm_rel=0.021317679435014725, ref_abs_avg=15.711809158325195, test_abs_avg=15.71021842956543
production_forward grad[93] vs paper_forward: mean_abs=0.3777868449687958, max_abs=3.75, mean_rel=0.1303006112575531, max_rel=1025.6171875, norm_rel=0.020552638918161392, ref_abs_avg=18.57373046875, test_abs_avg=18.57126808166504
production_forward grad[94] vs paper_forward: mean_abs=0.36123043298721313, max_abs=3.90625, mean_rel=0.13201701641082764, max_rel=808.8929443359375, norm_rel=0.019751302897930145, ref_abs_avg=18.41680145263672, test_abs_avg=18.42163848876953
production_forward grad[95] vs paper_forward: mean_abs=0.27881669998168945, max_abs=1.625, mean_rel=0.11499209702014923, max_rel=9.322678565979004, norm_rel=0.018402373418211937, ref_abs_avg=15.228529930114746, test_abs_avg=15.21572494506836
production_forward grad[96] vs paper_forward: mean_abs=0.3505590260028839, max_abs=3.5, mean_rel=0.12168572843074799, max_rel=654.312255859375, norm_rel=0.019775988534092903, ref_abs_avg=18.00403594970703, test_abs_avg=18.00510597229004
production_forward grad[97] vs paper_forward: mean_abs=0.34111499786376953, max_abs=4.5, mean_rel=0.1158377155661583, max_rel=463.2244873046875, norm_rel=0.019328203052282333, ref_abs_avg=17.971115112304688, test_abs_avg=17.967222213745117
torch_compile_phases_forward vs paper_forward output: mean_abs=0.001647129887714982, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008692385628819466, max_abs=0.4765625, mean_rel=0.07553401589393616, max_rel=79.08068084716797, norm_rel=0.02066361904144287, ref_abs_avg=0.4534105062484741, test_abs_avg=0.453407883644104
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.44158411026001, max_abs=48.0, mean_rel=0.2301913946866989, max_rel=1271.57568359375, norm_rel=0.02059803530573845, ref_abs_avg=316.8620910644531, test_abs_avg=316.90093994140625
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.2740964889526367, max_abs=6.0, mean_rel=0.10057303309440613, max_rel=8.98129940032959, norm_rel=0.02384422905743122, ref_abs_avg=53.99516677856445, test_abs_avg=54.01372528076172
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.6014535427093506, max_abs=11.0, mean_rel=0.15915164351463318, max_rel=1652.6619873046875, norm_rel=0.024462033063173294, ref_abs_avg=65.772705078125, test_abs_avg=65.7728042602539
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.575416088104248, max_abs=10.0, mean_rel=0.17332887649536133, max_rel=1601.402587890625, norm_rel=0.024320164695382118, ref_abs_avg=65.07916259765625, test_abs_avg=65.09284210205078
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.1136856079101562, max_abs=3.75, mean_rel=0.13020412623882294, max_rel=12.876826286315918, norm_rel=0.023617075756192207, ref_abs_avg=46.73762512207031, test_abs_avg=46.75843811035156
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.3882076740264893, max_abs=9.0, mean_rel=0.16609090566635132, max_rel=2189.389404296875, norm_rel=0.024212993681430817, ref_abs_avg=57.606529235839844, test_abs_avg=57.60641860961914
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.359616756439209, max_abs=8.25, mean_rel=0.15436160564422607, max_rel=741.676025390625, norm_rel=0.024034198373556137, ref_abs_avg=56.751468658447266, test_abs_avg=56.74744415283203
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0404233932495117, max_abs=4.375, mean_rel=0.12559723854064941, max_rel=23.01078987121582, norm_rel=0.023795487359166145, ref_abs_avg=43.7595329284668, test_abs_avg=43.80195617675781
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.2533295154571533, max_abs=7.5, mean_rel=0.1750565618276596, max_rel=1983.82275390625, norm_rel=0.024030327796936035, ref_abs_avg=52.38103485107422, test_abs_avg=52.379581451416016
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.2164626121520996, max_abs=8.0, mean_rel=0.17962370812892914, max_rel=1760.4307861328125, norm_rel=0.023675546050071716, ref_abs_avg=51.592018127441406, test_abs_avg=51.60108947753906
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9444242715835571, max_abs=4.21875, mean_rel=0.27582377195358276, max_rel=84.1720199584961, norm_rel=0.023983178660273552, ref_abs_avg=40.87166976928711, test_abs_avg=40.89530944824219
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.1590449810028076, max_abs=7.125, mean_rel=0.15042752027511597, max_rel=683.2634887695312, norm_rel=0.0237759817391634, ref_abs_avg=49.00951385498047, test_abs_avg=49.01042175292969
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1351025104522705, max_abs=7.0, mean_rel=0.15771722793579102, max_rel=1272.3990478515625, norm_rel=0.02367989532649517, ref_abs_avg=48.160552978515625, test_abs_avg=48.165977478027344
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8715248107910156, max_abs=4.0, mean_rel=0.09752605855464935, max_rel=6.392169952392578, norm_rel=0.02258765883743763, ref_abs_avg=38.350425720214844, test_abs_avg=38.30450439453125
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.0895785093307495, max_abs=6.5, mean_rel=0.15008142590522766, max_rel=806.60546875, norm_rel=0.02358570694923401, ref_abs_avg=46.41209411621094, test_abs_avg=46.41338348388672
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.052673101425171, max_abs=6.5, mean_rel=0.15623487532138824, max_rel=1157.975341796875, norm_rel=0.023228837177157402, ref_abs_avg=45.52634048461914, test_abs_avg=45.5320930480957
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.820655345916748, max_abs=3.375, mean_rel=0.08370527625083923, max_rel=7.531705379486084, norm_rel=0.022429561242461205, ref_abs_avg=36.803096771240234, test_abs_avg=36.76976776123047
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.022688388824463, max_abs=6.0, mean_rel=0.16087473928928375, max_rel=1168.044921875, norm_rel=0.023433323949575424, ref_abs_avg=43.816856384277344, test_abs_avg=43.81586456298828
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9981573820114136, max_abs=6.5, mean_rel=0.15179499983787537, max_rel=688.0043334960938, norm_rel=0.0230763740837574, ref_abs_avg=43.46126174926758, test_abs_avg=43.46421813964844
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7473281621932983, max_abs=3.0, mean_rel=1.1309947967529297, max_rel=547.1082763671875, norm_rel=0.022000109776854515, ref_abs_avg=33.90388488769531, test_abs_avg=33.89374542236328
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9635671377182007, max_abs=6.5, mean_rel=0.15462622046470642, max_rel=2007.7613525390625, norm_rel=0.023293308913707733, ref_abs_avg=41.60504150390625, test_abs_avg=41.60758972167969
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9434412717819214, max_abs=5.5, mean_rel=0.16877685487270355, max_rel=1684.6605224609375, norm_rel=0.023071253672242165, ref_abs_avg=41.09327697753906, test_abs_avg=41.09177780151367
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7190554141998291, max_abs=3.0, mean_rel=0.07161372900009155, max_rel=2.5992438793182373, norm_rel=0.021421341225504875, ref_abs_avg=33.610252380371094, test_abs_avg=33.60820007324219
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9186146855354309, max_abs=6.0, mean_rel=0.15957699716091156, max_rel=1546.2373046875, norm_rel=0.02314598672091961, ref_abs_avg=39.85179901123047, test_abs_avg=39.85247802734375
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8996977806091309, max_abs=5.5, mean_rel=0.15055318176746368, max_rel=1035.489013671875, norm_rel=0.022799614816904068, ref_abs_avg=39.645572662353516, test_abs_avg=39.6419563293457
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.893286943435669, max_abs=3.875, mean_rel=0.10089732706546783, max_rel=9.108026504516602, norm_rel=0.025142226368188858, ref_abs_avg=35.232234954833984, test_abs_avg=35.233402252197266
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.0501112937927246, max_abs=7.0, mean_rel=0.17889195680618286, max_rel=1589.250732421875, norm_rel=0.024855684489011765, ref_abs_avg=42.418556213378906, test_abs_avg=42.421775817871094
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0351895093917847, max_abs=6.0, mean_rel=0.17393441498279572, max_rel=1327.7581787109375, norm_rel=0.02491738833487034, ref_abs_avg=41.757354736328125, test_abs_avg=41.76496505737305
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.8028736114501953, max_abs=3.25, mean_rel=0.10110096633434296, max_rel=5.345337390899658, norm_rel=0.025616297498345375, ref_abs_avg=31.271564483642578, test_abs_avg=31.29086685180664
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.9818282127380371, max_abs=6.875, mean_rel=0.16666720807552338, max_rel=1234.7437744140625, norm_rel=0.025139220058918, ref_abs_avg=39.1629638671875, test_abs_avg=39.16202926635742
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9634130597114563, max_abs=6.5, mean_rel=0.1612984836101532, max_rel=678.9266967773438, norm_rel=0.024945028126239777, ref_abs_avg=38.694114685058594, test_abs_avg=38.700355529785156
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7600417137145996, max_abs=3.4375, mean_rel=0.2352544367313385, max_rel=76.67998504638672, norm_rel=0.02508286014199257, ref_abs_avg=30.195205688476562, test_abs_avg=30.230945587158203
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9187811613082886, max_abs=6.21875, mean_rel=0.16912062466144562, max_rel=2082.721435546875, norm_rel=0.025016240775585175, ref_abs_avg=36.86464309692383, test_abs_avg=36.86658477783203
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.9047542214393616, max_abs=7.0, mean_rel=0.1647670865058899, max_rel=1280.07861328125, norm_rel=0.024888772517442703, ref_abs_avg=36.529319763183594, test_abs_avg=36.528987884521484
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.7118818759918213, max_abs=2.25, mean_rel=0.2297213077545166, max_rel=28.494747161865234, norm_rel=0.02380591817200184, ref_abs_avg=28.74938201904297, test_abs_avg=28.792490005493164
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8550360202789307, max_abs=5.375, mean_rel=0.16443707048892975, max_rel=1037.3037109375, norm_rel=0.02479129284620285, ref_abs_avg=34.6072998046875, test_abs_avg=34.609619140625
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8409743309020996, max_abs=4.75, mean_rel=0.17204689979553223, max_rel=1733.517578125, norm_rel=0.024771468713879585, ref_abs_avg=34.08018493652344, test_abs_avg=34.081947326660156
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6849899291992188, max_abs=2.5, mean_rel=0.09961951524019241, max_rel=8.020153045654297, norm_rel=0.025707516819238663, ref_abs_avg=26.351638793945312, test_abs_avg=26.3707218170166
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8112920522689819, max_abs=5.5, mean_rel=0.16359518468379974, max_rel=1059.9619140625, norm_rel=0.02449464239180088, ref_abs_avg=33.20873260498047, test_abs_avg=33.209964752197266
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7968498468399048, max_abs=5.0, mean_rel=0.1484868824481964, max_rel=617.1142578125, norm_rel=0.024381717666983604, ref_abs_avg=32.743255615234375, test_abs_avg=32.74095153808594
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6490793228149414, max_abs=2.875, mean_rel=0.09278152883052826, max_rel=4.221629619598389, norm_rel=0.02559303492307663, ref_abs_avg=25.427825927734375, test_abs_avg=25.419645309448242
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7700605392456055, max_abs=5.5, mean_rel=0.16342279314994812, max_rel=1275.8436279296875, norm_rel=0.024191541597247124, ref_abs_avg=31.880401611328125, test_abs_avg=31.883060455322266
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.75656658411026, max_abs=5.5625, mean_rel=0.1707446426153183, max_rel=1833.885009765625, norm_rel=0.024251393973827362, ref_abs_avg=31.241113662719727, test_abs_avg=31.252765655517578
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.583277702331543, max_abs=2.5, mean_rel=0.09616367518901825, max_rel=2.9607162475585938, norm_rel=0.025652185082435608, ref_abs_avg=22.847549438476562, test_abs_avg=22.869659423828125
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7391319274902344, max_abs=5.25, mean_rel=0.16160418093204498, max_rel=1749.4027099609375, norm_rel=0.02395251952111721, ref_abs_avg=30.863243103027344, test_abs_avg=30.862167358398438
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7240340709686279, max_abs=5.0, mean_rel=0.1432601809501648, max_rel=464.04437255859375, norm_rel=0.02415410615503788, ref_abs_avg=30.067012786865234, test_abs_avg=30.06436538696289
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5473282337188721, max_abs=2.0390625, mean_rel=0.1713854968547821, max_rel=43.35483169555664, norm_rel=0.024285422638058662, ref_abs_avg=23.13817024230957, test_abs_avg=23.086984634399414
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.6962489485740662, max_abs=5.3125, mean_rel=0.1577298939228058, max_rel=937.7737426757812, norm_rel=0.023905742913484573, ref_abs_avg=29.204204559326172, test_abs_avg=29.204275131225586
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6876217126846313, max_abs=4.5, mean_rel=0.15640127658843994, max_rel=806.6766357421875, norm_rel=0.023773588240146637, ref_abs_avg=29.003843307495117, test_abs_avg=29.006628036499023
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6327664852142334, max_abs=2.84375, mean_rel=0.09559713304042816, max_rel=3.4380569458007812, norm_rel=0.026891710236668587, ref_abs_avg=23.922178268432617, test_abs_avg=23.912532806396484
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.7811424732208252, max_abs=5.5, mean_rel=0.17168495059013367, max_rel=1547.3004150390625, norm_rel=0.02566707693040371, ref_abs_avg=30.51802635192871, test_abs_avg=30.515459060668945
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7626357078552246, max_abs=6.0, mean_rel=0.15288399159908295, max_rel=1945.5538330078125, norm_rel=0.025433333590626717, ref_abs_avg=30.038925170898438, test_abs_avg=30.026243209838867
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5813632011413574, max_abs=2.625, mean_rel=0.08453626930713654, max_rel=2.824777126312256, norm_rel=0.02399369701743126, ref_abs_avg=24.265478134155273, test_abs_avg=24.21561050415039
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7201873660087585, max_abs=5.25, mean_rel=0.16889739036560059, max_rel=1495.7274169921875, norm_rel=0.025221562013030052, ref_abs_avg=28.59386444091797, test_abs_avg=28.592281341552734
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.7028765678405762, max_abs=4.75, mean_rel=0.17153611779212952, max_rel=1023.3237915039062, norm_rel=0.025271369144320488, ref_abs_avg=27.917760848999023, test_abs_avg=27.918867111206055
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5407621264457703, max_abs=2.5, mean_rel=0.1162588894367218, max_rel=21.649343490600586, norm_rel=0.024611325934529305, ref_abs_avg=22.843719482421875, test_abs_avg=22.82748794555664
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6678101420402527, max_abs=5.0, mean_rel=0.1665952205657959, max_rel=1636.2216796875, norm_rel=0.02460886538028717, ref_abs_avg=27.116634368896484, test_abs_avg=27.11492156982422
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6575154066085815, max_abs=4.0, mean_rel=0.15341946482658386, max_rel=935.0604858398438, norm_rel=0.024457531049847603, ref_abs_avg=26.94525146484375, test_abs_avg=26.93939971923828
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5069513320922852, max_abs=2.29296875, mean_rel=0.12099587172269821, max_rel=11.917329788208008, norm_rel=0.02636084333062172, ref_abs_avg=19.37409019470215, test_abs_avg=19.344196319580078
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6285373568534851, max_abs=4.0, mean_rel=0.1616775393486023, max_rel=1177.928466796875, norm_rel=0.02439984865486622, ref_abs_avg=25.806875228881836, test_abs_avg=25.80667495727539
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.611492395401001, max_abs=4.25, mean_rel=0.16237162053585052, max_rel=847.1456909179688, norm_rel=0.02382352575659752, ref_abs_avg=25.67983055114746, test_abs_avg=25.679603576660156
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.49228334426879883, max_abs=1.984375, mean_rel=0.17030754685401917, max_rel=21.507707595825195, norm_rel=0.025292430073022842, ref_abs_avg=19.432334899902344, test_abs_avg=19.37239646911621
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.5957309007644653, max_abs=4.0, mean_rel=0.16034425795078278, max_rel=772.8475952148438, norm_rel=0.023902885615825653, ref_abs_avg=24.935592651367188, test_abs_avg=24.935527801513672
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5807714462280273, max_abs=4.0, mean_rel=0.16018223762512207, max_rel=903.4387817382812, norm_rel=0.023622361943125725, ref_abs_avg=24.602170944213867, test_abs_avg=24.606006622314453
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4665610194206238, max_abs=2.0, mean_rel=0.17230546474456787, max_rel=35.521480560302734, norm_rel=0.023415306583046913, ref_abs_avg=20.219223022460938, test_abs_avg=20.224245071411133
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5582067966461182, max_abs=3.75, mean_rel=0.1511562466621399, max_rel=1049.5406494140625, norm_rel=0.023356880992650986, ref_abs_avg=23.908180236816406, test_abs_avg=23.90894317626953
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5410493016242981, max_abs=3.75, mean_rel=0.1458253562450409, max_rel=1225.402099609375, norm_rel=0.022757701575756073, ref_abs_avg=23.78460693359375, test_abs_avg=23.785404205322266
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.42390012741088867, max_abs=2.0, mean_rel=0.10620155930519104, max_rel=11.469222068786621, norm_rel=0.02267242968082428, ref_abs_avg=19.474403381347656, test_abs_avg=19.481613159179688
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.529059886932373, max_abs=3.75, mean_rel=0.14789117872714996, max_rel=798.618896484375, norm_rel=0.02286665514111519, ref_abs_avg=23.141183853149414, test_abs_avg=23.14208984375
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5221624374389648, max_abs=4.0, mean_rel=0.14247937500476837, max_rel=1148.052978515625, norm_rel=0.022820187732577324, ref_abs_avg=22.86351203918457, test_abs_avg=22.858858108520508
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.4135604798793793, max_abs=1.5, mean_rel=0.3215451240539551, max_rel=71.44497680664062, norm_rel=0.022351644933223724, ref_abs_avg=17.851730346679688, test_abs_avg=17.851985931396484
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5057151317596436, max_abs=4.25, mean_rel=0.14849793910980225, max_rel=966.4481201171875, norm_rel=0.02261391468346119, ref_abs_avg=22.36838722229004, test_abs_avg=22.369606018066406
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.48991602659225464, max_abs=3.625, mean_rel=0.14624333381652832, max_rel=735.5315551757812, norm_rel=0.022221941500902176, ref_abs_avg=22.01224136352539, test_abs_avg=22.012849807739258
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.46544885635375977, max_abs=2.0, mean_rel=0.10971501469612122, max_rel=9.124733924865723, norm_rel=0.024812711402773857, ref_abs_avg=18.95456314086914, test_abs_avg=18.949920654296875
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5591495037078857, max_abs=5.0, mean_rel=0.15360218286514282, max_rel=1096.818603515625, norm_rel=0.023662632331252098, ref_abs_avg=23.63210678100586, test_abs_avg=23.633785247802734
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5506902933120728, max_abs=4.0625, mean_rel=0.14693158864974976, max_rel=529.505126953125, norm_rel=0.02346368506550789, ref_abs_avg=23.500316619873047, test_abs_avg=23.504600524902344
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.401336669921875, max_abs=1.375, mean_rel=0.07974828034639359, max_rel=5.006665229797363, norm_rel=0.02090482786297798, ref_abs_avg=19.49745750427246, test_abs_avg=19.507055282592773
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5245023965835571, max_abs=5.0, mean_rel=0.1445680558681488, max_rel=1053.8597412109375, norm_rel=0.02304133214056492, ref_abs_avg=22.74782943725586, test_abs_avg=22.745132446289062
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.50694739818573, max_abs=4.03125, mean_rel=0.14495587348937988, max_rel=450.883056640625, norm_rel=0.022937148809432983, ref_abs_avg=22.233036041259766, test_abs_avg=22.234893798828125
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.4027118682861328, max_abs=1.8125, mean_rel=0.0863538309931755, max_rel=3.9024627208709717, norm_rel=0.023062020540237427, ref_abs_avg=17.48955726623535, test_abs_avg=17.51111602783203
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.48510199785232544, max_abs=4.25, mean_rel=0.14514270424842834, max_rel=1341.9061279296875, norm_rel=0.022486699745059013, ref_abs_avg=21.588407516479492, test_abs_avg=21.58794593811035
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4693548381328583, max_abs=4.5, mean_rel=0.14392036199569702, max_rel=989.8938598632812, norm_rel=0.02231415919959545, ref_abs_avg=21.04747200012207, test_abs_avg=21.049068450927734
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3580322265625, max_abs=1.34375, mean_rel=0.07107458263635635, max_rel=3.2107737064361572, norm_rel=0.020549260079860687, ref_abs_avg=17.789915084838867, test_abs_avg=17.775453567504883
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.44617998600006104, max_abs=5.0, mean_rel=0.1416770964860916, max_rel=786.1690063476562, norm_rel=0.021925969049334526, ref_abs_avg=20.368284225463867, test_abs_avg=20.366809844970703
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.43867313861846924, max_abs=4.25, mean_rel=0.13919112086296082, max_rel=785.5155639648438, norm_rel=0.021631410345435143, ref_abs_avg=20.355377197265625, test_abs_avg=20.357933044433594
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.33877426385879517, max_abs=1.5, mean_rel=0.19062566757202148, max_rel=49.21577835083008, norm_rel=0.020751953125, ref_abs_avg=16.751989364624023, test_abs_avg=16.72013282775879
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.42685139179229736, max_abs=4.0, mean_rel=0.1318153440952301, max_rel=762.826904296875, norm_rel=0.02153724618256092, ref_abs_avg=19.87241554260254, test_abs_avg=19.872325897216797
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.41157570481300354, max_abs=4.25, mean_rel=0.13157299160957336, max_rel=1044.5855712890625, norm_rel=0.02120511792600155, ref_abs_avg=19.42223358154297, test_abs_avg=19.42709732055664
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.332561731338501, max_abs=1.5, mean_rel=0.12899591028690338, max_rel=12.784658432006836, norm_rel=0.020889053121209145, ref_abs_avg=16.068771362304688, test_abs_avg=16.08417510986328
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.3998448848724365, max_abs=4.0, mean_rel=0.13060683012008667, max_rel=743.7888793945312, norm_rel=0.02076265774667263, ref_abs_avg=19.379024505615234, test_abs_avg=19.37887191772461
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.3884722590446472, max_abs=3.5, mean_rel=0.13231591880321503, max_rel=945.60888671875, norm_rel=0.020944586023688316, ref_abs_avg=18.738975524902344, test_abs_avg=18.72876739501953
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.32552504539489746, max_abs=1.15625, mean_rel=0.1286526322364807, max_rel=27.67963981628418, norm_rel=0.02120368555188179, ref_abs_avg=15.711809158325195, test_abs_avg=15.723800659179688
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.378286212682724, max_abs=4.0, mean_rel=0.12966883182525635, max_rel=786.31298828125, norm_rel=0.020590096712112427, ref_abs_avg=18.57373046875, test_abs_avg=18.571876525878906
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.36280620098114014, max_abs=4.375, mean_rel=0.13329225778579712, max_rel=641.06982421875, norm_rel=0.019831467419862747, ref_abs_avg=18.41680145263672, test_abs_avg=18.41921615600586
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.2765841484069824, max_abs=1.375, mean_rel=0.12716008722782135, max_rel=15.030441284179688, norm_rel=0.018260633572936058, ref_abs_avg=15.228529930114746, test_abs_avg=15.205109596252441
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.351135790348053, max_abs=3.5, mean_rel=0.12105734646320343, max_rel=783.737060546875, norm_rel=0.019800543785095215, ref_abs_avg=18.00403594970703, test_abs_avg=18.00575065612793
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.3455704152584076, max_abs=4.25, mean_rel=0.12223568558692932, max_rel=507.4629211425781, norm_rel=0.019603924825787544, ref_abs_avg=17.971115112304688, test_abs_avg=17.97127914428711
production_forward2 vs paper_forward output: mean_abs=0.0016438555903732777, max_abs=0.0390625
production_forward2 grad[0] vs paper_forward: mean_abs=0.008681712672114372, max_abs=0.4375, mean_rel=0.07541597634553909, max_rel=82.3691177368164, norm_rel=0.020633680745959282, ref_abs_avg=0.4534105062484741, test_abs_avg=0.45340433716773987
production_forward2 grad[1] vs paper_forward: mean_abs=7.46407413482666, max_abs=48.0, mean_rel=0.19053860008716583, max_rel=787.500732421875, norm_rel=0.020621756091713905, ref_abs_avg=316.8620910644531, test_abs_avg=316.8234558105469
production_forward2 grad[2] vs paper_forward: mean_abs=1.2588224411010742, max_abs=4.5625, mean_rel=0.09789802134037018, max_rel=10.2731294631958, norm_rel=0.0232496727257967, ref_abs_avg=53.99516677856445, test_abs_avg=54.04026794433594
production_forward2 grad[3] vs paper_forward: mean_abs=1.6007676124572754, max_abs=11.0, mean_rel=0.1624365746974945, max_rel=1454.97314453125, norm_rel=0.02445010282099247, ref_abs_avg=65.772705078125, test_abs_avg=65.77266693115234
production_forward2 grad[4] vs paper_forward: mean_abs=1.5690324306488037, max_abs=10.0, mean_rel=0.17197370529174805, max_rel=1855.9810791015625, norm_rel=0.024239294230937958, ref_abs_avg=65.07916259765625, test_abs_avg=65.09127807617188
production_forward2 grad[5] vs paper_forward: mean_abs=1.069509506225586, max_abs=4.25, mean_rel=0.1118713766336441, max_rel=9.126151084899902, norm_rel=0.023196624591946602, ref_abs_avg=46.73762512207031, test_abs_avg=46.798248291015625
production_forward2 grad[6] vs paper_forward: mean_abs=1.3858983516693115, max_abs=9.0, mean_rel=0.17234688997268677, max_rel=3455.54443359375, norm_rel=0.02415955439209938, ref_abs_avg=57.606529235839844, test_abs_avg=57.60588073730469
production_forward2 grad[7] vs paper_forward: mean_abs=1.3597955703735352, max_abs=8.4921875, mean_rel=0.15771213173866272, max_rel=803.4172973632812, norm_rel=0.024076106026768684, ref_abs_avg=56.751468658447266, test_abs_avg=56.75229263305664
production_forward2 grad[8] vs paper_forward: mean_abs=1.0652074813842773, max_abs=4.5, mean_rel=0.10222339630126953, max_rel=12.992486953735352, norm_rel=0.024067146703600883, ref_abs_avg=43.7595329284668, test_abs_avg=43.82754135131836
production_forward2 grad[9] vs paper_forward: mean_abs=1.2523584365844727, max_abs=8.0, mean_rel=0.1687341183423996, max_rel=1828.64453125, norm_rel=0.024016696959733963, ref_abs_avg=52.38103485107422, test_abs_avg=52.38081359863281
production_forward2 grad[10] vs paper_forward: mean_abs=1.213525652885437, max_abs=7.25, mean_rel=0.1817682683467865, max_rel=1266.0235595703125, norm_rel=0.023654252290725708, ref_abs_avg=51.592018127441406, test_abs_avg=51.60052490234375
production_forward2 grad[11] vs paper_forward: mean_abs=0.9407048225402832, max_abs=4.0, mean_rel=0.2441253960132599, max_rel=66.52519226074219, norm_rel=0.023881874978542328, ref_abs_avg=40.87166976928711, test_abs_avg=40.857261657714844
production_forward2 grad[12] vs paper_forward: mean_abs=1.1565043926239014, max_abs=7.0, mean_rel=0.15075565874576569, max_rel=994.765869140625, norm_rel=0.023732690140604973, ref_abs_avg=49.00951385498047, test_abs_avg=49.0117301940918
production_forward2 grad[13] vs paper_forward: mean_abs=1.131324052810669, max_abs=7.0, mean_rel=0.15730127692222595, max_rel=1171.578125, norm_rel=0.023606566712260246, ref_abs_avg=48.160552978515625, test_abs_avg=48.1590690612793
production_forward2 grad[14] vs paper_forward: mean_abs=0.8687238693237305, max_abs=4.0, mean_rel=0.10808130353689194, max_rel=8.266677856445312, norm_rel=0.022537406533956528, ref_abs_avg=38.350425720214844, test_abs_avg=38.31158447265625
production_forward2 grad[15] vs paper_forward: mean_abs=1.0880777835845947, max_abs=7.0, mean_rel=0.1494382917881012, max_rel=897.5690307617188, norm_rel=0.023553315550088882, ref_abs_avg=46.41209411621094, test_abs_avg=46.41356658935547
production_forward2 grad[16] vs paper_forward: mean_abs=1.051208257675171, max_abs=6.25, mean_rel=0.1618974357843399, max_rel=1411.97314453125, norm_rel=0.02319052256643772, ref_abs_avg=45.52634048461914, test_abs_avg=45.52690124511719
production_forward2 grad[17] vs paper_forward: mean_abs=0.8324575424194336, max_abs=3.0, mean_rel=0.0996280312538147, max_rel=9.68925952911377, norm_rel=0.02265443094074726, ref_abs_avg=36.803096771240234, test_abs_avg=36.83369445800781
production_forward2 grad[18] vs paper_forward: mean_abs=1.0210765600204468, max_abs=6.0, mean_rel=0.15747463703155518, max_rel=944.0186767578125, norm_rel=0.0233951173722744, ref_abs_avg=43.816856384277344, test_abs_avg=43.816734313964844
production_forward2 grad[19] vs paper_forward: mean_abs=0.9972863793373108, max_abs=6.0, mean_rel=0.15105147659778595, max_rel=762.1427612304688, norm_rel=0.02306034043431282, ref_abs_avg=43.46126174926758, test_abs_avg=43.463199615478516
production_forward2 grad[20] vs paper_forward: mean_abs=0.7451468706130981, max_abs=2.75, mean_rel=0.2984253764152527, max_rel=122.91606140136719, norm_rel=0.022277874872088432, ref_abs_avg=33.90388488769531, test_abs_avg=33.90565490722656
production_forward2 grad[21] vs paper_forward: mean_abs=0.9611718654632568, max_abs=7.0, mean_rel=0.1568019688129425, max_rel=2189.229248046875, norm_rel=0.023231005296111107, ref_abs_avg=41.60504150390625, test_abs_avg=41.606651306152344
production_forward2 grad[22] vs paper_forward: mean_abs=0.9418781995773315, max_abs=5.875, mean_rel=0.16662448644638062, max_rel=1594.5855712890625, norm_rel=0.02303430438041687, ref_abs_avg=41.09327697753906, test_abs_avg=41.08871078491211
production_forward2 grad[23] vs paper_forward: mean_abs=0.7283706665039062, max_abs=3.0, mean_rel=0.06917521357536316, max_rel=2.576190710067749, norm_rel=0.02179790288209915, ref_abs_avg=33.610252380371094, test_abs_avg=33.588134765625
production_forward2 grad[24] vs paper_forward: mean_abs=0.9165229201316833, max_abs=6.0, mean_rel=0.15555819869041443, max_rel=1236.5701904296875, norm_rel=0.023098589852452278, ref_abs_avg=39.85179901123047, test_abs_avg=39.85284423828125
production_forward2 grad[25] vs paper_forward: mean_abs=0.8969874382019043, max_abs=5.25, mean_rel=0.15641072392463684, max_rel=1380.6058349609375, norm_rel=0.022730862721800804, ref_abs_avg=39.645572662353516, test_abs_avg=39.64278793334961
production_forward2 grad[26] vs paper_forward: mean_abs=0.8856716156005859, max_abs=3.568359375, mean_rel=0.10985491424798965, max_rel=10.177593231201172, norm_rel=0.025013351812958717, ref_abs_avg=35.232234954833984, test_abs_avg=35.2314567565918
production_forward2 grad[27] vs paper_forward: mean_abs=1.0483026504516602, max_abs=6.75, mean_rel=0.1755572259426117, max_rel=1435.074462890625, norm_rel=0.024814926087856293, ref_abs_avg=42.418556213378906, test_abs_avg=42.419979095458984
production_forward2 grad[28] vs paper_forward: mean_abs=1.0327279567718506, max_abs=6.125, mean_rel=0.17432263493537903, max_rel=1327.7581787109375, norm_rel=0.024854673072695732, ref_abs_avg=41.757354736328125, test_abs_avg=41.766178131103516
production_forward2 grad[29] vs paper_forward: mean_abs=0.783104419708252, max_abs=3.0, mean_rel=0.1013813242316246, max_rel=3.4299182891845703, norm_rel=0.02544292062520981, ref_abs_avg=31.271564483642578, test_abs_avg=31.26246452331543
production_forward2 grad[30] vs paper_forward: mean_abs=0.9820823669433594, max_abs=6.5, mean_rel=0.16751211881637573, max_rel=1101.308349609375, norm_rel=0.025140469893813133, ref_abs_avg=39.1629638671875, test_abs_avg=39.161216735839844
production_forward2 grad[31] vs paper_forward: mean_abs=0.9651154279708862, max_abs=7.0, mean_rel=0.16326695680618286, max_rel=782.5602416992188, norm_rel=0.024981480091810226, ref_abs_avg=38.694114685058594, test_abs_avg=38.698184967041016
production_forward2 grad[32] vs paper_forward: mean_abs=0.7471394538879395, max_abs=3.0, mean_rel=0.21243548393249512, max_rel=68.79993438720703, norm_rel=0.024368466809391975, ref_abs_avg=30.195205688476562, test_abs_avg=30.234580993652344
production_forward2 grad[33] vs paper_forward: mean_abs=0.9184620380401611, max_abs=7.0, mean_rel=0.16549095511436462, max_rel=1761.767578125, norm_rel=0.024982744827866554, ref_abs_avg=36.86464309692383, test_abs_avg=36.866214752197266
production_forward2 grad[34] vs paper_forward: mean_abs=0.9026491045951843, max_abs=6.0, mean_rel=0.16219252347946167, max_rel=820.262451171875, norm_rel=0.02482711337506771, ref_abs_avg=36.529319763183594, test_abs_avg=36.526939392089844
production_forward2 grad[35] vs paper_forward: mean_abs=0.6829543113708496, max_abs=2.5, mean_rel=0.22439171373844147, max_rel=32.149383544921875, norm_rel=0.023456580936908722, ref_abs_avg=28.74938201904297, test_abs_avg=28.805280685424805
production_forward2 grad[36] vs paper_forward: mean_abs=0.8549602031707764, max_abs=5.5, mean_rel=0.16572271287441254, max_rel=1091.3797607421875, norm_rel=0.024778956547379494, ref_abs_avg=34.6072998046875, test_abs_avg=34.60903549194336
production_forward2 grad[37] vs paper_forward: mean_abs=0.8430811166763306, max_abs=5.25, mean_rel=0.17284294962882996, max_rel=1233.2216796875, norm_rel=0.024838022887706757, ref_abs_avg=34.08018493652344, test_abs_avg=34.08290100097656
production_forward2 grad[38] vs paper_forward: mean_abs=0.6863551139831543, max_abs=2.375, mean_rel=0.10613729059696198, max_rel=7.356536388397217, norm_rel=0.02553623728454113, ref_abs_avg=26.351638793945312, test_abs_avg=26.375049591064453
production_forward2 grad[39] vs paper_forward: mean_abs=0.810015082359314, max_abs=5.5, mean_rel=0.1686258167028427, max_rel=1747.1513671875, norm_rel=0.02446819096803665, ref_abs_avg=33.20873260498047, test_abs_avg=33.21070098876953
production_forward2 grad[40] vs paper_forward: mean_abs=0.7961045503616333, max_abs=4.75, mean_rel=0.1507657915353775, max_rel=621.2510375976562, norm_rel=0.024349085986614227, ref_abs_avg=32.743255615234375, test_abs_avg=32.737884521484375
production_forward2 grad[41] vs paper_forward: mean_abs=0.653728723526001, max_abs=2.5, mean_rel=0.09152796864509583, max_rel=4.7806572914123535, norm_rel=0.02574131451547146, ref_abs_avg=25.427825927734375, test_abs_avg=25.407848358154297
production_forward2 grad[42] vs paper_forward: mean_abs=0.7687426805496216, max_abs=5.0, mean_rel=0.16375738382339478, max_rel=1157.216796875, norm_rel=0.024155346676707268, ref_abs_avg=31.880401611328125, test_abs_avg=31.883522033691406
production_forward2 grad[43] vs paper_forward: mean_abs=0.7565680742263794, max_abs=5.75, mean_rel=0.1699477881193161, max_rel=1934.969482421875, norm_rel=0.02423691935837269, ref_abs_avg=31.241113662719727, test_abs_avg=31.249126434326172
production_forward2 grad[44] vs paper_forward: mean_abs=0.5975408554077148, max_abs=2.25, mean_rel=0.0975075215101242, max_rel=4.233861923217773, norm_rel=0.026117412373423576, ref_abs_avg=22.847549438476562, test_abs_avg=22.874073028564453
production_forward2 grad[45] vs paper_forward: mean_abs=0.7378636598587036, max_abs=5.0, mean_rel=0.15968775749206543, max_rel=1267.54541015625, norm_rel=0.02390984259545803, ref_abs_avg=30.863243103027344, test_abs_avg=30.861858367919922
production_forward2 grad[46] vs paper_forward: mean_abs=0.7242390513420105, max_abs=4.75, mean_rel=0.1402120739221573, max_rel=467.78936767578125, norm_rel=0.02415536902844906, ref_abs_avg=30.067012786865234, test_abs_avg=30.06265640258789
production_forward2 grad[47] vs paper_forward: mean_abs=0.5452390909194946, max_abs=1.96875, mean_rel=0.13977719843387604, max_rel=29.58574104309082, norm_rel=0.023852193728089333, ref_abs_avg=23.13817024230957, test_abs_avg=23.09198760986328
production_forward2 grad[48] vs paper_forward: mean_abs=0.6958798170089722, max_abs=5.3125, mean_rel=0.1581684648990631, max_rel=1323.4678955078125, norm_rel=0.023890875279903412, ref_abs_avg=29.204204559326172, test_abs_avg=29.20419692993164
production_forward2 grad[49] vs paper_forward: mean_abs=0.6865646243095398, max_abs=4.75, mean_rel=0.15761038661003113, max_rel=1110.1551513671875, norm_rel=0.02372516691684723, ref_abs_avg=29.003843307495117, test_abs_avg=29.004858016967773
production_forward2 grad[50] vs paper_forward: mean_abs=0.6327365040779114, max_abs=2.9375, mean_rel=0.10251203179359436, max_rel=4.1193766593933105, norm_rel=0.02664998359978199, ref_abs_avg=23.922178268432617, test_abs_avg=23.921506881713867
production_forward2 grad[51] vs paper_forward: mean_abs=0.7801716327667236, max_abs=5.3984375, mean_rel=0.1726585030555725, max_rel=1547.3004150390625, norm_rel=0.025631362572312355, ref_abs_avg=30.51802635192871, test_abs_avg=30.515094757080078
production_forward2 grad[52] vs paper_forward: mean_abs=0.7606333494186401, max_abs=5.0, mean_rel=0.15073531866073608, max_rel=1371.3052978515625, norm_rel=0.025361670181155205, ref_abs_avg=30.038925170898438, test_abs_avg=30.02703094482422
production_forward2 grad[53] vs paper_forward: mean_abs=0.5673007965087891, max_abs=2.25, mean_rel=0.07314354926347733, max_rel=1.8600670099258423, norm_rel=0.023876603692770004, ref_abs_avg=24.265478134155273, test_abs_avg=24.23396110534668
production_forward2 grad[54] vs paper_forward: mean_abs=0.7186112403869629, max_abs=5.5, mean_rel=0.16520312428474426, max_rel=1141.3885498046875, norm_rel=0.0251611415296793, ref_abs_avg=28.59386444091797, test_abs_avg=28.592327117919922
production_forward2 grad[55] vs paper_forward: mean_abs=0.7009870409965515, max_abs=4.7109375, mean_rel=0.17121519148349762, max_rel=847.1209716796875, norm_rel=0.025217927992343903, ref_abs_avg=27.917760848999023, test_abs_avg=27.921993255615234
production_forward2 grad[56] vs paper_forward: mean_abs=0.5390464067459106, max_abs=2.3125, mean_rel=0.10160261392593384, max_rel=10.429698944091797, norm_rel=0.02471267431974411, ref_abs_avg=22.843719482421875, test_abs_avg=22.832847595214844
production_forward2 grad[57] vs paper_forward: mean_abs=0.6663918495178223, max_abs=5.0, mean_rel=0.16693100333213806, max_rel=1164.1785888671875, norm_rel=0.024573914706707, ref_abs_avg=27.116634368896484, test_abs_avg=27.115142822265625
production_forward2 grad[58] vs paper_forward: mean_abs=0.6570327281951904, max_abs=4.256591796875, mean_rel=0.15620183944702148, max_rel=920.783447265625, norm_rel=0.024426592513918877, ref_abs_avg=26.94525146484375, test_abs_avg=26.93948745727539
production_forward2 grad[59] vs paper_forward: mean_abs=0.5206328630447388, max_abs=2.52734375, mean_rel=0.13774290680885315, max_rel=13.135455131530762, norm_rel=0.026848528534173965, ref_abs_avg=19.37409019470215, test_abs_avg=19.33019256591797
production_forward2 grad[60] vs paper_forward: mean_abs=0.6273103952407837, max_abs=4.25, mean_rel=0.16002200543880463, max_rel=1147.96630859375, norm_rel=0.024351127445697784, ref_abs_avg=25.806875228881836, test_abs_avg=25.8056583404541
production_forward2 grad[61] vs paper_forward: mean_abs=0.6127846240997314, max_abs=4.0, mean_rel=0.1660403609275818, max_rel=1158.944091796875, norm_rel=0.023877281695604324, ref_abs_avg=25.67983055114746, test_abs_avg=25.67879867553711
production_forward2 grad[62] vs paper_forward: mean_abs=0.483797550201416, max_abs=2.0546875, mean_rel=0.1730353981256485, max_rel=38.9100341796875, norm_rel=0.02514338679611683, ref_abs_avg=19.432334899902344, test_abs_avg=19.400650024414062
production_forward2 grad[63] vs paper_forward: mean_abs=0.5944684743881226, max_abs=4.25, mean_rel=0.157697856426239, max_rel=996.0812377929688, norm_rel=0.023852145299315453, ref_abs_avg=24.935592651367188, test_abs_avg=24.935285568237305
production_forward2 grad[64] vs paper_forward: mean_abs=0.5796438455581665, max_abs=4.0, mean_rel=0.1612367480993271, max_rel=698.542236328125, norm_rel=0.023580702021718025, ref_abs_avg=24.602170944213867, test_abs_avg=24.605907440185547
production_forward2 grad[65] vs paper_forward: mean_abs=0.45333701372146606, max_abs=1.75, mean_rel=0.13235385715961456, max_rel=24.757144927978516, norm_rel=0.022805867716670036, ref_abs_avg=20.219223022460938, test_abs_avg=20.200990676879883
production_forward2 grad[66] vs paper_forward: mean_abs=0.5578999519348145, max_abs=3.5, mean_rel=0.15253213047981262, max_rel=807.5069580078125, norm_rel=0.023324452340602875, ref_abs_avg=23.908180236816406, test_abs_avg=23.908512115478516
production_forward2 grad[67] vs paper_forward: mean_abs=0.5426074266433716, max_abs=3.75, mean_rel=0.14896178245544434, max_rel=1359.9022216796875, norm_rel=0.022837607190012932, ref_abs_avg=23.78460693359375, test_abs_avg=23.784053802490234
production_forward2 grad[68] vs paper_forward: mean_abs=0.437347412109375, max_abs=1.931640625, mean_rel=0.12029734253883362, max_rel=12.425702095031738, norm_rel=0.022872325032949448, ref_abs_avg=19.474403381347656, test_abs_avg=19.475147247314453
production_forward2 grad[69] vs paper_forward: mean_abs=0.5289828777313232, max_abs=3.625, mean_rel=0.14703303575515747, max_rel=1133.469970703125, norm_rel=0.022859059274196625, ref_abs_avg=23.141183853149414, test_abs_avg=23.141571044921875
production_forward2 grad[70] vs paper_forward: mean_abs=0.5201072096824646, max_abs=4.0, mean_rel=0.14100201427936554, max_rel=954.92041015625, norm_rel=0.02274751104414463, ref_abs_avg=22.86351203918457, test_abs_avg=22.859092712402344
production_forward2 grad[71] vs paper_forward: mean_abs=0.41589805483818054, max_abs=1.5625, mean_rel=0.3692753314971924, max_rel=82.7531967163086, norm_rel=0.02270704321563244, ref_abs_avg=17.851730346679688, test_abs_avg=17.853805541992188
production_forward2 grad[72] vs paper_forward: mean_abs=0.5055949687957764, max_abs=4.03125, mean_rel=0.14905160665512085, max_rel=779.3941040039062, norm_rel=0.022604648023843765, ref_abs_avg=22.36838722229004, test_abs_avg=22.369129180908203
production_forward2 grad[73] vs paper_forward: mean_abs=0.49018535017967224, max_abs=3.9375, mean_rel=0.14627447724342346, max_rel=687.8478393554688, norm_rel=0.02226869948208332, ref_abs_avg=22.01224136352539, test_abs_avg=22.009126663208008
production_forward2 grad[74] vs paper_forward: mean_abs=0.4609041213989258, max_abs=2.375, mean_rel=0.10935935378074646, max_rel=7.882992744445801, norm_rel=0.02530822530388832, ref_abs_avg=18.95456314086914, test_abs_avg=18.957927703857422
production_forward2 grad[75] vs paper_forward: mean_abs=0.5589946508407593, max_abs=4.5, mean_rel=0.15177924931049347, max_rel=1061.434326171875, norm_rel=0.023653781041502953, ref_abs_avg=23.63210678100586, test_abs_avg=23.632282257080078
production_forward2 grad[76] vs paper_forward: mean_abs=0.5519148707389832, max_abs=4.5, mean_rel=0.14723336696624756, max_rel=478.74029541015625, norm_rel=0.023512408137321472, ref_abs_avg=23.500316619873047, test_abs_avg=23.506418228149414
production_forward2 grad[77] vs paper_forward: mean_abs=0.40078437328338623, max_abs=1.75, mean_rel=0.09034864604473114, max_rel=7.612252235412598, norm_rel=0.021196426823735237, ref_abs_avg=19.49745750427246, test_abs_avg=19.511627197265625
production_forward2 grad[78] vs paper_forward: mean_abs=0.5246173143386841, max_abs=5.0, mean_rel=0.14220087230205536, max_rel=1201.5146484375, norm_rel=0.02306661382317543, ref_abs_avg=22.74782943725586, test_abs_avg=22.74496841430664
production_forward2 grad[79] vs paper_forward: mean_abs=0.5058400630950928, max_abs=4.0, mean_rel=0.1501455456018448, max_rel=554.5726928710938, norm_rel=0.022932440042495728, ref_abs_avg=22.233036041259766, test_abs_avg=22.230085372924805
production_forward2 grad[80] vs paper_forward: mean_abs=0.4016626477241516, max_abs=1.828125, mean_rel=0.08547919988632202, max_rel=5.041690826416016, norm_rel=0.023038852959871292, ref_abs_avg=17.48955726623535, test_abs_avg=17.521610260009766
production_forward2 grad[81] vs paper_forward: mean_abs=0.4845930337905884, max_abs=4.5, mean_rel=0.1487298607826233, max_rel=1347.6890869140625, norm_rel=0.0224520992487669, ref_abs_avg=21.588407516479492, test_abs_avg=21.58843994140625
production_forward2 grad[82] vs paper_forward: mean_abs=0.4714202284812927, max_abs=4.5, mean_rel=0.14728400111198425, max_rel=1220.09765625, norm_rel=0.02244986966252327, ref_abs_avg=21.04747200012207, test_abs_avg=21.0463924407959
production_forward2 grad[83] vs paper_forward: mean_abs=0.3597477674484253, max_abs=1.15625, mean_rel=0.06618072837591171, max_rel=2.164430618286133, norm_rel=0.020401854068040848, ref_abs_avg=17.789915084838867, test_abs_avg=17.76805877685547
production_forward2 grad[84] vs paper_forward: mean_abs=0.4457368850708008, max_abs=5.0, mean_rel=0.14254674315452576, max_rel=1127.3560791015625, norm_rel=0.02190321870148182, ref_abs_avg=20.368284225463867, test_abs_avg=20.36745834350586
production_forward2 grad[85] vs paper_forward: mean_abs=0.43953147530555725, max_abs=4.0, mean_rel=0.13778898119926453, max_rel=688.724853515625, norm_rel=0.021684151142835617, ref_abs_avg=20.355377197265625, test_abs_avg=20.357646942138672
production_forward2 grad[86] vs paper_forward: mean_abs=0.3443332314491272, max_abs=1.25, mean_rel=0.1913977712392807, max_rel=43.03923034667969, norm_rel=0.02074470743536949, ref_abs_avg=16.751989364624023, test_abs_avg=16.737417221069336
production_forward2 grad[87] vs paper_forward: mean_abs=0.42571866512298584, max_abs=4.0, mean_rel=0.13047802448272705, max_rel=720.6826782226562, norm_rel=0.021486716344952583, ref_abs_avg=19.87241554260254, test_abs_avg=19.87248992919922
production_forward2 grad[88] vs paper_forward: mean_abs=0.4070279002189636, max_abs=3.25, mean_rel=0.128236785531044, max_rel=902.4935302734375, norm_rel=0.020932327955961227, ref_abs_avg=19.42223358154297, test_abs_avg=19.427961349487305
production_forward2 grad[89] vs paper_forward: mean_abs=0.3200201988220215, max_abs=1.25, mean_rel=0.1164088323712349, max_rel=16.929683685302734, norm_rel=0.01990802213549614, ref_abs_avg=16.068771362304688, test_abs_avg=16.08905029296875
production_forward2 grad[90] vs paper_forward: mean_abs=0.39891189336776733, max_abs=4.25, mean_rel=0.13080641627311707, max_rel=725.4290771484375, norm_rel=0.020724229514598846, ref_abs_avg=19.379024505615234, test_abs_avg=19.378623962402344
production_forward2 grad[91] vs paper_forward: mean_abs=0.385730117559433, max_abs=3.75, mean_rel=0.12771154940128326, max_rel=757.2150268554688, norm_rel=0.02081763744354248, ref_abs_avg=18.738975524902344, test_abs_avg=18.727718353271484
production_forward2 grad[92] vs paper_forward: mean_abs=0.3244926929473877, max_abs=1.25, mean_rel=0.13940641283988953, max_rel=24.43627166748047, norm_rel=0.021286476403474808, ref_abs_avg=15.711809158325195, test_abs_avg=15.710552215576172
production_forward2 grad[93] vs paper_forward: mean_abs=0.37787842750549316, max_abs=3.5, mean_rel=0.12992142140865326, max_rel=873.677978515625, norm_rel=0.0205598883330822, ref_abs_avg=18.57373046875, test_abs_avg=18.57134437561035
production_forward2 grad[94] vs paper_forward: mean_abs=0.3613802492618561, max_abs=3.90625, mean_rel=0.1317794770002365, max_rel=717.7806396484375, norm_rel=0.01975361630320549, ref_abs_avg=18.41680145263672, test_abs_avg=18.42152976989746
production_forward2 grad[95] vs paper_forward: mean_abs=0.27881669998168945, max_abs=1.625, mean_rel=0.11499209702014923, max_rel=9.322678565979004, norm_rel=0.018402373418211937, ref_abs_avg=15.228529930114746, test_abs_avg=15.21572494506836
production_forward2 grad[96] vs paper_forward: mean_abs=0.3505590260028839, max_abs=3.5, mean_rel=0.12168572843074799, max_rel=654.312255859375, norm_rel=0.019775988534092903, ref_abs_avg=18.00403594970703, test_abs_avg=18.00510597229004
production_forward2 grad[97] vs paper_forward: mean_abs=0.34111499786376953, max_abs=4.5, mean_rel=0.1158377155661583, max_rel=463.2244873046875, norm_rel=0.019328203052282333, ref_abs_avg=17.971115112304688, test_abs_avg=17.967222213745117
identity layers + randn queries
mean abs randn paper: 0.216796875
paper_forward fwd+bwd:  379.960 ms
paper_forward fwd-only: 85.826 ms
paper_forward bwd-only: 294.145 ms
paper_forward peak allocated: fwd=29.830 GiB, fwd+bwd=31.949 GiB
paper_forward peak reserved:  fwd=29.850 GiB, fwd+bwd=32.600 GiB
mean abs difference randn: 3.647804260253906e-05
mean relative difference randn: 0.000637054443359375
production_forward2 fwd+bwd:  224.350 ms
production_forward2 fwd-only: 22.333 ms
production_forward2 bwd-only: 202.142 ms
production_forward2 peak allocated: fwd=55.349 GiB, fwd+bwd=58.728 GiB
production_forward2 peak reserved:  fwd=55.611 GiB, fwd+bwd=61.363 GiB
mean abs difference randn: 0.00165557861328125
mean relative difference randn: 0.029296875
production_forward fwd+bwd:  112.025 ms
production_forward fwd-only: 20.510 ms
production_forward bwd-only: 91.668 ms
production_forward peak allocated: fwd=3.442 GiB, fwd+bwd=7.322 GiB
production_forward peak reserved:  fwd=5.061 GiB, fwd+bwd=8.061 GiB
mean abs difference randn: 0.00165557861328125
mean relative difference randn: 0.029296875
torch_compile_phases_forward fwd+bwd:  189.957 ms
torch_compile_phases_forward fwd-only: 36.446 ms
torch_compile_phases_forward bwd-only: 152.611 ms
torch_compile_phases_forward peak allocated: fwd=13.781 GiB, fwd+bwd=14.409 GiB
torch_compile_phases_forward peak reserved:  fwd=14.453 GiB, fwd+bwd=18.705 GiB
mean abs difference randn: 0.0016632080078125
mean relative difference randn: 0.0294189453125

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.001656435546465218, max_abs=0.03515625
production_forward grad[0] vs paper_forward: mean_abs=0.008766619488596916, max_abs=0.359375, mean_rel=0.07509181648492813, max_rel=146.0196533203125, norm_rel=0.02049698680639267, ref_abs_avg=0.46179795265197754, test_abs_avg=0.46179887652397156
production_forward grad[1] vs paper_forward: mean_abs=7.541618347167969, max_abs=56.0, mean_rel=0.16736863553524017, max_rel=196.0317840576172, norm_rel=0.020769281312823296, ref_abs_avg=322.6158752441406, test_abs_avg=322.5877685546875
production_forward grad[2] vs paper_forward: mean_abs=1.3132696151733398, max_abs=6.5, mean_rel=0.17005562782287598, max_rel=29.250656127929688, norm_rel=0.024588795378804207, ref_abs_avg=55.42802047729492, test_abs_avg=55.3482780456543
production_forward grad[3] vs paper_forward: mean_abs=1.6159980297088623, max_abs=11.0, mean_rel=0.18096262216567993, max_rel=3018.01708984375, norm_rel=0.02422279119491577, ref_abs_avg=67.03909301757812, test_abs_avg=67.04411315917969
production_forward grad[4] vs paper_forward: mean_abs=1.5753518342971802, max_abs=10.0, mean_rel=0.16418470442295074, max_rel=1005.169677734375, norm_rel=0.024029720574617386, ref_abs_avg=65.90721130371094, test_abs_avg=65.90469360351562
production_forward grad[5] vs paper_forward: mean_abs=1.1901540756225586, max_abs=4.5, mean_rel=0.09627442061901093, max_rel=6.1428422927856445, norm_rel=0.02352185733616352, ref_abs_avg=49.96830749511719, test_abs_avg=50.066776275634766
production_forward grad[6] vs paper_forward: mean_abs=1.417792797088623, max_abs=10.25, mean_rel=0.15701952576637268, max_rel=1374.23828125, norm_rel=0.023906923830509186, ref_abs_avg=59.58451843261719, test_abs_avg=59.587371826171875
production_forward grad[7] vs paper_forward: mean_abs=1.3878774642944336, max_abs=9.0, mean_rel=0.1585463583469391, max_rel=503.7158203125, norm_rel=0.023786041885614395, ref_abs_avg=58.67168426513672, test_abs_avg=58.6705322265625
production_forward grad[8] vs paper_forward: mean_abs=1.0349493026733398, max_abs=5.0, mean_rel=0.22002574801445007, max_rel=32.88441848754883, norm_rel=0.023916853591799736, ref_abs_avg=41.42108154296875, test_abs_avg=41.470481872558594
production_forward grad[9] vs paper_forward: mean_abs=1.279581069946289, max_abs=8.0, mean_rel=0.16528217494487762, max_rel=1451.975341796875, norm_rel=0.0237700417637825, ref_abs_avg=54.108741760253906, test_abs_avg=54.11620330810547
production_forward grad[10] vs paper_forward: mean_abs=1.2559798955917358, max_abs=7.5, mean_rel=0.15576393902301788, max_rel=1192.8248291015625, norm_rel=0.023571578785777092, ref_abs_avg=53.53357696533203, test_abs_avg=53.547760009765625
production_forward grad[11] vs paper_forward: mean_abs=0.9704008102416992, max_abs=4.875, mean_rel=0.10722281783819199, max_rel=5.988156318664551, norm_rel=0.023373819887638092, ref_abs_avg=40.50605010986328, test_abs_avg=40.50851058959961
production_forward grad[12] vs paper_forward: mean_abs=1.1809748411178589, max_abs=8.5, mean_rel=0.15452569723129272, max_rel=1663.381591796875, norm_rel=0.023437395691871643, ref_abs_avg=50.60166931152344, test_abs_avg=50.6021728515625
production_forward grad[13] vs paper_forward: mean_abs=1.1473374366760254, max_abs=7.0, mean_rel=0.161492258310318, max_rel=2642.48095703125, norm_rel=0.023093923926353455, ref_abs_avg=49.9005126953125, test_abs_avg=49.90232467651367
production_forward grad[14] vs paper_forward: mean_abs=0.9094564914703369, max_abs=4.1875, mean_rel=0.1482466757297516, max_rel=20.46766471862793, norm_rel=0.02388136275112629, ref_abs_avg=37.63853454589844, test_abs_avg=37.699928283691406
production_forward grad[15] vs paper_forward: mean_abs=1.1021370887756348, max_abs=6.5, mean_rel=0.17277543246746063, max_rel=2674.087158203125, norm_rel=0.023307550698518753, ref_abs_avg=47.55043029785156, test_abs_avg=47.5506706237793
production_forward grad[16] vs paper_forward: mean_abs=1.0718754529953003, max_abs=6.10595703125, mean_rel=0.14729568362236023, max_rel=1332.0926513671875, norm_rel=0.023134984076023102, ref_abs_avg=46.61479949951172, test_abs_avg=46.613460540771484
production_forward grad[17] vs paper_forward: mean_abs=0.834764838218689, max_abs=3.71875, mean_rel=0.19043564796447754, max_rel=42.88237380981445, norm_rel=0.022245870903134346, ref_abs_avg=37.68003845214844, test_abs_avg=37.688350677490234
production_forward grad[18] vs paper_forward: mean_abs=1.0338894128799438, max_abs=6.5, mean_rel=0.15427391231060028, max_rel=1592.978759765625, norm_rel=0.023134564980864525, ref_abs_avg=44.9437255859375, test_abs_avg=44.94546127319336
production_forward grad[19] vs paper_forward: mean_abs=1.010792851448059, max_abs=6.0, mean_rel=0.15532007813453674, max_rel=1153.1331787109375, norm_rel=0.022928040474653244, ref_abs_avg=44.35914993286133, test_abs_avg=44.366050720214844
production_forward grad[20] vs paper_forward: mean_abs=0.7909536361694336, max_abs=3.125, mean_rel=0.09572990238666534, max_rel=9.015830993652344, norm_rel=0.023883670568466187, ref_abs_avg=34.179813385009766, test_abs_avg=34.184852600097656
production_forward grad[21] vs paper_forward: mean_abs=0.9701224565505981, max_abs=6.0, mean_rel=0.1649436354637146, max_rel=1353.21630859375, norm_rel=0.023041436448693275, ref_abs_avg=42.34089279174805, test_abs_avg=42.34196472167969
production_forward grad[22] vs paper_forward: mean_abs=0.950322151184082, max_abs=6.25, mean_rel=0.15383680164813995, max_rel=1377.9429931640625, norm_rel=0.022884562611579895, ref_abs_avg=41.78714370727539, test_abs_avg=41.788414001464844
production_forward grad[23] vs paper_forward: mean_abs=0.796684741973877, max_abs=3.0, mean_rel=0.10665536671876907, max_rel=15.003925323486328, norm_rel=0.02292640693485737, ref_abs_avg=34.986515045166016, test_abs_avg=34.97389221191406
production_forward grad[24] vs paper_forward: mean_abs=0.931485652923584, max_abs=6.25, mean_rel=0.15400414168834686, max_rel=2021.8919677734375, norm_rel=0.022738534957170486, ref_abs_avg=41.15446472167969, test_abs_avg=41.151512145996094
production_forward grad[25] vs paper_forward: mean_abs=0.9075039625167847, max_abs=5.25, mean_rel=0.18119943141937256, max_rel=2122.30126953125, norm_rel=0.022664597257971764, ref_abs_avg=40.21721649169922, test_abs_avg=40.22042465209961
production_forward grad[26] vs paper_forward: mean_abs=0.9147510528564453, max_abs=4.0, mean_rel=0.09401050209999084, max_rel=8.553215026855469, norm_rel=0.026668615639209747, ref_abs_avg=35.37638854980469, test_abs_avg=35.48035430908203
production_forward grad[27] vs paper_forward: mean_abs=1.0941706895828247, max_abs=7.0, mean_rel=0.18670600652694702, max_rel=2314.18505859375, norm_rel=0.024933427572250366, ref_abs_avg=44.080909729003906, test_abs_avg=44.08184814453125
production_forward grad[28] vs paper_forward: mean_abs=1.0718159675598145, max_abs=8.0, mean_rel=0.17522016167640686, max_rel=1508.6636962890625, norm_rel=0.024858832359313965, ref_abs_avg=43.307594299316406, test_abs_avg=43.306678771972656
production_forward grad[29] vs paper_forward: mean_abs=0.8407237529754639, max_abs=3.25, mean_rel=0.14011618494987488, max_rel=24.150836944580078, norm_rel=0.026020005345344543, ref_abs_avg=31.892459869384766, test_abs_avg=31.945871353149414
production_forward grad[30] vs paper_forward: mean_abs=0.9990115761756897, max_abs=7.0, mean_rel=0.17565324902534485, max_rel=1891.6826171875, norm_rel=0.02518557943403721, ref_abs_avg=39.83612823486328, test_abs_avg=39.83374786376953
production_forward grad[31] vs paper_forward: mean_abs=0.9768403768539429, max_abs=7.5, mean_rel=0.18605485558509827, max_rel=1050.862548828125, norm_rel=0.02483302354812622, ref_abs_avg=39.48025894165039, test_abs_avg=39.46694564819336
production_forward grad[32] vs paper_forward: mean_abs=0.7582244873046875, max_abs=3.015625, mean_rel=0.08027759194374084, max_rel=2.587059497833252, norm_rel=0.025579286739230156, ref_abs_avg=29.38788604736328, test_abs_avg=29.39977264404297
production_forward grad[33] vs paper_forward: mean_abs=0.9188641309738159, max_abs=6.5, mean_rel=0.17360574007034302, max_rel=1355.6746826171875, norm_rel=0.025005733594298363, ref_abs_avg=36.878517150878906, test_abs_avg=36.87763977050781
production_forward grad[34] vs paper_forward: mean_abs=0.9159027338027954, max_abs=5.5, mean_rel=0.16444145143032074, max_rel=928.9007568359375, norm_rel=0.024884631857275963, ref_abs_avg=36.962501525878906, test_abs_avg=36.965309143066406
production_forward grad[35] vs paper_forward: mean_abs=0.698115348815918, max_abs=2.625, mean_rel=0.13262221217155457, max_rel=15.563626289367676, norm_rel=0.023270465433597565, ref_abs_avg=29.88469696044922, test_abs_avg=29.854145050048828
production_forward grad[36] vs paper_forward: mean_abs=0.8681442737579346, max_abs=5.5, mean_rel=0.16780096292495728, max_rel=2119.1259765625, norm_rel=0.024477409198880196, ref_abs_avg=35.57512664794922, test_abs_avg=35.571353912353516
production_forward grad[37] vs paper_forward: mean_abs=0.8605866432189941, max_abs=5.75, mean_rel=0.17428366839885712, max_rel=1794.0224609375, norm_rel=0.024570809677243233, ref_abs_avg=35.16339111328125, test_abs_avg=35.165138244628906
production_forward grad[38] vs paper_forward: mean_abs=0.6460466384887695, max_abs=2.5, mean_rel=0.14626213908195496, max_rel=18.718791961669922, norm_rel=0.02350449562072754, ref_abs_avg=27.215896606445312, test_abs_avg=27.20701789855957
production_forward grad[39] vs paper_forward: mean_abs=0.8248687982559204, max_abs=5.5, mean_rel=0.17078569531440735, max_rel=1403.943359375, norm_rel=0.024402840062975883, ref_abs_avg=33.91107940673828, test_abs_avg=33.911190032958984
production_forward grad[40] vs paper_forward: mean_abs=0.8066900372505188, max_abs=5.0, mean_rel=0.16923066973686218, max_rel=1130.75927734375, norm_rel=0.024147307500243187, ref_abs_avg=33.53181838989258, test_abs_avg=33.52789306640625
production_forward grad[41] vs paper_forward: mean_abs=0.60268235206604, max_abs=2.7109375, mean_rel=0.3397125005722046, max_rel=131.58689880371094, norm_rel=0.02312241680920124, ref_abs_avg=26.89397430419922, test_abs_avg=26.85370635986328
production_forward grad[42] vs paper_forward: mean_abs=0.7751505970954895, max_abs=5.171875, mean_rel=0.17360159754753113, max_rel=1553.4776611328125, norm_rel=0.023968541994690895, ref_abs_avg=32.37318420410156, test_abs_avg=32.37373352050781
production_forward grad[43] vs paper_forward: mean_abs=0.7596390247344971, max_abs=5.5, mean_rel=0.16255803406238556, max_rel=1060.671142578125, norm_rel=0.023771649226546288, ref_abs_avg=31.971927642822266, test_abs_avg=31.969364166259766
production_forward grad[44] vs paper_forward: mean_abs=0.5791447162628174, max_abs=2.875, mean_rel=0.10211212933063507, max_rel=11.505637168884277, norm_rel=0.02264036238193512, ref_abs_avg=25.65668487548828, test_abs_avg=25.641881942749023
production_forward grad[45] vs paper_forward: mean_abs=0.7401366233825684, max_abs=5.0, mean_rel=0.17046460509300232, max_rel=2200.492431640625, norm_rel=0.023711755871772766, ref_abs_avg=31.268787384033203, test_abs_avg=31.268070220947266
production_forward grad[46] vs paper_forward: mean_abs=0.7265003323554993, max_abs=5.0, mean_rel=0.16579744219779968, max_rel=1350.0308837890625, norm_rel=0.023564079776406288, ref_abs_avg=30.898422241210938, test_abs_avg=30.902917861938477
production_forward grad[47] vs paper_forward: mean_abs=0.550801694393158, max_abs=2.25, mean_rel=0.25572919845581055, max_rel=82.90214538574219, norm_rel=0.0224936343729496, ref_abs_avg=24.9323673248291, test_abs_avg=24.884654998779297
production_forward grad[48] vs paper_forward: mean_abs=0.7047370672225952, max_abs=5.0, mean_rel=0.15317612886428833, max_rel=922.7888793945312, norm_rel=0.023350892588496208, ref_abs_avg=30.20827293395996, test_abs_avg=30.208677291870117
production_forward grad[49] vs paper_forward: mean_abs=0.6939269304275513, max_abs=4.8125, mean_rel=0.15072166919708252, max_rel=538.0653686523438, norm_rel=0.023497572168707848, ref_abs_avg=29.59029769897461, test_abs_avg=29.58919334411621
production_forward grad[50] vs paper_forward: mean_abs=0.636530876159668, max_abs=2.875, mean_rel=0.1387467086315155, max_rel=24.49537467956543, norm_rel=0.025937777012586594, ref_abs_avg=24.952110290527344, test_abs_avg=24.96184539794922
production_forward grad[51] vs paper_forward: mean_abs=0.8005472421646118, max_abs=5.25, mean_rel=0.1650850921869278, max_rel=1188.587646484375, norm_rel=0.0254213847219944, ref_abs_avg=31.56496810913086, test_abs_avg=31.565319061279297
production_forward grad[52] vs paper_forward: mean_abs=0.7807434797286987, max_abs=5.25, mean_rel=0.20100155472755432, max_rel=2061.2998046875, norm_rel=0.02539658732712269, ref_abs_avg=30.90643882751465, test_abs_avg=30.911746978759766
production_forward grad[53] vs paper_forward: mean_abs=0.609573245048523, max_abs=2.4375, mean_rel=0.10435421019792557, max_rel=8.781907081604004, norm_rel=0.024119727313518524, ref_abs_avg=24.975601196289062, test_abs_avg=25.016857147216797
production_forward grad[54] vs paper_forward: mean_abs=0.7181857228279114, max_abs=4.6875, mean_rel=0.1671743243932724, max_rel=1299.1817626953125, norm_rel=0.024680910632014275, ref_abs_avg=29.089229583740234, test_abs_avg=29.090679168701172
production_forward grad[55] vs paper_forward: mean_abs=0.7033591270446777, max_abs=5.5, mean_rel=0.17685070633888245, max_rel=2047.2490234375, norm_rel=0.024728218093514442, ref_abs_avg=28.50194549560547, test_abs_avg=28.49941635131836
production_forward grad[56] vs paper_forward: mean_abs=0.549144983291626, max_abs=2.21875, mean_rel=0.19600093364715576, max_rel=31.848920822143555, norm_rel=0.024100035429000854, ref_abs_avg=22.963457107543945, test_abs_avg=22.94822120666504
production_forward grad[57] vs paper_forward: mean_abs=0.6714603304862976, max_abs=4.5, mean_rel=0.15984991192817688, max_rel=989.0477294921875, norm_rel=0.02408756874501705, ref_abs_avg=27.861852645874023, test_abs_avg=27.862937927246094
production_forward grad[58] vs paper_forward: mean_abs=0.6524665355682373, max_abs=5.0, mean_rel=0.1487116515636444, max_rel=777.9663696289062, norm_rel=0.023918086662888527, ref_abs_avg=27.31515884399414, test_abs_avg=27.314308166503906
production_forward grad[59] vs paper_forward: mean_abs=0.49249768257141113, max_abs=1.875, mean_rel=0.09666868299245834, max_rel=7.133633136749268, norm_rel=0.02310030348598957, ref_abs_avg=21.2255916595459, test_abs_avg=21.22384262084961
production_forward grad[60] vs paper_forward: mean_abs=0.6286062002182007, max_abs=4.296875, mean_rel=0.16019368171691895, max_rel=922.188720703125, norm_rel=0.023932961747050285, ref_abs_avg=26.24979019165039, test_abs_avg=26.249710083007812
production_forward grad[61] vs paper_forward: mean_abs=0.6138253808021545, max_abs=5.0, mean_rel=0.16795039176940918, max_rel=1077.138427734375, norm_rel=0.023333173245191574, ref_abs_avg=26.292442321777344, test_abs_avg=26.290864944458008
production_forward grad[62] vs paper_forward: mean_abs=0.47077059745788574, max_abs=1.953125, mean_rel=0.16168230772018433, max_rel=28.228195190429688, norm_rel=0.023602349683642387, ref_abs_avg=20.75735092163086, test_abs_avg=20.78348159790039
production_forward grad[63] vs paper_forward: mean_abs=0.5892536640167236, max_abs=5.0, mean_rel=0.1592307984828949, max_rel=958.842041015625, norm_rel=0.023364219814538956, ref_abs_avg=25.191635131835938, test_abs_avg=25.191726684570312
production_forward grad[64] vs paper_forward: mean_abs=0.5795659422874451, max_abs=4.5, mean_rel=0.14647850394248962, max_rel=667.8187866210938, norm_rel=0.02333616092801094, ref_abs_avg=24.87888526916504, test_abs_avg=24.867809295654297
production_forward grad[65] vs paper_forward: mean_abs=0.441774845123291, max_abs=2.375, mean_rel=0.10196900367736816, max_rel=5.899691581726074, norm_rel=0.023015771061182022, ref_abs_avg=19.222089767456055, test_abs_avg=19.228374481201172
production_forward grad[66] vs paper_forward: mean_abs=0.5599398612976074, max_abs=4.265625, mean_rel=0.14580872654914856, max_rel=553.0821533203125, norm_rel=0.0230946596711874, ref_abs_avg=24.244924545288086, test_abs_avg=24.24245262145996
production_forward grad[67] vs paper_forward: mean_abs=0.5493358969688416, max_abs=5.0, mean_rel=0.1529843807220459, max_rel=598.8043823242188, norm_rel=0.023019500076770782, ref_abs_avg=23.908363342285156, test_abs_avg=23.89972496032715
production_forward grad[68] vs paper_forward: mean_abs=0.4357919692993164, max_abs=1.75, mean_rel=0.05923297628760338, max_rel=1.552717924118042, norm_rel=0.02157299593091011, ref_abs_avg=20.41046142578125, test_abs_avg=20.455089569091797
production_forward grad[69] vs paper_forward: mean_abs=0.5355095863342285, max_abs=5.0, mean_rel=0.15342596173286438, max_rel=880.2237548828125, norm_rel=0.02282477356493473, ref_abs_avg=23.468610763549805, test_abs_avg=23.468090057373047
production_forward grad[70] vs paper_forward: mean_abs=0.522459864616394, max_abs=4.5, mean_rel=0.15182504057884216, max_rel=651.3082885742188, norm_rel=0.022633066400885582, ref_abs_avg=23.067672729492188, test_abs_avg=23.05970001220703
production_forward grad[71] vs paper_forward: mean_abs=0.42899131774902344, max_abs=1.75, mean_rel=0.07030946016311646, max_rel=3.5249009132385254, norm_rel=0.023048322647809982, ref_abs_avg=18.965551376342773, test_abs_avg=18.94503402709961
production_forward grad[72] vs paper_forward: mean_abs=0.5067758560180664, max_abs=4.0, mean_rel=0.14291971921920776, max_rel=1058.799072265625, norm_rel=0.022302687168121338, ref_abs_avg=22.70415496826172, test_abs_avg=22.702905654907227
production_forward grad[73] vs paper_forward: mean_abs=0.49490734934806824, max_abs=3.59375, mean_rel=0.13451677560806274, max_rel=669.5357666015625, norm_rel=0.022244734689593315, ref_abs_avg=22.264915466308594, test_abs_avg=22.26923370361328
production_forward grad[74] vs paper_forward: mean_abs=0.44257497787475586, max_abs=1.75, mean_rel=0.07345804572105408, max_rel=2.4967634677886963, norm_rel=0.0238149706274271, ref_abs_avg=18.89409637451172, test_abs_avg=18.900415420532227
production_forward grad[75] vs paper_forward: mean_abs=0.5549705624580383, max_abs=4.5, mean_rel=0.15718507766723633, max_rel=965.381103515625, norm_rel=0.023875268176198006, ref_abs_avg=23.234760284423828, test_abs_avg=23.23485565185547
production_forward grad[76] vs paper_forward: mean_abs=0.5431972742080688, max_abs=4.0, mean_rel=0.14446786046028137, max_rel=916.4304809570312, norm_rel=0.023599078878760338, ref_abs_avg=23.00525665283203, test_abs_avg=23.011707305908203
production_forward grad[77] vs paper_forward: mean_abs=0.444674015045166, max_abs=1.75, mean_rel=0.09661619365215302, max_rel=13.790474891662598, norm_rel=0.02330566942691803, ref_abs_avg=19.245468139648438, test_abs_avg=19.264957427978516
production_forward grad[78] vs paper_forward: mean_abs=0.5258604288101196, max_abs=4.5, mean_rel=0.15043523907661438, max_rel=1493.939208984375, norm_rel=0.02333109639585018, ref_abs_avg=22.523651123046875, test_abs_avg=22.52276611328125
production_forward grad[79] vs paper_forward: mean_abs=0.5106149315834045, max_abs=4.0, mean_rel=0.1630687564611435, max_rel=1988.2998046875, norm_rel=0.023073669523000717, ref_abs_avg=22.084484100341797, test_abs_avg=22.08112335205078
production_forward grad[80] vs paper_forward: mean_abs=0.3965587019920349, max_abs=1.875, mean_rel=0.14276933670043945, max_rel=18.59913444519043, norm_rel=0.02217290550470352, ref_abs_avg=17.633087158203125, test_abs_avg=17.626140594482422
production_forward grad[81] vs paper_forward: mean_abs=0.48822277784347534, max_abs=4.5, mean_rel=0.1489093154668808, max_rel=1051.9383544921875, norm_rel=0.02278553694486618, ref_abs_avg=21.408138275146484, test_abs_avg=21.40907859802246
production_forward grad[82] vs paper_forward: mean_abs=0.47311335802078247, max_abs=4.0, mean_rel=0.14728900790214539, max_rel=1046.068603515625, norm_rel=0.022987207397818565, ref_abs_avg=20.699317932128906, test_abs_avg=20.69815444946289
production_forward grad[83] vs paper_forward: mean_abs=0.36212652921676636, max_abs=1.46875, mean_rel=0.6576038002967834, max_rel=290.82684326171875, norm_rel=0.021892575547099113, ref_abs_avg=16.433135986328125, test_abs_avg=16.42238426208496
production_forward grad[84] vs paper_forward: mean_abs=0.45531323552131653, max_abs=4.5, mean_rel=0.14736297726631165, max_rel=829.3242797851562, norm_rel=0.02217791974544525, ref_abs_avg=20.576242446899414, test_abs_avg=20.576061248779297
production_forward grad[85] vs paper_forward: mean_abs=0.4325432777404785, max_abs=3.75, mean_rel=0.13667820394039154, max_rel=562.6572265625, norm_rel=0.02163873426616192, ref_abs_avg=20.051090240478516, test_abs_avg=20.04547882080078
production_forward grad[86] vs paper_forward: mean_abs=0.34038376808166504, max_abs=1.375, mean_rel=0.1382315456867218, max_rel=18.291385650634766, norm_rel=0.02145872265100479, ref_abs_avg=16.240827560424805, test_abs_avg=16.215797424316406
production_forward grad[87] vs paper_forward: mean_abs=0.41989666223526, max_abs=5.0, mean_rel=0.13803236186504364, max_rel=751.4496459960938, norm_rel=0.02154696173965931, ref_abs_avg=19.532272338867188, test_abs_avg=19.53079605102539
production_forward grad[88] vs paper_forward: mean_abs=0.4064665138721466, max_abs=3.5, mean_rel=0.13423758745193481, max_rel=612.958251953125, norm_rel=0.021173741668462753, ref_abs_avg=19.261489868164062, test_abs_avg=19.269405364990234
production_forward grad[89] vs paper_forward: mean_abs=0.31465619802474976, max_abs=1.125, mean_rel=0.11413410305976868, max_rel=20.863073348999023, norm_rel=0.020865701138973236, ref_abs_avg=15.075325965881348, test_abs_avg=15.078287124633789
production_forward grad[90] vs paper_forward: mean_abs=0.38481026887893677, max_abs=4.0, mean_rel=0.13015437126159668, max_rel=613.380615234375, norm_rel=0.020789148285984993, ref_abs_avg=18.62749671936035, test_abs_avg=18.626468658447266
production_forward grad[91] vs paper_forward: mean_abs=0.3901175856590271, max_abs=4.0, mean_rel=0.1314821094274521, max_rel=1130.8067626953125, norm_rel=0.020817382261157036, ref_abs_avg=18.947162628173828, test_abs_avg=18.952953338623047
production_forward grad[92] vs paper_forward: mean_abs=0.3478403091430664, max_abs=1.3125, mean_rel=0.16120360791683197, max_rel=35.3707160949707, norm_rel=0.022885316982865334, ref_abs_avg=15.16098403930664, test_abs_avg=15.173409461975098
production_forward grad[93] vs paper_forward: mean_abs=0.3779502809047699, max_abs=4.75, mean_rel=0.1284690648317337, max_rel=884.1799926757812, norm_rel=0.020517844706773758, ref_abs_avg=18.61284637451172, test_abs_avg=18.611305236816406
production_forward grad[94] vs paper_forward: mean_abs=0.3599070608615875, max_abs=3.875, mean_rel=0.12646308541297913, max_rel=564.6521606445312, norm_rel=0.020389074459671974, ref_abs_avg=17.86054039001465, test_abs_avg=17.864337921142578
production_forward grad[95] vs paper_forward: mean_abs=0.3036472201347351, max_abs=1.25, mean_rel=0.0887424498796463, max_rel=6.4361186027526855, norm_rel=0.02020084112882614, ref_abs_avg=15.50328254699707, test_abs_avg=15.496624946594238
production_forward grad[96] vs paper_forward: mean_abs=0.3487859070301056, max_abs=3.5, mean_rel=0.12425471842288971, max_rel=807.1906127929688, norm_rel=0.019748158752918243, ref_abs_avg=17.92497444152832, test_abs_avg=17.924070358276367
production_forward grad[97] vs paper_forward: mean_abs=0.35044848918914795, max_abs=3.25, mean_rel=0.11464925855398178, max_rel=394.41497802734375, norm_rel=0.01988852396607399, ref_abs_avg=17.953140258789062, test_abs_avg=17.94904899597168
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016602998366579413, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.00878569670021534, max_abs=0.359375, mean_rel=0.07511980086565018, max_rel=121.446533203125, norm_rel=0.02052956447005272, ref_abs_avg=0.46179795265197754, test_abs_avg=0.4617875814437866
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.465423583984375, max_abs=48.0, mean_rel=0.1707155704498291, max_rel=297.9353942871094, norm_rel=0.020507873967289925, ref_abs_avg=322.6158752441406, test_abs_avg=322.62445068359375
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.3511743545532227, max_abs=6.0, mean_rel=0.20967090129852295, max_rel=56.85801696777344, norm_rel=0.02480834722518921, ref_abs_avg=55.42802047729492, test_abs_avg=55.30979919433594
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.6154533624649048, max_abs=11.0, mean_rel=0.18312087655067444, max_rel=2209.341552734375, norm_rel=0.02421646937727928, ref_abs_avg=67.03909301757812, test_abs_avg=67.04078674316406
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5665100812911987, max_abs=10.0, mean_rel=0.17050987482070923, max_rel=1138.609375, norm_rel=0.02388543263077736, ref_abs_avg=65.90721130371094, test_abs_avg=65.90692138671875
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.192824363708496, max_abs=4.5625, mean_rel=0.14244195818901062, max_rel=25.27659034729004, norm_rel=0.023837769404053688, ref_abs_avg=49.96830749511719, test_abs_avg=50.087608337402344
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.4205443859100342, max_abs=9.5, mean_rel=0.15978217124938965, max_rel=2241.902587890625, norm_rel=0.02396504022181034, ref_abs_avg=59.58451843261719, test_abs_avg=59.586143493652344
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3858237266540527, max_abs=9.0, mean_rel=0.1625472605228424, max_rel=720.5462646484375, norm_rel=0.023740040138363838, ref_abs_avg=58.67168426513672, test_abs_avg=58.674468994140625
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0694169998168945, max_abs=5.25, mean_rel=0.24986132979393005, max_rel=31.316160202026367, norm_rel=0.025355398654937744, ref_abs_avg=41.42108154296875, test_abs_avg=41.42173767089844
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.2834302186965942, max_abs=8.25, mean_rel=0.16253432631492615, max_rel=1415.69189453125, norm_rel=0.02382686920464039, ref_abs_avg=54.108741760253906, test_abs_avg=54.11341094970703
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.2581284046173096, max_abs=7.25, mean_rel=0.16183751821517944, max_rel=1412.75, norm_rel=0.023612910881638527, ref_abs_avg=53.53357696533203, test_abs_avg=53.542659759521484
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9732046127319336, max_abs=4.125, mean_rel=0.11654676496982574, max_rel=7.404088497161865, norm_rel=0.023475296795368195, ref_abs_avg=40.50605010986328, test_abs_avg=40.52882385253906
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.1847411394119263, max_abs=7.71875, mean_rel=0.16247765719890594, max_rel=1484.4453125, norm_rel=0.02350243367254734, ref_abs_avg=50.60166931152344, test_abs_avg=50.600929260253906
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1555100679397583, max_abs=7.0, mean_rel=0.15586017072200775, max_rel=2928.12890625, norm_rel=0.02325213886797428, ref_abs_avg=49.9005126953125, test_abs_avg=49.8980598449707
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.9045751094818115, max_abs=3.5625, mean_rel=0.13140669465065002, max_rel=10.012064933776855, norm_rel=0.024038689211010933, ref_abs_avg=37.63853454589844, test_abs_avg=37.69664764404297
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1081953048706055, max_abs=7.0, mean_rel=0.16949786245822906, max_rel=3611.4375, norm_rel=0.02341259829699993, ref_abs_avg=47.55043029785156, test_abs_avg=47.55194854736328
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0759015083312988, max_abs=6.625, mean_rel=0.15020182728767395, max_rel=1203.9239501953125, norm_rel=0.023219160735607147, ref_abs_avg=46.61479949951172, test_abs_avg=46.61492156982422
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8200277090072632, max_abs=4.0, mean_rel=0.11919975280761719, max_rel=9.985160827636719, norm_rel=0.022177238017320633, ref_abs_avg=37.68003845214844, test_abs_avg=37.64683532714844
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0367125272750854, max_abs=7.0, mean_rel=0.1587986946105957, max_rel=1479.2313232421875, norm_rel=0.023199448361992836, ref_abs_avg=44.9437255859375, test_abs_avg=44.94287872314453
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=1.0162042379379272, max_abs=6.5, mean_rel=0.15857848525047302, max_rel=1667.4609375, norm_rel=0.023040907457470894, ref_abs_avg=44.35914993286133, test_abs_avg=44.36493682861328
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7874002456665039, max_abs=3.0, mean_rel=0.09603177011013031, max_rel=11.560051918029785, norm_rel=0.023752696812152863, ref_abs_avg=34.179813385009766, test_abs_avg=34.21506118774414
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9755420684814453, max_abs=6.375, mean_rel=0.1710009127855301, max_rel=1624.6641845703125, norm_rel=0.023153582587838173, ref_abs_avg=42.34089279174805, test_abs_avg=42.34265899658203
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9542405605316162, max_abs=6.125, mean_rel=0.1567869782447815, max_rel=1673.71533203125, norm_rel=0.02295628935098648, ref_abs_avg=41.78714370727539, test_abs_avg=41.78363037109375
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7932348251342773, max_abs=3.0, mean_rel=0.105008065700531, max_rel=12.550299644470215, norm_rel=0.022873345762491226, ref_abs_avg=34.986515045166016, test_abs_avg=34.96085739135742
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9354616403579712, max_abs=6.5, mean_rel=0.1552504152059555, max_rel=1901.75830078125, norm_rel=0.022835586220026016, ref_abs_avg=41.15446472167969, test_abs_avg=41.151649475097656
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.909334123134613, max_abs=5.5, mean_rel=0.1744806170463562, max_rel=2375.0, norm_rel=0.02272803522646427, ref_abs_avg=40.21721649169922, test_abs_avg=40.22007369995117
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.9050083160400391, max_abs=4.0, mean_rel=0.07779032737016678, max_rel=4.639657497406006, norm_rel=0.026418844237923622, ref_abs_avg=35.37638854980469, test_abs_avg=35.46505355834961
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.0936024188995361, max_abs=6.875, mean_rel=0.19058340787887573, max_rel=3421.0625, norm_rel=0.024908240884542465, ref_abs_avg=44.080909729003906, test_abs_avg=44.07976531982422
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0654232501983643, max_abs=6.5, mean_rel=0.1704801321029663, max_rel=882.4887084960938, norm_rel=0.024703675881028175, ref_abs_avg=43.307594299316406, test_abs_avg=43.30767822265625
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.8102307319641113, max_abs=4.0, mean_rel=0.14770543575286865, max_rel=26.63619041442871, norm_rel=0.025747573003172874, ref_abs_avg=31.892459869384766, test_abs_avg=31.903108596801758
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.0013024806976318, max_abs=7.28125, mean_rel=0.18012496829032898, max_rel=1467.053466796875, norm_rel=0.025230826810002327, ref_abs_avg=39.83612823486328, test_abs_avg=39.83323669433594
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9770870208740234, max_abs=6.5, mean_rel=0.1934019774198532, max_rel=1682.672119140625, norm_rel=0.024822182953357697, ref_abs_avg=39.48025894165039, test_abs_avg=39.46621322631836
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7887916564941406, max_abs=3.0, mean_rel=0.07655707746744156, max_rel=1.3428518772125244, norm_rel=0.02646656148135662, ref_abs_avg=29.38788604736328, test_abs_avg=29.367664337158203
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9206603765487671, max_abs=6.0, mean_rel=0.1715240776538849, max_rel=1321.8643798828125, norm_rel=0.02506098337471485, ref_abs_avg=36.878517150878906, test_abs_avg=36.87532043457031
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.9156557321548462, max_abs=5.625, mean_rel=0.16497665643692017, max_rel=706.0619506835938, norm_rel=0.024866696447134018, ref_abs_avg=36.962501525878906, test_abs_avg=36.964019775390625
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.701624870300293, max_abs=2.625, mean_rel=0.11893238127231598, max_rel=12.811437606811523, norm_rel=0.023026078939437866, ref_abs_avg=29.88469696044922, test_abs_avg=29.834606170654297
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8718903064727783, max_abs=5.859375, mean_rel=0.17089375853538513, max_rel=1926.4735107421875, norm_rel=0.024566199630498886, ref_abs_avg=35.57512664794922, test_abs_avg=35.571475982666016
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8613458275794983, max_abs=5.25, mean_rel=0.17145827412605286, max_rel=1721.770751953125, norm_rel=0.024597635492682457, ref_abs_avg=35.16339111328125, test_abs_avg=35.163787841796875
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6943860054016113, max_abs=2.375, mean_rel=0.14074666798114777, max_rel=13.769828796386719, norm_rel=0.02511279284954071, ref_abs_avg=27.215896606445312, test_abs_avg=27.21141242980957
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8287520408630371, max_abs=5.125, mean_rel=0.16784371435642242, max_rel=2043.413330078125, norm_rel=0.02451818250119686, ref_abs_avg=33.91107940673828, test_abs_avg=33.909912109375
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.8066491484642029, max_abs=5.125, mean_rel=0.1694691777229309, max_rel=1374.1053466796875, norm_rel=0.024175355210900307, ref_abs_avg=33.53181838989258, test_abs_avg=33.53114318847656
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6161501407623291, max_abs=2.25, mean_rel=0.4132613241672516, max_rel=154.7921142578125, norm_rel=0.02281855046749115, ref_abs_avg=26.89397430419922, test_abs_avg=26.896455764770508
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7764265537261963, max_abs=5.0, mean_rel=0.17343752086162567, max_rel=1868.487548828125, norm_rel=0.024015579372644424, ref_abs_avg=32.37318420410156, test_abs_avg=32.37174606323242
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7611837387084961, max_abs=5.0, mean_rel=0.1663045734167099, max_rel=1037.0433349609375, norm_rel=0.023822098970413208, ref_abs_avg=31.971927642822266, test_abs_avg=31.966758728027344
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5974161624908447, max_abs=2.875, mean_rel=0.1641722470521927, max_rel=41.73356628417969, norm_rel=0.02305646426975727, ref_abs_avg=25.65668487548828, test_abs_avg=25.652751922607422
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7427530288696289, max_abs=4.78125, mean_rel=0.17139190435409546, max_rel=2056.6591796875, norm_rel=0.023796064779162407, ref_abs_avg=31.268787384033203, test_abs_avg=31.267292022705078
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7264099717140198, max_abs=5.0, mean_rel=0.16071882843971252, max_rel=1326.5628662109375, norm_rel=0.023557009175419807, ref_abs_avg=30.898422241210938, test_abs_avg=30.90087890625
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5495068430900574, max_abs=2.5, mean_rel=0.36412954330444336, max_rel=141.9005889892578, norm_rel=0.02222169190645218, ref_abs_avg=24.9323673248291, test_abs_avg=24.879072189331055
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.7065194845199585, max_abs=5.0, mean_rel=0.15747381746768951, max_rel=1640.5352783203125, norm_rel=0.023399149999022484, ref_abs_avg=30.20827293395996, test_abs_avg=30.20897102355957
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6947920322418213, max_abs=5.0, mean_rel=0.1474105268716812, max_rel=500.2645263671875, norm_rel=0.023521143943071365, ref_abs_avg=29.59029769897461, test_abs_avg=29.588966369628906
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6122646331787109, max_abs=2.5, mean_rel=0.1428288072347641, max_rel=28.91084861755371, norm_rel=0.025265583768486977, ref_abs_avg=24.952110290527344, test_abs_avg=24.96256446838379
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.7992360591888428, max_abs=5.5, mean_rel=0.1639809012413025, max_rel=1161.48046875, norm_rel=0.02536364272236824, ref_abs_avg=31.56496810913086, test_abs_avg=31.56487274169922
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7806021571159363, max_abs=5.5, mean_rel=0.19364067912101746, max_rel=2034.3544921875, norm_rel=0.02537233754992485, ref_abs_avg=30.90643882751465, test_abs_avg=30.909507751464844
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.6044416427612305, max_abs=2.75, mean_rel=0.08526893705129623, max_rel=4.216564655303955, norm_rel=0.024672869592905045, ref_abs_avg=24.975601196289062, test_abs_avg=25.019512176513672
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7188464403152466, max_abs=4.75, mean_rel=0.16692140698432922, max_rel=1389.5404052734375, norm_rel=0.024696124717593193, ref_abs_avg=29.089229583740234, test_abs_avg=29.09003257751465
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.7033118009567261, max_abs=5.0, mean_rel=0.17388996481895447, max_rel=1712.5074462890625, norm_rel=0.024741770699620247, ref_abs_avg=28.50194549560547, test_abs_avg=28.50133514404297
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.558790922164917, max_abs=1.84375, mean_rel=0.207318514585495, max_rel=33.72891616821289, norm_rel=0.02437298744916916, ref_abs_avg=22.963457107543945, test_abs_avg=22.949787139892578
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6723161339759827, max_abs=5.0, mean_rel=0.16185131669044495, max_rel=867.7034912109375, norm_rel=0.02412283606827259, ref_abs_avg=27.861852645874023, test_abs_avg=27.86182403564453
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6505044102668762, max_abs=4.5, mean_rel=0.1481146514415741, max_rel=707.663818359375, norm_rel=0.02384064346551895, ref_abs_avg=27.31515884399414, test_abs_avg=27.312519073486328
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.47372984886169434, max_abs=2.0625, mean_rel=0.0978596955537796, max_rel=5.467382907867432, norm_rel=0.02266225777566433, ref_abs_avg=21.2255916595459, test_abs_avg=21.215478897094727
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6306372880935669, max_abs=4.21875, mean_rel=0.1562071144580841, max_rel=931.51220703125, norm_rel=0.024000054225325584, ref_abs_avg=26.24979019165039, test_abs_avg=26.2476806640625
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.6158424615859985, max_abs=4.4375, mean_rel=0.16339142620563507, max_rel=869.4322509765625, norm_rel=0.02339894138276577, ref_abs_avg=26.292442321777344, test_abs_avg=26.287654876708984
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.48146820068359375, max_abs=2.0, mean_rel=0.16291314363479614, max_rel=27.191619873046875, norm_rel=0.02364380471408367, ref_abs_avg=20.75735092163086, test_abs_avg=20.776657104492188
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.5897902250289917, max_abs=4.75, mean_rel=0.15843836963176727, max_rel=986.2216796875, norm_rel=0.023392360657453537, ref_abs_avg=25.191635131835938, test_abs_avg=25.19084930419922
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5815513134002686, max_abs=4.21875, mean_rel=0.14848200976848602, max_rel=693.35791015625, norm_rel=0.023400478065013885, ref_abs_avg=24.87888526916504, test_abs_avg=24.86894989013672
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.44111621379852295, max_abs=2.0, mean_rel=0.0920553207397461, max_rel=5.327268123626709, norm_rel=0.023056261241436005, ref_abs_avg=19.222089767456055, test_abs_avg=19.235427856445312
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.56172114610672, max_abs=4.4375, mean_rel=0.14892955124378204, max_rel=1242.968994140625, norm_rel=0.023161225020885468, ref_abs_avg=24.244924545288086, test_abs_avg=24.242095947265625
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5518737435340881, max_abs=4.140625, mean_rel=0.155913308262825, max_rel=771.3080444335938, norm_rel=0.023112084716558456, ref_abs_avg=23.908363342285156, test_abs_avg=23.900373458862305
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.42639923095703125, max_abs=1.625, mean_rel=0.05640169978141785, max_rel=1.5128095149993896, norm_rel=0.021224265918135643, ref_abs_avg=20.41046142578125, test_abs_avg=20.432022094726562
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5367096662521362, max_abs=5.0, mean_rel=0.1511150598526001, max_rel=1155.1473388671875, norm_rel=0.022859446704387665, ref_abs_avg=23.468610763549805, test_abs_avg=23.467531204223633
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5243935585021973, max_abs=3.9375, mean_rel=0.15063053369522095, max_rel=615.9161376953125, norm_rel=0.022713614627718925, ref_abs_avg=23.067672729492188, test_abs_avg=23.06024932861328
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.42614126205444336, max_abs=1.6875, mean_rel=0.07698233425617218, max_rel=4.524925708770752, norm_rel=0.02286199852824211, ref_abs_avg=18.965551376342773, test_abs_avg=18.966087341308594
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5082801580429077, max_abs=3.5, mean_rel=0.14689311385154724, max_rel=1415.92236328125, norm_rel=0.022370269522070885, ref_abs_avg=22.70415496826172, test_abs_avg=22.702917098999023
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.4943962097167969, max_abs=4.25, mean_rel=0.13584502041339874, max_rel=681.7681274414062, norm_rel=0.022226402536034584, ref_abs_avg=22.264915466308594, test_abs_avg=22.264108657836914
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4649531841278076, max_abs=2.25, mean_rel=0.0832185447216034, max_rel=5.201590538024902, norm_rel=0.025142161175608635, ref_abs_avg=18.89409637451172, test_abs_avg=18.883563995361328
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5513052940368652, max_abs=4.25, mean_rel=0.15484608709812164, max_rel=839.2532348632812, norm_rel=0.02373337931931019, ref_abs_avg=23.234760284423828, test_abs_avg=23.234939575195312
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5401948690414429, max_abs=3.75, mean_rel=0.14416751265525818, max_rel=800.710205078125, norm_rel=0.02349221333861351, ref_abs_avg=23.00525665283203, test_abs_avg=23.01076889038086
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.43167805671691895, max_abs=1.625, mean_rel=0.10457497090101242, max_rel=15.191150665283203, norm_rel=0.022703353315591812, ref_abs_avg=19.245468139648438, test_abs_avg=19.264663696289062
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5250300168991089, max_abs=4.296875, mean_rel=0.15689948201179504, max_rel=1476.2913818359375, norm_rel=0.023308683186769485, ref_abs_avg=22.523651123046875, test_abs_avg=22.523130416870117
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.5132584571838379, max_abs=3.5, mean_rel=0.16187457740306854, max_rel=2045.1109619140625, norm_rel=0.023153439164161682, ref_abs_avg=22.084484100341797, test_abs_avg=22.078521728515625
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.3868703842163086, max_abs=1.5, mean_rel=0.10991445183753967, max_rel=10.15432357788086, norm_rel=0.021756010130047798, ref_abs_avg=17.633087158203125, test_abs_avg=17.62580108642578
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.4881046414375305, max_abs=4.25, mean_rel=0.14782753586769104, max_rel=920.4317626953125, norm_rel=0.022800110280513763, ref_abs_avg=21.408138275146484, test_abs_avg=21.408794403076172
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.47372668981552124, max_abs=3.875, mean_rel=0.14857375621795654, max_rel=952.6657104492188, norm_rel=0.022985156625509262, ref_abs_avg=20.699317932128906, test_abs_avg=20.700040817260742
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3849472403526306, max_abs=1.375, mean_rel=0.42040589451789856, max_rel=164.18968200683594, norm_rel=0.022901343181729317, ref_abs_avg=16.433135986328125, test_abs_avg=16.437511444091797
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.4557732343673706, max_abs=4.0, mean_rel=0.15170159935951233, max_rel=1020.8355102539062, norm_rel=0.022215235978364944, ref_abs_avg=20.576242446899414, test_abs_avg=20.57586097717285
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.4350714087486267, max_abs=3.75, mean_rel=0.13815459609031677, max_rel=467.55377197265625, norm_rel=0.02176312729716301, ref_abs_avg=20.051090240478516, test_abs_avg=20.041982650756836
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.33384108543395996, max_abs=1.375, mean_rel=0.14608117938041687, max_rel=18.460201263427734, norm_rel=0.021041493862867355, ref_abs_avg=16.240827560424805, test_abs_avg=16.203880310058594
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.42108798027038574, max_abs=4.25, mean_rel=0.14028090238571167, max_rel=677.251708984375, norm_rel=0.021624214947223663, ref_abs_avg=19.532272338867188, test_abs_avg=19.530696868896484
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.4056098163127899, max_abs=3.5, mean_rel=0.13196666538715363, max_rel=439.9364318847656, norm_rel=0.021164201200008392, ref_abs_avg=19.261489868164062, test_abs_avg=19.2665958404541
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.31106722354888916, max_abs=1.25, mean_rel=0.10186693072319031, max_rel=11.021929740905762, norm_rel=0.020865648984909058, ref_abs_avg=15.075325965881348, test_abs_avg=15.076882362365723
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.385699987411499, max_abs=3.5, mean_rel=0.13171209394931793, max_rel=911.8831787109375, norm_rel=0.02084335871040821, ref_abs_avg=18.62749671936035, test_abs_avg=18.626157760620117
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.38421422243118286, max_abs=4.0, mean_rel=0.1293371319770813, max_rel=800.3587036132812, norm_rel=0.020468080416321754, ref_abs_avg=18.947162628173828, test_abs_avg=18.955821990966797
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.3390064239501953, max_abs=1.3125, mean_rel=0.17205867171287537, max_rel=39.82871627807617, norm_rel=0.022354543209075928, ref_abs_avg=15.16098403930664, test_abs_avg=15.17802619934082
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.378508597612381, max_abs=4.875, mean_rel=0.1281656175851822, max_rel=995.81640625, norm_rel=0.02055736444890499, ref_abs_avg=18.61284637451172, test_abs_avg=18.611129760742188
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.36383795738220215, max_abs=3.75, mean_rel=0.12923571467399597, max_rel=595.416015625, norm_rel=0.020574050024151802, ref_abs_avg=17.86054039001465, test_abs_avg=17.866836547851562
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.3137573003768921, max_abs=1.1875, mean_rel=0.0993601456284523, max_rel=10.380106925964355, norm_rel=0.02053828537464142, ref_abs_avg=15.50328254699707, test_abs_avg=15.493200302124023
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.34941279888153076, max_abs=5.0, mean_rel=0.1245025247335434, max_rel=721.40478515625, norm_rel=0.019785592332482338, ref_abs_avg=17.92497444152832, test_abs_avg=17.923988342285156
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.34578633308410645, max_abs=3.5, mean_rel=0.1134827584028244, max_rel=357.1706848144531, norm_rel=0.019638460129499435, ref_abs_avg=17.953140258789062, test_abs_avg=17.947826385498047
production_forward2 vs paper_forward output: mean_abs=0.001656435546465218, max_abs=0.03515625
production_forward2 grad[0] vs paper_forward: mean_abs=0.008775422349572182, max_abs=0.359375, mean_rel=0.0750124454498291, max_rel=157.96319580078125, norm_rel=0.020512601360678673, ref_abs_avg=0.46179795265197754, test_abs_avg=0.46178776025772095
production_forward2 grad[1] vs paper_forward: mean_abs=7.429131984710693, max_abs=56.0, mean_rel=0.15590283274650574, max_rel=279.8192138671875, norm_rel=0.020423633977770805, ref_abs_avg=322.6158752441406, test_abs_avg=322.6128234863281
production_forward2 grad[2] vs paper_forward: mean_abs=1.2950716018676758, max_abs=6.5, mean_rel=0.14732560515403748, max_rel=30.784399032592773, norm_rel=0.024045143276453018, ref_abs_avg=55.42802047729492, test_abs_avg=55.30434799194336
production_forward2 grad[3] vs paper_forward: mean_abs=1.6123871803283691, max_abs=11.0390625, mean_rel=0.1810983419418335, max_rel=3841.133544921875, norm_rel=0.024195585399866104, ref_abs_avg=67.03909301757812, test_abs_avg=67.04054260253906
production_forward2 grad[4] vs paper_forward: mean_abs=1.5686089992523193, max_abs=9.6875, mean_rel=0.16199907660484314, max_rel=1402.9600830078125, norm_rel=0.023924071341753006, ref_abs_avg=65.90721130371094, test_abs_avg=65.9022445678711
production_forward2 grad[5] vs paper_forward: mean_abs=1.197504997253418, max_abs=4.0, mean_rel=0.11345475912094116, max_rel=11.266610145568848, norm_rel=0.023897835984826088, ref_abs_avg=49.96830749511719, test_abs_avg=50.0290641784668
production_forward2 grad[6] vs paper_forward: mean_abs=1.4183955192565918, max_abs=9.0, mean_rel=0.15785187482833862, max_rel=1939.6824951171875, norm_rel=0.023936478421092033, ref_abs_avg=59.58451843261719, test_abs_avg=59.586517333984375
production_forward2 grad[7] vs paper_forward: mean_abs=1.3917796611785889, max_abs=9.0, mean_rel=0.16300657391548157, max_rel=609.7156372070312, norm_rel=0.023841898888349533, ref_abs_avg=58.67168426513672, test_abs_avg=58.66948699951172
production_forward2 grad[8] vs paper_forward: mean_abs=1.0676298141479492, max_abs=5.5, mean_rel=0.21616721153259277, max_rel=22.377086639404297, norm_rel=0.02511708438396454, ref_abs_avg=41.42108154296875, test_abs_avg=41.42753982543945
production_forward2 grad[9] vs paper_forward: mean_abs=1.2829959392547607, max_abs=8.0, mean_rel=0.16411754488945007, max_rel=1266.472412109375, norm_rel=0.023821931332349777, ref_abs_avg=54.108741760253906, test_abs_avg=54.11311721801758
production_forward2 grad[10] vs paper_forward: mean_abs=1.2558788061141968, max_abs=8.0, mean_rel=0.15980057418346405, max_rel=1740.435791015625, norm_rel=0.02357197366654873, ref_abs_avg=53.53357696533203, test_abs_avg=53.542972564697266
production_forward2 grad[11] vs paper_forward: mean_abs=0.9471054077148438, max_abs=4.5, mean_rel=0.11123037338256836, max_rel=6.869014263153076, norm_rel=0.023031434044241905, ref_abs_avg=40.50605010986328, test_abs_avg=40.55303955078125
production_forward2 grad[12] vs paper_forward: mean_abs=1.1838940382003784, max_abs=7.0, mean_rel=0.15649405121803284, max_rel=1494.970947265625, norm_rel=0.02349024824798107, ref_abs_avg=50.60166931152344, test_abs_avg=50.601524353027344
production_forward2 grad[13] vs paper_forward: mean_abs=1.1511338949203491, max_abs=7.0, mean_rel=0.15480145812034607, max_rel=2511.55908203125, norm_rel=0.023161254823207855, ref_abs_avg=49.9005126953125, test_abs_avg=49.90153503417969
production_forward2 grad[14] vs paper_forward: mean_abs=0.9043688774108887, max_abs=4.5, mean_rel=0.13892580568790436, max_rel=19.237592697143555, norm_rel=0.023677585646510124, ref_abs_avg=37.63853454589844, test_abs_avg=37.70915222167969
production_forward2 grad[15] vs paper_forward: mean_abs=1.106191635131836, max_abs=7.0, mean_rel=0.16848978400230408, max_rel=3156.546875, norm_rel=0.0233834907412529, ref_abs_avg=47.55043029785156, test_abs_avg=47.55045700073242
production_forward2 grad[16] vs paper_forward: mean_abs=1.0742195844650269, max_abs=6.349609375, mean_rel=0.14865906536579132, max_rel=1252.58642578125, norm_rel=0.02317146211862564, ref_abs_avg=46.61479949951172, test_abs_avg=46.61466598510742
production_forward2 grad[17] vs paper_forward: mean_abs=0.8484855890274048, max_abs=4.03125, mean_rel=0.10961771011352539, max_rel=9.235067367553711, norm_rel=0.022991767153143883, ref_abs_avg=37.68003845214844, test_abs_avg=37.69695281982422
production_forward2 grad[18] vs paper_forward: mean_abs=1.0361120700836182, max_abs=7.0, mean_rel=0.1555647850036621, max_rel=1464.0650634765625, norm_rel=0.02319004386663437, ref_abs_avg=44.9437255859375, test_abs_avg=44.94268035888672
production_forward2 grad[19] vs paper_forward: mean_abs=1.0138955116271973, max_abs=6.75, mean_rel=0.15633244812488556, max_rel=1261.5179443359375, norm_rel=0.022986171767115593, ref_abs_avg=44.35914993286133, test_abs_avg=44.366371154785156
production_forward2 grad[20] vs paper_forward: mean_abs=0.80340576171875, max_abs=3.5, mean_rel=0.09703667461872101, max_rel=10.10621166229248, norm_rel=0.024096015840768814, ref_abs_avg=34.179813385009766, test_abs_avg=34.22218704223633
production_forward2 grad[21] vs paper_forward: mean_abs=0.9741590023040771, max_abs=6.375, mean_rel=0.1646624207496643, max_rel=1312.4119873046875, norm_rel=0.023125125095248222, ref_abs_avg=42.34089279174805, test_abs_avg=42.34117126464844
production_forward2 grad[22] vs paper_forward: mean_abs=0.9516024589538574, max_abs=6.5, mean_rel=0.15691085159778595, max_rel=1168.7381591796875, norm_rel=0.02289816178381443, ref_abs_avg=41.78714370727539, test_abs_avg=41.7834587097168
production_forward2 grad[23] vs paper_forward: mean_abs=0.7892341613769531, max_abs=3.0, mean_rel=0.10693881660699844, max_rel=14.513200759887695, norm_rel=0.022812265902757645, ref_abs_avg=34.986515045166016, test_abs_avg=34.99903869628906
production_forward2 grad[24] vs paper_forward: mean_abs=0.9341068863868713, max_abs=6.0, mean_rel=0.15701600909233093, max_rel=1841.69140625, norm_rel=0.022810211405158043, ref_abs_avg=41.15446472167969, test_abs_avg=41.15039825439453
production_forward2 grad[25] vs paper_forward: mean_abs=0.9082603454589844, max_abs=5.5, mean_rel=0.17364998161792755, max_rel=2203.125, norm_rel=0.022678961977362633, ref_abs_avg=40.21721649169922, test_abs_avg=40.21661376953125
production_forward2 grad[26] vs paper_forward: mean_abs=0.8968601226806641, max_abs=4.75, mean_rel=0.08063732832670212, max_rel=8.110170364379883, norm_rel=0.026072174310684204, ref_abs_avg=35.37638854980469, test_abs_avg=35.47318649291992
production_forward2 grad[27] vs paper_forward: mean_abs=1.0922309160232544, max_abs=7.5, mean_rel=0.18821389973163605, max_rel=3005.9833984375, norm_rel=0.024881530553102493, ref_abs_avg=44.080909729003906, test_abs_avg=44.080543518066406
production_forward2 grad[28] vs paper_forward: mean_abs=1.0659656524658203, max_abs=6.25, mean_rel=0.17037734389305115, max_rel=1100.163330078125, norm_rel=0.02473987638950348, ref_abs_avg=43.307594299316406, test_abs_avg=43.303955078125
production_forward2 grad[29] vs paper_forward: mean_abs=0.8127176761627197, max_abs=3.75, mean_rel=0.11375881731510162, max_rel=11.901591300964355, norm_rel=0.025785036385059357, ref_abs_avg=31.892459869384766, test_abs_avg=31.94232177734375
production_forward2 grad[30] vs paper_forward: mean_abs=1.001023769378662, max_abs=7.0, mean_rel=0.17611362040042877, max_rel=1782.6925048828125, norm_rel=0.025218084454536438, ref_abs_avg=39.83612823486328, test_abs_avg=39.833763122558594
production_forward2 grad[31] vs paper_forward: mean_abs=0.9775118231773376, max_abs=6.0, mean_rel=0.1835278868675232, max_rel=1297.476806640625, norm_rel=0.02484789863228798, ref_abs_avg=39.48025894165039, test_abs_avg=39.468868255615234
production_forward2 grad[32] vs paper_forward: mean_abs=0.7711982727050781, max_abs=3.375, mean_rel=0.08109043538570404, max_rel=3.018547534942627, norm_rel=0.026215186342597008, ref_abs_avg=29.38788604736328, test_abs_avg=29.38918685913086
production_forward2 grad[33] vs paper_forward: mean_abs=0.9200060963630676, max_abs=6.0, mean_rel=0.1746872365474701, max_rel=1339.7840576171875, norm_rel=0.025043753907084465, ref_abs_avg=36.878517150878906, test_abs_avg=36.876712799072266
production_forward2 grad[34] vs paper_forward: mean_abs=0.9134000539779663, max_abs=6.0, mean_rel=0.16314269602298737, max_rel=975.3255004882812, norm_rel=0.02483038604259491, ref_abs_avg=36.962501525878906, test_abs_avg=36.96540832519531
production_forward2 grad[35] vs paper_forward: mean_abs=0.7149534225463867, max_abs=2.625, mean_rel=0.13256719708442688, max_rel=17.645875930786133, norm_rel=0.023583652451634407, ref_abs_avg=29.88469696044922, test_abs_avg=29.881641387939453
production_forward2 grad[36] vs paper_forward: mean_abs=0.869479238986969, max_abs=5.8125, mean_rel=0.16973817348480225, max_rel=2459.972900390625, norm_rel=0.024517642334103584, ref_abs_avg=35.57512664794922, test_abs_avg=35.571197509765625
production_forward2 grad[37] vs paper_forward: mean_abs=0.8595103025436401, max_abs=5.5, mean_rel=0.17286911606788635, max_rel=1685.6448974609375, norm_rel=0.024547161534428596, ref_abs_avg=35.16339111328125, test_abs_avg=35.16478729248047
production_forward2 grad[38] vs paper_forward: mean_abs=0.6778327226638794, max_abs=2.75, mean_rel=0.15113253891468048, max_rel=20.83977699279785, norm_rel=0.02459135465323925, ref_abs_avg=27.215896606445312, test_abs_avg=27.202470779418945
production_forward2 grad[39] vs paper_forward: mean_abs=0.826273500919342, max_abs=5.28125, mean_rel=0.17069697380065918, max_rel=1761.31640625, norm_rel=0.024452228099107742, ref_abs_avg=33.91107940673828, test_abs_avg=33.910892486572266
production_forward2 grad[40] vs paper_forward: mean_abs=0.807306706905365, max_abs=5.0, mean_rel=0.17416539788246155, max_rel=1481.4638671875, norm_rel=0.024174870923161507, ref_abs_avg=33.53181838989258, test_abs_avg=33.52936553955078
production_forward2 grad[41] vs paper_forward: mean_abs=0.6123806834220886, max_abs=2.5546875, mean_rel=0.45138275623321533, max_rel=189.59994506835938, norm_rel=0.023031380027532578, ref_abs_avg=26.89397430419922, test_abs_avg=26.87334632873535
production_forward2 grad[42] vs paper_forward: mean_abs=0.7763288021087646, max_abs=5.0, mean_rel=0.1744619905948639, max_rel=1532.480224609375, norm_rel=0.024010667577385902, ref_abs_avg=32.37318420410156, test_abs_avg=32.37184143066406
production_forward2 grad[43] vs paper_forward: mean_abs=0.7598079442977905, max_abs=5.0, mean_rel=0.16189557313919067, max_rel=1110.1640625, norm_rel=0.02377397008240223, ref_abs_avg=31.971927642822266, test_abs_avg=31.969924926757812
production_forward2 grad[44] vs paper_forward: mean_abs=0.5816867351531982, max_abs=3.0, mean_rel=0.14463599026203156, max_rel=33.775177001953125, norm_rel=0.022814590483903885, ref_abs_avg=25.65668487548828, test_abs_avg=25.63577651977539
production_forward2 grad[45] vs paper_forward: mean_abs=0.7417310476303101, max_abs=4.84375, mean_rel=0.172463059425354, max_rel=2516.996337890625, norm_rel=0.023759763687849045, ref_abs_avg=31.268787384033203, test_abs_avg=31.267118453979492
production_forward2 grad[46] vs paper_forward: mean_abs=0.7285642623901367, max_abs=5.5, mean_rel=0.16411089897155762, max_rel=1537.7742919921875, norm_rel=0.02361493557691574, ref_abs_avg=30.898422241210938, test_abs_avg=30.902957916259766
production_forward2 grad[47] vs paper_forward: mean_abs=0.558436930179596, max_abs=2.25, mean_rel=0.3694338798522949, max_rel=141.9005889892578, norm_rel=0.022858822718262672, ref_abs_avg=24.9323673248291, test_abs_avg=24.86528968811035
production_forward2 grad[48] vs paper_forward: mean_abs=0.7062105536460876, max_abs=4.5, mean_rel=0.15724557638168335, max_rel=1465.8455810546875, norm_rel=0.02338501624763012, ref_abs_avg=30.20827293395996, test_abs_avg=30.20929718017578
production_forward2 grad[49] vs paper_forward: mean_abs=0.6946709156036377, max_abs=4.5, mean_rel=0.15165100991725922, max_rel=506.5646667480469, norm_rel=0.023532038554549217, ref_abs_avg=29.59029769897461, test_abs_avg=29.58919334411621
production_forward2 grad[50] vs paper_forward: mean_abs=0.63909912109375, max_abs=2.5, mean_rel=0.12527239322662354, max_rel=17.977291107177734, norm_rel=0.025563694536685944, ref_abs_avg=24.952110290527344, test_abs_avg=24.948143005371094
production_forward2 grad[51] vs paper_forward: mean_abs=0.7986176609992981, max_abs=5.5, mean_rel=0.16357529163360596, max_rel=1266.0421142578125, norm_rel=0.02534857764840126, ref_abs_avg=31.56496810913086, test_abs_avg=31.565475463867188
production_forward2 grad[52] vs paper_forward: mean_abs=0.7786224484443665, max_abs=6.25, mean_rel=0.19743645191192627, max_rel=1989.2236328125, norm_rel=0.02531571313738823, ref_abs_avg=30.90643882751465, test_abs_avg=30.911884307861328
production_forward2 grad[53] vs paper_forward: mean_abs=0.5990228652954102, max_abs=2.5, mean_rel=0.09589846432209015, max_rel=6.090593338012695, norm_rel=0.02414979785680771, ref_abs_avg=24.975601196289062, test_abs_avg=25.026044845581055
production_forward2 grad[54] vs paper_forward: mean_abs=0.7167866230010986, max_abs=4.5, mean_rel=0.16608354449272156, max_rel=1626.731689453125, norm_rel=0.02464352175593376, ref_abs_avg=29.089229583740234, test_abs_avg=29.090717315673828
production_forward2 grad[55] vs paper_forward: mean_abs=0.7018444538116455, max_abs=6.0, mean_rel=0.1774778664112091, max_rel=1764.0062255859375, norm_rel=0.024673068895936012, ref_abs_avg=28.50194549560547, test_abs_avg=28.49848175048828
production_forward2 grad[56] vs paper_forward: mean_abs=0.5534179210662842, max_abs=1.97265625, mean_rel=0.20380575954914093, max_rel=37.4889030456543, norm_rel=0.02423921227455139, ref_abs_avg=22.963457107543945, test_abs_avg=22.9438419342041
production_forward2 grad[57] vs paper_forward: mean_abs=0.6715116500854492, max_abs=5.0, mean_rel=0.16279640793800354, max_rel=930.1929931640625, norm_rel=0.02409152127802372, ref_abs_avg=27.861852645874023, test_abs_avg=27.862796783447266
production_forward2 grad[58] vs paper_forward: mean_abs=0.6533027291297913, max_abs=4.5625, mean_rel=0.14969536662101746, max_rel=903.7705078125, norm_rel=0.02394525334239006, ref_abs_avg=27.31515884399414, test_abs_avg=27.312437057495117
production_forward2 grad[59] vs paper_forward: mean_abs=0.49728918075561523, max_abs=1.875, mean_rel=0.10599811375141144, max_rel=8.175039291381836, norm_rel=0.023171572014689445, ref_abs_avg=21.2255916595459, test_abs_avg=21.217697143554688
production_forward2 grad[60] vs paper_forward: mean_abs=0.6297117471694946, max_abs=4.296875, mean_rel=0.15853258967399597, max_rel=1019.2376708984375, norm_rel=0.02396766096353531, ref_abs_avg=26.24979019165039, test_abs_avg=26.249622344970703
production_forward2 grad[61] vs paper_forward: mean_abs=0.6144095659255981, max_abs=5.0625, mean_rel=0.16547797620296478, max_rel=1178.8720703125, norm_rel=0.023373456671833992, ref_abs_avg=26.292442321777344, test_abs_avg=26.291004180908203
production_forward2 grad[62] vs paper_forward: mean_abs=0.46608591079711914, max_abs=2.171875, mean_rel=0.157058447599411, max_rel=27.932031631469727, norm_rel=0.0232424084097147, ref_abs_avg=20.75735092163086, test_abs_avg=20.779338836669922
production_forward2 grad[63] vs paper_forward: mean_abs=0.589758574962616, max_abs=5.25, mean_rel=0.15970107913017273, max_rel=1232.719482421875, norm_rel=0.023389797657728195, ref_abs_avg=25.191635131835938, test_abs_avg=25.191421508789062
production_forward2 grad[64] vs paper_forward: mean_abs=0.5801708698272705, max_abs=4.5, mean_rel=0.14583440124988556, max_rel=765.71875, norm_rel=0.023374561220407486, ref_abs_avg=24.87888526916504, test_abs_avg=24.866844177246094
production_forward2 grad[65] vs paper_forward: mean_abs=0.44485175609588623, max_abs=2.3125, mean_rel=0.10706211626529694, max_rel=10.474392890930176, norm_rel=0.022859716787934303, ref_abs_avg=19.222089767456055, test_abs_avg=19.224124908447266
production_forward2 grad[66] vs paper_forward: mean_abs=0.5607990026473999, max_abs=4.625, mean_rel=0.1471305936574936, max_rel=695.96875, norm_rel=0.023130230605602264, ref_abs_avg=24.244924545288086, test_abs_avg=24.241960525512695
production_forward2 grad[67] vs paper_forward: mean_abs=0.5504035949707031, max_abs=5.0, mean_rel=0.15153801441192627, max_rel=716.8331909179688, norm_rel=0.023049410432577133, ref_abs_avg=23.908363342285156, test_abs_avg=23.89885902404785
production_forward2 grad[68] vs paper_forward: mean_abs=0.4351625442504883, max_abs=1.5, mean_rel=0.05813843756914139, max_rel=1.4799224138259888, norm_rel=0.021719908341765404, ref_abs_avg=20.41046142578125, test_abs_avg=20.450714111328125
production_forward2 grad[69] vs paper_forward: mean_abs=0.5359549522399902, max_abs=5.0, mean_rel=0.15272994339466095, max_rel=830.6945190429688, norm_rel=0.022838713601231575, ref_abs_avg=23.468610763549805, test_abs_avg=23.467966079711914
production_forward2 grad[70] vs paper_forward: mean_abs=0.5230456590652466, max_abs=4.0, mean_rel=0.1525035798549652, max_rel=848.1475219726562, norm_rel=0.022664876654744148, ref_abs_avg=23.067672729492188, test_abs_avg=23.059438705444336
production_forward2 grad[71] vs paper_forward: mean_abs=0.43280696868896484, max_abs=1.75, mean_rel=0.07601722329854965, max_rel=5.277044773101807, norm_rel=0.02328030951321125, ref_abs_avg=18.965551376342773, test_abs_avg=18.946758270263672
production_forward2 grad[72] vs paper_forward: mean_abs=0.5076832175254822, max_abs=4.0, mean_rel=0.14330145716667175, max_rel=956.763916015625, norm_rel=0.02234681509435177, ref_abs_avg=22.70415496826172, test_abs_avg=22.70295524597168
production_forward2 grad[73] vs paper_forward: mean_abs=0.49532443284988403, max_abs=4.0, mean_rel=0.1339012086391449, max_rel=559.8778686523438, norm_rel=0.022269103676080704, ref_abs_avg=22.264915466308594, test_abs_avg=22.268701553344727
production_forward2 grad[74] vs paper_forward: mean_abs=0.445639967918396, max_abs=1.875, mean_rel=0.08101275563240051, max_rel=3.3983724117279053, norm_rel=0.02366931550204754, ref_abs_avg=18.89409637451172, test_abs_avg=18.890363693237305
production_forward2 grad[75] vs paper_forward: mean_abs=0.5518974661827087, max_abs=4.5, mean_rel=0.15569013357162476, max_rel=1064.6068115234375, norm_rel=0.023749960586428642, ref_abs_avg=23.234760284423828, test_abs_avg=23.23465919494629
production_forward2 grad[76] vs paper_forward: mean_abs=0.5404744148254395, max_abs=4.125, mean_rel=0.1430024653673172, max_rel=787.3578491210938, norm_rel=0.02348119579255581, ref_abs_avg=23.00525665283203, test_abs_avg=23.011232376098633
production_forward2 grad[77] vs paper_forward: mean_abs=0.44283461570739746, max_abs=1.625, mean_rel=0.08372686058282852, max_rel=8.888111114501953, norm_rel=0.023184025660157204, ref_abs_avg=19.245468139648438, test_abs_avg=19.25818634033203
production_forward2 grad[78] vs paper_forward: mean_abs=0.5249752998352051, max_abs=4.25, mean_rel=0.15149039030075073, max_rel=1499.8218994140625, norm_rel=0.02328726276755333, ref_abs_avg=22.523651123046875, test_abs_avg=22.522476196289062
production_forward2 grad[79] vs paper_forward: mean_abs=0.5107666850090027, max_abs=4.0, mean_rel=0.1634797751903534, max_rel=2045.1109619140625, norm_rel=0.02306569367647171, ref_abs_avg=22.084484100341797, test_abs_avg=22.080476760864258
production_forward2 grad[80] vs paper_forward: mean_abs=0.3922082185745239, max_abs=1.75, mean_rel=0.1507745385169983, max_rel=17.104476928710938, norm_rel=0.022156866267323494, ref_abs_avg=17.633087158203125, test_abs_avg=17.627838134765625
production_forward2 grad[81] vs paper_forward: mean_abs=0.48782944679260254, max_abs=4.25, mean_rel=0.14846055209636688, max_rel=955.0387573242188, norm_rel=0.022765139117836952, ref_abs_avg=21.408138275146484, test_abs_avg=21.40899085998535
production_forward2 grad[82] vs paper_forward: mean_abs=0.4725438952445984, max_abs=3.75, mean_rel=0.14823246002197266, max_rel=1016.1796875, norm_rel=0.022952811792492867, ref_abs_avg=20.699317932128906, test_abs_avg=20.697277069091797
production_forward2 grad[83] vs paper_forward: mean_abs=0.36185622215270996, max_abs=1.40625, mean_rel=0.22396963834762573, max_rel=64.95063018798828, norm_rel=0.021964794024825096, ref_abs_avg=16.433135986328125, test_abs_avg=16.422271728515625
production_forward2 grad[84] vs paper_forward: mean_abs=0.45543554425239563, max_abs=4.0, mean_rel=0.14729659259319305, max_rel=811.2061157226562, norm_rel=0.0221815574914217, ref_abs_avg=20.576242446899414, test_abs_avg=20.57620620727539
production_forward2 grad[85] vs paper_forward: mean_abs=0.43255579471588135, max_abs=3.75, mean_rel=0.13520777225494385, max_rel=583.823974609375, norm_rel=0.021636275574564934, ref_abs_avg=20.051090240478516, test_abs_avg=20.045001983642578
production_forward2 grad[86] vs paper_forward: mean_abs=0.33619141578674316, max_abs=1.2890625, mean_rel=0.14155302941799164, max_rel=17.723661422729492, norm_rel=0.021090468391776085, ref_abs_avg=16.240827560424805, test_abs_avg=16.217872619628906
production_forward2 grad[87] vs paper_forward: mean_abs=0.42009198665618896, max_abs=4.0, mean_rel=0.13881641626358032, max_rel=697.5775756835938, norm_rel=0.021557796746492386, ref_abs_avg=19.532272338867188, test_abs_avg=19.5308780670166
production_forward2 grad[88] vs paper_forward: mean_abs=0.40689098834991455, max_abs=3.5, mean_rel=0.13362786173820496, max_rel=593.7874145507812, norm_rel=0.021191341802477837, ref_abs_avg=19.261489868164062, test_abs_avg=19.269237518310547
production_forward2 grad[89] vs paper_forward: mean_abs=0.31579816341400146, max_abs=1.25, mean_rel=0.11239397525787354, max_rel=17.947179794311523, norm_rel=0.020960764959454536, ref_abs_avg=15.075325965881348, test_abs_avg=15.068632125854492
production_forward2 grad[90] vs paper_forward: mean_abs=0.38496437668800354, max_abs=4.0, mean_rel=0.13240423798561096, max_rel=639.5650634765625, norm_rel=0.02079550363123417, ref_abs_avg=18.62749671936035, test_abs_avg=18.626358032226562
production_forward2 grad[91] vs paper_forward: mean_abs=0.3902134597301483, max_abs=4.0, mean_rel=0.13087917864322662, max_rel=1086.7470703125, norm_rel=0.020812246948480606, ref_abs_avg=18.947162628173828, test_abs_avg=18.95298957824707
production_forward2 grad[92] vs paper_forward: mean_abs=0.34354662895202637, max_abs=1.3125, mean_rel=0.17828261852264404, max_rel=45.69450759887695, norm_rel=0.022671116515994072, ref_abs_avg=15.16098403930664, test_abs_avg=15.169672966003418
production_forward2 grad[93] vs paper_forward: mean_abs=0.3781406879425049, max_abs=4.75, mean_rel=0.1285073459148407, max_rel=897.3670654296875, norm_rel=0.020518969744443893, ref_abs_avg=18.61284637451172, test_abs_avg=18.611215591430664
production_forward2 grad[94] vs paper_forward: mean_abs=0.3599199056625366, max_abs=4.0, mean_rel=0.12655238807201385, max_rel=533.8883056640625, norm_rel=0.020389650017023087, ref_abs_avg=17.86054039001465, test_abs_avg=17.864221572875977
production_forward2 grad[95] vs paper_forward: mean_abs=0.3036472201347351, max_abs=1.25, mean_rel=0.0887424498796463, max_rel=6.4361186027526855, norm_rel=0.02020084112882614, ref_abs_avg=15.50328254699707, test_abs_avg=15.496624946594238
production_forward2 grad[96] vs paper_forward: mean_abs=0.3487859070301056, max_abs=3.5, mean_rel=0.12425471842288971, max_rel=807.1906127929688, norm_rel=0.019748158752918243, ref_abs_avg=17.92497444152832, test_abs_avg=17.924070358276367
production_forward2 grad[97] vs paper_forward: mean_abs=0.35044848918914795, max_abs=3.25, mean_rel=0.11464925855398178, max_rel=394.41497802734375, norm_rel=0.01988852396607399, ref_abs_avg=17.953140258789062, test_abs_avg=17.94904899597168

