identity layers + randn queries

Autotune Choices Stats:
{"num_choices": 15, "num_triton_choices": 14, "best_kernel": "mm", "best_time": 0.18943999707698822, "best_triton_pos": 1, "best_triton_time": 0.1955839991569519, "best_triton_kernel": "triton_mm_166", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"}
AUTOTUNE mm(65536x512, 512x8)
strides: [512, 1], [1, 512]
dtypes: torch.float32, torch.float32
  mm 0.1894 ms 100.0% 
  triton_mm_166 0.1956 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_169 0.1976 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_163 0.3133 ms 60.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_167 0.3144 ms 60.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_164 0.3154 ms 60.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_159 0.3164 ms 59.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_161 0.3164 ms 59.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_162 0.3164 ms 59.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_168 0.3174 ms 59.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8
SingleProcess AUTOTUNE benchmarking takes 0.5614 seconds and 0.9610 seconds precompiling for 15 choices
Autotune Choices Stats:
{"num_choices": 15, "num_triton_choices": 14, "best_kernel": "mm", "best_time": 0.3481599986553192, "best_triton_pos": 1, "best_triton_time": 0.3758080005645752, "best_triton_kernel": "triton_mm_180", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"}
AUTOTUNE mm(131072x512, 512x8)
strides: [512, 1], [1, 512]
dtypes: torch.float32, torch.float32
  mm 0.3482 ms 100.0% 
  triton_mm_180 0.3758 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_183 0.3820 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_176 0.5868 ms 59.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_177 0.5878 ms 59.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_175 0.5888 ms 59.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_178 0.5898 ms 59.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_181 0.6185 ms 56.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_182 0.6185 ms 56.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8
  triton_mm_179 0.6513 ms 53.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
SingleProcess AUTOTUNE benchmarking takes 0.5854 seconds and 0.6738 seconds precompiling for 15 choices
Autotune Choices Stats:
{"num_choices": 6, "num_triton_choices": 0, "best_kernel": "mm", "best_time": 0.30105599761009216}
AUTOTUNE mm(512x131072, 131072x8)
strides: [1, 512], [8, 1]
dtypes: torch.float32, torch.float32
  mm 0.3011 ms 100.0% 
  decompose_k_mm_256_split_22 1.1233 ms 26.8% k_split=256
  decompose_k_mm_128_split_21 1.1735 ms 25.7% k_split=128
  decompose_k_mm_64_split_20 1.3916 ms 21.6% k_split=64
  decompose_k_mm_32_split_19 1.8596 ms 16.2% k_split=32
  decompose_k_mm_16_split_18 1.8616 ms 16.2% k_split=16
SingleProcess AUTOTUNE benchmarking takes 2.4789 seconds and 0.0005 seconds precompiling for 6 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_194", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.15667200088500977, "best_triton_pos": 0}
AUTOTUNE mm(131072x8, 8x512)
strides: [8, 1], [512, 1]
dtypes: torch.float32, torch.float32
  triton_mm_194 0.1567 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_195 0.1659 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_189 0.1710 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_191 0.1710 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  mm 0.1741 ms 90.0% 
  triton_mm_192 0.1741 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_193 0.1812 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_196 0.1843 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_198 0.1956 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_190 0.2017 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
SingleProcess AUTOTUNE benchmarking takes 0.5972 seconds and 0.0004 seconds precompiling for 18 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_208", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.3604480028152466, "best_triton_pos": 0}
AUTOTUNE mm(327680x1, 1x512)
strides: [1, 0], [512, 1]
dtypes: torch.float32, torch.float32
  triton_mm_208 0.3604 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_206 0.3645 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_211 0.3645 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_209 0.3686 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_212 0.3738 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_205 0.3758 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_207 0.4045 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
  triton_mm_210 0.4157 ms 86.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_213 0.4321 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  mm 0.4372 ms 82.4% 
SingleProcess AUTOTUNE benchmarking takes 0.7037 seconds and 0.8850 seconds precompiling for 18 choices
Autotune Choices Stats:
{"num_choices": 8, "num_triton_choices": 0, "best_kernel": "mm", "best_time": 0.13414399325847626}
AUTOTUNE mm(512x65536, 65536x8)
strides: [1, 512], [8, 1]
dtypes: torch.float32, torch.float32
  mm 0.1341 ms 100.0% 
  decompose_k_mm_128_split_29 0.4884 ms 27.5% k_split=128
  decompose_k_mm_64_split_28 0.5796 ms 23.1% k_split=64
  decompose_k_mm_16_split_26 0.7680 ms 17.5% k_split=16
  decompose_k_mm_32_split_27 0.7680 ms 17.5% k_split=32
  decompose_k_mm_8_split_25 1.5176 ms 8.8% k_split=8
  decompose_k_mm_4_split_24 3.0198 ms 4.4% k_split=4
  decompose_k_mm_2_split_23 6.0242 ms 2.2% k_split=2
SingleProcess AUTOTUNE benchmarking takes 1.3488 seconds and 0.0004 seconds precompiling for 8 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_223", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.07782399654388428, "best_triton_pos": 0}
AUTOTUNE mm(65536x8, 8x512)
strides: [8, 1], [512, 1]
dtypes: torch.float32, torch.float32
  triton_mm_223 0.0778 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_225 0.0788 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_226 0.0809 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_228 0.0809 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  mm 0.0829 ms 93.8% 
  triton_mm_229 0.0840 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_227 0.0901 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_224 0.0911 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
  triton_mm_220 0.0932 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8
  triton_mm_230 0.0932 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
SingleProcess AUTOTUNE benchmarking takes 0.3925 seconds and 0.0002 seconds precompiling for 18 choices

torch_compile_phases_forward fwd+bwd:  84.216 ms
torch_compile_phases_forward bwd-only: 67.228 ms
torch_compile_phases_forward peak allocated: fwd=6.424 GiB, fwd+bwd=6.737 GiB
torch_compile_phases_forward peak reserved:  fwd=6.646 GiB, fwd+bwd=8.773 GiB

/usr/local/lib/python3.12/dist-packages/torch/_inductor/lowering.py:7627: UserWarning: 
Online softmax is disabled on the fly since Inductor decides to
split the reduction. Cut an issue to PyTorch if this is an
important use case and you want to speed it up with online
softmax.

  warnings.warn(
/usr/local/lib/python3.12/dist-packages/torch/_inductor/lowering.py:7627: UserWarning: 
Online softmax is disabled on the fly since Inductor decides to
split the reduction. Cut an issue to PyTorch if this is an
important use case and you want to speed it up with online
softmax.

  warnings.warn(
E0428 21:01:43.143000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Runtime error during autotuning: 
E0428 21:01:43.143000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] CUDA driver error: invalid argument
E0428 21:01:43.143000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.143000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] This may mean this GPU is too small for max_autotune mode.
E0428 21:01:43.143000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.143000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] . 
E0428 21:01:43.143000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Ignoring this choice.
E0428 21:01:43.152000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Runtime error during autotuning: 
E0428 21:01:43.152000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] CUDA driver error: invalid argument
E0428 21:01:43.152000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.152000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] This may mean this GPU is too small for max_autotune mode.
E0428 21:01:43.152000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.152000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] . 
E0428 21:01:43.152000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Ignoring this choice.
E0428 21:01:43.165000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Runtime error during autotuning: 
E0428 21:01:43.165000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] CUDA driver error: invalid argument
E0428 21:01:43.165000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.165000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] This may mean this GPU is too small for max_autotune mode.
E0428 21:01:43.165000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.165000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] . 
E0428 21:01:43.165000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Ignoring this choice.
E0428 21:01:43.176000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Runtime error during autotuning: 
E0428 21:01:43.176000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] CUDA driver error: invalid argument
E0428 21:01:43.176000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.176000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] This may mean this GPU is too small for max_autotune mode.
E0428 21:01:43.176000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.176000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] . 
E0428 21:01:43.176000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Ignoring this choice.
E0428 21:01:43.189000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Runtime error during autotuning: 
E0428 21:01:43.189000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] CUDA driver error: invalid argument
E0428 21:01:43.189000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.189000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] This may mean this GPU is too small for max_autotune mode.
E0428 21:01:43.189000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.189000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] . 
E0428 21:01:43.189000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Ignoring this choice.
E0428 21:01:43.202000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Runtime error during autotuning: 
E0428 21:01:43.202000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] CUDA driver error: invalid argument
E0428 21:01:43.202000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.202000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] This may mean this GPU is too small for max_autotune mode.
E0428 21:01:43.202000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.202000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] . 
E0428 21:01:43.202000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Ignoring this choice.
E0428 21:01:43.211000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Runtime error during autotuning: 
E0428 21:01:43.211000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] CUDA driver error: invalid argument
E0428 21:01:43.211000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.211000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] This may mean this GPU is too small for max_autotune mode.
E0428 21:01:43.211000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.211000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] . 
E0428 21:01:43.211000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Ignoring this choice.
E0428 21:01:43.219000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Runtime error during autotuning: 
E0428 21:01:43.219000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] CUDA driver error: invalid argument
E0428 21:01:43.219000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.219000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] This may mean this GPU is too small for max_autotune mode.
E0428 21:01:43.219000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.219000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] . 
E0428 21:01:43.219000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Ignoring this choice.
E0428 21:01:43.228000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Runtime error during autotuning: 
E0428 21:01:43.228000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] CUDA driver error: invalid argument
E0428 21:01:43.228000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.228000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] This may mean this GPU is too small for max_autotune mode.
E0428 21:01:43.228000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.228000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] . 
E0428 21:01:43.228000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Ignoring this choice.
E0428 21:01:43.236000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Runtime error during autotuning: 
E0428 21:01:43.236000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] CUDA driver error: invalid argument
E0428 21:01:43.236000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.236000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] This may mean this GPU is too small for max_autotune mode.
E0428 21:01:43.236000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.236000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] . 
E0428 21:01:43.236000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Ignoring this choice.
E0428 21:01:43.244000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Runtime error during autotuning: 
E0428 21:01:43.244000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] CUDA driver error: invalid argument
E0428 21:01:43.244000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.244000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] This may mean this GPU is too small for max_autotune mode.
E0428 21:01:43.244000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.244000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] . 
E0428 21:01:43.244000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Ignoring this choice.
E0428 21:01:43.253000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Runtime error during autotuning: 
E0428 21:01:43.253000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] CUDA driver error: invalid argument
E0428 21:01:43.253000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.253000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] This may mean this GPU is too small for max_autotune mode.
E0428 21:01:43.253000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] 
E0428 21:01:43.253000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] . 
E0428 21:01:43.253000 2448 torch/_inductor/select_algorithm.py:3727] [5/1] Ignoring this choice.
Autotune Choices Stats:
{"num_choices": 13, "num_triton_choices": 12, "best_kernel": "bmm", "best_time": 1.7397760152816772, "best_triton_pos": 1, "best_triton_time": Infinity, "best_triton_kernel": "triton_bmm_235", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2"}
AUTOTUNE bmm(65536x2x1, 65536x1x512)
strides: [1, 65536, 0], [512, 0, 1]
dtypes: torch.float32, torch.float32
  bmm 1.7398 ms 100.0% 
  triton_bmm_235 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2
  triton_bmm_236 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2
  triton_bmm_237 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_bmm_238 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2
  triton_bmm_239 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_bmm_240 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_bmm_241 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_bmm_242 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_bmm_243 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
SingleProcess AUTOTUNE benchmarking takes 0.1647 seconds and 0.0004 seconds precompiling for 13 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_252", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.15360000729560852, "best_triton_pos": 0}
AUTOTUNE mm(512x1, 1x131072)
strides: [1, 512], [0, 1]
dtypes: torch.float32, torch.float32
  triton_mm_252 0.1536 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_254 0.1536 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_257 0.1536 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_255 0.1546 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_258 0.1546 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_251 0.1618 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_253 0.1659 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
  triton_mm_259 0.1679 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  mm 0.1690 ms 90.9% 
  triton_mm_256 0.1700 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
SingleProcess AUTOTUNE benchmarking takes 0.5879 seconds and 0.7307 seconds precompiling for 18 choices
/usr/local/lib/python3.12/dist-packages/torch/_inductor/lowering.py:7627: UserWarning: 
Online softmax is disabled on the fly since Inductor decides to
split the reduction. Cut an issue to PyTorch if this is an
important use case and you want to speed it up with online
softmax.

  warnings.warn(
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_271", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.07884799689054489, "best_triton_pos": 0}
AUTOTUNE mm(512x1, 1x65536)
strides: [1, 512], [0, 1]
dtypes: torch.float32, torch.float32
  triton_mm_271 0.0788 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_269 0.0799 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  mm 0.0809 ms 97.5% 
  triton_mm_268 0.0819 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_270 0.0819 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
  triton_mm_275 0.0819 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_272 0.0829 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_274 0.0829 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_264 0.0840 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2
  triton_mm_276 0.0870 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
SingleProcess AUTOTUNE benchmarking takes 0.3864 seconds and 1.0878 seconds precompiling for 18 choices

paper_forward fwd+bwd:  194.271 ms
paper_forward bwd-only: 153.893 ms
paper_forward peak allocated: fwd=14.884 GiB, fwd+bwd=15.944 GiB
paper_forward peak reserved:  fwd=14.932 GiB, fwd+bwd=16.182 GiB
liger_forward fwd+bwd:  167.348 ms
liger_forward bwd-only: 145.757 ms
liger_forward peak allocated: fwd=7.681 GiB, fwd+bwd=7.681 GiB
liger_forward peak reserved:  fwd=7.732 GiB, fwd+bwd=8.045 GiB
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (1, 512, 8, 1, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 11.65s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None;
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_2_online_softmax_merge_intrablock_out_kernel,
with key as (512, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.bfloat16'),
finished after 2.42s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (2, 512, 8, 2, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 14.62s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (3, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 17.47s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (4, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 17.54s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (5, 512, 1, 8, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 8.07s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (5, 512, 1, 8, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 14.14s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Triton autotuning for function phase_1_reduce_grad_pseudo_queries_kernel,
with key as (65536, 512, 1, 'torch.float32', 'torch.float32'),
finished after 1.94s,
best config selected: BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_2_online_softmax_merge_intrablock_backward_kernel,
with key as (512, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 2.60s,
best config selected: num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Triton autotuning for function phase_2_reduce_grad_pseudo_query_kernel,
with key as (65536, 512, 'torch.float32', 'torch.float32'),
finished after 1.89s,
best config selected: BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (4, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 52.16s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Triton autotuning for function phase_1_reduce_grad_pseudo_queries_kernel,
with key as (65536, 512, 8, 'torch.float32', 'torch.float32'),
finished after 2.03s,
best config selected: BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (3, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 44.62s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (2, 512, 8, 2, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 32.96s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (1, 512, 8, 1, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 21.26s,
best config selected: num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None;
production_forward fwd+bwd:  57.764 ms
production_forward bwd-only: 49.458 ms
production_forward peak allocated: fwd=1.128 GiB, fwd+bwd=5.130 GiB
production_forward peak reserved:  fwd=2.131 GiB, fwd+bwd=5.256 GiB
pytorch_attn_res_forward fwd+bwd:  994.262 ms
pytorch_attn_res_forward bwd-only: 820.203 ms
pytorch_attn_res_forward peak allocated: fwd=43.745 GiB, fwd+bwd=44.867 GiB
pytorch_attn_res_forward peak reserved:  fwd=44.922 GiB, fwd+bwd=46.174 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016621432732790709, max_abs=0.0390625
production_forward grad[0] vs paper_forward: mean_abs=0.00844741053879261, max_abs=0.4453125, mean_rel=0.07281096279621124, max_rel=140.03741455078125, norm_rel=0.01994001306593418, ref_abs_avg=0.4600657820701599, test_abs_avg=0.46009618043899536
production_forward grad[1] vs paper_forward: mean_abs=5.2351908683776855, max_abs=40.0, mean_rel=0.14164376258850098, max_rel=88.60671997070312, norm_rel=0.02087828330695629, ref_abs_avg=224.78880310058594, test_abs_avg=224.86109924316406
production_forward grad[2] vs paper_forward: mean_abs=0.8771915435791016, max_abs=3.75, mean_rel=0.09431654214859009, max_rel=6.64244270324707, norm_rel=0.024978818371891975, ref_abs_avg=35.474117279052734, test_abs_avg=35.48321533203125
production_forward grad[3] vs paper_forward: mean_abs=1.065678358078003, max_abs=6.5, mean_rel=0.5112793445587158, max_rel=4125.0, norm_rel=0.023369718343019485, ref_abs_avg=45.86791229248047, test_abs_avg=45.871253967285156
production_forward grad[4] vs paper_forward: mean_abs=1.0397562980651855, max_abs=6.5, mean_rel=0.2799282968044281, max_rel=3562.499755859375, norm_rel=0.023089125752449036, ref_abs_avg=45.2935791015625, test_abs_avg=45.29520797729492
production_forward grad[5] vs paper_forward: mean_abs=0.7950403690338135, max_abs=3.25, mean_rel=0.07861720025539398, max_rel=2.634491443634033, norm_rel=0.02329234406352043, ref_abs_avg=34.386600494384766, test_abs_avg=34.386444091796875
production_forward grad[6] vs paper_forward: mean_abs=0.9425928592681885, max_abs=7.0, mean_rel=0.4562044143676758, max_rel=3999.999755859375, norm_rel=0.02298324927687645, ref_abs_avg=41.25053024291992, test_abs_avg=41.255577087402344
production_forward grad[7] vs paper_forward: mean_abs=0.9184994101524353, max_abs=5.5, mean_rel=0.2792612612247467, max_rel=2718.749755859375, norm_rel=0.022636542096734047, ref_abs_avg=40.795616149902344, test_abs_avg=40.7982177734375
production_forward grad[8] vs paper_forward: mean_abs=0.6988534927368164, max_abs=3.0, mean_rel=0.07894454896450043, max_rel=8.184221267700195, norm_rel=0.02254626154899597, ref_abs_avg=31.117130279541016, test_abs_avg=31.012615203857422
production_forward grad[9] vs paper_forward: mean_abs=0.8543533682823181, max_abs=5.46875, mean_rel=0.44558024406433105, max_rel=2999.999755859375, norm_rel=0.02284705825150013, ref_abs_avg=37.62277603149414, test_abs_avg=37.62663269042969
production_forward grad[10] vs paper_forward: mean_abs=0.8368165493011475, max_abs=5.125, mean_rel=0.22738172113895416, max_rel=1859.3748779296875, norm_rel=0.022543536499142647, ref_abs_avg=37.29949951171875, test_abs_avg=37.304847717285156
production_forward grad[11] vs paper_forward: mean_abs=0.6360292434692383, max_abs=2.375, mean_rel=0.16176337003707886, max_rel=11.02155876159668, norm_rel=0.02331395633518696, ref_abs_avg=27.387165069580078, test_abs_avg=27.37973403930664
production_forward grad[12] vs paper_forward: mean_abs=0.7949001789093018, max_abs=5.0, mean_rel=0.3855287432670593, max_rel=2187.5, norm_rel=0.02270597405731678, ref_abs_avg=35.220008850097656, test_abs_avg=35.223480224609375
production_forward grad[13] vs paper_forward: mean_abs=0.773632824420929, max_abs=4.75, mean_rel=0.23065777122974396, max_rel=2874.999755859375, norm_rel=0.02234330214560032, ref_abs_avg=34.84288787841797, test_abs_avg=34.848045349121094
production_forward grad[14] vs paper_forward: mean_abs=0.6402603983879089, max_abs=2.375, mean_rel=0.2497742921113968, max_rel=72.6661376953125, norm_rel=0.022637514397501945, ref_abs_avg=27.258831024169922, test_abs_avg=27.319318771362305
production_forward grad[15] vs paper_forward: mean_abs=0.7438783645629883, max_abs=4.7578125, mean_rel=0.3871786892414093, max_rel=2874.999755859375, norm_rel=0.022513290867209435, ref_abs_avg=33.24609375, test_abs_avg=33.251705169677734
production_forward grad[16] vs paper_forward: mean_abs=0.723299503326416, max_abs=4.25, mean_rel=0.2729055881500244, max_rel=2281.25, norm_rel=0.022273031994700432, ref_abs_avg=32.64417266845703, test_abs_avg=32.64942169189453
production_forward grad[17] vs paper_forward: mean_abs=0.5619542598724365, max_abs=2.25, mean_rel=0.07239760458469391, max_rel=2.909863233566284, norm_rel=0.022600268945097923, ref_abs_avg=25.22259521484375, test_abs_avg=25.230960845947266
production_forward grad[18] vs paper_forward: mean_abs=0.6934369802474976, max_abs=4.25, mean_rel=0.35275810956954956, max_rel=2999.999755859375, norm_rel=0.022303510457277298, ref_abs_avg=31.229719161987305, test_abs_avg=31.231369018554688
production_forward grad[19] vs paper_forward: mean_abs=0.6833133697509766, max_abs=5.0, mean_rel=0.2597936689853668, max_rel=3749.999755859375, norm_rel=0.022159021347761154, ref_abs_avg=30.976266860961914, test_abs_avg=30.979408264160156
production_forward grad[20] vs paper_forward: mean_abs=0.5476148128509521, max_abs=2.240234375, mean_rel=0.16805396974086761, max_rel=39.27169418334961, norm_rel=0.021440215408802032, ref_abs_avg=25.488330841064453, test_abs_avg=25.486238479614258
production_forward grad[21] vs paper_forward: mean_abs=0.6608970165252686, max_abs=4.125, mean_rel=0.33341342210769653, max_rel=2437.5, norm_rel=0.022346001118421555, ref_abs_avg=29.76357078552246, test_abs_avg=29.76540184020996
production_forward grad[22] vs paper_forward: mean_abs=0.6456248164176941, max_abs=4.25, mean_rel=0.24961625039577484, max_rel=1874.9998779296875, norm_rel=0.022043567150831223, ref_abs_avg=29.416351318359375, test_abs_avg=29.419551849365234
production_forward grad[23] vs paper_forward: mean_abs=0.49860286712646484, max_abs=2.25, mean_rel=0.15246231853961945, max_rel=18.201772689819336, norm_rel=0.021660542115569115, ref_abs_avg=23.674747467041016, test_abs_avg=23.72454261779785
production_forward grad[24] vs paper_forward: mean_abs=0.6289588212966919, max_abs=4.0, mean_rel=0.3250219225883484, max_rel=1999.9998779296875, norm_rel=0.022150862962007523, ref_abs_avg=28.560895919799805, test_abs_avg=28.563819885253906
production_forward grad[25] vs paper_forward: mean_abs=0.6162135601043701, max_abs=4.0, mean_rel=0.216140478849411, max_rel=1749.9998779296875, norm_rel=0.021794168278574944, ref_abs_avg=28.36595916748047, test_abs_avg=28.370161056518555
production_forward grad[26] vs paper_forward: mean_abs=0.6066062450408936, max_abs=2.40625, mean_rel=0.18835964798927307, max_rel=41.08372497558594, norm_rel=0.023988213390111923, ref_abs_avg=25.649154663085938, test_abs_avg=25.678142547607422
production_forward grad[27] vs paper_forward: mean_abs=0.7286376953125, max_abs=4.9375, mean_rel=0.3530241847038269, max_rel=2640.624755859375, norm_rel=0.02396545000374317, ref_abs_avg=30.570690155029297, test_abs_avg=30.577287673950195
production_forward grad[28] vs paper_forward: mean_abs=0.71254563331604, max_abs=4.5, mean_rel=0.251210480928421, max_rel=2390.625, norm_rel=0.023660006001591682, ref_abs_avg=30.233238220214844, test_abs_avg=30.24408721923828
production_forward grad[29] vs paper_forward: mean_abs=0.5605611801147461, max_abs=2.25, mean_rel=0.15797239542007446, max_rel=11.97568130493164, norm_rel=0.024662798270583153, ref_abs_avg=23.545520782470703, test_abs_avg=23.558271408081055
production_forward grad[30] vs paper_forward: mean_abs=0.6889122724533081, max_abs=4.25, mean_rel=0.3887021541595459, max_rel=2687.499755859375, norm_rel=0.024418022483587265, ref_abs_avg=28.33119773864746, test_abs_avg=28.336244583129883
production_forward grad[31] vs paper_forward: mean_abs=0.6735402345657349, max_abs=4.0, mean_rel=0.23396892845630646, max_rel=1890.6248779296875, norm_rel=0.024296695366501808, ref_abs_avg=27.838775634765625, test_abs_avg=27.844390869140625
production_forward grad[32] vs paper_forward: mean_abs=0.47575387358665466, max_abs=2.21484375, mean_rel=0.39405855536460876, max_rel=153.86956787109375, norm_rel=0.023877786472439766, ref_abs_avg=20.606718063354492, test_abs_avg=20.571914672851562
production_forward grad[33] vs paper_forward: mean_abs=0.6389240026473999, max_abs=4.125, mean_rel=0.3626914918422699, max_rel=2796.874755859375, norm_rel=0.024278732016682625, ref_abs_avg=26.435688018798828, test_abs_avg=26.440929412841797
production_forward grad[34] vs paper_forward: mean_abs=0.6277029514312744, max_abs=3.75, mean_rel=0.22350755333900452, max_rel=1374.9998779296875, norm_rel=0.02408279851078987, ref_abs_avg=26.199146270751953, test_abs_avg=26.202178955078125
production_forward grad[35] vs paper_forward: mean_abs=0.4878774881362915, max_abs=2.0, mean_rel=0.2702536880970001, max_rel=57.30256652832031, norm_rel=0.024107569828629494, ref_abs_avg=20.239421844482422, test_abs_avg=20.247913360595703
production_forward grad[36] vs paper_forward: mean_abs=0.5981341600418091, max_abs=4.0, mean_rel=0.31154245138168335, max_rel=2687.499755859375, norm_rel=0.023895282298326492, ref_abs_avg=25.104877471923828, test_abs_avg=25.108322143554688
production_forward grad[37] vs paper_forward: mean_abs=0.5897332429885864, max_abs=3.625, mean_rel=0.17011667788028717, max_rel=1218.75, norm_rel=0.023868173360824585, ref_abs_avg=24.73564338684082, test_abs_avg=24.732975006103516
production_forward grad[38] vs paper_forward: mean_abs=0.4465658664703369, max_abs=2.25, mean_rel=0.15438322722911835, max_rel=12.02711296081543, norm_rel=0.022725747898221016, ref_abs_avg=20.02178382873535, test_abs_avg=20.056209564208984
production_forward grad[39] vs paper_forward: mean_abs=0.5658649206161499, max_abs=3.5, mean_rel=0.2865141034126282, max_rel=1953.1248779296875, norm_rel=0.023837406188249588, ref_abs_avg=23.822280883789062, test_abs_avg=23.826786041259766
production_forward grad[40] vs paper_forward: mean_abs=0.5572620034217834, max_abs=4.0, mean_rel=0.2245531678199768, max_rel=2437.5, norm_rel=0.02406209334731102, ref_abs_avg=23.222286224365234, test_abs_avg=23.225643157958984
production_forward grad[41] vs paper_forward: mean_abs=0.4754953384399414, max_abs=2.0, mean_rel=0.07467465102672577, max_rel=2.4695513248443604, norm_rel=0.025740498676896095, ref_abs_avg=19.07509994506836, test_abs_avg=19.05921745300293
production_forward grad[42] vs paper_forward: mean_abs=0.5347285270690918, max_abs=3.25, mean_rel=0.27696532011032104, max_rel=1968.7498779296875, norm_rel=0.023555995896458626, ref_abs_avg=22.759305953979492, test_abs_avg=22.762401580810547
production_forward grad[43] vs paper_forward: mean_abs=0.5279946327209473, max_abs=3.25, mean_rel=0.22627916932106018, max_rel=1671.8748779296875, norm_rel=0.02365676313638687, ref_abs_avg=22.3861083984375, test_abs_avg=22.38298797607422
production_forward grad[44] vs paper_forward: mean_abs=0.4063425064086914, max_abs=1.43359375, mean_rel=0.1034744456410408, max_rel=5.105606555938721, norm_rel=0.02304207906126976, ref_abs_avg=17.370952606201172, test_abs_avg=17.378406524658203
production_forward grad[45] vs paper_forward: mean_abs=0.5090469717979431, max_abs=4.0, mean_rel=0.3001173734664917, max_rel=1937.4998779296875, norm_rel=0.023312747478485107, ref_abs_avg=21.88553237915039, test_abs_avg=21.886695861816406
production_forward grad[46] vs paper_forward: mean_abs=0.5008906126022339, max_abs=3.75, mean_rel=0.2081308662891388, max_rel=1281.25, norm_rel=0.023119444027543068, ref_abs_avg=21.696189880371094, test_abs_avg=21.694162368774414
production_forward grad[47] vs paper_forward: mean_abs=0.4129853844642639, max_abs=1.625, mean_rel=0.07306171953678131, max_rel=1.3363525867462158, norm_rel=0.023951513692736626, ref_abs_avg=16.739242553710938, test_abs_avg=16.71992301940918
production_forward grad[48] vs paper_forward: mean_abs=0.48837947845458984, max_abs=3.0, mean_rel=0.2993902266025543, max_rel=1562.4998779296875, norm_rel=0.023283405229449272, ref_abs_avg=21.01951789855957, test_abs_avg=21.020341873168945
production_forward grad[49] vs paper_forward: mean_abs=0.4822111427783966, max_abs=3.25, mean_rel=0.19711428880691528, max_rel=1218.75, norm_rel=0.02320772036910057, ref_abs_avg=20.804094314575195, test_abs_avg=20.808547973632812
production_forward grad[50] vs paper_forward: mean_abs=0.4501352310180664, max_abs=2.0, mean_rel=0.12679365277290344, max_rel=23.426956176757812, norm_rel=0.02358507178723812, ref_abs_avg=19.293697357177734, test_abs_avg=19.290424346923828
production_forward grad[51] vs paper_forward: mean_abs=0.5500832796096802, max_abs=4.0, mean_rel=0.3030139207839966, max_rel=2125.0, norm_rel=0.024104826152324677, ref_abs_avg=22.85861587524414, test_abs_avg=22.862838745117188
production_forward grad[52] vs paper_forward: mean_abs=0.5323808789253235, max_abs=3.25, mean_rel=0.2110104113817215, max_rel=1374.9998779296875, norm_rel=0.023952649906277657, ref_abs_avg=22.292659759521484, test_abs_avg=22.300914764404297
production_forward grad[53] vs paper_forward: mean_abs=0.44314074516296387, max_abs=1.5625, mean_rel=0.10853271186351776, max_rel=11.153543472290039, norm_rel=0.025182528421282768, ref_abs_avg=17.43967628479004, test_abs_avg=17.40729522705078
production_forward grad[54] vs paper_forward: mean_abs=0.5008724927902222, max_abs=3.375, mean_rel=0.30852362513542175, max_rel=2375.0, norm_rel=0.023863960057497025, ref_abs_avg=21.034650802612305, test_abs_avg=21.0389404296875
production_forward grad[55] vs paper_forward: mean_abs=0.48770007491111755, max_abs=3.15625, mean_rel=0.20766660571098328, max_rel=1671.8748779296875, norm_rel=0.023404782637953758, ref_abs_avg=20.858680725097656, test_abs_avg=20.86368751525879
production_forward grad[56] vs paper_forward: mean_abs=0.40517663955688477, max_abs=1.625, mean_rel=0.08222305774688721, max_rel=5.377387523651123, norm_rel=0.024050552397966385, ref_abs_avg=16.8723087310791, test_abs_avg=16.854286193847656
production_forward grad[57] vs paper_forward: mean_abs=0.4670841097831726, max_abs=3.125, mean_rel=0.2728216052055359, max_rel=2593.749755859375, norm_rel=0.023417405784130096, ref_abs_avg=19.974973678588867, test_abs_avg=19.97821807861328
production_forward grad[58] vs paper_forward: mean_abs=0.45706707239151, max_abs=2.90625, mean_rel=0.19871193170547485, max_rel=1624.9998779296875, norm_rel=0.023349298164248466, ref_abs_avg=19.60144805908203, test_abs_avg=19.60318374633789
production_forward grad[59] vs paper_forward: mean_abs=0.3405742645263672, max_abs=1.25, mean_rel=0.06764545291662216, max_rel=3.732480049133301, norm_rel=0.02237614430487156, ref_abs_avg=15.873289108276367, test_abs_avg=15.90664291381836
production_forward grad[60] vs paper_forward: mean_abs=0.44004499912261963, max_abs=3.0, mean_rel=0.2674647569656372, max_rel=1874.9998779296875, norm_rel=0.022986793890595436, ref_abs_avg=19.159875869750977, test_abs_avg=19.162071228027344
production_forward grad[61] vs paper_forward: mean_abs=0.42646172642707825, max_abs=3.125, mean_rel=0.19618088006973267, max_rel=1624.9998779296875, norm_rel=0.022639458999037743, ref_abs_avg=18.835948944091797, test_abs_avg=18.839290618896484
production_forward grad[62] vs paper_forward: mean_abs=0.3424968719482422, max_abs=1.5625, mean_rel=0.15934841334819794, max_rel=18.163135528564453, norm_rel=0.023442445322871208, ref_abs_avg=14.476025581359863, test_abs_avg=14.442276954650879
production_forward grad[63] vs paper_forward: mean_abs=0.4071807861328125, max_abs=3.21875, mean_rel=0.24713540077209473, max_rel=2109.375, norm_rel=0.02249998040497303, ref_abs_avg=18.0826358795166, test_abs_avg=18.084247589111328
production_forward grad[64] vs paper_forward: mean_abs=0.39893078804016113, max_abs=3.0, mean_rel=0.21281680464744568, max_rel=1843.7498779296875, norm_rel=0.02223573811352253, ref_abs_avg=17.942668914794922, test_abs_avg=17.943851470947266
production_forward grad[65] vs paper_forward: mean_abs=0.32924970984458923, max_abs=1.375, mean_rel=0.14080466330051422, max_rel=22.317096710205078, norm_rel=0.02252836711704731, ref_abs_avg=14.372766494750977, test_abs_avg=14.3516845703125
production_forward grad[66] vs paper_forward: mean_abs=0.3909444510936737, max_abs=2.875, mean_rel=0.24258528649806976, max_rel=1624.9998779296875, norm_rel=0.02215571142733097, ref_abs_avg=17.640853881835938, test_abs_avg=17.642230987548828
production_forward grad[67] vs paper_forward: mean_abs=0.3822891414165497, max_abs=2.8125, mean_rel=0.17686638236045837, max_rel=1218.75, norm_rel=0.022148434072732925, ref_abs_avg=17.30759048461914, test_abs_avg=17.30960464477539
production_forward grad[68] vs paper_forward: mean_abs=0.2942647933959961, max_abs=1.1875, mean_rel=0.07385742664337158, max_rel=4.478495121002197, norm_rel=0.022001542150974274, ref_abs_avg=13.908746719360352, test_abs_avg=13.920014381408691
production_forward grad[69] vs paper_forward: mean_abs=0.3688483238220215, max_abs=2.5, mean_rel=0.2220471203327179, max_rel=1749.9998779296875, norm_rel=0.02164873667061329, ref_abs_avg=17.0279598236084, test_abs_avg=17.029911041259766
production_forward grad[70] vs paper_forward: mean_abs=0.36919480562210083, max_abs=3.3779296875, mean_rel=0.17213013768196106, max_rel=1187.5, norm_rel=0.021822212263941765, ref_abs_avg=16.956562042236328, test_abs_avg=16.960058212280273
production_forward grad[71] vs paper_forward: mean_abs=0.2832911014556885, max_abs=1.125, mean_rel=0.06090712547302246, max_rel=3.679279088973999, norm_rel=0.018767785280942917, ref_abs_avg=14.888690948486328, test_abs_avg=14.90494155883789
production_forward grad[72] vs paper_forward: mean_abs=0.35829320549964905, max_abs=2.75, mean_rel=0.24114327132701874, max_rel=1437.4998779296875, norm_rel=0.021419377997517586, ref_abs_avg=16.728961944580078, test_abs_avg=16.72982406616211
production_forward grad[73] vs paper_forward: mean_abs=0.3534094989299774, max_abs=2.6875, mean_rel=0.17920845746994019, max_rel=999.9999389648438, norm_rel=0.021576015278697014, ref_abs_avg=16.39342498779297, test_abs_avg=16.396791458129883
production_forward grad[74] vs paper_forward: mean_abs=0.28634023666381836, max_abs=1.375, mean_rel=0.06513399630784988, max_rel=1.816113829612732, norm_rel=0.022051675245165825, ref_abs_avg=13.307266235351562, test_abs_avg=13.27767562866211
production_forward grad[75] vs paper_forward: mean_abs=0.3786771893501282, max_abs=2.8125, mean_rel=0.2464650273323059, max_rel=1343.7498779296875, norm_rel=0.023566750809550285, ref_abs_avg=16.098392486572266, test_abs_avg=16.101085662841797
production_forward grad[76] vs paper_forward: mean_abs=0.37092387676239014, max_abs=2.5, mean_rel=0.17646080255508423, max_rel=1406.2498779296875, norm_rel=0.02283794805407524, ref_abs_avg=16.235929489135742, test_abs_avg=16.23088836669922
production_forward grad[77] vs paper_forward: mean_abs=0.30356502532958984, max_abs=1.1875, mean_rel=0.10040082037448883, max_rel=10.27904224395752, norm_rel=0.023679658770561218, ref_abs_avg=12.923069953918457, test_abs_avg=12.9012451171875
production_forward grad[78] vs paper_forward: mean_abs=0.3480707108974457, max_abs=3.1875, mean_rel=0.219404399394989, max_rel=1499.9998779296875, norm_rel=0.02285470999777317, ref_abs_avg=15.245328903198242, test_abs_avg=15.24691390991211
production_forward grad[79] vs paper_forward: mean_abs=0.3405326008796692, max_abs=2.75, mean_rel=0.16783742606639862, max_rel=859.3749389648438, norm_rel=0.022843651473522186, ref_abs_avg=14.929855346679688, test_abs_avg=14.939491271972656
production_forward grad[80] vs paper_forward: mean_abs=0.26551905274391174, max_abs=1.0, mean_rel=0.4422324001789093, max_rel=151.3004150390625, norm_rel=0.02208888716995716, ref_abs_avg=12.319372177124023, test_abs_avg=12.34486198425293
production_forward grad[81] vs paper_forward: mean_abs=0.3286503851413727, max_abs=3.03125, mean_rel=0.21648699045181274, max_rel=1515.6248779296875, norm_rel=0.022339606657624245, ref_abs_avg=14.7099609375, test_abs_avg=14.71082592010498
production_forward grad[82] vs paper_forward: mean_abs=0.3232482373714447, max_abs=2.53125, mean_rel=0.15769445896148682, max_rel=945.3124389648438, norm_rel=0.022421546280384064, ref_abs_avg=14.469204902648926, test_abs_avg=14.469968795776367
production_forward grad[83] vs paper_forward: mean_abs=0.2518165707588196, max_abs=1.296875, mean_rel=0.08060109615325928, max_rel=6.766908645629883, norm_rel=0.022266676649451256, ref_abs_avg=11.722786903381348, test_abs_avg=11.713851928710938
production_forward grad[84] vs paper_forward: mean_abs=0.3112246096134186, max_abs=3.0, mean_rel=0.20886145532131195, max_rel=1125.0, norm_rel=0.0219044741243124, ref_abs_avg=14.227313995361328, test_abs_avg=14.227931022644043
production_forward grad[85] vs paper_forward: mean_abs=0.2986195683479309, max_abs=2.8125, mean_rel=0.16929204761981964, max_rel=999.9999389648438, norm_rel=0.022121848538517952, ref_abs_avg=13.575211524963379, test_abs_avg=13.577457427978516
production_forward grad[86] vs paper_forward: mean_abs=0.25557076930999756, max_abs=1.0, mean_rel=0.12455320358276367, max_rel=7.522261619567871, norm_rel=0.023515833541750908, ref_abs_avg=10.758685111999512, test_abs_avg=10.764371871948242
production_forward grad[87] vs paper_forward: mean_abs=0.2860739231109619, max_abs=2.75, mean_rel=0.24171647429466248, max_rel=1343.7498779296875, norm_rel=0.021467864513397217, ref_abs_avg=13.37537956237793, test_abs_avg=13.376228332519531
production_forward grad[88] vs paper_forward: mean_abs=0.28104454278945923, max_abs=2.25, mean_rel=0.1726798713207245, max_rel=1437.4998779296875, norm_rel=0.021072473376989365, ref_abs_avg=13.361084938049316, test_abs_avg=13.363449096679688
production_forward grad[89] vs paper_forward: mean_abs=0.23059141635894775, max_abs=0.875, mean_rel=0.09131640195846558, max_rel=12.214709281921387, norm_rel=0.019911326467990875, ref_abs_avg=11.682544708251953, test_abs_avg=11.674335479736328
production_forward grad[90] vs paper_forward: mean_abs=0.272255003452301, max_abs=3.5, mean_rel=0.19844821095466614, max_rel=1046.875, norm_rel=0.02062051370739937, ref_abs_avg=13.271541595458984, test_abs_avg=13.271957397460938
production_forward grad[91] vs paper_forward: mean_abs=0.2635205090045929, max_abs=2.53125, mean_rel=0.14818578958511353, max_rel=843.7499389648438, norm_rel=0.020299524068832397, ref_abs_avg=13.092889785766602, test_abs_avg=13.098148345947266
production_forward grad[92] vs paper_forward: mean_abs=0.22491693496704102, max_abs=0.9375, mean_rel=0.1124451607465744, max_rel=20.951520919799805, norm_rel=0.021148664876818657, ref_abs_avg=10.735759735107422, test_abs_avg=10.732950210571289
production_forward grad[93] vs paper_forward: mean_abs=0.2520468831062317, max_abs=3.0, mean_rel=0.20004281401634216, max_rel=1781.2498779296875, norm_rel=0.02011808007955551, ref_abs_avg=12.66917610168457, test_abs_avg=12.669696807861328
production_forward grad[94] vs paper_forward: mean_abs=0.24607723951339722, max_abs=2.5, mean_rel=0.1438222974538803, max_rel=703.1249389648438, norm_rel=0.01990147866308689, ref_abs_avg=12.527870178222656, test_abs_avg=12.529139518737793
production_forward grad[95] vs paper_forward: mean_abs=0.21243087947368622, max_abs=1.0, mean_rel=0.10460207611322403, max_rel=14.825494766235352, norm_rel=0.01965353451669216, ref_abs_avg=10.972978591918945, test_abs_avg=10.967432022094727
production_forward grad[96] vs paper_forward: mean_abs=0.24596744775772095, max_abs=3.125, mean_rel=0.16512688994407654, max_rel=1484.3748779296875, norm_rel=0.019846994429826736, ref_abs_avg=12.605910301208496, test_abs_avg=12.605249404907227
production_forward grad[97] vs paper_forward: mean_abs=0.2296103835105896, max_abs=2.53125, mean_rel=0.12434893846511841, max_rel=976.5624389648438, norm_rel=0.019295688718557358, ref_abs_avg=12.18551254272461, test_abs_avg=12.18825912475586
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016660474939271808, max_abs=0.03515625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008795153349637985, max_abs=0.484375, mean_rel=0.07549016922712326, max_rel=127.94539642333984, norm_rel=0.020631754770874977, ref_abs_avg=0.4600657820701599, test_abs_avg=0.46007609367370605
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=5.301075458526611, max_abs=40.0, mean_rel=0.1474834531545639, max_rel=110.08341217041016, norm_rel=0.021175215020775795, ref_abs_avg=224.78880310058594, test_abs_avg=224.8474884033203
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=0.9276199340820312, max_abs=4.0, mean_rel=0.10154200345277786, max_rel=4.832826137542725, norm_rel=0.026641706004738808, ref_abs_avg=35.474117279052734, test_abs_avg=35.48945236206055
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.100957989692688, max_abs=6.5625, mean_rel=0.5150226354598999, max_rel=3812.499755859375, norm_rel=0.02413857728242874, ref_abs_avg=45.86791229248047, test_abs_avg=45.86997985839844
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.0776374340057373, max_abs=6.625, mean_rel=0.28817614912986755, max_rel=3156.249755859375, norm_rel=0.023933636024594307, ref_abs_avg=45.2935791015625, test_abs_avg=45.293460845947266
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=0.7690801620483398, max_abs=3.5, mean_rel=0.09846532344818115, max_rel=8.952591896057129, norm_rel=0.023088397458195686, ref_abs_avg=34.386600494384766, test_abs_avg=34.379737854003906
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=0.9741805195808411, max_abs=6.5, mean_rel=0.473825603723526, max_rel=4375.0, norm_rel=0.02373555302619934, ref_abs_avg=41.25053024291992, test_abs_avg=41.25342559814453
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=0.949428379535675, max_abs=5.5, mean_rel=0.29486995935440063, max_rel=2624.999755859375, norm_rel=0.023386992514133453, ref_abs_avg=40.795616149902344, test_abs_avg=40.79694366455078
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.7431612014770508, max_abs=2.5, mean_rel=0.09169657528400421, max_rel=10.940535545349121, norm_rel=0.02369360439479351, ref_abs_avg=31.117130279541016, test_abs_avg=31.026199340820312
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=0.8820309042930603, max_abs=6.0, mean_rel=0.43945205211639404, max_rel=3249.999755859375, norm_rel=0.023575417697429657, ref_abs_avg=37.62277603149414, test_abs_avg=37.62164306640625
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=0.8657312393188477, max_abs=5.5, mean_rel=0.24641665816307068, max_rel=2093.75, norm_rel=0.02331128530204296, ref_abs_avg=37.29949951171875, test_abs_avg=37.30149841308594
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.6531314849853516, max_abs=2.75, mean_rel=0.17245033383369446, max_rel=18.572101593017578, norm_rel=0.024145783856511116, ref_abs_avg=27.387165069580078, test_abs_avg=27.40390396118164
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=0.8195544481277466, max_abs=5.25, mean_rel=0.3934154808521271, max_rel=3468.749755859375, norm_rel=0.02339855767786503, ref_abs_avg=35.220008850097656, test_abs_avg=35.22136688232422
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=0.7978272438049316, max_abs=4.625, mean_rel=0.23358651995658875, max_rel=2468.75, norm_rel=0.023041922599077225, ref_abs_avg=34.84288787841797, test_abs_avg=34.845619201660156
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.6641609072685242, max_abs=2.75, mean_rel=0.35643285512924194, max_rel=126.28257751464844, norm_rel=0.02350827306509018, ref_abs_avg=27.258831024169922, test_abs_avg=27.329906463623047
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=0.7655897736549377, max_abs=4.75, mean_rel=0.3950309753417969, max_rel=3187.499755859375, norm_rel=0.023167457431554794, ref_abs_avg=33.24609375, test_abs_avg=33.250511169433594
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=0.744899570941925, max_abs=4.25, mean_rel=0.25806641578674316, max_rel=2078.125, norm_rel=0.022922944277524948, ref_abs_avg=32.64417266845703, test_abs_avg=32.646629333496094
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.5902221202850342, max_abs=2.75, mean_rel=0.08307772129774094, max_rel=3.6131949424743652, norm_rel=0.023671112954616547, ref_abs_avg=25.22259521484375, test_abs_avg=25.242597579956055
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=0.712125837802887, max_abs=4.5, mean_rel=0.3734392523765564, max_rel=2375.0, norm_rel=0.02290213294327259, ref_abs_avg=31.229719161987305, test_abs_avg=31.23056411743164
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.701522946357727, max_abs=4.25, mean_rel=0.259413480758667, max_rel=2749.999755859375, norm_rel=0.022744974121451378, ref_abs_avg=30.976266860961914, test_abs_avg=30.97844886779785
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.5905864238739014, max_abs=2.0, mean_rel=0.14257006347179413, max_rel=26.934669494628906, norm_rel=0.02295061945915222, ref_abs_avg=25.488330841064453, test_abs_avg=25.48125457763672
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.6786319017410278, max_abs=4.0625, mean_rel=0.3460838794708252, max_rel=2375.0, norm_rel=0.02292875200510025, ref_abs_avg=29.76357078552246, test_abs_avg=29.76565170288086
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.6635147929191589, max_abs=4.0, mean_rel=0.261231929063797, max_rel=2203.125, norm_rel=0.022651933133602142, ref_abs_avg=29.416351318359375, test_abs_avg=29.41933822631836
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.5373392105102539, max_abs=2.0, mean_rel=0.1508064568042755, max_rel=16.837871551513672, norm_rel=0.02269929088652134, ref_abs_avg=23.674747467041016, test_abs_avg=23.7125186920166
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.6448228359222412, max_abs=4.0, mean_rel=0.3355834186077118, max_rel=2375.0, norm_rel=0.02270098216831684, ref_abs_avg=28.560895919799805, test_abs_avg=28.562883377075195
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.6307092905044556, max_abs=3.75, mean_rel=0.22738389670848846, max_rel=2062.5, norm_rel=0.022314012050628662, ref_abs_avg=28.36595916748047, test_abs_avg=28.36859893798828
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.6076529026031494, max_abs=2.5, mean_rel=0.2111211121082306, max_rel=46.79473114013672, norm_rel=0.02426801435649395, ref_abs_avg=25.649154663085938, test_abs_avg=25.67198371887207
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=0.7475307583808899, max_abs=5.0, mean_rel=0.3704819679260254, max_rel=3124.999755859375, norm_rel=0.024577436968684196, ref_abs_avg=30.570690155029297, test_abs_avg=30.577239990234375
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.7317333221435547, max_abs=4.5, mean_rel=0.26900631189346313, max_rel=2500.0, norm_rel=0.024284135550260544, ref_abs_avg=30.233238220214844, test_abs_avg=30.241121292114258
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.5870203971862793, max_abs=2.25, mean_rel=0.16650544106960297, max_rel=11.296934127807617, norm_rel=0.025244513526558876, ref_abs_avg=23.545520782470703, test_abs_avg=23.550140380859375
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.7044100761413574, max_abs=4.75, mean_rel=0.4046538770198822, max_rel=2624.999755859375, norm_rel=0.024959390982985497, ref_abs_avg=28.33119773864746, test_abs_avg=28.334218978881836
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.6882776021957397, max_abs=4.125, mean_rel=0.23149491846561432, max_rel=1906.2498779296875, norm_rel=0.024840328842401505, ref_abs_avg=27.838775634765625, test_abs_avg=27.843006134033203
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.5093213319778442, max_abs=2.375, mean_rel=0.3892762064933777, max_rel=142.59767150878906, norm_rel=0.024832550436258316, ref_abs_avg=20.606718063354492, test_abs_avg=20.584766387939453
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.6527535915374756, max_abs=4.375, mean_rel=0.36881202459335327, max_rel=2484.375, norm_rel=0.024783644825220108, ref_abs_avg=26.435688018798828, test_abs_avg=26.439851760864258
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.6410864591598511, max_abs=3.625, mean_rel=0.22706103324890137, max_rel=1781.2498779296875, norm_rel=0.024589568376541138, ref_abs_avg=26.199146270751953, test_abs_avg=26.204559326171875
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.50788414478302, max_abs=2.0, mean_rel=0.19311481714248657, max_rel=38.67213821411133, norm_rel=0.024789944291114807, ref_abs_avg=20.239421844482422, test_abs_avg=20.270591735839844
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.6097210049629211, max_abs=3.875, mean_rel=0.317084401845932, max_rel=2781.249755859375, norm_rel=0.024347830563783646, ref_abs_avg=25.104877471923828, test_abs_avg=25.108312606811523
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.5986956357955933, max_abs=3.75, mean_rel=0.16949979960918427, max_rel=1107.21337890625, norm_rel=0.024249771609902382, ref_abs_avg=24.73564338684082, test_abs_avg=24.73554039001465
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.48169398307800293, max_abs=1.875, mean_rel=0.21239778399467468, max_rel=25.410158157348633, norm_rel=0.02393406815826893, ref_abs_avg=20.02178382873535, test_abs_avg=20.035722732543945
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.5764164328575134, max_abs=3.5, mean_rel=0.30161458253860474, max_rel=2125.0, norm_rel=0.024275412783026695, ref_abs_avg=23.822280883789062, test_abs_avg=23.825551986694336
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.5680235624313354, max_abs=3.84375, mean_rel=0.2344270944595337, max_rel=2593.749755859375, norm_rel=0.02451973222196102, ref_abs_avg=23.222286224365234, test_abs_avg=23.22415542602539
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.4788482189178467, max_abs=2.125, mean_rel=0.08096587657928467, max_rel=2.916431427001953, norm_rel=0.02556358277797699, ref_abs_avg=19.07509994506836, test_abs_avg=19.04537582397461
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.5436545610427856, max_abs=3.25, mean_rel=0.28582829236984253, max_rel=2375.0, norm_rel=0.023941824212670326, ref_abs_avg=22.759305953979492, test_abs_avg=22.762046813964844
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.5363178253173828, max_abs=3.375, mean_rel=0.23155824840068817, max_rel=1468.7498779296875, norm_rel=0.024029355496168137, ref_abs_avg=22.3861083984375, test_abs_avg=22.383317947387695
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.41514021158218384, max_abs=1.5078125, mean_rel=0.10774683952331543, max_rel=4.9114766120910645, norm_rel=0.02358683943748474, ref_abs_avg=17.370952606201172, test_abs_avg=17.374483108520508
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.5178704857826233, max_abs=3.53125, mean_rel=0.29836374521255493, max_rel=1937.4998779296875, norm_rel=0.023693667724728584, ref_abs_avg=21.88553237915039, test_abs_avg=21.886550903320312
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.5078302621841431, max_abs=3.5, mean_rel=0.20842760801315308, max_rel=1624.9998779296875, norm_rel=0.02346673607826233, ref_abs_avg=21.696189880371094, test_abs_avg=21.694936752319336
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.41507911682128906, max_abs=1.875, mean_rel=0.07397622615098953, max_rel=1.8653254508972168, norm_rel=0.024267038330435753, ref_abs_avg=16.739242553710938, test_abs_avg=16.737016677856445
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.49473637342453003, max_abs=3.296875, mean_rel=0.3041794002056122, max_rel=2250.0, norm_rel=0.023597724735736847, ref_abs_avg=21.01951789855957, test_abs_avg=21.02069854736328
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.48987704515457153, max_abs=3.0, mean_rel=0.19166763126850128, max_rel=1562.4998779296875, norm_rel=0.023574912920594215, ref_abs_avg=20.804094314575195, test_abs_avg=20.80915641784668
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.47037315368652344, max_abs=1.75, mean_rel=0.10553858429193497, max_rel=11.953587532043457, norm_rel=0.02407827600836754, ref_abs_avg=19.293697357177734, test_abs_avg=19.30217933654785
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.5589489340782166, max_abs=3.875, mean_rel=0.30913421511650085, max_rel=1906.2498779296875, norm_rel=0.02448870614171028, ref_abs_avg=22.85861587524414, test_abs_avg=22.862960815429688
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.5392613410949707, max_abs=3.5, mean_rel=0.217403843998909, max_rel=1531.2498779296875, norm_rel=0.024250837042927742, ref_abs_avg=22.292659759521484, test_abs_avg=22.300670623779297
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.4520721435546875, max_abs=1.75, mean_rel=0.12072225660085678, max_rel=12.923686981201172, norm_rel=0.02586476318538189, ref_abs_avg=17.43967628479004, test_abs_avg=17.402976989746094
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.5095855593681335, max_abs=3.5, mean_rel=0.310454398393631, max_rel=2312.5, norm_rel=0.024259746074676514, ref_abs_avg=21.034650802612305, test_abs_avg=21.03889274597168
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.49743983149528503, max_abs=3.0625, mean_rel=0.20852801203727722, max_rel=1499.9998779296875, norm_rel=0.023856431245803833, ref_abs_avg=20.858680725097656, test_abs_avg=20.862424850463867
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.41153621673583984, max_abs=1.5, mean_rel=0.08381270617246628, max_rel=6.580990791320801, norm_rel=0.02458474598824978, ref_abs_avg=16.8723087310791, test_abs_avg=16.85849380493164
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.4742625951766968, max_abs=3.21875, mean_rel=0.27552321553230286, max_rel=1687.4998779296875, norm_rel=0.023753007873892784, ref_abs_avg=19.974973678588867, test_abs_avg=19.97728157043457
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.4632062613964081, max_abs=2.75, mean_rel=0.19888631999492645, max_rel=1624.9998779296875, norm_rel=0.02367682009935379, ref_abs_avg=19.60144805908203, test_abs_avg=19.600933074951172
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.37157320976257324, max_abs=1.28125, mean_rel=0.07546370476484299, max_rel=4.836989402770996, norm_rel=0.02397286146879196, ref_abs_avg=15.873289108276367, test_abs_avg=15.916666984558105
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.4458408057689667, max_abs=3.0, mean_rel=0.2661173343658447, max_rel=1437.4998779296875, norm_rel=0.02328488789498806, ref_abs_avg=19.159875869750977, test_abs_avg=19.161945343017578
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.43478286266326904, max_abs=2.75, mean_rel=0.19840434193611145, max_rel=1531.2498779296875, norm_rel=0.023077717050909996, ref_abs_avg=18.835948944091797, test_abs_avg=18.838733673095703
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.3435596525669098, max_abs=1.4375, mean_rel=0.19540750980377197, max_rel=30.320133209228516, norm_rel=0.02341267839074135, ref_abs_avg=14.476025581359863, test_abs_avg=14.45240592956543
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.41188275814056396, max_abs=2.75, mean_rel=0.2501086890697479, max_rel=1687.4998779296875, norm_rel=0.02276524528861046, ref_abs_avg=18.0826358795166, test_abs_avg=18.0838623046875
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.4044736623764038, max_abs=2.75, mean_rel=0.22074130177497864, max_rel=2250.0, norm_rel=0.022532610222697258, ref_abs_avg=17.942668914794922, test_abs_avg=17.943218231201172
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.3396916389465332, max_abs=1.5, mean_rel=0.1259998381137848, max_rel=14.737486839294434, norm_rel=0.02293999493122101, ref_abs_avg=14.372766494750977, test_abs_avg=14.346692085266113
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.39539703726768494, max_abs=2.75, mean_rel=0.23658083379268646, max_rel=1499.9998779296875, norm_rel=0.02239641174674034, ref_abs_avg=17.640853881835938, test_abs_avg=17.642166137695312
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.3860263228416443, max_abs=2.625, mean_rel=0.17998719215393066, max_rel=1437.4998779296875, norm_rel=0.02235013246536255, ref_abs_avg=17.30759048461914, test_abs_avg=17.30736541748047
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.3035874366760254, max_abs=1.3125, mean_rel=0.07290492206811905, max_rel=4.549686908721924, norm_rel=0.02246750518679619, ref_abs_avg=13.908746719360352, test_abs_avg=13.929866790771484
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.3727935552597046, max_abs=2.75, mean_rel=0.22945180535316467, max_rel=1749.9998779296875, norm_rel=0.02188294567167759, ref_abs_avg=17.0279598236084, test_abs_avg=17.029516220092773
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.3712831735610962, max_abs=3.1904296875, mean_rel=0.17078016698360443, max_rel=1031.25, norm_rel=0.021942107006907463, ref_abs_avg=16.956562042236328, test_abs_avg=16.96044158935547
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.28732728958129883, max_abs=1.125, mean_rel=0.06041457876563072, max_rel=2.2206671237945557, norm_rel=0.018957290798425674, ref_abs_avg=14.888690948486328, test_abs_avg=14.914785385131836
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.3613988161087036, max_abs=2.875, mean_rel=0.24807092547416687, max_rel=1374.9998779296875, norm_rel=0.021590523421764374, ref_abs_avg=16.728961944580078, test_abs_avg=16.729454040527344
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.35580793023109436, max_abs=3.125, mean_rel=0.18472695350646973, max_rel=874.9999389648438, norm_rel=0.02172030322253704, ref_abs_avg=16.39342498779297, test_abs_avg=16.39472770690918
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.30904901027679443, max_abs=1.0, mean_rel=0.07094857096672058, max_rel=1.5479555130004883, norm_rel=0.023106688633561134, ref_abs_avg=13.307266235351562, test_abs_avg=13.289194107055664
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.3827005624771118, max_abs=2.75, mean_rel=0.24594777822494507, max_rel=1718.7498779296875, norm_rel=0.02379421889781952, ref_abs_avg=16.098392486572266, test_abs_avg=16.100982666015625
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.3753819167613983, max_abs=2.5625, mean_rel=0.18752899765968323, max_rel=1624.9998779296875, norm_rel=0.023124247789382935, ref_abs_avg=16.235929489135742, test_abs_avg=16.231847763061523
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.29718732833862305, max_abs=1.25, mean_rel=0.1015322357416153, max_rel=9.08598804473877, norm_rel=0.023534083738923073, ref_abs_avg=12.923069953918457, test_abs_avg=12.918262481689453
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.3518837094306946, max_abs=3.5, mean_rel=0.22027575969696045, max_rel=1499.9998779296875, norm_rel=0.02309395745396614, ref_abs_avg=15.245328903198242, test_abs_avg=15.247604370117188
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.34582453966140747, max_abs=2.75, mean_rel=0.1745745688676834, max_rel=1031.25, norm_rel=0.0231636893004179, ref_abs_avg=14.929855346679688, test_abs_avg=14.940502166748047
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.27212420105934143, max_abs=1.091796875, mean_rel=0.4714643061161041, max_rel=140.91653442382812, norm_rel=0.02276458963751793, ref_abs_avg=12.319372177124023, test_abs_avg=12.329559326171875
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.3312557339668274, max_abs=2.65625, mean_rel=0.22667613625526428, max_rel=1929.6873779296875, norm_rel=0.022526714950799942, ref_abs_avg=14.7099609375, test_abs_avg=14.710326194763184
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.3273400068283081, max_abs=2.5, mean_rel=0.16157761216163635, max_rel=796.8749389648438, norm_rel=0.02268267795443535, ref_abs_avg=14.469204902648926, test_abs_avg=14.46898078918457
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.2604236602783203, max_abs=1.375, mean_rel=0.08705376088619232, max_rel=7.044288158416748, norm_rel=0.02282167598605156, ref_abs_avg=11.722786903381348, test_abs_avg=11.72310733795166
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.3135335445404053, max_abs=3.125, mean_rel=0.21161571145057678, max_rel=1125.0, norm_rel=0.02206621877849102, ref_abs_avg=14.227313995361328, test_abs_avg=14.227725982666016
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.30118975043296814, max_abs=2.6875, mean_rel=0.16584324836730957, max_rel=953.1249389648438, norm_rel=0.022267786785960197, ref_abs_avg=13.575211524963379, test_abs_avg=13.576786994934082
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.2416517734527588, max_abs=1.0, mean_rel=0.1159229651093483, max_rel=7.250263690948486, norm_rel=0.02268083207309246, ref_abs_avg=10.758685111999512, test_abs_avg=10.771507263183594
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.28790968656539917, max_abs=2.625, mean_rel=0.23952944576740265, max_rel=1343.7498779296875, norm_rel=0.021583780646324158, ref_abs_avg=13.37537956237793, test_abs_avg=13.375787734985352
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.2849249839782715, max_abs=2.453125, mean_rel=0.1676032692193985, max_rel=1015.6249389648438, norm_rel=0.021382678300142288, ref_abs_avg=13.361084938049316, test_abs_avg=13.36194133758545
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.22624433040618896, max_abs=0.875, mean_rel=0.08988979458808899, max_rel=13.989458084106445, norm_rel=0.019745297729969025, ref_abs_avg=11.682544708251953, test_abs_avg=11.688526153564453
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.274126797914505, max_abs=3.5625, mean_rel=0.20101609826087952, max_rel=1148.4375, norm_rel=0.020771922543644905, ref_abs_avg=13.271541595458984, test_abs_avg=13.27206802368164
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.26688095927238464, max_abs=2.5, mean_rel=0.14456894993782043, max_rel=593.75, norm_rel=0.02056942693889141, ref_abs_avg=13.092889785766602, test_abs_avg=13.099349975585938
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.2238701581954956, max_abs=1.0, mean_rel=0.09619911015033722, max_rel=12.63537883758545, norm_rel=0.02129758894443512, ref_abs_avg=10.735759735107422, test_abs_avg=10.725440979003906
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.2530471086502075, max_abs=3.0, mean_rel=0.20448271930217743, max_rel=2093.75, norm_rel=0.02018611878156662, ref_abs_avg=12.66917610168457, test_abs_avg=12.669975280761719
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.24763812124729156, max_abs=2.75, mean_rel=0.145236074924469, max_rel=874.9999389648438, norm_rel=0.020077990368008614, ref_abs_avg=12.527870178222656, test_abs_avg=12.52806568145752
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.2105032205581665, max_abs=0.875, mean_rel=0.08302993327379227, max_rel=8.392487525939941, norm_rel=0.019653646275401115, ref_abs_avg=10.972978591918945, test_abs_avg=10.962709426879883
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.24625417590141296, max_abs=3.25, mean_rel=0.1618860512971878, max_rel=1468.7498779296875, norm_rel=0.019856588914990425, ref_abs_avg=12.605910301208496, test_abs_avg=12.605535507202148
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.23233141005039215, max_abs=2.25, mean_rel=0.12245278805494308, max_rel=625.0, norm_rel=0.019458865746855736, ref_abs_avg=12.18551254272461, test_abs_avg=12.184371948242188
liger_forward vs paper_forward output: mean_abs=0.00038585939910262823, max_abs=0.03125
liger_forward grad[0] vs paper_forward: mean_abs=0.004702524747699499, max_abs=0.203125, mean_rel=0.03638814762234688, max_rel=61.644744873046875, norm_rel=0.012401288375258446, ref_abs_avg=0.4600657820701599, test_abs_avg=0.460044264793396
liger_forward grad[1] vs paper_forward: mean_abs=2.4941816329956055, max_abs=16.0, mean_rel=0.07282566279172897, max_rel=117.22714233398438, norm_rel=0.010157186537981033, ref_abs_avg=224.78880310058594, test_abs_avg=224.84486389160156
liger_forward grad[2] vs paper_forward: mean_abs=0.48952674865722656, max_abs=2.0, mean_rel=0.0499117337167263, max_rel=3.018594741821289, norm_rel=0.014294338412582874, ref_abs_avg=35.474117279052734, test_abs_avg=35.43171691894531
liger_forward grad[3] vs paper_forward: mean_abs=0.6003615260124207, max_abs=4.0, mean_rel=0.29087093472480774, max_rel=2500.0, norm_rel=0.013335256837308407, ref_abs_avg=45.86791229248047, test_abs_avg=45.86886215209961
liger_forward grad[4] vs paper_forward: mean_abs=0.5814459919929504, max_abs=3.5625, mean_rel=0.15147140622138977, max_rel=1359.3748779296875, norm_rel=0.013066079467535019, ref_abs_avg=45.2935791015625, test_abs_avg=45.293338775634766
liger_forward grad[5] vs paper_forward: mean_abs=0.4340848922729492, max_abs=1.6328125, mean_rel=0.08878309279680252, max_rel=19.327077865600586, norm_rel=0.013105095364153385, ref_abs_avg=34.386600494384766, test_abs_avg=34.383628845214844
liger_forward grad[6] vs paper_forward: mean_abs=0.5265727043151855, max_abs=3.25, mean_rel=0.23234263062477112, max_rel=1749.9998779296875, norm_rel=0.01301246415823698, ref_abs_avg=41.25053024291992, test_abs_avg=41.249900817871094
liger_forward grad[7] vs paper_forward: mean_abs=0.5100048780441284, max_abs=3.5, mean_rel=0.15609890222549438, max_rel=1812.4998779296875, norm_rel=0.012729666195809841, ref_abs_avg=40.795616149902344, test_abs_avg=40.796974182128906
liger_forward grad[8] vs paper_forward: mean_abs=0.3697662353515625, max_abs=1.375, mean_rel=0.038704823702573776, max_rel=1.3574998378753662, norm_rel=0.011975618079304695, ref_abs_avg=31.117130279541016, test_abs_avg=31.1103515625
liger_forward grad[9] vs paper_forward: mean_abs=0.47231149673461914, max_abs=3.25, mean_rel=0.20391035079956055, max_rel=1656.2498779296875, norm_rel=0.012791852466762066, ref_abs_avg=37.62277603149414, test_abs_avg=37.62274169921875
liger_forward grad[10] vs paper_forward: mean_abs=0.46046045422554016, max_abs=2.75, mean_rel=0.1389351338148117, max_rel=1312.4998779296875, norm_rel=0.012563515454530716, ref_abs_avg=37.29949951171875, test_abs_avg=37.30130386352539
liger_forward grad[11] vs paper_forward: mean_abs=0.3671722412109375, max_abs=1.5625, mean_rel=0.09256426990032196, max_rel=7.954150199890137, norm_rel=0.01355702430009842, ref_abs_avg=27.387165069580078, test_abs_avg=27.379531860351562
liger_forward grad[12] vs paper_forward: mean_abs=0.4351508617401123, max_abs=2.625, mean_rel=0.21703337132930756, max_rel=1749.9998779296875, norm_rel=0.012604412622749805, ref_abs_avg=35.220008850097656, test_abs_avg=35.221744537353516
liger_forward grad[13] vs paper_forward: mean_abs=0.4200195372104645, max_abs=2.5, mean_rel=0.1304604411125183, max_rel=1437.4998779296875, norm_rel=0.012295175343751907, ref_abs_avg=34.84288787841797, test_abs_avg=34.83959197998047
liger_forward grad[14] vs paper_forward: mean_abs=0.3403356671333313, max_abs=1.75, mean_rel=0.1397155374288559, max_rel=43.81913375854492, norm_rel=0.01246617455035448, ref_abs_avg=27.258831024169922, test_abs_avg=27.253681182861328
liger_forward grad[15] vs paper_forward: mean_abs=0.4045390188694, max_abs=2.59375, mean_rel=0.21235623955726624, max_rel=1374.9998779296875, norm_rel=0.012415549717843533, ref_abs_avg=33.24609375, test_abs_avg=33.24802780151367
liger_forward grad[16] vs paper_forward: mean_abs=0.3910837173461914, max_abs=2.4375, mean_rel=0.1341622769832611, max_rel=1093.75, norm_rel=0.012220828793942928, ref_abs_avg=32.64417266845703, test_abs_avg=32.64379119873047
liger_forward grad[17] vs paper_forward: mean_abs=0.3096189498901367, max_abs=1.25, mean_rel=0.045786745846271515, max_rel=4.043051242828369, norm_rel=0.01273912750184536, ref_abs_avg=25.22259521484375, test_abs_avg=25.192302703857422
liger_forward grad[18] vs paper_forward: mean_abs=0.3750329315662384, max_abs=2.5, mean_rel=0.1862451136112213, max_rel=1250.0, norm_rel=0.012245974503457546, ref_abs_avg=31.229719161987305, test_abs_avg=31.22920036315918
liger_forward grad[19] vs paper_forward: mean_abs=0.36561116576194763, max_abs=2.5, mean_rel=0.12217946350574493, max_rel=937.4999389648438, norm_rel=0.012039793655276299, ref_abs_avg=30.976266860961914, test_abs_avg=30.978137969970703
liger_forward grad[20] vs paper_forward: mean_abs=0.29506754875183105, max_abs=1.25, mean_rel=0.0728675127029419, max_rel=11.186270713806152, norm_rel=0.011953305453062057, ref_abs_avg=25.488330841064453, test_abs_avg=25.49544906616211
liger_forward grad[21] vs paper_forward: mean_abs=0.35344403982162476, max_abs=2.25, mean_rel=0.17302316427230835, max_rel=1499.9998779296875, norm_rel=0.012131499126553535, ref_abs_avg=29.76357078552246, test_abs_avg=29.76443862915039
liger_forward grad[22] vs paper_forward: mean_abs=0.3439379632472992, max_abs=2.125, mean_rel=0.1289680004119873, max_rel=890.6249389648438, norm_rel=0.011922161094844341, ref_abs_avg=29.416351318359375, test_abs_avg=29.417274475097656
liger_forward grad[23] vs paper_forward: mean_abs=0.2849764823913574, max_abs=1.078125, mean_rel=0.05475275218486786, max_rel=3.9880900382995605, norm_rel=0.012371310032904148, ref_abs_avg=23.674747467041016, test_abs_avg=23.681991577148438
liger_forward grad[24] vs paper_forward: mean_abs=0.3354838192462921, max_abs=2.125, mean_rel=0.16433103382587433, max_rel=999.9999389648438, norm_rel=0.011990544386208057, ref_abs_avg=28.560895919799805, test_abs_avg=28.561500549316406
liger_forward grad[25] vs paper_forward: mean_abs=0.32437822222709656, max_abs=2.0, mean_rel=0.11352013051509857, max_rel=874.9999389648438, norm_rel=0.01165814884006977, ref_abs_avg=28.36595916748047, test_abs_avg=28.368568420410156
liger_forward grad[26] vs paper_forward: mean_abs=0.31140875816345215, max_abs=1.25, mean_rel=0.07566532492637634, max_rel=12.93857192993164, norm_rel=0.012421555817127228, ref_abs_avg=25.649154663085938, test_abs_avg=25.631954193115234
liger_forward grad[27] vs paper_forward: mean_abs=0.36927270889282227, max_abs=2.5625, mean_rel=0.1812010258436203, max_rel=1250.0, norm_rel=0.012334072031080723, ref_abs_avg=30.570690155029297, test_abs_avg=30.573604583740234
liger_forward grad[28] vs paper_forward: mean_abs=0.36067333817481995, max_abs=2.25, mean_rel=0.12950636446475983, max_rel=1968.7498779296875, norm_rel=0.012145807035267353, ref_abs_avg=30.233238220214844, test_abs_avg=30.231861114501953
liger_forward grad[29] vs paper_forward: mean_abs=0.28394436836242676, max_abs=1.5, mean_rel=0.08586951345205307, max_rel=7.516844272613525, norm_rel=0.01256791315972805, ref_abs_avg=23.545520782470703, test_abs_avg=23.572694778442383
liger_forward grad[30] vs paper_forward: mean_abs=0.3377203941345215, max_abs=2.25, mean_rel=0.17532075941562653, max_rel=1125.0, norm_rel=0.01215872261673212, ref_abs_avg=28.33119773864746, test_abs_avg=28.33176612854004
liger_forward grad[31] vs paper_forward: mean_abs=0.32628095149993896, max_abs=2.0, mean_rel=0.11085618287324905, max_rel=1421.8748779296875, norm_rel=0.011952908709645271, ref_abs_avg=27.838775634765625, test_abs_avg=27.837854385375977
liger_forward grad[32] vs paper_forward: mean_abs=0.25261786580085754, max_abs=1.0, mean_rel=0.19986562430858612, max_rel=71.56852722167969, norm_rel=0.012438679113984108, ref_abs_avg=20.606718063354492, test_abs_avg=20.603260040283203
liger_forward grad[33] vs paper_forward: mean_abs=0.309715211391449, max_abs=2.25, mean_rel=0.16640636324882507, max_rel=1687.4998779296875, norm_rel=0.011968491598963737, ref_abs_avg=26.435688018798828, test_abs_avg=26.435771942138672
liger_forward grad[34] vs paper_forward: mean_abs=0.3003780245780945, max_abs=2.0, mean_rel=0.1045701876282692, max_rel=687.4999389648438, norm_rel=0.011695345863699913, ref_abs_avg=26.199146270751953, test_abs_avg=26.196529388427734
liger_forward grad[35] vs paper_forward: mean_abs=0.2278686761856079, max_abs=1.0, mean_rel=0.06178806722164154, max_rel=8.466526985168457, norm_rel=0.011498657986521721, ref_abs_avg=20.239421844482422, test_abs_avg=20.274566650390625
liger_forward grad[36] vs paper_forward: mean_abs=0.28692126274108887, max_abs=2.0, mean_rel=0.15348626673221588, max_rel=1312.4998779296875, norm_rel=0.011681056581437588, ref_abs_avg=25.104877471923828, test_abs_avg=25.10515594482422
liger_forward grad[37] vs paper_forward: mean_abs=0.2781580686569214, max_abs=2.0, mean_rel=0.08744120597839355, max_rel=781.2499389648438, norm_rel=0.011468100361526012, ref_abs_avg=24.73564338684082, test_abs_avg=24.735530853271484
liger_forward grad[38] vs paper_forward: mean_abs=0.22899067401885986, max_abs=1.140625, mean_rel=0.09393744170665741, max_rel=11.335542678833008, norm_rel=0.011893205344676971, ref_abs_avg=20.02178382873535, test_abs_avg=20.015872955322266
liger_forward grad[39] vs paper_forward: mean_abs=0.27006858587265015, max_abs=1.875, mean_rel=0.14058656990528107, max_rel=1374.9998779296875, norm_rel=0.011574692092835903, ref_abs_avg=23.822280883789062, test_abs_avg=23.822629928588867
liger_forward grad[40] vs paper_forward: mean_abs=0.26095524430274963, max_abs=1.75, mean_rel=0.10440540313720703, max_rel=874.9999389648438, norm_rel=0.011456874199211597, ref_abs_avg=23.222286224365234, test_abs_avg=23.225200653076172
liger_forward grad[41] vs paper_forward: mean_abs=0.2133646011352539, max_abs=0.75, mean_rel=0.033525481820106506, max_rel=0.9557846188545227, norm_rel=0.011504155583679676, ref_abs_avg=19.07509994506836, test_abs_avg=19.081411361694336
liger_forward grad[42] vs paper_forward: mean_abs=0.25419822335243225, max_abs=1.75, mean_rel=0.14113126695156097, max_rel=937.4999389648438, norm_rel=0.011403319425880909, ref_abs_avg=22.759305953979492, test_abs_avg=22.759437561035156
liger_forward grad[43] vs paper_forward: mean_abs=0.2452203780412674, max_abs=1.5, mean_rel=0.09540539234876633, max_rel=554.6875, norm_rel=0.011190001852810383, ref_abs_avg=22.3861083984375, test_abs_avg=22.385822296142578
liger_forward grad[44] vs paper_forward: mean_abs=0.19289088249206543, max_abs=0.6875, mean_rel=0.045555830001831055, max_rel=2.427865743637085, norm_rel=0.01113906130194664, ref_abs_avg=17.370952606201172, test_abs_avg=17.375179290771484
liger_forward grad[45] vs paper_forward: mean_abs=0.24063998460769653, max_abs=1.5, mean_rel=0.13805994391441345, max_rel=999.9999389648438, norm_rel=0.011226058006286621, ref_abs_avg=21.88553237915039, test_abs_avg=21.88558578491211
liger_forward grad[46] vs paper_forward: mean_abs=0.23405541479587555, max_abs=1.5, mean_rel=0.09749211370944977, max_rel=781.2499389648438, norm_rel=0.011015874333679676, ref_abs_avg=21.696189880371094, test_abs_avg=21.695350646972656
liger_forward grad[47] vs paper_forward: mean_abs=0.18409422039985657, max_abs=0.78125, mean_rel=0.03343389555811882, max_rel=1.2433773279190063, norm_rel=0.011105437763035297, ref_abs_avg=16.739242553710938, test_abs_avg=16.73394012451172
liger_forward grad[48] vs paper_forward: mean_abs=0.2290748655796051, max_abs=1.625, mean_rel=0.14056634902954102, max_rel=937.4999389648438, norm_rel=0.011141409166157246, ref_abs_avg=21.01951789855957, test_abs_avg=21.0199031829834
liger_forward grad[49] vs paper_forward: mean_abs=0.22250314056873322, max_abs=1.5, mean_rel=0.08687359094619751, max_rel=874.9999389648438, norm_rel=0.0109290461987257, ref_abs_avg=20.804094314575195, test_abs_avg=20.80472183227539
liger_forward grad[50] vs paper_forward: mean_abs=0.18821144104003906, max_abs=0.8125, mean_rel=0.04278096556663513, max_rel=2.8462417125701904, norm_rel=0.010072242468595505, ref_abs_avg=19.293697357177734, test_abs_avg=19.30657196044922
liger_forward grad[51] vs paper_forward: mean_abs=0.26065170764923096, max_abs=2.0, mean_rel=0.14220955967903137, max_rel=874.9999389648438, norm_rel=0.011628331616520882, ref_abs_avg=22.85861587524414, test_abs_avg=22.858545303344727
liger_forward grad[52] vs paper_forward: mean_abs=0.2505020797252655, max_abs=1.8125, mean_rel=0.10353630781173706, max_rel=718.7499389648438, norm_rel=0.011453536339104176, ref_abs_avg=22.292659759521484, test_abs_avg=22.294639587402344
liger_forward grad[53] vs paper_forward: mean_abs=0.18975399434566498, max_abs=1.0, mean_rel=0.046773046255111694, max_rel=2.149282693862915, norm_rel=0.011180967092514038, ref_abs_avg=17.43967628479004, test_abs_avg=17.423919677734375
liger_forward grad[54] vs paper_forward: mean_abs=0.23611974716186523, max_abs=1.5, mean_rel=0.1346738040447235, max_rel=718.7499389648438, norm_rel=0.01145152747631073, ref_abs_avg=21.034650802612305, test_abs_avg=21.036115646362305
liger_forward grad[55] vs paper_forward: mean_abs=0.22781917452812195, max_abs=1.6875, mean_rel=0.09269370138645172, max_rel=843.7499389648438, norm_rel=0.011125599965453148, ref_abs_avg=20.858680725097656, test_abs_avg=20.860916137695312
liger_forward grad[56] vs paper_forward: mean_abs=0.17722797393798828, max_abs=0.6875, mean_rel=0.032789669930934906, max_rel=1.1744835376739502, norm_rel=0.010804105550050735, ref_abs_avg=16.8723087310791, test_abs_avg=16.875682830810547
liger_forward grad[57] vs paper_forward: mean_abs=0.21944937109947205, max_abs=1.5625, mean_rel=0.1303943693637848, max_rel=999.9999389648438, norm_rel=0.011211934499442577, ref_abs_avg=19.974973678588867, test_abs_avg=19.9759521484375
liger_forward grad[58] vs paper_forward: mean_abs=0.2120630443096161, max_abs=1.5, mean_rel=0.09445507824420929, max_rel=726.5624389648438, norm_rel=0.011046936735510826, ref_abs_avg=19.60144805908203, test_abs_avg=19.603302001953125
liger_forward grad[59] vs paper_forward: mean_abs=0.1772773563861847, max_abs=0.875, mean_rel=0.031024087220430374, max_rel=1.058824896812439, norm_rel=0.011495506390929222, ref_abs_avg=15.873289108276367, test_abs_avg=15.889914512634277
liger_forward grad[60] vs paper_forward: mean_abs=0.20559990406036377, max_abs=1.5, mean_rel=0.12113773822784424, max_rel=781.2499389648438, norm_rel=0.010953783988952637, ref_abs_avg=19.159875869750977, test_abs_avg=19.160438537597656
liger_forward grad[61] vs paper_forward: mean_abs=0.19935140013694763, max_abs=1.25, mean_rel=0.08784932643175125, max_rel=718.7499389648438, norm_rel=0.010789008811116219, ref_abs_avg=18.835948944091797, test_abs_avg=18.834156036376953
liger_forward grad[62] vs paper_forward: mean_abs=0.15347576141357422, max_abs=0.625, mean_rel=0.07602083683013916, max_rel=9.4506196975708, norm_rel=0.010701145976781845, ref_abs_avg=14.476025581359863, test_abs_avg=14.483244895935059
liger_forward grad[63] vs paper_forward: mean_abs=0.1909661442041397, max_abs=1.5, mean_rel=0.1101379469037056, max_rel=656.2499389648438, norm_rel=0.010779596865177155, ref_abs_avg=18.0826358795166, test_abs_avg=18.08257484436035
liger_forward grad[64] vs paper_forward: mean_abs=0.1869625449180603, max_abs=1.5, mean_rel=0.09277801960706711, max_rel=468.7499694824219, norm_rel=0.010605810210108757, ref_abs_avg=17.942668914794922, test_abs_avg=17.941438674926758
liger_forward grad[65] vs paper_forward: mean_abs=0.13817787170410156, max_abs=0.515625, mean_rel=0.051747795194387436, max_rel=5.170541286468506, norm_rel=0.009810369461774826, ref_abs_avg=14.372766494750977, test_abs_avg=14.369983673095703
liger_forward grad[66] vs paper_forward: mean_abs=0.18193726241588593, max_abs=1.5, mean_rel=0.11221146583557129, max_rel=687.4999389648438, norm_rel=0.010531565174460411, ref_abs_avg=17.640853881835938, test_abs_avg=17.64081573486328
liger_forward grad[67] vs paper_forward: mean_abs=0.17715807259082794, max_abs=1.375, mean_rel=0.0819428414106369, max_rel=562.5, norm_rel=0.010454259812831879, ref_abs_avg=17.30759048461914, test_abs_avg=17.307483673095703
liger_forward grad[68] vs paper_forward: mean_abs=0.13723421096801758, max_abs=0.75, mean_rel=0.03066437505185604, max_rel=1.5721226930618286, norm_rel=0.01053204108029604, ref_abs_avg=13.908746719360352, test_abs_avg=13.907468795776367
liger_forward grad[69] vs paper_forward: mean_abs=0.171299010515213, max_abs=1.25, mean_rel=0.10466142743825912, max_rel=718.7499389648438, norm_rel=0.01029494870454073, ref_abs_avg=17.0279598236084, test_abs_avg=17.02815818786621
liger_forward grad[70] vs paper_forward: mean_abs=0.16775082051753998, max_abs=1.5, mean_rel=0.07930788397789001, max_rel=531.25, norm_rel=0.010093126446008682, ref_abs_avg=16.956562042236328, test_abs_avg=16.958189010620117
liger_forward grad[71] vs paper_forward: mean_abs=0.13977718353271484, max_abs=0.5, mean_rel=0.03326113149523735, max_rel=1.6224604845046997, norm_rel=0.009449874050915241, ref_abs_avg=14.888690948486328, test_abs_avg=14.893996238708496
liger_forward grad[72] vs paper_forward: mean_abs=0.16583684086799622, max_abs=1.25, mean_rel=0.10532047599554062, max_rel=749.9999389648438, norm_rel=0.010140075348317623, ref_abs_avg=16.728961944580078, test_abs_avg=16.729259490966797
liger_forward grad[73] vs paper_forward: mean_abs=0.15866434574127197, max_abs=1.25, mean_rel=0.08495400846004486, max_rel=499.9999694824219, norm_rel=0.009916041046380997, ref_abs_avg=16.39342498779297, test_abs_avg=16.391206741333008
liger_forward grad[74] vs paper_forward: mean_abs=0.14657020568847656, max_abs=0.6875, mean_rel=0.03626884147524834, max_rel=1.8054817914962769, norm_rel=0.011430490761995316, ref_abs_avg=13.307266235351562, test_abs_avg=13.302018165588379
liger_forward grad[75] vs paper_forward: mean_abs=0.17966414988040924, max_abs=1.375, mean_rel=0.11431068181991577, max_rel=937.4999389648438, norm_rel=0.011391761712729931, ref_abs_avg=16.098392486572266, test_abs_avg=16.098535537719727
liger_forward grad[76] vs paper_forward: mean_abs=0.17618635296821594, max_abs=1.5, mean_rel=0.09010006487369537, max_rel=527.34375, norm_rel=0.01105670165270567, ref_abs_avg=16.235929489135742, test_abs_avg=16.234024047851562
liger_forward grad[77] vs paper_forward: mean_abs=0.1305985450744629, max_abs=0.625, mean_rel=0.03712262958288193, max_rel=1.4890397787094116, norm_rel=0.010270418599247932, ref_abs_avg=12.923069953918457, test_abs_avg=12.922069549560547
liger_forward grad[78] vs paper_forward: mean_abs=0.16475814580917358, max_abs=1.5, mean_rel=0.09412902593612671, max_rel=437.4999694824219, norm_rel=0.011024311184883118, ref_abs_avg=15.245328903198242, test_abs_avg=15.245157241821289
liger_forward grad[79] vs paper_forward: mean_abs=0.16034090518951416, max_abs=1.5, mean_rel=0.08126077055931091, max_rel=679.6874389648438, norm_rel=0.010952651500701904, ref_abs_avg=14.929855346679688, test_abs_avg=14.93142318725586
liger_forward grad[80] vs paper_forward: mean_abs=0.1245788037776947, max_abs=0.6875, mean_rel=0.08281173557043076, max_rel=11.92117977142334, norm_rel=0.010682350024580956, ref_abs_avg=12.319372177124023, test_abs_avg=12.322736740112305
liger_forward grad[81] vs paper_forward: mean_abs=0.15627892315387726, max_abs=1.5, mean_rel=0.10554172098636627, max_rel=625.0, norm_rel=0.01084146648645401, ref_abs_avg=14.7099609375, test_abs_avg=14.71023178100586
liger_forward grad[82] vs paper_forward: mean_abs=0.1522962749004364, max_abs=1.34375, mean_rel=0.08087264746427536, max_rel=921.8749389648438, norm_rel=0.01078172866255045, ref_abs_avg=14.469204902648926, test_abs_avg=14.469598770141602
liger_forward grad[83] vs paper_forward: mean_abs=0.11408543586730957, max_abs=0.5, mean_rel=0.03613375127315521, max_rel=1.9243215322494507, norm_rel=0.01031915657222271, ref_abs_avg=11.722786903381348, test_abs_avg=11.708070755004883
liger_forward grad[84] vs paper_forward: mean_abs=0.14684166014194489, max_abs=1.25, mean_rel=0.10316084325313568, max_rel=812.4999389648438, norm_rel=0.01056126318871975, ref_abs_avg=14.227313995361328, test_abs_avg=14.2269287109375
liger_forward grad[85] vs paper_forward: mean_abs=0.14060759544372559, max_abs=1.25, mean_rel=0.08393800258636475, max_rel=484.3749694824219, norm_rel=0.01062087807804346, ref_abs_avg=13.575211524963379, test_abs_avg=13.575268745422363
liger_forward grad[86] vs paper_forward: mean_abs=0.11251389980316162, max_abs=0.375, mean_rel=0.06877239048480988, max_rel=8.099942207336426, norm_rel=0.01041314285248518, ref_abs_avg=10.758685111999512, test_abs_avg=10.761014938354492
liger_forward grad[87] vs paper_forward: mean_abs=0.13561159372329712, max_abs=1.5, mean_rel=0.10955987125635147, max_rel=640.625, norm_rel=0.010413329117000103, ref_abs_avg=13.37537956237793, test_abs_avg=13.375846862792969
liger_forward grad[88] vs paper_forward: mean_abs=0.13164004683494568, max_abs=1.25, mean_rel=0.0769137293100357, max_rel=359.3749694824219, norm_rel=0.010070040822029114, ref_abs_avg=13.361084938049316, test_abs_avg=13.358186721801758
liger_forward grad[89] vs paper_forward: mean_abs=0.10343825817108154, max_abs=0.40625, mean_rel=0.08382600545883179, max_rel=24.921911239624023, norm_rel=0.009066377766430378, ref_abs_avg=11.682544708251953, test_abs_avg=11.6861572265625
liger_forward grad[90] vs paper_forward: mean_abs=0.12898661196231842, max_abs=1.25, mean_rel=0.09381866455078125, max_rel=785.1561889648438, norm_rel=0.01000142376869917, ref_abs_avg=13.271541595458984, test_abs_avg=13.271718978881836
liger_forward grad[91] vs paper_forward: mean_abs=0.12525765597820282, max_abs=1.125, mean_rel=0.06979386508464813, max_rel=453.1249694824219, norm_rel=0.00991375558078289, ref_abs_avg=13.092889785766602, test_abs_avg=13.096902847290039
liger_forward grad[92] vs paper_forward: mean_abs=0.09998834133148193, max_abs=0.5, mean_rel=0.04065484553575516, max_rel=4.706034183502197, norm_rel=0.009863447397947311, ref_abs_avg=10.735759735107422, test_abs_avg=10.737569808959961
liger_forward grad[93] vs paper_forward: mean_abs=0.11876900494098663, max_abs=1.75, mean_rel=0.08924213796854019, max_rel=718.7499389648438, norm_rel=0.009722206741571426, ref_abs_avg=12.66917610168457, test_abs_avg=12.668953895568848
liger_forward grad[94] vs paper_forward: mean_abs=0.11742924898862839, max_abs=1.3125, mean_rel=0.06391569972038269, max_rel=308.59375, norm_rel=0.009766132570803165, ref_abs_avg=12.527870178222656, test_abs_avg=12.527554512023926
liger_forward grad[95] vs paper_forward: mean_abs=0.09454965591430664, max_abs=0.3125, mean_rel=0.035156868398189545, max_rel=3.734102249145508, norm_rel=0.008831310085952282, ref_abs_avg=10.972978591918945, test_abs_avg=10.973855972290039
liger_forward grad[96] vs paper_forward: mean_abs=0.11619603633880615, max_abs=1.5, mean_rel=0.08208882808685303, max_rel=562.5, norm_rel=0.009632146917283535, ref_abs_avg=12.605910301208496, test_abs_avg=12.606014251708984
liger_forward grad[97] vs paper_forward: mean_abs=0.1099359393119812, max_abs=1.25, mean_rel=0.06057432293891907, max_rel=437.4999694824219, norm_rel=0.009427709504961967, ref_abs_avg=12.18551254272461, test_abs_avg=12.187074661254883
identity layers + randn queries
production_forward fwd+bwd:  57.855 ms
production_forward bwd-only: 48.924 ms
production_forward peak allocated: fwd=1.300 GiB, fwd+bwd=5.302 GiB
production_forward peak reserved:  fwd=2.428 GiB, fwd+bwd=5.428 GiB
liger_forward fwd+bwd:  168.042 ms
liger_forward bwd-only: 145.935 ms
liger_forward peak allocated: fwd=7.853 GiB, fwd+bwd=7.853 GiB
liger_forward peak reserved:  fwd=7.904 GiB, fwd+bwd=8.217 GiB
torch_compile_phases_forward fwd+bwd:  84.147 ms
torch_compile_phases_forward bwd-only: 67.166 ms
torch_compile_phases_forward peak allocated: fwd=6.596 GiB, fwd+bwd=6.909 GiB
torch_compile_phases_forward peak reserved:  fwd=6.943 GiB, fwd+bwd=9.070 GiB
pytorch_attn_res_forward fwd+bwd:  992.666 ms
pytorch_attn_res_forward bwd-only: 817.661 ms
pytorch_attn_res_forward peak allocated: fwd=43.917 GiB, fwd+bwd=45.039 GiB
pytorch_attn_res_forward peak reserved:  fwd=45.406 GiB, fwd+bwd=46.658 GiB
paper_forward fwd+bwd:  194.382 ms
paper_forward bwd-only: 154.020 ms
paper_forward peak allocated: fwd=15.056 GiB, fwd+bwd=16.116 GiB
paper_forward peak reserved:  fwd=15.104 GiB, fwd+bwd=16.354 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.001613008906133473, max_abs=0.03125
production_forward grad[0] vs paper_forward: mean_abs=0.008189539425075054, max_abs=0.296875, mean_rel=0.07170166075229645, max_rel=122.70584869384766, norm_rel=0.01968325488269329, ref_abs_avg=0.45215165615081787, test_abs_avg=0.4521888494491577
production_forward grad[1] vs paper_forward: mean_abs=5.052803039550781, max_abs=48.0, mean_rel=0.15489374101161957, max_rel=211.27796936035156, norm_rel=0.020439887419342995, ref_abs_avg=219.76283264160156, test_abs_avg=219.7893524169922
production_forward grad[2] vs paper_forward: mean_abs=0.8835868835449219, max_abs=3.5625, mean_rel=0.15889255702495575, max_rel=15.498384475708008, norm_rel=0.0234852135181427, ref_abs_avg=36.81531524658203, test_abs_avg=36.82783508300781
production_forward grad[3] vs paper_forward: mean_abs=1.051203966140747, max_abs=8.0, mean_rel=0.5097329616546631, max_rel=4062.499755859375, norm_rel=0.023142144083976746, ref_abs_avg=45.72734069824219, test_abs_avg=45.73485565185547
production_forward grad[4] vs paper_forward: mean_abs=1.0182623863220215, max_abs=6.5, mean_rel=0.3445832133293152, max_rel=3687.499755859375, norm_rel=0.022730017080903053, ref_abs_avg=45.06587219238281, test_abs_avg=45.07117462158203
production_forward grad[5] vs paper_forward: mean_abs=0.7838284969329834, max_abs=2.75, mean_rel=0.11794203519821167, max_rel=10.453630447387695, norm_rel=0.024573786184191704, ref_abs_avg=31.4451847076416, test_abs_avg=31.472679138183594
production_forward grad[6] vs paper_forward: mean_abs=0.9229426383972168, max_abs=6.25, mean_rel=0.42268332839012146, max_rel=3812.499755859375, norm_rel=0.022840557619929314, ref_abs_avg=40.6658935546875, test_abs_avg=40.668983459472656
production_forward grad[7] vs paper_forward: mean_abs=0.8985924124717712, max_abs=5.5, mean_rel=0.2915041446685791, max_rel=2968.749755859375, norm_rel=0.022512752562761307, ref_abs_avg=40.16265106201172, test_abs_avg=40.16539764404297
production_forward grad[8] vs paper_forward: mean_abs=0.6813516616821289, max_abs=3.25, mean_rel=0.08630362153053284, max_rel=4.361237525939941, norm_rel=0.022957079112529755, ref_abs_avg=29.935989379882812, test_abs_avg=29.945011138916016
production_forward grad[9] vs paper_forward: mean_abs=0.8374464511871338, max_abs=5.75, mean_rel=0.403420090675354, max_rel=3124.999755859375, norm_rel=0.022671138867735863, ref_abs_avg=37.126014709472656, test_abs_avg=37.13442611694336
production_forward grad[10] vs paper_forward: mean_abs=0.8187312483787537, max_abs=5.0, mean_rel=0.246662437915802, max_rel=2500.0, norm_rel=0.02245226316154003, ref_abs_avg=36.651641845703125, test_abs_avg=36.64887619018555
production_forward grad[11] vs paper_forward: mean_abs=0.6357238292694092, max_abs=2.125, mean_rel=0.09060713648796082, max_rel=7.1500749588012695, norm_rel=0.02198091335594654, ref_abs_avg=28.56995391845703, test_abs_avg=28.558937072753906
production_forward grad[12] vs paper_forward: mean_abs=0.7700285315513611, max_abs=5.5, mean_rel=0.4122500419616699, max_rel=3062.499755859375, norm_rel=0.022372275590896606, ref_abs_avg=34.624847412109375, test_abs_avg=34.6295280456543
production_forward grad[13] vs paper_forward: mean_abs=0.7527225613594055, max_abs=4.875, mean_rel=0.23212553560733795, max_rel=1906.2498779296875, norm_rel=0.02229015715420246, ref_abs_avg=33.97350311279297, test_abs_avg=33.98246383666992
production_forward grad[14] vs paper_forward: mean_abs=0.5713958740234375, max_abs=3.0, mean_rel=0.0853842943906784, max_rel=7.188145160675049, norm_rel=0.021858014166355133, ref_abs_avg=26.997406005859375, test_abs_avg=26.970848083496094
production_forward grad[15] vs paper_forward: mean_abs=0.7217075824737549, max_abs=4.5, mean_rel=0.36425644159317017, max_rel=2843.749755859375, norm_rel=0.022273758426308632, ref_abs_avg=32.597877502441406, test_abs_avg=32.601715087890625
production_forward grad[16] vs paper_forward: mean_abs=0.7065678238868713, max_abs=4.5, mean_rel=0.24671262502670288, max_rel=2187.5, norm_rel=0.022077932953834534, ref_abs_avg=32.10940933227539, test_abs_avg=32.113182067871094
production_forward grad[17] vs paper_forward: mean_abs=0.5527763366699219, max_abs=2.515625, mean_rel=0.1303633451461792, max_rel=16.588348388671875, norm_rel=0.023853885009884834, ref_abs_avg=23.074634552001953, test_abs_avg=23.080032348632812
production_forward grad[18] vs paper_forward: mean_abs=0.6783912777900696, max_abs=4.25, mean_rel=0.31892651319503784, max_rel=3624.999755859375, norm_rel=0.022225074470043182, ref_abs_avg=30.705598831176758, test_abs_avg=30.71027374267578
production_forward grad[19] vs paper_forward: mean_abs=0.6658324003219604, max_abs=4.078125, mean_rel=0.20036503672599792, max_rel=2187.5, norm_rel=0.022023124620318413, ref_abs_avg=30.367618560791016, test_abs_avg=30.37188148498535
production_forward grad[20] vs paper_forward: mean_abs=0.5250397324562073, max_abs=2.0, mean_rel=0.3271685540676117, max_rel=116.02200317382812, norm_rel=0.020740460604429245, ref_abs_avg=25.196876525878906, test_abs_avg=25.171844482421875
production_forward grad[21] vs paper_forward: mean_abs=0.6431406736373901, max_abs=4.0, mean_rel=0.3403722643852234, max_rel=2375.0, norm_rel=0.02201908640563488, ref_abs_avg=29.357746124267578, test_abs_avg=29.359773635864258
production_forward grad[22] vs paper_forward: mean_abs=0.6300106048583984, max_abs=4.0, mean_rel=0.20394748449325562, max_rel=1828.1248779296875, norm_rel=0.02179350145161152, ref_abs_avg=29.025928497314453, test_abs_avg=29.031476974487305
production_forward grad[23] vs paper_forward: mean_abs=0.5056028366088867, max_abs=1.91943359375, mean_rel=0.10781625658273697, max_rel=5.922231674194336, norm_rel=0.02280445024371147, ref_abs_avg=22.348602294921875, test_abs_avg=22.368297576904297
production_forward grad[24] vs paper_forward: mean_abs=0.6113514304161072, max_abs=4.0, mean_rel=0.326471745967865, max_rel=2812.499755859375, norm_rel=0.021866118535399437, ref_abs_avg=28.09967803955078, test_abs_avg=28.104347229003906
production_forward grad[25] vs paper_forward: mean_abs=0.5980910062789917, max_abs=3.90625, mean_rel=0.18403804302215576, max_rel=1835.9373779296875, norm_rel=0.021637853235006332, ref_abs_avg=27.73802375793457, test_abs_avg=27.74295425415039
production_forward grad[26] vs paper_forward: mean_abs=0.535435676574707, max_abs=2.5, mean_rel=0.09598422050476074, max_rel=10.209741592407227, norm_rel=0.022773100063204765, ref_abs_avg=23.98821258544922, test_abs_avg=24.00550651550293
production_forward grad[27] vs paper_forward: mean_abs=0.7036082744598389, max_abs=4.75, mean_rel=0.3154544532299042, max_rel=2375.0, norm_rel=0.02395617589354515, ref_abs_avg=29.507240295410156, test_abs_avg=29.51278305053711
production_forward grad[28] vs paper_forward: mean_abs=0.6907514333724976, max_abs=5.5, mean_rel=0.23544156551361084, max_rel=2062.5, norm_rel=0.023835379630327225, ref_abs_avg=29.135578155517578, test_abs_avg=29.14533233642578
production_forward grad[29] vs paper_forward: mean_abs=0.5373849868774414, max_abs=2.375, mean_rel=0.08569655567407608, max_rel=3.6484317779541016, norm_rel=0.023967750370502472, ref_abs_avg=22.71471405029297, test_abs_avg=22.717144012451172
production_forward grad[30] vs paper_forward: mean_abs=0.6518963575363159, max_abs=4.5625, mean_rel=0.3180224299430847, max_rel=3031.249755859375, norm_rel=0.02426340989768505, ref_abs_avg=26.98845672607422, test_abs_avg=26.991046905517578
production_forward grad[31] vs paper_forward: mean_abs=0.6427279114723206, max_abs=4.125, mean_rel=0.22440940141677856, max_rel=1828.1248779296875, norm_rel=0.02418694831430912, ref_abs_avg=26.684879302978516, test_abs_avg=26.689266204833984
production_forward grad[32] vs paper_forward: mean_abs=0.5222187042236328, max_abs=1.875, mean_rel=0.1314404159784317, max_rel=8.862496376037598, norm_rel=0.024657083675265312, ref_abs_avg=20.62885284423828, test_abs_avg=20.601593017578125
production_forward grad[33] vs paper_forward: mean_abs=0.6077059507369995, max_abs=3.75, mean_rel=0.3042398691177368, max_rel=2187.5, norm_rel=0.024161696434020996, ref_abs_avg=25.24026107788086, test_abs_avg=25.24444580078125
production_forward grad[34] vs paper_forward: mean_abs=0.5961793065071106, max_abs=3.75, mean_rel=0.23419412970542908, max_rel=2406.25, norm_rel=0.02391962520778179, ref_abs_avg=25.046863555908203, test_abs_avg=25.054012298583984
production_forward grad[35] vs paper_forward: mean_abs=0.4474067687988281, max_abs=2.0, mean_rel=0.11644724011421204, max_rel=17.071096420288086, norm_rel=0.022137384861707687, ref_abs_avg=20.51504898071289, test_abs_avg=20.569671630859375
production_forward grad[36] vs paper_forward: mean_abs=0.571506917476654, max_abs=4.0, mean_rel=0.31228604912757874, max_rel=2250.0, norm_rel=0.02396899089217186, ref_abs_avg=23.928983688354492, test_abs_avg=23.933856964111328
production_forward grad[37] vs paper_forward: mean_abs=0.5593961477279663, max_abs=3.3984375, mean_rel=0.21172212064266205, max_rel=1937.4998779296875, norm_rel=0.023500027135014534, ref_abs_avg=23.87112808227539, test_abs_avg=23.87458038330078
production_forward grad[38] vs paper_forward: mean_abs=0.42716312408447266, max_abs=1.625, mean_rel=0.17893168330192566, max_rel=16.719335556030273, norm_rel=0.02265433594584465, ref_abs_avg=18.77190399169922, test_abs_avg=18.81562042236328
production_forward grad[39] vs paper_forward: mean_abs=0.5394347906112671, max_abs=3.5625, mean_rel=0.305110365152359, max_rel=2375.0, norm_rel=0.023663828149437904, ref_abs_avg=22.881351470947266, test_abs_avg=22.883617401123047
production_forward grad[40] vs paper_forward: mean_abs=0.5303590297698975, max_abs=3.25, mean_rel=0.19742687046527863, max_rel=1210.9375, norm_rel=0.023597972467541695, ref_abs_avg=22.553905487060547, test_abs_avg=22.55375099182129
production_forward grad[41] vs paper_forward: mean_abs=0.4102509021759033, max_abs=1.5, mean_rel=0.11697344481945038, max_rel=5.578297138214111, norm_rel=0.023282596841454506, ref_abs_avg=17.675657272338867, test_abs_avg=17.671566009521484
production_forward grad[42] vs paper_forward: mean_abs=0.5128207206726074, max_abs=3.5, mean_rel=0.26276206970214844, max_rel=1687.4998779296875, norm_rel=0.023279869928956032, ref_abs_avg=22.101913452148438, test_abs_avg=22.105754852294922
production_forward grad[43] vs paper_forward: mean_abs=0.5014892816543579, max_abs=3.34375, mean_rel=0.21464474499225616, max_rel=1640.6248779296875, norm_rel=0.02297237142920494, ref_abs_avg=21.87014389038086, test_abs_avg=21.871538162231445
production_forward grad[44] vs paper_forward: mean_abs=0.39441609382629395, max_abs=1.75, mean_rel=0.16798308491706848, max_rel=45.32048797607422, norm_rel=0.02297486551105976, ref_abs_avg=17.338542938232422, test_abs_avg=17.332378387451172
production_forward grad[45] vs paper_forward: mean_abs=0.4859774112701416, max_abs=3.5, mean_rel=0.30647513270378113, max_rel=1749.9998779296875, norm_rel=0.02330099605023861, ref_abs_avg=20.903797149658203, test_abs_avg=20.907543182373047
production_forward grad[46] vs paper_forward: mean_abs=0.4815678000450134, max_abs=3.25, mean_rel=0.20884555578231812, max_rel=1374.9998779296875, norm_rel=0.02349759265780449, ref_abs_avg=20.573020935058594, test_abs_avg=20.5760555267334
production_forward grad[47] vs paper_forward: mean_abs=0.3617624044418335, max_abs=1.34375, mean_rel=0.1647670567035675, max_rel=15.945206642150879, norm_rel=0.021571138873696327, ref_abs_avg=16.573673248291016, test_abs_avg=16.57705307006836
production_forward grad[48] vs paper_forward: mean_abs=0.46814024448394775, max_abs=3.0, mean_rel=0.28324541449546814, max_rel=1781.2498779296875, norm_rel=0.022938307374715805, ref_abs_avg=20.424524307250977, test_abs_avg=20.427711486816406
production_forward grad[49] vs paper_forward: mean_abs=0.45816147327423096, max_abs=2.90625, mean_rel=0.18919363617897034, max_rel=1531.2498779296875, norm_rel=0.02302403189241886, ref_abs_avg=19.992259979248047, test_abs_avg=19.992698669433594
production_forward grad[50] vs paper_forward: mean_abs=0.42344969511032104, max_abs=1.75, mean_rel=0.08780350536108017, max_rel=2.698878049850464, norm_rel=0.023351894691586494, ref_abs_avg=18.229686737060547, test_abs_avg=18.275232315063477
production_forward grad[51] vs paper_forward: mean_abs=0.5325555801391602, max_abs=4.5, mean_rel=0.29340213537216187, max_rel=2125.0, norm_rel=0.02435528114438057, ref_abs_avg=21.937835693359375, test_abs_avg=21.9392032623291
production_forward grad[52] vs paper_forward: mean_abs=0.5228139162063599, max_abs=3.470703125, mean_rel=0.22314903140068054, max_rel=1406.2498779296875, norm_rel=0.024618621915578842, ref_abs_avg=21.294940948486328, test_abs_avg=21.299169540405273
production_forward grad[53] vs paper_forward: mean_abs=0.3907017707824707, max_abs=1.4375, mean_rel=0.09579619765281677, max_rel=5.951128005981445, norm_rel=0.023619258776307106, ref_abs_avg=16.53561782836914, test_abs_avg=16.5318603515625
production_forward grad[54] vs paper_forward: mean_abs=0.4849616289138794, max_abs=3.5, mean_rel=0.307672381401062, max_rel=1843.7498779296875, norm_rel=0.02379573881626129, ref_abs_avg=20.411123275756836, test_abs_avg=20.412830352783203
production_forward grad[55] vs paper_forward: mean_abs=0.47726351022720337, max_abs=3.5, mean_rel=0.189573734998703, max_rel=1218.75, norm_rel=0.02374638058245182, ref_abs_avg=20.121692657470703, test_abs_avg=20.119823455810547
production_forward grad[56] vs paper_forward: mean_abs=0.3965872526168823, max_abs=1.8125, mean_rel=0.11169908940792084, max_rel=12.660346031188965, norm_rel=0.02551070787012577, ref_abs_avg=15.880451202392578, test_abs_avg=15.903633117675781
production_forward grad[57] vs paper_forward: mean_abs=0.45215505361557007, max_abs=3.0390625, mean_rel=0.27540603280067444, max_rel=1828.1248779296875, norm_rel=0.023400021716952324, ref_abs_avg=19.357479095458984, test_abs_avg=19.358997344970703
production_forward grad[58] vs paper_forward: mean_abs=0.4426085352897644, max_abs=3.1875, mean_rel=0.22247330844402313, max_rel=1312.4998779296875, norm_rel=0.023414794355630875, ref_abs_avg=18.964744567871094, test_abs_avg=18.968111038208008
production_forward grad[59] vs paper_forward: mean_abs=0.34710943698883057, max_abs=1.4375, mean_rel=0.08967950940132141, max_rel=8.770166397094727, norm_rel=0.022976292297244072, ref_abs_avg=15.696249008178711, test_abs_avg=15.678422927856445
production_forward grad[60] vs paper_forward: mean_abs=0.4213852286338806, max_abs=2.796875, mean_rel=0.2670566439628601, max_rel=1531.2498779296875, norm_rel=0.023041801527142525, ref_abs_avg=18.306289672851562, test_abs_avg=18.307512283325195
production_forward grad[61] vs paper_forward: mean_abs=0.4134387969970703, max_abs=2.75, mean_rel=0.21065853536128998, max_rel=1499.9998779296875, norm_rel=0.022686675190925598, ref_abs_avg=18.220102310180664, test_abs_avg=18.221206665039062
production_forward grad[62] vs paper_forward: mean_abs=0.33240988850593567, max_abs=1.625, mean_rel=0.1398245096206665, max_rel=39.00375747680664, norm_rel=0.022389935329556465, ref_abs_avg=14.937843322753906, test_abs_avg=14.897598266601562
production_forward grad[63] vs paper_forward: mean_abs=0.40013813972473145, max_abs=3.8125, mean_rel=0.23326198756694794, max_rel=1812.4998779296875, norm_rel=0.022775808349251747, ref_abs_avg=17.58639144897461, test_abs_avg=17.587554931640625
production_forward grad[64] vs paper_forward: mean_abs=0.3917028605937958, max_abs=2.875, mean_rel=0.1691853255033493, max_rel=1250.0, norm_rel=0.02272726036608219, ref_abs_avg=17.27721405029297, test_abs_avg=17.282018661499023
production_forward grad[65] vs paper_forward: mean_abs=0.28968381881713867, max_abs=1.28125, mean_rel=0.08210879564285278, max_rel=11.01206111907959, norm_rel=0.0214137714356184, ref_abs_avg=14.078263282775879, test_abs_avg=14.077423095703125
production_forward grad[66] vs paper_forward: mean_abs=0.376334547996521, max_abs=3.0, mean_rel=0.2230772078037262, max_rel=1125.0, norm_rel=0.022311769425868988, ref_abs_avg=16.87433624267578, test_abs_avg=16.87557029724121
production_forward grad[67] vs paper_forward: mean_abs=0.3688699007034302, max_abs=3.75, mean_rel=0.18601679801940918, max_rel=1031.25, norm_rel=0.022266212850809097, ref_abs_avg=16.63825035095215, test_abs_avg=16.639896392822266
production_forward grad[68] vs paper_forward: mean_abs=0.29639673233032227, max_abs=1.25, mean_rel=0.14066842198371887, max_rel=18.20893096923828, norm_rel=0.021294139325618744, ref_abs_avg=14.038765907287598, test_abs_avg=14.040138244628906
production_forward grad[69] vs paper_forward: mean_abs=0.36458510160446167, max_abs=3.125, mean_rel=0.24403013288974762, max_rel=1531.2498779296875, norm_rel=0.021948015317320824, ref_abs_avg=16.6068058013916, test_abs_avg=16.607837677001953
production_forward grad[70] vs paper_forward: mean_abs=0.35293474793434143, max_abs=3.0, mean_rel=0.1632082462310791, max_rel=882.8124389648438, norm_rel=0.02187964878976345, ref_abs_avg=16.130495071411133, test_abs_avg=16.1298828125
production_forward grad[71] vs paper_forward: mean_abs=0.27646803855895996, max_abs=1.25, mean_rel=0.11127042770385742, max_rel=19.067956924438477, norm_rel=0.02049611136317253, ref_abs_avg=13.467937469482422, test_abs_avg=13.470550537109375
production_forward grad[72] vs paper_forward: mean_abs=0.34562569856643677, max_abs=2.5, mean_rel=0.23286697268486023, max_rel=1312.4998779296875, norm_rel=0.021713146939873695, ref_abs_avg=15.886011123657227, test_abs_avg=15.886649131774902
production_forward grad[73] vs paper_forward: mean_abs=0.33555838465690613, max_abs=3.25, mean_rel=0.15972930192947388, max_rel=999.9999389648438, norm_rel=0.021380580961704254, ref_abs_avg=15.698066711425781, test_abs_avg=15.696810722351074
production_forward grad[74] vs paper_forward: mean_abs=0.30694007873535156, max_abs=1.25, mean_rel=0.08761201798915863, max_rel=3.578306198120117, norm_rel=0.022372178733348846, ref_abs_avg=14.100811958312988, test_abs_avg=14.085565567016602
production_forward grad[75] vs paper_forward: mean_abs=0.380851149559021, max_abs=2.75, mean_rel=0.2515720725059509, max_rel=1374.9998779296875, norm_rel=0.02332450821995735, ref_abs_avg=16.368633270263672, test_abs_avg=16.36968994140625
production_forward grad[76] vs paper_forward: mean_abs=0.3788970112800598, max_abs=3.625, mean_rel=0.19044269621372223, max_rel=1414.0623779296875, norm_rel=0.023236963897943497, ref_abs_avg=16.27424430847168, test_abs_avg=16.283466339111328
production_forward grad[77] vs paper_forward: mean_abs=0.27355504035949707, max_abs=1.125, mean_rel=0.08036424219608307, max_rel=9.906886100769043, norm_rel=0.020307624712586403, ref_abs_avg=13.505280494689941, test_abs_avg=13.506446838378906
production_forward grad[78] vs paper_forward: mean_abs=0.3524198830127716, max_abs=2.625, mean_rel=0.2284984141588211, max_rel=1062.5, norm_rel=0.022593647241592407, ref_abs_avg=15.626630783081055, test_abs_avg=15.626389503479004
production_forward grad[79] vs paper_forward: mean_abs=0.34093213081359863, max_abs=2.75, mean_rel=0.1547621488571167, max_rel=914.0624389648438, norm_rel=0.022028552368283272, ref_abs_avg=15.478316307067871, test_abs_avg=15.48515796661377
production_forward grad[80] vs paper_forward: mean_abs=0.2751131057739258, max_abs=1.125, mean_rel=0.06325580179691315, max_rel=1.9733595848083496, norm_rel=0.02155054546892643, ref_abs_avg=12.814900398254395, test_abs_avg=12.813520431518555
production_forward grad[81] vs paper_forward: mean_abs=0.3263111114501953, max_abs=3.75, mean_rel=0.23153257369995117, max_rel=1781.2498779296875, norm_rel=0.0220468919724226, ref_abs_avg=14.831981658935547, test_abs_avg=14.831811904907227
production_forward grad[82] vs paper_forward: mean_abs=0.31935784220695496, max_abs=4.5, mean_rel=0.17346592247486115, max_rel=999.9999389648438, norm_rel=0.021860390901565552, ref_abs_avg=14.663915634155273, test_abs_avg=14.669600486755371
production_forward grad[83] vs paper_forward: mean_abs=0.2503626346588135, max_abs=1.09375, mean_rel=0.08746186643838882, max_rel=7.993175983428955, norm_rel=0.02098315954208374, ref_abs_avg=11.704788208007812, test_abs_avg=11.71341609954834
production_forward grad[84] vs paper_forward: mean_abs=0.3023774027824402, max_abs=2.6875, mean_rel=0.18616297841072083, max_rel=1781.2498779296875, norm_rel=0.021307626739144325, ref_abs_avg=14.239021301269531, test_abs_avg=14.237858772277832
production_forward grad[85] vs paper_forward: mean_abs=0.2961236834526062, max_abs=3.5, mean_rel=0.17062067985534668, max_rel=804.6874389648438, norm_rel=0.021459173411130905, ref_abs_avg=13.92523193359375, test_abs_avg=13.928701400756836
production_forward grad[86] vs paper_forward: mean_abs=0.24256521463394165, max_abs=1.0625, mean_rel=0.09518994390964508, max_rel=4.056853294372559, norm_rel=0.02355407178401947, ref_abs_avg=10.413944244384766, test_abs_avg=10.418254852294922
production_forward grad[87] vs paper_forward: mean_abs=0.2817533016204834, max_abs=2.75, mean_rel=0.1981532871723175, max_rel=1218.75, norm_rel=0.02092721313238144, ref_abs_avg=13.514683723449707, test_abs_avg=13.51449966430664
production_forward grad[88] vs paper_forward: mean_abs=0.27556294202804565, max_abs=3.125, mean_rel=0.14107748866081238, max_rel=1187.5, norm_rel=0.02113107033073902, ref_abs_avg=13.152386665344238, test_abs_avg=13.1520357131958
production_forward grad[89] vs paper_forward: mean_abs=0.2206832766532898, max_abs=0.919921875, mean_rel=0.14126597344875336, max_rel=23.9399356842041, norm_rel=0.021377572789788246, ref_abs_avg=10.370721817016602, test_abs_avg=10.372151374816895
production_forward grad[90] vs paper_forward: mean_abs=0.26815280318260193, max_abs=2.5, mean_rel=0.19386549293994904, max_rel=1062.5, norm_rel=0.020291768014431, ref_abs_avg=13.299307823181152, test_abs_avg=13.297822952270508
production_forward grad[91] vs paper_forward: mean_abs=0.2606692910194397, max_abs=3.25, mean_rel=0.15001636743545532, max_rel=906.2499389648438, norm_rel=0.020257847383618355, ref_abs_avg=13.005265235900879, test_abs_avg=13.005974769592285
production_forward grad[92] vs paper_forward: mean_abs=0.2054329514503479, max_abs=0.8984375, mean_rel=0.09894607961177826, max_rel=12.505753517150879, norm_rel=0.018925979733467102, ref_abs_avg=10.696525573730469, test_abs_avg=10.679920196533203
production_forward grad[93] vs paper_forward: mean_abs=0.25168949365615845, max_abs=2.5, mean_rel=0.18554627895355225, max_rel=1093.75, norm_rel=0.01991489715874195, ref_abs_avg=12.77432632446289, test_abs_avg=12.772873878479004
production_forward grad[94] vs paper_forward: mean_abs=0.24188044667243958, max_abs=3.75, mean_rel=0.1367185413837433, max_rel=695.3124389648438, norm_rel=0.019000528380274773, ref_abs_avg=12.800605773925781, test_abs_avg=12.79692554473877
production_forward grad[95] vs paper_forward: mean_abs=0.19383907318115234, max_abs=0.75, mean_rel=0.07333885133266449, max_rel=5.828317642211914, norm_rel=0.019046427682042122, ref_abs_avg=10.366214752197266, test_abs_avg=10.354594230651855
production_forward grad[96] vs paper_forward: mean_abs=0.23112152516841888, max_abs=2.5, mean_rel=0.17383868992328644, max_rel=1218.75, norm_rel=0.01934141293168068, ref_abs_avg=12.123010635375977, test_abs_avg=12.12219524383545
production_forward grad[97] vs paper_forward: mean_abs=0.23144622147083282, max_abs=3.25, mean_rel=0.1475067436695099, max_rel=718.7499389648438, norm_rel=0.019461872056126595, ref_abs_avg=12.038049697875977, test_abs_avg=12.036060333251953
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016177948564291, max_abs=0.04296875
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008535533212125301, max_abs=0.484375, mean_rel=0.07437631487846375, max_rel=74.09425354003906, norm_rel=0.02041635476052761, ref_abs_avg=0.45215165615081787, test_abs_avg=0.45218074321746826
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=5.2006378173828125, max_abs=40.0, mean_rel=0.15257948637008667, max_rel=161.68402099609375, norm_rel=0.02092604711651802, ref_abs_avg=219.76283264160156, test_abs_avg=219.79495239257812
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=0.9042301177978516, max_abs=3.5, mean_rel=0.12535104155540466, max_rel=11.27136516571045, norm_rel=0.02436264604330063, ref_abs_avg=36.81531524658203, test_abs_avg=36.796939849853516
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.0905038118362427, max_abs=7.0, mean_rel=0.5225694179534912, max_rel=4406.25, norm_rel=0.023983921855688095, ref_abs_avg=45.72734069824219, test_abs_avg=45.73494338989258
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.0633400678634644, max_abs=7.5, mean_rel=0.3430405259132385, max_rel=3062.499755859375, norm_rel=0.023740258067846298, ref_abs_avg=45.06587219238281, test_abs_avg=45.07447052001953
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=0.7939009666442871, max_abs=3.0, mean_rel=0.1237674355506897, max_rel=14.352757453918457, norm_rel=0.025341039523482323, ref_abs_avg=31.4451847076416, test_abs_avg=31.48040008544922
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=0.9555220007896423, max_abs=6.375, mean_rel=0.4274117052555084, max_rel=3437.499755859375, norm_rel=0.023636730387806892, ref_abs_avg=40.6658935546875, test_abs_avg=40.66623306274414
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=0.928152322769165, max_abs=5.25, mean_rel=0.3102227747440338, max_rel=2812.499755859375, norm_rel=0.02322688326239586, ref_abs_avg=40.16265106201172, test_abs_avg=40.16584396362305
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.6729822158813477, max_abs=2.75, mean_rel=0.07968392968177795, max_rel=1.7041040658950806, norm_rel=0.0227156113833189, ref_abs_avg=29.935989379882812, test_abs_avg=29.951236724853516
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=0.866046667098999, max_abs=5.25, mean_rel=0.4422232508659363, max_rel=2999.999755859375, norm_rel=0.02345963567495346, ref_abs_avg=37.126014709472656, test_abs_avg=37.13220977783203
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=0.8489782810211182, max_abs=5.0625, mean_rel=0.24576681852340698, max_rel=1999.9998779296875, norm_rel=0.023270606994628906, ref_abs_avg=36.651641845703125, test_abs_avg=36.650360107421875
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.630861222743988, max_abs=2.5625, mean_rel=0.10078947246074677, max_rel=17.232669830322266, norm_rel=0.022089792415499687, ref_abs_avg=28.56995391845703, test_abs_avg=28.559772491455078
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=0.7961475849151611, max_abs=5.0, mean_rel=0.43496885895729065, max_rel=2999.999755859375, norm_rel=0.02312779426574707, ref_abs_avg=34.624847412109375, test_abs_avg=34.62908172607422
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=0.7749375700950623, max_abs=5.125, mean_rel=0.24602551758289337, max_rel=2437.5, norm_rel=0.022932961583137512, ref_abs_avg=33.97350311279297, test_abs_avg=33.9787483215332
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.5813484191894531, max_abs=3.3125, mean_rel=0.09407997876405716, max_rel=7.593630313873291, norm_rel=0.022157538682222366, ref_abs_avg=26.997406005859375, test_abs_avg=26.985065460205078
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=0.7439092397689819, max_abs=4.7734375, mean_rel=0.38535118103027344, max_rel=2562.5, norm_rel=0.02292824350297451, ref_abs_avg=32.597877502441406, test_abs_avg=32.60096740722656
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=0.7280315160751343, max_abs=5.0, mean_rel=0.26917406916618347, max_rel=2546.875, norm_rel=0.022754456847906113, ref_abs_avg=32.10940933227539, test_abs_avg=32.11669921875
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.5760045051574707, max_abs=2.328125, mean_rel=0.1420496255159378, max_rel=15.721855163574219, norm_rel=0.024372754618525505, ref_abs_avg=23.074634552001953, test_abs_avg=23.06173324584961
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=0.6988261342048645, max_abs=4.15625, mean_rel=0.32821065187454224, max_rel=3249.999755859375, norm_rel=0.02287788689136505, ref_abs_avg=30.705598831176758, test_abs_avg=30.709354400634766
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.6846797466278076, max_abs=4.25, mean_rel=0.20992200076580048, max_rel=2312.5, norm_rel=0.022648772224783897, ref_abs_avg=30.367618560791016, test_abs_avg=30.37203025817871
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.576240599155426, max_abs=2.25, mean_rel=0.16233737766742706, max_rel=29.79726219177246, norm_rel=0.022240107879042625, ref_abs_avg=25.196876525878906, test_abs_avg=25.162769317626953
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.6603381633758545, max_abs=4.25, mean_rel=0.3359594941139221, max_rel=2250.0, norm_rel=0.022616835311055183, ref_abs_avg=29.357746124267578, test_abs_avg=29.359355926513672
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.6474484205245972, max_abs=3.875, mean_rel=0.191414937376976, max_rel=1718.7498779296875, norm_rel=0.022419797256588936, ref_abs_avg=29.025928497314453, test_abs_avg=29.03118896484375
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.5477714538574219, max_abs=2.25, mean_rel=0.10435578227043152, max_rel=7.528260231018066, norm_rel=0.024692267179489136, ref_abs_avg=22.348602294921875, test_abs_avg=22.32992172241211
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.6267411708831787, max_abs=4.1875, mean_rel=0.3405146896839142, max_rel=3374.999755859375, norm_rel=0.022417260333895683, ref_abs_avg=28.09967803955078, test_abs_avg=28.1031551361084
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.6127415895462036, max_abs=4.140625, mean_rel=0.18700936436653137, max_rel=2031.2498779296875, norm_rel=0.022179147228598595, ref_abs_avg=27.73802375793457, test_abs_avg=27.743427276611328
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.6014389991760254, max_abs=2.25, mean_rel=0.09059128165245056, max_rel=3.655909299850464, norm_rel=0.024783439934253693, ref_abs_avg=23.98821258544922, test_abs_avg=23.999061584472656
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=0.7235155701637268, max_abs=4.75, mean_rel=0.3199552297592163, max_rel=2874.999755859375, norm_rel=0.02463718317449093, ref_abs_avg=29.507240295410156, test_abs_avg=29.51272201538086
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.711334228515625, max_abs=4.5, mean_rel=0.24441245198249817, max_rel=2203.125, norm_rel=0.02451549470424652, ref_abs_avg=29.135578155517578, test_abs_avg=29.143905639648438
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.5610685348510742, max_abs=2.25, mean_rel=0.0844825953245163, max_rel=3.8491313457489014, norm_rel=0.024735381826758385, ref_abs_avg=22.71471405029297, test_abs_avg=22.72366714477539
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.6674274206161499, max_abs=4.375, mean_rel=0.33838415145874023, max_rel=2500.0, norm_rel=0.024816932156682014, ref_abs_avg=26.98845672607422, test_abs_avg=26.990150451660156
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.6562882661819458, max_abs=4.5, mean_rel=0.22668668627738953, max_rel=2296.875, norm_rel=0.024705693125724792, ref_abs_avg=26.684879302978516, test_abs_avg=26.692119598388672
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.5430496335029602, max_abs=2.0, mean_rel=0.14335665106773376, max_rel=10.533539772033691, norm_rel=0.026080546900629997, ref_abs_avg=20.62885284423828, test_abs_avg=20.60509490966797
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.6200940012931824, max_abs=3.875, mean_rel=0.3258892893791199, max_rel=2187.5, norm_rel=0.024652553722262383, ref_abs_avg=25.24026107788086, test_abs_avg=25.244319915771484
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.6087312698364258, max_abs=3.9609375, mean_rel=0.23039472103118896, max_rel=2500.0, norm_rel=0.024400684982538223, ref_abs_avg=25.046863555908203, test_abs_avg=25.05614471435547
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.4697842597961426, max_abs=1.875, mean_rel=0.13260938227176666, max_rel=17.38777732849121, norm_rel=0.02295023202896118, ref_abs_avg=20.51504898071289, test_abs_avg=20.559417724609375
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.583962082862854, max_abs=4.0, mean_rel=0.32160618901252747, max_rel=2562.5, norm_rel=0.024506477639079094, ref_abs_avg=23.928983688354492, test_abs_avg=23.933902740478516
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.5712155103683472, max_abs=3.5859375, mean_rel=0.20893102884292603, max_rel=1718.7498779296875, norm_rel=0.0240099485963583, ref_abs_avg=23.87112808227539, test_abs_avg=23.875106811523438
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.4258143901824951, max_abs=1.625, mean_rel=0.17198434472084045, max_rel=15.751967430114746, norm_rel=0.022518588230013847, ref_abs_avg=18.77190399169922, test_abs_avg=18.807071685791016
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.5498174428939819, max_abs=3.5, mean_rel=0.3123282790184021, max_rel=1999.9998779296875, norm_rel=0.02411791868507862, ref_abs_avg=22.881351470947266, test_abs_avg=22.88282012939453
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.5393285155296326, max_abs=3.5, mean_rel=0.21090400218963623, max_rel=1671.8748779296875, norm_rel=0.023982547223567963, ref_abs_avg=22.553905487060547, test_abs_avg=22.554737091064453
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.4269566535949707, max_abs=1.75, mean_rel=0.12604546546936035, max_rel=7.897165298461914, norm_rel=0.024301104247570038, ref_abs_avg=17.675657272338867, test_abs_avg=17.678913116455078
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.5224161744117737, max_abs=3.6875, mean_rel=0.26974451541900635, max_rel=1781.2498779296875, norm_rel=0.023695124313235283, ref_abs_avg=22.101913452148438, test_abs_avg=22.104995727539062
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.511029064655304, max_abs=3.125, mean_rel=0.21834130585193634, max_rel=1507.8123779296875, norm_rel=0.023411652073264122, ref_abs_avg=21.87014389038086, test_abs_avg=21.87110137939453
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.39292633533477783, max_abs=1.625, mean_rel=0.24082671105861664, max_rel=66.44136810302734, norm_rel=0.023137683048844337, ref_abs_avg=17.338542938232422, test_abs_avg=17.348209381103516
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.4942186772823334, max_abs=3.75, mean_rel=0.3040013909339905, max_rel=1593.7498779296875, norm_rel=0.023690715432167053, ref_abs_avg=20.903797149658203, test_abs_avg=20.907480239868164
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.48995184898376465, max_abs=3.0, mean_rel=0.1988397240638733, max_rel=1562.4998779296875, norm_rel=0.02389449067413807, ref_abs_avg=20.573020935058594, test_abs_avg=20.57428741455078
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.38717329502105713, max_abs=1.625, mean_rel=0.26070863008499146, max_rel=58.1904182434082, norm_rel=0.023132098838686943, ref_abs_avg=16.573673248291016, test_abs_avg=16.591323852539062
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.4741339385509491, max_abs=3.25, mean_rel=0.2762812077999115, max_rel=1624.9998779296875, norm_rel=0.023248914629220963, ref_abs_avg=20.424524307250977, test_abs_avg=20.42681884765625
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.46558427810668945, max_abs=3.25, mean_rel=0.19341522455215454, max_rel=1140.625, norm_rel=0.023392867296934128, ref_abs_avg=19.992259979248047, test_abs_avg=19.9930419921875
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.43958091735839844, max_abs=1.5625, mean_rel=0.10489203035831451, max_rel=4.411405563354492, norm_rel=0.024203257635235786, ref_abs_avg=18.229686737060547, test_abs_avg=18.265727996826172
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.5419049263000488, max_abs=4.0, mean_rel=0.29739412665367126, max_rel=1999.9998779296875, norm_rel=0.024797994643449783, ref_abs_avg=21.937835693359375, test_abs_avg=21.939125061035156
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.5310348868370056, max_abs=3.578125, mean_rel=0.22452911734580994, max_rel=1218.75, norm_rel=0.025018634274601936, ref_abs_avg=21.294940948486328, test_abs_avg=21.29883575439453
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.4090309143066406, max_abs=1.4375, mean_rel=0.111261747777462, max_rel=8.572784423828125, norm_rel=0.0242041926831007, ref_abs_avg=16.53561782836914, test_abs_avg=16.510662078857422
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.4924333989620209, max_abs=4.0, mean_rel=0.2985828220844269, max_rel=1812.4998779296875, norm_rel=0.024161215871572495, ref_abs_avg=20.411123275756836, test_abs_avg=20.412662506103516
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.4857882857322693, max_abs=3.0, mean_rel=0.1913311779499054, max_rel=1562.4998779296875, norm_rel=0.024174045771360397, ref_abs_avg=20.121692657470703, test_abs_avg=20.119823455810547
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.39719486236572266, max_abs=1.890625, mean_rel=0.13397151231765747, max_rel=20.1676082611084, norm_rel=0.025636514648795128, ref_abs_avg=15.880451202392578, test_abs_avg=15.914958953857422
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.45914000272750854, max_abs=3.25, mean_rel=0.29220545291900635, max_rel=1828.1248779296875, norm_rel=0.0237426795065403, ref_abs_avg=19.357479095458984, test_abs_avg=19.35814666748047
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.4471049904823303, max_abs=3.125, mean_rel=0.23978835344314575, max_rel=1796.8748779296875, norm_rel=0.023646844550967216, ref_abs_avg=18.964744567871094, test_abs_avg=18.96698760986328
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.3462042808532715, max_abs=1.59375, mean_rel=0.1123126894235611, max_rel=16.81293487548828, norm_rel=0.02329220622777939, ref_abs_avg=15.696249008178711, test_abs_avg=15.706547737121582
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.4272519052028656, max_abs=2.96875, mean_rel=0.26873230934143066, max_rel=1593.7498779296875, norm_rel=0.023358795791864395, ref_abs_avg=18.306289672851562, test_abs_avg=18.30716896057129
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.4193849563598633, max_abs=3.25, mean_rel=0.21543216705322266, max_rel=1468.7498779296875, norm_rel=0.023036500439047813, ref_abs_avg=18.220102310180664, test_abs_avg=18.219491958618164
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.3400644063949585, max_abs=1.25, mean_rel=0.12894275784492493, max_rel=21.616540908813477, norm_rel=0.022878432646393776, ref_abs_avg=14.937843322753906, test_abs_avg=14.892619132995605
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.40514397621154785, max_abs=2.75, mean_rel=0.2349453568458557, max_rel=1562.4998779296875, norm_rel=0.02304944396018982, ref_abs_avg=17.58639144897461, test_abs_avg=17.58769989013672
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.4005458354949951, max_abs=3.375, mean_rel=0.16447660326957703, max_rel=1187.5, norm_rel=0.023246804252266884, ref_abs_avg=17.27721405029297, test_abs_avg=17.282588958740234
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.30851030349731445, max_abs=1.46875, mean_rel=0.09253127872943878, max_rel=10.815417289733887, norm_rel=0.022689061239361763, ref_abs_avg=14.078263282775879, test_abs_avg=14.07581901550293
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.3809363543987274, max_abs=3.0, mean_rel=0.22670643031597137, max_rel=1437.4998779296875, norm_rel=0.02257569506764412, ref_abs_avg=16.87433624267578, test_abs_avg=16.875717163085938
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.3745766282081604, max_abs=3.25, mean_rel=0.19760607182979584, max_rel=1125.0, norm_rel=0.022588465362787247, ref_abs_avg=16.63825035095215, test_abs_avg=16.64056396484375
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.3008146286010742, max_abs=1.1513671875, mean_rel=0.1755671203136444, max_rel=30.845306396484375, norm_rel=0.022031284868717194, ref_abs_avg=14.038765907287598, test_abs_avg=14.037792205810547
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.3680921196937561, max_abs=2.5625, mean_rel=0.2421083301305771, max_rel=1578.1248779296875, norm_rel=0.022150080651044846, ref_abs_avg=16.6068058013916, test_abs_avg=16.60817527770996
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.3570203483104706, max_abs=4.0, mean_rel=0.1626673936843872, max_rel=796.8749389648438, norm_rel=0.022133855149149895, ref_abs_avg=16.130495071411133, test_abs_avg=16.13120460510254
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.2822437286376953, max_abs=1.5, mean_rel=0.09655453264713287, max_rel=6.68781852722168, norm_rel=0.02095734141767025, ref_abs_avg=13.467937469482422, test_abs_avg=13.46472454071045
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.34861743450164795, max_abs=2.75, mean_rel=0.23757758736610413, max_rel=1187.5, norm_rel=0.021895671263337135, ref_abs_avg=15.886011123657227, test_abs_avg=15.886600494384766
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.3391875624656677, max_abs=2.625, mean_rel=0.1586894392967224, max_rel=1093.75, norm_rel=0.02159418910741806, ref_abs_avg=15.698066711425781, test_abs_avg=15.696518898010254
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.31900492310523987, max_abs=1.28125, mean_rel=0.08022402226924896, max_rel=2.649641752243042, norm_rel=0.0230373814702034, ref_abs_avg=14.100811958312988, test_abs_avg=14.093510627746582
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.3863239586353302, max_abs=2.875, mean_rel=0.25741344690322876, max_rel=1593.7498779296875, norm_rel=0.023654798045754433, ref_abs_avg=16.368633270263672, test_abs_avg=16.370540618896484
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.3843719959259033, max_abs=3.25, mean_rel=0.17938373982906342, max_rel=1390.6248779296875, norm_rel=0.02360917255282402, ref_abs_avg=16.27424430847168, test_abs_avg=16.284385681152344
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.27431201934814453, max_abs=1.25, mean_rel=0.09677518159151077, max_rel=12.269184112548828, norm_rel=0.02085547149181366, ref_abs_avg=13.505280494689941, test_abs_avg=13.51108169555664
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.35721713304519653, max_abs=2.75, mean_rel=0.23309887945652008, max_rel=1250.0, norm_rel=0.022894293069839478, ref_abs_avg=15.626630783081055, test_abs_avg=15.626224517822266
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.34704291820526123, max_abs=2.625, mean_rel=0.15941785275936127, max_rel=960.9374389648438, norm_rel=0.022431574761867523, ref_abs_avg=15.478316307067871, test_abs_avg=15.48672866821289
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.27195239067077637, max_abs=1.25, mean_rel=0.06359227001667023, max_rel=1.8632391691207886, norm_rel=0.021896565333008766, ref_abs_avg=12.814900398254395, test_abs_avg=12.798942565917969
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.32995349168777466, max_abs=3.6875, mean_rel=0.2308783382177353, max_rel=1624.9998779296875, norm_rel=0.022266263142228127, ref_abs_avg=14.831981658935547, test_abs_avg=14.832508087158203
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.3229813873767853, max_abs=3.5, mean_rel=0.18329425156116486, max_rel=1140.625, norm_rel=0.02206788770854473, ref_abs_avg=14.663915634155273, test_abs_avg=14.668200492858887
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.2539067268371582, max_abs=1.125, mean_rel=0.09052213281393051, max_rel=6.294963836669922, norm_rel=0.021660879254341125, ref_abs_avg=11.704788208007812, test_abs_avg=11.714119911193848
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.30471038818359375, max_abs=2.625, mean_rel=0.18983682990074158, max_rel=1531.2498779296875, norm_rel=0.021461378782987595, ref_abs_avg=14.239021301269531, test_abs_avg=14.238251686096191
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.29778096079826355, max_abs=3.875, mean_rel=0.16992640495300293, max_rel=781.2499389648438, norm_rel=0.021554093807935715, ref_abs_avg=13.92523193359375, test_abs_avg=13.93004035949707
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.24663853645324707, max_abs=0.978515625, mean_rel=0.10711503028869629, max_rel=7.591462135314941, norm_rel=0.02385174296796322, ref_abs_avg=10.413944244384766, test_abs_avg=10.410432815551758
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.2835496962070465, max_abs=2.5, mean_rel=0.19716797769069672, max_rel=1437.4998779296875, norm_rel=0.02105928398668766, ref_abs_avg=13.514683723449707, test_abs_avg=13.51429557800293
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.279907763004303, max_abs=3.875, mean_rel=0.14100825786590576, max_rel=1499.9998779296875, norm_rel=0.021532760933041573, ref_abs_avg=13.152386665344238, test_abs_avg=13.15504264831543
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.2258228063583374, max_abs=1.125, mean_rel=0.13038606941699982, max_rel=19.849151611328125, norm_rel=0.022051634266972542, ref_abs_avg=10.370721817016602, test_abs_avg=10.358813285827637
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.26981455087661743, max_abs=2.375, mean_rel=0.19176241755485535, max_rel=1374.9998779296875, norm_rel=0.020411768928170204, ref_abs_avg=13.299307823181152, test_abs_avg=13.29809284210205
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.260348916053772, max_abs=3.25, mean_rel=0.14648909866809845, max_rel=671.8749389648438, norm_rel=0.02017958089709282, ref_abs_avg=13.005265235900879, test_abs_avg=13.007566452026367
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.20145711302757263, max_abs=0.828125, mean_rel=0.10499130934476852, max_rel=7.433783531188965, norm_rel=0.019084518775343895, ref_abs_avg=10.696525573730469, test_abs_avg=10.695367813110352
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.25284016132354736, max_abs=2.6875, mean_rel=0.18634265661239624, max_rel=1234.375, norm_rel=0.020006606355309486, ref_abs_avg=12.77432632446289, test_abs_avg=12.773123741149902
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.24349944293498993, max_abs=3.875, mean_rel=0.13197655975818634, max_rel=554.6875, norm_rel=0.019137971103191376, ref_abs_avg=12.800605773925781, test_abs_avg=12.796117782592773
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.20222872495651245, max_abs=0.75, mean_rel=0.07769054174423218, max_rel=5.693558692932129, norm_rel=0.019528662785887718, ref_abs_avg=10.366214752197266, test_abs_avg=10.36555290222168
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.2313867062330246, max_abs=2.78125, mean_rel=0.17509795725345612, max_rel=1093.75, norm_rel=0.01937716454267502, ref_abs_avg=12.123010635375977, test_abs_avg=12.122305870056152
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.23116083443164825, max_abs=3.0, mean_rel=0.1448800265789032, max_rel=734.3749389648438, norm_rel=0.01943598873913288, ref_abs_avg=12.038049697875977, test_abs_avg=12.038419723510742
liger_forward vs paper_forward output: mean_abs=0.00038665931788273156, max_abs=0.0302734375
liger_forward grad[0] vs paper_forward: mean_abs=0.00465250201523304, max_abs=0.30078125, mean_rel=0.03659655898809433, max_rel=68.59514617919922, norm_rel=0.012502272613346577, ref_abs_avg=0.45215165615081787, test_abs_avg=0.45214277505874634
liger_forward grad[1] vs paper_forward: mean_abs=2.471778631210327, max_abs=24.0, mean_rel=0.07318079471588135, max_rel=85.33113098144531, norm_rel=0.010237117297947407, ref_abs_avg=219.76283264160156, test_abs_avg=219.75228881835938
liger_forward grad[2] vs paper_forward: mean_abs=0.48089027404785156, max_abs=2.0, mean_rel=0.10298997163772583, max_rel=8.588645935058594, norm_rel=0.013236230239272118, ref_abs_avg=36.81531524658203, test_abs_avg=36.744415283203125
liger_forward grad[3] vs paper_forward: mean_abs=0.6102097630500793, max_abs=4.0, mean_rel=0.3070196211338043, max_rel=2500.0, norm_rel=0.013585480861365795, ref_abs_avg=45.72734069824219, test_abs_avg=45.7296142578125
liger_forward grad[4] vs paper_forward: mean_abs=0.5906386375427246, max_abs=4.0, mean_rel=0.16422562301158905, max_rel=1312.4998779296875, norm_rel=0.013341612182557583, ref_abs_avg=45.06587219238281, test_abs_avg=45.06493377685547
liger_forward grad[5] vs paper_forward: mean_abs=0.41857385635375977, max_abs=1.625, mean_rel=0.11744654178619385, max_rel=27.380531311035156, norm_rel=0.0132050896063447, ref_abs_avg=31.4451847076416, test_abs_avg=31.466773986816406
liger_forward grad[6] vs paper_forward: mean_abs=0.5274055004119873, max_abs=4.0, mean_rel=0.22686509788036346, max_rel=1687.4998779296875, norm_rel=0.013214366510510445, ref_abs_avg=40.6658935546875, test_abs_avg=40.66834259033203
liger_forward grad[7] vs paper_forward: mean_abs=0.5115777254104614, max_abs=3.0, mean_rel=0.19434058666229248, max_rel=2250.0, norm_rel=0.012958312407135963, ref_abs_avg=40.16265106201172, test_abs_avg=40.166229248046875
liger_forward grad[8] vs paper_forward: mean_abs=0.37586259841918945, max_abs=1.5, mean_rel=0.04409191012382507, max_rel=1.8054251670837402, norm_rel=0.013132697902619839, ref_abs_avg=29.935989379882812, test_abs_avg=29.91927719116211
liger_forward grad[9] vs paper_forward: mean_abs=0.47272926568984985, max_abs=3.0, mean_rel=0.22625522315502167, max_rel=1374.9998779296875, norm_rel=0.012968110851943493, ref_abs_avg=37.126014709472656, test_abs_avg=37.127967834472656
liger_forward grad[10] vs paper_forward: mean_abs=0.45591050386428833, max_abs=3.0, mean_rel=0.14972856640815735, max_rel=1499.9998779296875, norm_rel=0.012661170214414597, ref_abs_avg=36.651641845703125, test_abs_avg=36.65071105957031
liger_forward grad[11] vs paper_forward: mean_abs=0.35538196563720703, max_abs=1.5, mean_rel=0.09818577021360397, max_rel=22.3947811126709, norm_rel=0.012563502416014671, ref_abs_avg=28.56995391845703, test_abs_avg=28.56740379333496
liger_forward grad[12] vs paper_forward: mean_abs=0.4323870539665222, max_abs=2.875, mean_rel=0.23464089632034302, max_rel=1437.4998779296875, norm_rel=0.012735970318317413, ref_abs_avg=34.624847412109375, test_abs_avg=34.62726593017578
liger_forward grad[13] vs paper_forward: mean_abs=0.4189513921737671, max_abs=2.5, mean_rel=0.12232539802789688, max_rel=1156.25, norm_rel=0.012549348175525665, ref_abs_avg=33.97350311279297, test_abs_avg=33.974342346191406
liger_forward grad[14] vs paper_forward: mean_abs=0.3296194076538086, max_abs=1.5, mean_rel=0.043710485100746155, max_rel=3.5440733432769775, norm_rel=0.01226933766156435, ref_abs_avg=26.997406005859375, test_abs_avg=26.969451904296875
liger_forward grad[15] vs paper_forward: mean_abs=0.40165582299232483, max_abs=2.5, mean_rel=0.20629292726516724, max_rel=1374.9998779296875, norm_rel=0.012559376657009125, ref_abs_avg=32.597877502441406, test_abs_avg=32.599098205566406
liger_forward grad[16] vs paper_forward: mean_abs=0.3881272077560425, max_abs=2.5, mean_rel=0.13521289825439453, max_rel=1062.5, norm_rel=0.012311046943068504, ref_abs_avg=32.10940933227539, test_abs_avg=32.110496520996094
liger_forward grad[17] vs paper_forward: mean_abs=0.30322742462158203, max_abs=1.375, mean_rel=0.08794474601745605, max_rel=9.146341323852539, norm_rel=0.013134408742189407, ref_abs_avg=23.074634552001953, test_abs_avg=23.057838439941406
liger_forward grad[18] vs paper_forward: mean_abs=0.37359219789505005, max_abs=2.375, mean_rel=0.17081966996192932, max_rel=1125.0, norm_rel=0.0124236224219203, ref_abs_avg=30.705598831176758, test_abs_avg=30.705717086791992
liger_forward grad[19] vs paper_forward: mean_abs=0.36343902349472046, max_abs=2.5, mean_rel=0.09043365716934204, max_rel=1031.25, norm_rel=0.012191633693873882, ref_abs_avg=30.367618560791016, test_abs_avg=30.369770050048828
liger_forward grad[20] vs paper_forward: mean_abs=0.2911149859428406, max_abs=1.0234375, mean_rel=0.08713772892951965, max_rel=19.435443878173828, norm_rel=0.011570761911571026, ref_abs_avg=25.196876525878906, test_abs_avg=25.194169998168945
liger_forward grad[21] vs paper_forward: mean_abs=0.351188600063324, max_abs=2.25, mean_rel=0.18009516596794128, max_rel=1250.0, norm_rel=0.012200870551168919, ref_abs_avg=29.357746124267578, test_abs_avg=29.359111785888672
liger_forward grad[22] vs paper_forward: mean_abs=0.3388688862323761, max_abs=2.25, mean_rel=0.10830600559711456, max_rel=843.7499389648438, norm_rel=0.011893186718225479, ref_abs_avg=29.025928497314453, test_abs_avg=29.02808380126953
liger_forward grad[23] vs paper_forward: mean_abs=0.28407907485961914, max_abs=1.0, mean_rel=0.05084347724914551, max_rel=2.8205511569976807, norm_rel=0.012907061725854874, ref_abs_avg=22.348602294921875, test_abs_avg=22.35869789123535
liger_forward grad[24] vs paper_forward: mean_abs=0.3321465253829956, max_abs=2.25, mean_rel=0.17698170244693756, max_rel=1187.5, norm_rel=0.012064189650118351, ref_abs_avg=28.09967803955078, test_abs_avg=28.100154876708984
liger_forward grad[25] vs paper_forward: mean_abs=0.32070398330688477, max_abs=2.25, mean_rel=0.1010909378528595, max_rel=812.4999389648438, norm_rel=0.011793972924351692, ref_abs_avg=27.73802375793457, test_abs_avg=27.738378524780273
liger_forward grad[26] vs paper_forward: mean_abs=0.29579830169677734, max_abs=1.25, mean_rel=0.05501532554626465, max_rel=9.384190559387207, norm_rel=0.012527521699666977, ref_abs_avg=23.98821258544922, test_abs_avg=23.99853515625
liger_forward grad[27] vs paper_forward: mean_abs=0.3632913827896118, max_abs=2.5, mean_rel=0.16871225833892822, max_rel=1328.1248779296875, norm_rel=0.012549386359751225, ref_abs_avg=29.507240295410156, test_abs_avg=29.507671356201172
liger_forward grad[28] vs paper_forward: mean_abs=0.35087329149246216, max_abs=2.25, mean_rel=0.1230664923787117, max_rel=984.3749389648438, norm_rel=0.012263718992471695, ref_abs_avg=29.135578155517578, test_abs_avg=29.136146545410156
liger_forward grad[29] vs paper_forward: mean_abs=0.2756185531616211, max_abs=0.875, mean_rel=0.04444336146116257, max_rel=1.9998328685760498, norm_rel=0.012244575656950474, ref_abs_avg=22.71471405029297, test_abs_avg=22.71368408203125
liger_forward grad[30] vs paper_forward: mean_abs=0.32529228925704956, max_abs=2.0, mean_rel=0.17090116441249847, max_rel=1156.25, norm_rel=0.012299278751015663, ref_abs_avg=26.98845672607422, test_abs_avg=26.98910140991211
liger_forward grad[31] vs paper_forward: mean_abs=0.3155195116996765, max_abs=2.0, mean_rel=0.11333788931369781, max_rel=1039.0625, norm_rel=0.012051071040332317, ref_abs_avg=26.684879302978516, test_abs_avg=26.687244415283203
liger_forward grad[32] vs paper_forward: mean_abs=0.23880815505981445, max_abs=1.046875, mean_rel=0.05070025473833084, max_rel=3.655406951904297, norm_rel=0.01171589083969593, ref_abs_avg=20.62885284423828, test_abs_avg=20.631608963012695
liger_forward grad[33] vs paper_forward: mean_abs=0.2988095283508301, max_abs=2.0, mean_rel=0.1592203676700592, max_rel=1062.5, norm_rel=0.012080955319106579, ref_abs_avg=25.24026107788086, test_abs_avg=25.240371704101562
liger_forward grad[34] vs paper_forward: mean_abs=0.2898522615432739, max_abs=1.75, mean_rel=0.1126556470990181, max_rel=1390.6248779296875, norm_rel=0.011810447089374065, ref_abs_avg=25.046863555908203, test_abs_avg=25.04717254638672
liger_forward grad[35] vs paper_forward: mean_abs=0.23682832717895508, max_abs=1.0, mean_rel=0.0859317034482956, max_rel=9.391576766967773, norm_rel=0.01167244277894497, ref_abs_avg=20.51504898071289, test_abs_avg=20.5142879486084
liger_forward grad[36] vs paper_forward: mean_abs=0.2788374722003937, max_abs=1.75, mean_rel=0.14434422552585602, max_rel=874.9999389648438, norm_rel=0.011898940429091454, ref_abs_avg=23.928983688354492, test_abs_avg=23.929807662963867
liger_forward grad[37] vs paper_forward: mean_abs=0.270079106092453, max_abs=1.75, mean_rel=0.09070465713739395, max_rel=656.2499389648438, norm_rel=0.011546233668923378, ref_abs_avg=23.87112808227539, test_abs_avg=23.870683670043945
liger_forward grad[38] vs paper_forward: mean_abs=0.22692012786865234, max_abs=0.8125, mean_rel=0.12140848487615585, max_rel=24.05355453491211, norm_rel=0.012014707550406456, ref_abs_avg=18.77190399169922, test_abs_avg=18.777158737182617
liger_forward grad[39] vs paper_forward: mean_abs=0.26173725724220276, max_abs=1.75, mean_rel=0.15128310024738312, max_rel=749.9999389648438, norm_rel=0.011681687086820602, ref_abs_avg=22.881351470947266, test_abs_avg=22.881446838378906
liger_forward grad[40] vs paper_forward: mean_abs=0.2526758015155792, max_abs=1.5, mean_rel=0.1002628281712532, max_rel=796.8749389648438, norm_rel=0.011429816484451294, ref_abs_avg=22.553905487060547, test_abs_avg=22.554977416992188
liger_forward grad[41] vs paper_forward: mean_abs=0.21412992477416992, max_abs=0.75, mean_rel=0.05552014335989952, max_rel=3.312328338623047, norm_rel=0.01222287118434906, ref_abs_avg=17.675657272338867, test_abs_avg=17.692577362060547
liger_forward grad[42] vs paper_forward: mean_abs=0.24886244535446167, max_abs=1.5, mean_rel=0.1349940001964569, max_rel=999.9999389648438, norm_rel=0.011491966433823109, ref_abs_avg=22.101913452148438, test_abs_avg=22.10190200805664
liger_forward grad[43] vs paper_forward: mean_abs=0.2416122704744339, max_abs=1.5, mean_rel=0.10707555711269379, max_rel=632.8125, norm_rel=0.01126962061971426, ref_abs_avg=21.87014389038086, test_abs_avg=21.870464324951172
liger_forward grad[44] vs paper_forward: mean_abs=0.18385326862335205, max_abs=0.75, mean_rel=0.0839509665966034, max_rel=18.585201263427734, norm_rel=0.010701067745685577, ref_abs_avg=17.338542938232422, test_abs_avg=17.331661224365234
liger_forward grad[45] vs paper_forward: mean_abs=0.2337392270565033, max_abs=1.75, mean_rel=0.14285901188850403, max_rel=874.9999389648438, norm_rel=0.011402377858757973, ref_abs_avg=20.903797149658203, test_abs_avg=20.905025482177734
liger_forward grad[46] vs paper_forward: mean_abs=0.229766383767128, max_abs=1.5, mean_rel=0.09262499213218689, max_rel=609.375, norm_rel=0.011404731310904026, ref_abs_avg=20.573020935058594, test_abs_avg=20.571386337280273
liger_forward grad[47] vs paper_forward: mean_abs=0.18329322338104248, max_abs=0.75, mean_rel=0.12980730831623077, max_rel=26.09349822998047, norm_rel=0.01132142636924982, ref_abs_avg=16.573673248291016, test_abs_avg=16.588703155517578
liger_forward grad[48] vs paper_forward: mean_abs=0.22404949367046356, max_abs=1.5625, mean_rel=0.12709680199623108, max_rel=874.9999389648438, norm_rel=0.011194167658686638, ref_abs_avg=20.424524307250977, test_abs_avg=20.42441177368164
liger_forward grad[49] vs paper_forward: mean_abs=0.21639582514762878, max_abs=1.4375, mean_rel=0.09687305241823196, max_rel=578.125, norm_rel=0.011079618707299232, ref_abs_avg=19.992259979248047, test_abs_avg=19.99172592163086
liger_forward grad[50] vs paper_forward: mean_abs=0.20429468154907227, max_abs=0.75, mean_rel=0.04290201514959335, max_rel=1.711376428604126, norm_rel=0.011533753015100956, ref_abs_avg=18.229686737060547, test_abs_avg=18.209609985351562
liger_forward grad[51] vs paper_forward: mean_abs=0.25900983810424805, max_abs=2.0, mean_rel=0.13970325887203217, max_rel=859.3749389648438, norm_rel=0.012046766467392445, ref_abs_avg=21.937835693359375, test_abs_avg=21.93770980834961
liger_forward grad[52] vs paper_forward: mean_abs=0.25052544474601746, max_abs=2.0, mean_rel=0.11397579312324524, max_rel=843.7499389648438, norm_rel=0.011977018788456917, ref_abs_avg=21.294940948486328, test_abs_avg=21.29383087158203
liger_forward grad[53] vs paper_forward: mean_abs=0.18646860122680664, max_abs=0.75, mean_rel=0.05071033909916878, max_rel=4.066142559051514, norm_rel=0.011395549401640892, ref_abs_avg=16.53561782836914, test_abs_avg=16.52196502685547
liger_forward grad[54] vs paper_forward: mean_abs=0.23256944119930267, max_abs=1.75, mean_rel=0.14834538102149963, max_rel=1437.4998779296875, norm_rel=0.011627360247075558, ref_abs_avg=20.411123275756836, test_abs_avg=20.41180992126465
liger_forward grad[55] vs paper_forward: mean_abs=0.22653944790363312, max_abs=1.5, mean_rel=0.09148001670837402, max_rel=656.2499389648438, norm_rel=0.011473244987428188, ref_abs_avg=20.121692657470703, test_abs_avg=20.122623443603516
liger_forward grad[56] vs paper_forward: mean_abs=0.1803288459777832, max_abs=0.75, mean_rel=0.03523213043808937, max_rel=1.4810523986816406, norm_rel=0.011587899178266525, ref_abs_avg=15.880451202392578, test_abs_avg=15.890531539916992
liger_forward grad[57] vs paper_forward: mean_abs=0.21685925126075745, max_abs=1.5, mean_rel=0.13675557076931, max_rel=906.2499389648438, norm_rel=0.011427177116274834, ref_abs_avg=19.357479095458984, test_abs_avg=19.358022689819336
liger_forward grad[58] vs paper_forward: mean_abs=0.2100183665752411, max_abs=1.5, mean_rel=0.10913387686014175, max_rel=718.7499389648438, norm_rel=0.011304566636681557, ref_abs_avg=18.964744567871094, test_abs_avg=18.967693328857422
liger_forward grad[59] vs paper_forward: mean_abs=0.1592165231704712, max_abs=0.625, mean_rel=0.03978333994746208, max_rel=2.094896078109741, norm_rel=0.010383840650320053, ref_abs_avg=15.696249008178711, test_abs_avg=15.68685245513916
liger_forward grad[60] vs paper_forward: mean_abs=0.20131133496761322, max_abs=1.25, mean_rel=0.12703470885753632, max_rel=687.4999389648438, norm_rel=0.011217868886888027, ref_abs_avg=18.306289672851562, test_abs_avg=18.306373596191406
liger_forward grad[61] vs paper_forward: mean_abs=0.19738422334194183, max_abs=1.5, mean_rel=0.09529048204421997, max_rel=781.2499389648438, norm_rel=0.011027687229216099, ref_abs_avg=18.220102310180664, test_abs_avg=18.219947814941406
liger_forward grad[62] vs paper_forward: mean_abs=0.15510225296020508, max_abs=0.625, mean_rel=0.04984497278928757, max_rel=6.226503849029541, norm_rel=0.01068856567144394, ref_abs_avg=14.937843322753906, test_abs_avg=14.93359375
liger_forward grad[63] vs paper_forward: mean_abs=0.19039073586463928, max_abs=1.65625, mean_rel=0.10975983738899231, max_rel=656.2499389648438, norm_rel=0.01104908436536789, ref_abs_avg=17.58639144897461, test_abs_avg=17.58732032775879
liger_forward grad[64] vs paper_forward: mean_abs=0.18601717054843903, max_abs=1.578125, mean_rel=0.08310724794864655, max_rel=499.9999694824219, norm_rel=0.011011192575097084, ref_abs_avg=17.27721405029297, test_abs_avg=17.276216506958008
liger_forward grad[65] vs paper_forward: mean_abs=0.14648544788360596, max_abs=0.5625, mean_rel=0.03543262183666229, max_rel=1.094287395477295, norm_rel=0.011048542335629463, ref_abs_avg=14.078263282775879, test_abs_avg=14.074016571044922
liger_forward grad[66] vs paper_forward: mean_abs=0.17915841937065125, max_abs=1.25, mean_rel=0.10500581562519073, max_rel=625.0, norm_rel=0.010834144428372383, ref_abs_avg=16.87433624267578, test_abs_avg=16.875137329101562
liger_forward grad[67] vs paper_forward: mean_abs=0.17504601180553436, max_abs=1.25, mean_rel=0.08730627596378326, max_rel=843.7499389648438, norm_rel=0.010748853906989098, ref_abs_avg=16.63825035095215, test_abs_avg=16.637718200683594
liger_forward grad[68] vs paper_forward: mean_abs=0.13726434111595154, max_abs=0.75, mean_rel=0.06396561115980148, max_rel=12.972325325012207, norm_rel=0.010527027770876884, ref_abs_avg=14.038765907287598, test_abs_avg=14.041239738464355
liger_forward grad[69] vs paper_forward: mean_abs=0.17247028648853302, max_abs=1.25, mean_rel=0.11371605098247528, max_rel=593.75, norm_rel=0.010598479770123959, ref_abs_avg=16.6068058013916, test_abs_avg=16.607330322265625
liger_forward grad[70] vs paper_forward: mean_abs=0.1676402986049652, max_abs=1.5, mean_rel=0.07676004618406296, max_rel=507.8124694824219, norm_rel=0.010594220831990242, ref_abs_avg=16.130495071411133, test_abs_avg=16.13185691833496
liger_forward grad[71] vs paper_forward: mean_abs=0.13204741477966309, max_abs=0.5, mean_rel=0.06421877443790436, max_rel=9.039168357849121, norm_rel=0.009858822450041771, ref_abs_avg=13.467937469482422, test_abs_avg=13.470399856567383
liger_forward grad[72] vs paper_forward: mean_abs=0.16315285861492157, max_abs=1.25, mean_rel=0.11272026598453522, max_rel=625.0, norm_rel=0.010479791089892387, ref_abs_avg=15.886011123657227, test_abs_avg=15.886419296264648
liger_forward grad[73] vs paper_forward: mean_abs=0.15699106454849243, max_abs=1.375, mean_rel=0.07797329127788544, max_rel=562.5, norm_rel=0.010205033235251904, ref_abs_avg=15.698066711425781, test_abs_avg=15.698427200317383
liger_forward grad[74] vs paper_forward: mean_abs=0.15064716339111328, max_abs=0.75, mean_rel=0.033857136964797974, max_rel=1.4642937183380127, norm_rel=0.011061768978834152, ref_abs_avg=14.100811958312988, test_abs_avg=14.095827102661133
liger_forward grad[75] vs paper_forward: mean_abs=0.18533699214458466, max_abs=1.5, mean_rel=0.11384924501180649, max_rel=687.4999389648438, norm_rel=0.011542895808815956, ref_abs_avg=16.368633270263672, test_abs_avg=16.368850708007812
liger_forward grad[76] vs paper_forward: mean_abs=0.18213078379631042, max_abs=1.53125, mean_rel=0.08400571346282959, max_rel=562.5, norm_rel=0.011379438452422619, ref_abs_avg=16.27424430847168, test_abs_avg=16.274166107177734
liger_forward grad[77] vs paper_forward: mean_abs=0.13464570045471191, max_abs=0.609375, mean_rel=0.040382225066423416, max_rel=2.3664512634277344, norm_rel=0.010181441903114319, ref_abs_avg=13.505280494689941, test_abs_avg=13.516606330871582
liger_forward grad[78] vs paper_forward: mean_abs=0.17179760336875916, max_abs=1.75, mean_rel=0.1112251728773117, max_rel=640.625, norm_rel=0.011209150776267052, ref_abs_avg=15.626630783081055, test_abs_avg=15.626327514648438
liger_forward grad[79] vs paper_forward: mean_abs=0.16456958651542664, max_abs=1.5, mean_rel=0.08275754749774933, max_rel=453.1249694824219, norm_rel=0.010851612314581871, ref_abs_avg=15.478316307067871, test_abs_avg=15.480184555053711
liger_forward grad[80] vs paper_forward: mean_abs=0.12284088134765625, max_abs=0.5, mean_rel=0.03271661698818207, max_rel=1.2895560264587402, norm_rel=0.00992149580270052, ref_abs_avg=12.814900398254395, test_abs_avg=12.806418418884277
liger_forward grad[81] vs paper_forward: mean_abs=0.15849095582962036, max_abs=1.5, mean_rel=0.11372976005077362, max_rel=749.9999389648438, norm_rel=0.010907714255154133, ref_abs_avg=14.831981658935547, test_abs_avg=14.831988334655762
liger_forward grad[82] vs paper_forward: mean_abs=0.15264125168323517, max_abs=2.0, mean_rel=0.077436663210392, max_rel=382.8124694824219, norm_rel=0.010683918371796608, ref_abs_avg=14.663915634155273, test_abs_avg=14.663762092590332
liger_forward grad[83] vs paper_forward: mean_abs=0.11754465103149414, max_abs=0.4375, mean_rel=0.03790440037846565, max_rel=1.3264731168746948, norm_rel=0.009972738102078438, ref_abs_avg=11.704788208007812, test_abs_avg=11.708294868469238
liger_forward grad[84] vs paper_forward: mean_abs=0.1466565728187561, max_abs=1.125, mean_rel=0.09423595666885376, max_rel=625.0, norm_rel=0.010548586025834084, ref_abs_avg=14.239021301269531, test_abs_avg=14.238621711730957
liger_forward grad[85] vs paper_forward: mean_abs=0.14291629195213318, max_abs=1.25, mean_rel=0.08084367215633392, max_rel=390.6249694824219, norm_rel=0.010549994185566902, ref_abs_avg=13.92523193359375, test_abs_avg=13.925729751586914
liger_forward grad[86] vs paper_forward: mean_abs=0.1123971939086914, max_abs=0.40625, mean_rel=0.05362405627965927, max_rel=5.108755111694336, norm_rel=0.01084171049296856, ref_abs_avg=10.413944244384766, test_abs_avg=10.41629695892334
liger_forward grad[87] vs paper_forward: mean_abs=0.13653147220611572, max_abs=1.25, mean_rel=0.09668824821710587, max_rel=390.6249694824219, norm_rel=0.01036097202450037, ref_abs_avg=13.514683723449707, test_abs_avg=13.51481819152832
liger_forward grad[88] vs paper_forward: mean_abs=0.1352463662624359, max_abs=1.75, mean_rel=0.06787607073783875, max_rel=347.6562194824219, norm_rel=0.010574872605502605, ref_abs_avg=13.152386665344238, test_abs_avg=13.152841567993164
liger_forward grad[89] vs paper_forward: mean_abs=0.10925090312957764, max_abs=0.5, mean_rel=0.08709798753261566, max_rel=12.218266487121582, norm_rel=0.010551861487329006, ref_abs_avg=10.370721817016602, test_abs_avg=10.363309860229492
liger_forward grad[90] vs paper_forward: mean_abs=0.13000443577766418, max_abs=1.5, mean_rel=0.09110752493143082, max_rel=531.25, norm_rel=0.010067758150398731, ref_abs_avg=13.299307823181152, test_abs_avg=13.299057006835938
liger_forward grad[91] vs paper_forward: mean_abs=0.1265941709280014, max_abs=1.5, mean_rel=0.07488644868135452, max_rel=414.0624694824219, norm_rel=0.01006999984383583, ref_abs_avg=13.005265235900879, test_abs_avg=13.00646686553955
liger_forward grad[92] vs paper_forward: mean_abs=0.10482442378997803, max_abs=0.40625, mean_rel=0.07394100725650787, max_rel=9.897573471069336, norm_rel=0.010082248598337173, ref_abs_avg=10.696525573730469, test_abs_avg=10.695322036743164
liger_forward grad[93] vs paper_forward: mean_abs=0.12170802801847458, max_abs=1.5, mean_rel=0.08767828345298767, max_rel=625.0, norm_rel=0.009856848046183586, ref_abs_avg=12.77432632446289, test_abs_avg=12.774356842041016
liger_forward grad[94] vs paper_forward: mean_abs=0.11674012243747711, max_abs=1.5, mean_rel=0.0699886828660965, max_rel=421.8749694824219, norm_rel=0.009386011399328709, ref_abs_avg=12.800605773925781, test_abs_avg=12.801008224487305
liger_forward grad[95] vs paper_forward: mean_abs=0.09050697088241577, max_abs=0.4375, mean_rel=0.04183147847652435, max_rel=5.828317642211914, norm_rel=0.008908163756132126, ref_abs_avg=10.366214752197266, test_abs_avg=10.373685836791992
liger_forward grad[96] vs paper_forward: mean_abs=0.11162217706441879, max_abs=1.0, mean_rel=0.0842774361371994, max_rel=499.9999694824219, norm_rel=0.009592259302735329, ref_abs_avg=12.123010635375977, test_abs_avg=12.122907638549805
liger_forward grad[97] vs paper_forward: mean_abs=0.11247557401657104, max_abs=1.546875, mean_rel=0.0697464868426323, max_rel=328.1249694824219, norm_rel=0.009711495600640774, ref_abs_avg=12.038049697875977, test_abs_avg=12.04100513458252
identity layers + randn queries
liger_forward fwd+bwd:  167.802 ms
liger_forward bwd-only: 146.350 ms
liger_forward peak allocated: fwd=7.853 GiB, fwd+bwd=7.853 GiB
liger_forward peak reserved:  fwd=7.904 GiB, fwd+bwd=8.217 GiB
torch_compile_phases_forward fwd+bwd:  85.015 ms
torch_compile_phases_forward bwd-only: 67.132 ms
torch_compile_phases_forward peak allocated: fwd=6.596 GiB, fwd+bwd=6.909 GiB
torch_compile_phases_forward peak reserved:  fwd=6.818 GiB, fwd+bwd=8.945 GiB
pytorch_attn_res_forward fwd+bwd:  992.459 ms
pytorch_attn_res_forward bwd-only: 817.582 ms
pytorch_attn_res_forward peak allocated: fwd=43.917 GiB, fwd+bwd=45.039 GiB
pytorch_attn_res_forward peak reserved:  fwd=45.094 GiB, fwd+bwd=46.346 GiB
production_forward fwd+bwd:  57.853 ms
production_forward bwd-only: 48.954 ms
production_forward peak allocated: fwd=1.300 GiB, fwd+bwd=5.302 GiB
production_forward peak reserved:  fwd=2.305 GiB, fwd+bwd=5.430 GiB
paper_forward fwd+bwd:  195.665 ms
paper_forward bwd-only: 153.975 ms
paper_forward peak allocated: fwd=15.056 GiB, fwd+bwd=16.116 GiB
paper_forward peak reserved:  fwd=15.104 GiB, fwd+bwd=16.354 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016120814252644777, max_abs=0.032470703125
production_forward grad[0] vs paper_forward: mean_abs=0.008073119446635246, max_abs=0.375, mean_rel=0.07052332162857056, max_rel=95.65785217285156, norm_rel=0.019205190241336823, ref_abs_avg=0.45513755083084106, test_abs_avg=0.4551530182361603
production_forward grad[1] vs paper_forward: mean_abs=4.93707275390625, max_abs=40.0, mean_rel=0.17235787212848663, max_rel=332.4716491699219, norm_rel=0.01970656029880047, ref_abs_avg=220.91897583007812, test_abs_avg=220.88218688964844
production_forward grad[2] vs paper_forward: mean_abs=0.8008972406387329, max_abs=3.0, mean_rel=0.4747125208377838, max_rel=101.20562744140625, norm_rel=0.021789485588669777, ref_abs_avg=37.39579772949219, test_abs_avg=37.368377685546875
production_forward grad[3] vs paper_forward: mean_abs=1.017902135848999, max_abs=7.0, mean_rel=0.42080339789390564, max_rel=3374.999755859375, norm_rel=0.022542638704180717, ref_abs_avg=45.413307189941406, test_abs_avg=45.41690444946289
production_forward grad[4] vs paper_forward: mean_abs=0.9921409487724304, max_abs=7.5, mean_rel=0.28773605823516846, max_rel=3249.999755859375, norm_rel=0.022258738055825233, ref_abs_avg=44.841949462890625, test_abs_avg=44.839622497558594
production_forward grad[5] vs paper_forward: mean_abs=0.748626708984375, max_abs=3.4892578125, mean_rel=0.1113518700003624, max_rel=17.512645721435547, norm_rel=0.02374056912958622, ref_abs_avg=31.152013778686523, test_abs_avg=31.158456802368164
production_forward grad[6] vs paper_forward: mean_abs=0.9026038646697998, max_abs=6.25, mean_rel=0.4480903446674347, max_rel=2999.999755859375, norm_rel=0.02241562306880951, ref_abs_avg=40.50524139404297, test_abs_avg=40.50713348388672
production_forward grad[7] vs paper_forward: mean_abs=0.8780787587165833, max_abs=5.28125, mean_rel=0.264748752117157, max_rel=3124.999755859375, norm_rel=0.022038493305444717, ref_abs_avg=40.084922790527344, test_abs_avg=40.08911895751953
production_forward grad[8] vs paper_forward: mean_abs=0.6925449371337891, max_abs=2.9375, mean_rel=0.07937045395374298, max_rel=3.672668933868408, norm_rel=0.023943394422531128, ref_abs_avg=29.721181869506836, test_abs_avg=29.669208526611328
production_forward grad[9] vs paper_forward: mean_abs=0.8178051710128784, max_abs=5.0, mean_rel=0.3783573508262634, max_rel=2796.874755859375, norm_rel=0.022143250331282616, ref_abs_avg=37.17271423339844, test_abs_avg=37.173057556152344
production_forward grad[10] vs paper_forward: mean_abs=0.7976640462875366, max_abs=4.75, mean_rel=0.21181133389472961, max_rel=2765.624755859375, norm_rel=0.021731600165367126, ref_abs_avg=36.91738510131836, test_abs_avg=36.91822814941406
production_forward grad[11] vs paper_forward: mean_abs=0.6061177253723145, max_abs=2.69921875, mean_rel=0.09720370918512344, max_rel=8.718583106994629, norm_rel=0.02092796005308628, ref_abs_avg=28.961830139160156, test_abs_avg=29.016738891601562
production_forward grad[12] vs paper_forward: mean_abs=0.7570273280143738, max_abs=5.0, mean_rel=0.3804816007614136, max_rel=3062.499755859375, norm_rel=0.021990876644849777, ref_abs_avg=34.61470031738281, test_abs_avg=34.616172790527344
production_forward grad[13] vs paper_forward: mean_abs=0.7381021976470947, max_abs=4.5, mean_rel=0.26022952795028687, max_rel=2500.0, norm_rel=0.021561075001955032, ref_abs_avg=34.41101837158203, test_abs_avg=34.414024353027344
production_forward grad[14] vs paper_forward: mean_abs=0.5921429395675659, max_abs=2.4375, mean_rel=0.08361420780420303, max_rel=6.626277923583984, norm_rel=0.023008311167359352, ref_abs_avg=25.978391647338867, test_abs_avg=25.948772430419922
production_forward grad[15] vs paper_forward: mean_abs=0.705967903137207, max_abs=4.25, mean_rel=0.3358229696750641, max_rel=2874.999755859375, norm_rel=0.021811731159687042, ref_abs_avg=32.56072998046875, test_abs_avg=32.561485290527344
production_forward grad[16] vs paper_forward: mean_abs=0.6875119209289551, max_abs=4.3125, mean_rel=0.2064734697341919, max_rel=2125.0, norm_rel=0.021486015990376472, ref_abs_avg=32.15374755859375, test_abs_avg=32.15707015991211
production_forward grad[17] vs paper_forward: mean_abs=0.536940336227417, max_abs=1.8125, mean_rel=0.18919667601585388, max_rel=26.441246032714844, norm_rel=0.02092897519469261, ref_abs_avg=25.502288818359375, test_abs_avg=25.514759063720703
production_forward grad[18] vs paper_forward: mean_abs=0.6639932990074158, max_abs=4.0, mean_rel=0.3354358673095703, max_rel=2078.125, norm_rel=0.021654782816767693, ref_abs_avg=30.84798240661621, test_abs_avg=30.847963333129883
production_forward grad[19] vs paper_forward: mean_abs=0.6452008485794067, max_abs=3.625, mean_rel=0.2610732316970825, max_rel=1812.4998779296875, norm_rel=0.021192720159888268, ref_abs_avg=30.54539680480957, test_abs_avg=30.543594360351562
production_forward grad[20] vs paper_forward: mean_abs=0.50473952293396, max_abs=2.4375, mean_rel=0.07661083340644836, max_rel=5.386773109436035, norm_rel=0.021053191274404526, ref_abs_avg=24.271549224853516, test_abs_avg=24.309593200683594
production_forward grad[21] vs paper_forward: mean_abs=0.6305606365203857, max_abs=4.5, mean_rel=0.3285656273365021, max_rel=2218.75, norm_rel=0.021595247089862823, ref_abs_avg=29.36042022705078, test_abs_avg=29.359411239624023
production_forward grad[22] vs paper_forward: mean_abs=0.614356517791748, max_abs=4.125, mean_rel=0.20888552069664001, max_rel=1562.4998779296875, norm_rel=0.02123788744211197, ref_abs_avg=29.033206939697266, test_abs_avg=29.032947540283203
production_forward grad[23] vs paper_forward: mean_abs=0.4894217550754547, max_abs=2.25, mean_rel=0.07553291320800781, max_rel=2.2835474014282227, norm_rel=0.020453162491321564, ref_abs_avg=23.873512268066406, test_abs_avg=23.86612319946289
production_forward grad[24] vs paper_forward: mean_abs=0.5994867086410522, max_abs=3.5, mean_rel=0.30998992919921875, max_rel=2187.5, norm_rel=0.021450545638799667, ref_abs_avg=28.10967445373535, test_abs_avg=28.109848022460938
production_forward grad[25] vs paper_forward: mean_abs=0.5836635828018188, max_abs=3.5, mean_rel=0.1973741352558136, max_rel=1843.7498779296875, norm_rel=0.021021898835897446, ref_abs_avg=27.909358978271484, test_abs_avg=27.913372039794922
production_forward grad[26] vs paper_forward: mean_abs=0.610933780670166, max_abs=2.0625, mean_rel=0.06515668332576752, max_rel=1.8714772462844849, norm_rel=0.02322845719754696, ref_abs_avg=26.34033966064453, test_abs_avg=26.365358352661133
production_forward grad[27] vs paper_forward: mean_abs=0.7053101062774658, max_abs=4.5, mean_rel=0.35427579283714294, max_rel=2500.0, norm_rel=0.02328813448548317, ref_abs_avg=30.453622817993164, test_abs_avg=30.454204559326172
production_forward grad[28] vs paper_forward: mean_abs=0.6846031546592712, max_abs=4.5, mean_rel=0.2428843379020691, max_rel=2062.5, norm_rel=0.022819871082901955, ref_abs_avg=30.14556312561035, test_abs_avg=30.145675659179688
production_forward grad[29] vs paper_forward: mean_abs=0.5398566722869873, max_abs=2.25, mean_rel=0.08531219512224197, max_rel=6.311266899108887, norm_rel=0.02262718416750431, ref_abs_avg=23.63501739501953, test_abs_avg=23.62322235107422
production_forward grad[30] vs paper_forward: mean_abs=0.645942211151123, max_abs=4.75, mean_rel=0.3215712308883667, max_rel=2500.0, norm_rel=0.023708416149020195, ref_abs_avg=27.351669311523438, test_abs_avg=27.352460861206055
production_forward grad[31] vs paper_forward: mean_abs=0.6332442164421082, max_abs=4.3125, mean_rel=0.22005945444107056, max_rel=2312.5, norm_rel=0.023433763533830643, ref_abs_avg=27.157873153686523, test_abs_avg=27.161724090576172
production_forward grad[32] vs paper_forward: mean_abs=0.49519920349121094, max_abs=2.0712890625, mean_rel=0.14724169671535492, max_rel=24.99098014831543, norm_rel=0.022313477471470833, ref_abs_avg=22.01713752746582, test_abs_avg=22.034320831298828
production_forward grad[33] vs paper_forward: mean_abs=0.6057522296905518, max_abs=4.0, mean_rel=0.31018030643463135, max_rel=2375.0, norm_rel=0.02342873066663742, ref_abs_avg=25.953052520751953, test_abs_avg=25.9530029296875
production_forward grad[34] vs paper_forward: mean_abs=0.599031388759613, max_abs=4.125, mean_rel=0.22744707763195038, max_rel=2500.0, norm_rel=0.023476695641875267, ref_abs_avg=25.623046875, test_abs_avg=25.62411117553711
production_forward grad[35] vs paper_forward: mean_abs=0.46359843015670776, max_abs=1.5625, mean_rel=0.10810677707195282, max_rel=7.249507427215576, norm_rel=0.02346215210855007, ref_abs_avg=19.734241485595703, test_abs_avg=19.728185653686523
production_forward grad[36] vs paper_forward: mean_abs=0.567221999168396, max_abs=3.5, mean_rel=0.3250660300254822, max_rel=2437.5, norm_rel=0.02327856793999672, ref_abs_avg=24.456052780151367, test_abs_avg=24.45618438720703
production_forward grad[37] vs paper_forward: mean_abs=0.5538569688796997, max_abs=3.375, mean_rel=0.19365853071212769, max_rel=1593.7498779296875, norm_rel=0.023419663310050964, ref_abs_avg=23.72242546081543, test_abs_avg=23.721858978271484
production_forward grad[38] vs paper_forward: mean_abs=0.4427793025970459, max_abs=1.75, mean_rel=0.14697253704071045, max_rel=18.526025772094727, norm_rel=0.023436836898326874, ref_abs_avg=18.811264038085938, test_abs_avg=18.77511215209961
production_forward grad[39] vs paper_forward: mean_abs=0.5307730436325073, max_abs=3.5, mean_rel=0.3232519328594208, max_rel=1781.2498779296875, norm_rel=0.023201927542686462, ref_abs_avg=22.94829559326172, test_abs_avg=22.94908905029297
production_forward grad[40] vs paper_forward: mean_abs=0.520715594291687, max_abs=3.5, mean_rel=0.21305227279663086, max_rel=1781.2498779296875, norm_rel=0.022858310490846634, ref_abs_avg=22.805240631103516, test_abs_avg=22.80809783935547
production_forward grad[41] vs paper_forward: mean_abs=0.4146614074707031, max_abs=1.5625, mean_rel=0.07078533619642258, max_rel=5.656833171844482, norm_rel=0.0228054691106081, ref_abs_avg=18.579368591308594, test_abs_avg=18.569408416748047
production_forward grad[42] vs paper_forward: mean_abs=0.5034499764442444, max_abs=3.1875, mean_rel=0.3082931935787201, max_rel=1999.9998779296875, norm_rel=0.022953256964683533, ref_abs_avg=22.00307846069336, test_abs_avg=22.003673553466797
production_forward grad[43] vs paper_forward: mean_abs=0.5014567375183105, max_abs=3.0, mean_rel=0.19393381476402283, max_rel=1734.3748779296875, norm_rel=0.02280678041279316, ref_abs_avg=22.056079864501953, test_abs_avg=22.056734085083008
production_forward grad[44] vs paper_forward: mean_abs=0.40216174721717834, max_abs=1.5625, mean_rel=0.21223393082618713, max_rel=70.3034439086914, norm_rel=0.022519350051879883, ref_abs_avg=17.83018684387207, test_abs_avg=17.81210708618164
production_forward grad[45] vs paper_forward: mean_abs=0.4866086542606354, max_abs=3.25, mean_rel=0.3394656479358673, max_rel=1843.7498779296875, norm_rel=0.022691404446959496, ref_abs_avg=21.479015350341797, test_abs_avg=21.480350494384766
production_forward grad[46] vs paper_forward: mean_abs=0.4729144871234894, max_abs=3.0, mean_rel=0.20298103988170624, max_rel=1617.1873779296875, norm_rel=0.022614045068621635, ref_abs_avg=20.938274383544922, test_abs_avg=20.936992645263672
production_forward grad[47] vs paper_forward: mean_abs=0.36387181282043457, max_abs=1.75, mean_rel=0.07462526857852936, max_rel=3.331346273422241, norm_rel=0.021186428144574165, ref_abs_avg=17.084997177124023, test_abs_avg=17.105628967285156
production_forward grad[48] vs paper_forward: mean_abs=0.46148064732551575, max_abs=3.0625, mean_rel=0.2856382131576538, max_rel=2312.5, norm_rel=0.022552967071533203, ref_abs_avg=20.49856948852539, test_abs_avg=20.497406005859375
production_forward grad[49] vs paper_forward: mean_abs=0.4501930773258209, max_abs=3.0, mean_rel=0.20387470722198486, max_rel=1624.9998779296875, norm_rel=0.02223789505660534, ref_abs_avg=20.253211975097656, test_abs_avg=20.25273323059082
production_forward grad[50] vs paper_forward: mean_abs=0.42903995513916016, max_abs=1.8125, mean_rel=0.0739511251449585, max_rel=4.396850109100342, norm_rel=0.024414826184511185, ref_abs_avg=17.707799911499023, test_abs_avg=17.700679779052734
production_forward grad[51] vs paper_forward: mean_abs=0.519123911857605, max_abs=3.5, mean_rel=0.30389443039894104, max_rel=1562.4998779296875, norm_rel=0.023799574002623558, ref_abs_avg=21.862266540527344, test_abs_avg=21.862926483154297
production_forward grad[52] vs paper_forward: mean_abs=0.506606936454773, max_abs=3.125, mean_rel=0.2183467447757721, max_rel=1624.9998779296875, norm_rel=0.02365436963737011, ref_abs_avg=21.49928092956543, test_abs_avg=21.50278091430664
production_forward grad[53] vs paper_forward: mean_abs=0.38550084829330444, max_abs=1.890625, mean_rel=0.08178924024105072, max_rel=3.6888937950134277, norm_rel=0.023508284240961075, ref_abs_avg=16.75925064086914, test_abs_avg=16.77794647216797
production_forward grad[54] vs paper_forward: mean_abs=0.4725269675254822, max_abs=3.875, mean_rel=0.2714635729789734, max_rel=1687.4998779296875, norm_rel=0.02344287373125553, ref_abs_avg=20.187374114990234, test_abs_avg=20.187965393066406
production_forward grad[55] vs paper_forward: mean_abs=0.4695202112197876, max_abs=3.25, mean_rel=0.20390412211418152, max_rel=1749.9998779296875, norm_rel=0.0234746802598238, ref_abs_avg=20.025814056396484, test_abs_avg=20.02530860900879
production_forward grad[56] vs paper_forward: mean_abs=0.35165876150131226, max_abs=1.25, mean_rel=0.15059149265289307, max_rel=18.929372787475586, norm_rel=0.020970456302165985, ref_abs_avg=16.879745483398438, test_abs_avg=16.857940673828125
production_forward grad[57] vs paper_forward: mean_abs=0.4420493245124817, max_abs=3.25, mean_rel=0.2631356120109558, max_rel=1312.4998779296875, norm_rel=0.023077601566910744, ref_abs_avg=19.18609046936035, test_abs_avg=19.18801498413086
production_forward grad[58] vs paper_forward: mean_abs=0.4319431185722351, max_abs=2.875, mean_rel=0.19698399305343628, max_rel=1062.5, norm_rel=0.022553011775016785, ref_abs_avg=19.12338638305664, test_abs_avg=19.124380111694336
production_forward grad[59] vs paper_forward: mean_abs=0.3475315570831299, max_abs=1.375, mean_rel=0.09965429455041885, max_rel=4.721233367919922, norm_rel=0.02162993513047695, ref_abs_avg=15.405131340026855, test_abs_avg=15.451444625854492
production_forward grad[60] vs paper_forward: mean_abs=0.4163353443145752, max_abs=3.0, mean_rel=0.2625371813774109, max_rel=1843.7498779296875, norm_rel=0.02252628281712532, ref_abs_avg=18.498188018798828, test_abs_avg=18.497344970703125
production_forward grad[61] vs paper_forward: mean_abs=0.40562891960144043, max_abs=2.875, mean_rel=0.20750389993190765, max_rel=929.6874389648438, norm_rel=0.0225448627024889, ref_abs_avg=17.97370147705078, test_abs_avg=17.976383209228516
production_forward grad[62] vs paper_forward: mean_abs=0.3139653205871582, max_abs=1.25, mean_rel=0.08126741647720337, max_rel=6.807301044464111, norm_rel=0.021222906187176704, ref_abs_avg=14.73922348022461, test_abs_avg=14.758588790893555
production_forward grad[63] vs paper_forward: mean_abs=0.3908108174800873, max_abs=3.0, mean_rel=0.264749675989151, max_rel=1562.4998779296875, norm_rel=0.02214968018233776, ref_abs_avg=17.625429153442383, test_abs_avg=17.624975204467773
production_forward grad[64] vs paper_forward: mean_abs=0.385009765625, max_abs=2.875, mean_rel=0.2184847891330719, max_rel=2062.5, norm_rel=0.021907538175582886, ref_abs_avg=17.57745361328125, test_abs_avg=17.576091766357422
production_forward grad[65] vs paper_forward: mean_abs=0.31050968170166016, max_abs=1.25, mean_rel=0.09059859067201614, max_rel=4.108374118804932, norm_rel=0.022359242662787437, ref_abs_avg=13.536881446838379, test_abs_avg=13.536105155944824
production_forward grad[66] vs paper_forward: mean_abs=0.3692493736743927, max_abs=3.125, mean_rel=0.24535667896270752, max_rel=1718.7498779296875, norm_rel=0.021842414513230324, ref_abs_avg=16.915512084960938, test_abs_avg=16.915111541748047
production_forward grad[67] vs paper_forward: mean_abs=0.36083024740219116, max_abs=2.5, mean_rel=0.18249264359474182, max_rel=1187.5, norm_rel=0.021530626341700554, ref_abs_avg=16.734439849853516, test_abs_avg=16.726743698120117
production_forward grad[68] vs paper_forward: mean_abs=0.28034353256225586, max_abs=1.1875, mean_rel=0.08836999535560608, max_rel=5.885166645050049, norm_rel=0.020118573680520058, ref_abs_avg=13.921530723571777, test_abs_avg=13.900588989257812
production_forward grad[69] vs paper_forward: mean_abs=0.34996330738067627, max_abs=2.25, mean_rel=0.23021657764911652, max_rel=1187.5, norm_rel=0.02148791216313839, ref_abs_avg=16.294937133789062, test_abs_avg=16.29602813720703
production_forward grad[70] vs paper_forward: mean_abs=0.3473503589630127, max_abs=2.5, mean_rel=0.17621386051177979, max_rel=1499.9998779296875, norm_rel=0.021188538521528244, ref_abs_avg=16.39023208618164, test_abs_avg=16.397438049316406
production_forward grad[71] vs paper_forward: mean_abs=0.2782841920852661, max_abs=1.3125, mean_rel=0.446779727935791, max_rel=97.1871109008789, norm_rel=0.021536292508244514, ref_abs_avg=13.350378036499023, test_abs_avg=13.36163330078125
production_forward grad[72] vs paper_forward: mean_abs=0.33584728837013245, max_abs=3.375, mean_rel=0.2395259141921997, max_rel=1624.9998779296875, norm_rel=0.02111135795712471, ref_abs_avg=15.868847846984863, test_abs_avg=15.869150161743164
production_forward grad[73] vs paper_forward: mean_abs=0.3280464708805084, max_abs=2.5, mean_rel=0.17627862095832825, max_rel=999.9999389648438, norm_rel=0.020816951990127563, ref_abs_avg=15.744575500488281, test_abs_avg=15.74538803100586
production_forward grad[74] vs paper_forward: mean_abs=0.2883593440055847, max_abs=1.0625, mean_rel=0.16132092475891113, max_rel=26.615591049194336, norm_rel=0.021254224702715874, ref_abs_avg=13.607373237609863, test_abs_avg=13.62105941772461
production_forward grad[75] vs paper_forward: mean_abs=0.3618320822715759, max_abs=3.0, mean_rel=0.24306565523147583, max_rel=1281.25, norm_rel=0.022957611829042435, ref_abs_avg=15.783008575439453, test_abs_avg=15.784140586853027
production_forward grad[76] vs paper_forward: mean_abs=0.3528703451156616, max_abs=2.75, mean_rel=0.1798364222049713, max_rel=1078.125, norm_rel=0.02247016318142414, ref_abs_avg=15.695590019226074, test_abs_avg=15.697491645812988
production_forward grad[77] vs paper_forward: mean_abs=0.2867164611816406, max_abs=1.25, mean_rel=0.27833378314971924, max_rel=29.09609031677246, norm_rel=0.02230243757367134, ref_abs_avg=12.416604995727539, test_abs_avg=12.43720817565918
production_forward grad[78] vs paper_forward: mean_abs=0.3342609703540802, max_abs=3.21875, mean_rel=0.22155021131038666, max_rel=1062.5, norm_rel=0.02250860072672367, ref_abs_avg=14.831121444702148, test_abs_avg=14.830930709838867
production_forward grad[79] vs paper_forward: mean_abs=0.32792606949806213, max_abs=3.0, mean_rel=0.18068262934684753, max_rel=1273.4375, norm_rel=0.022260867059230804, ref_abs_avg=14.735480308532715, test_abs_avg=14.733787536621094
production_forward grad[80] vs paper_forward: mean_abs=0.2559542655944824, max_abs=1.28125, mean_rel=0.06902569532394409, max_rel=4.033360481262207, norm_rel=0.020749658346176147, ref_abs_avg=12.702856063842773, test_abs_avg=12.697870254516602
production_forward grad[81] vs paper_forward: mean_abs=0.3110562562942505, max_abs=2.75, mean_rel=0.24501623213291168, max_rel=1468.7498779296875, norm_rel=0.02171470783650875, ref_abs_avg=14.351104736328125, test_abs_avg=14.351478576660156
production_forward grad[82] vs paper_forward: mean_abs=0.29889702796936035, max_abs=3.0, mean_rel=0.1619434952735901, max_rel=1062.5, norm_rel=0.02065782994031906, ref_abs_avg=14.373897552490234, test_abs_avg=14.379171371459961
production_forward grad[83] vs paper_forward: mean_abs=0.24148344993591309, max_abs=0.9375, mean_rel=0.09242333471775055, max_rel=5.918840408325195, norm_rel=0.021349992603063583, ref_abs_avg=11.396480560302734, test_abs_avg=11.410261154174805
production_forward grad[84] vs paper_forward: mean_abs=0.2900714576244354, max_abs=2.4375, mean_rel=0.21592830121517181, max_rel=906.2499389648438, norm_rel=0.021320248022675514, ref_abs_avg=13.62846851348877, test_abs_avg=13.629850387573242
production_forward grad[85] vs paper_forward: mean_abs=0.2855831980705261, max_abs=2.75, mean_rel=0.1540326178073883, max_rel=679.6874389648438, norm_rel=0.021247321739792824, ref_abs_avg=13.475860595703125, test_abs_avg=13.48416519165039
production_forward grad[86] vs paper_forward: mean_abs=0.21966370940208435, max_abs=0.9375, mean_rel=0.6448442339897156, max_rel=236.93002319335938, norm_rel=0.019988970831036568, ref_abs_avg=11.46314525604248, test_abs_avg=11.468585014343262
production_forward grad[87] vs paper_forward: mean_abs=0.2718283534049988, max_abs=2.75, mean_rel=0.19846539199352264, max_rel=1187.5, norm_rel=0.020768404006958008, ref_abs_avg=13.120227813720703, test_abs_avg=13.121952056884766
production_forward grad[88] vs paper_forward: mean_abs=0.26712194085121155, max_abs=2.375, mean_rel=0.15793770551681519, max_rel=922.8806762695312, norm_rel=0.020934896543622017, ref_abs_avg=12.793086051940918, test_abs_avg=12.797964096069336
production_forward grad[89] vs paper_forward: mean_abs=0.22129391133785248, max_abs=0.84375, mean_rel=0.1014193445444107, max_rel=10.208209037780762, norm_rel=0.01952294632792473, ref_abs_avg=11.660257339477539, test_abs_avg=11.666555404663086
production_forward grad[90] vs paper_forward: mean_abs=0.26049163937568665, max_abs=2.75, mean_rel=0.19152992963790894, max_rel=1093.75, norm_rel=0.020320305600762367, ref_abs_avg=12.901397705078125, test_abs_avg=12.9028902053833
production_forward grad[91] vs paper_forward: mean_abs=0.252587229013443, max_abs=2.5, mean_rel=0.15348559617996216, max_rel=890.6249389648438, norm_rel=0.020340988412499428, ref_abs_avg=12.536197662353516, test_abs_avg=12.54094123840332
production_forward grad[92] vs paper_forward: mean_abs=0.20952963829040527, max_abs=0.78125, mean_rel=0.09147277474403381, max_rel=5.587708473205566, norm_rel=0.02055131457746029, ref_abs_avg=10.175165176391602, test_abs_avg=10.181870460510254
production_forward grad[93] vs paper_forward: mean_abs=0.2424703985452652, max_abs=2.5, mean_rel=0.17640739679336548, max_rel=820.3124389648438, norm_rel=0.019818780943751335, ref_abs_avg=12.352249145507812, test_abs_avg=12.353897094726562
production_forward grad[94] vs paper_forward: mean_abs=0.23718081414699554, max_abs=2.6875, mean_rel=0.13520732522010803, max_rel=687.4999389648438, norm_rel=0.0194686409085989, ref_abs_avg=12.359708786010742, test_abs_avg=12.355003356933594
production_forward grad[95] vs paper_forward: mean_abs=0.20438289642333984, max_abs=0.75, mean_rel=0.07564788311719894, max_rel=2.9142212867736816, norm_rel=0.020220335572957993, ref_abs_avg=10.042418479919434, test_abs_avg=10.039766311645508
production_forward grad[96] vs paper_forward: mean_abs=0.2337367683649063, max_abs=2.3125, mean_rel=0.17572468519210815, max_rel=1187.5, norm_rel=0.019636966288089752, ref_abs_avg=12.094030380249023, test_abs_avg=12.095294952392578
production_forward grad[97] vs paper_forward: mean_abs=0.22482500970363617, max_abs=3.0, mean_rel=0.12904760241508484, max_rel=593.75, norm_rel=0.0193239226937294, ref_abs_avg=11.880948066711426, test_abs_avg=11.8790922164917
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016181794926524162, max_abs=0.031982421875
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008420441299676895, max_abs=0.375, mean_rel=0.07319074869155884, max_rel=118.19507598876953, norm_rel=0.019901873543858528, ref_abs_avg=0.45513755083084106, test_abs_avg=0.45514506101608276
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=5.052704334259033, max_abs=48.0, mean_rel=0.16302619874477386, max_rel=290.0380554199219, norm_rel=0.020093800500035286, ref_abs_avg=220.91897583007812, test_abs_avg=220.83615112304688
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=0.858479380607605, max_abs=3.25, mean_rel=0.5372728109359741, max_rel=117.8689193725586, norm_rel=0.02300412952899933, ref_abs_avg=37.39579772949219, test_abs_avg=37.34455108642578
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.0528557300567627, max_abs=7.875, mean_rel=0.4597163200378418, max_rel=3843.749755859375, norm_rel=0.023319827392697334, ref_abs_avg=45.413307189941406, test_abs_avg=45.41521453857422
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.0315136909484863, max_abs=8.0, mean_rel=0.3014986515045166, max_rel=3609.374755859375, norm_rel=0.023126786574721336, ref_abs_avg=44.841949462890625, test_abs_avg=44.844730377197266
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=0.7823657989501953, max_abs=3.2392578125, mean_rel=0.11061038076877594, max_rel=16.257890701293945, norm_rel=0.02474748156964779, ref_abs_avg=31.152013778686523, test_abs_avg=31.14769744873047
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=0.9326424598693848, max_abs=6.4375, mean_rel=0.48178794980049133, max_rel=3437.499755859375, norm_rel=0.02316517010331154, ref_abs_avg=40.50524139404297, test_abs_avg=40.50737762451172
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=0.9094710946083069, max_abs=5.59375, mean_rel=0.29217350482940674, max_rel=3281.249755859375, norm_rel=0.022811977192759514, ref_abs_avg=40.084922790527344, test_abs_avg=40.08594512939453
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.7316417694091797, max_abs=2.859375, mean_rel=0.07902193069458008, max_rel=3.9312005043029785, norm_rel=0.024772029370069504, ref_abs_avg=29.721181869506836, test_abs_avg=29.684520721435547
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=0.8445594310760498, max_abs=5.5, mean_rel=0.3971824645996094, max_rel=3062.499755859375, norm_rel=0.02286713384091854, ref_abs_avg=37.17271423339844, test_abs_avg=37.17241668701172
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=0.8243649005889893, max_abs=5.5, mean_rel=0.22069212794303894, max_rel=2546.875, norm_rel=0.02244090475142002, ref_abs_avg=36.91738510131836, test_abs_avg=36.91632080078125
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.6290135383605957, max_abs=2.38671875, mean_rel=0.09250976145267487, max_rel=7.709195613861084, norm_rel=0.021704617887735367, ref_abs_avg=28.961830139160156, test_abs_avg=28.988086700439453
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=0.7802910208702087, max_abs=4.9375, mean_rel=0.4166554808616638, max_rel=3124.999755859375, norm_rel=0.022648748010396957, ref_abs_avg=34.61470031738281, test_abs_avg=34.61614227294922
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=0.7608820199966431, max_abs=4.5, mean_rel=0.2590652108192444, max_rel=2624.999755859375, norm_rel=0.022211063653230667, ref_abs_avg=34.41101837158203, test_abs_avg=34.41257858276367
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.5871272087097168, max_abs=2.5, mean_rel=0.07799670100212097, max_rel=3.147355318069458, norm_rel=0.022708801552653313, ref_abs_avg=25.978391647338867, test_abs_avg=25.916492462158203
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=0.7264968752861023, max_abs=5.375, mean_rel=0.37841370701789856, max_rel=3249.999755859375, norm_rel=0.022457309067249298, ref_abs_avg=32.56072998046875, test_abs_avg=32.56120681762695
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=0.7078028917312622, max_abs=4.0, mean_rel=0.22501125931739807, max_rel=2874.999755859375, norm_rel=0.022119389846920967, ref_abs_avg=32.15374755859375, test_abs_avg=32.156150817871094
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.5545461177825928, max_abs=2.5, mean_rel=0.14720603823661804, max_rel=16.101194381713867, norm_rel=0.021596619859337807, ref_abs_avg=25.502288818359375, test_abs_avg=25.5159912109375
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=0.6832891702651978, max_abs=4.28125, mean_rel=0.3456072211265564, max_rel=2375.0, norm_rel=0.022264286875724792, ref_abs_avg=30.84798240661621, test_abs_avg=30.846797943115234
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.6637058258056641, max_abs=4.625, mean_rel=0.25600194931030273, max_rel=1874.9998779296875, norm_rel=0.021820012480020523, ref_abs_avg=30.54539680480957, test_abs_avg=30.544113159179688
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.5107536315917969, max_abs=1.875, mean_rel=0.07611867785453796, max_rel=2.3615241050720215, norm_rel=0.021116655319929123, ref_abs_avg=24.271549224853516, test_abs_avg=24.311988830566406
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.6468952894210815, max_abs=4.125, mean_rel=0.3273559808731079, max_rel=2906.249755859375, norm_rel=0.0221414752304554, ref_abs_avg=29.36042022705078, test_abs_avg=29.358518600463867
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.6301339864730835, max_abs=3.796875, mean_rel=0.22831255197525024, max_rel=1749.9998779296875, norm_rel=0.02180521935224533, ref_abs_avg=29.033206939697266, test_abs_avg=29.032058715820312
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.4784870147705078, max_abs=2.0625, mean_rel=0.07852671295404434, max_rel=4.6361918449401855, norm_rel=0.0201810784637928, ref_abs_avg=23.873512268066406, test_abs_avg=23.877296447753906
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.6152671575546265, max_abs=4.0, mean_rel=0.32783323526382446, max_rel=2312.5, norm_rel=0.021995263174176216, ref_abs_avg=28.10967445373535, test_abs_avg=28.109142303466797
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.5975382924079895, max_abs=3.75, mean_rel=0.2217300981283188, max_rel=2375.0, norm_rel=0.021501103416085243, ref_abs_avg=27.909358978271484, test_abs_avg=27.912599563598633
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.6119251251220703, max_abs=2.5625, mean_rel=0.06461472809314728, max_rel=2.620255947113037, norm_rel=0.02326364815235138, ref_abs_avg=26.34033966064453, test_abs_avg=26.396371841430664
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=0.7256133556365967, max_abs=4.75, mean_rel=0.3726520538330078, max_rel=2687.499755859375, norm_rel=0.02392667531967163, ref_abs_avg=30.453622817993164, test_abs_avg=30.45357894897461
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.7068469524383545, max_abs=4.25, mean_rel=0.2606738805770874, max_rel=2968.749755859375, norm_rel=0.0235622338950634, ref_abs_avg=30.14556312561035, test_abs_avg=30.142484664916992
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.5369174480438232, max_abs=2.27734375, mean_rel=0.10062115639448166, max_rel=10.576883316040039, norm_rel=0.022433573380112648, ref_abs_avg=23.63501739501953, test_abs_avg=23.632091522216797
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.6607054471969604, max_abs=4.75, mean_rel=0.3373013734817505, max_rel=2281.25, norm_rel=0.024247488006949425, ref_abs_avg=27.351669311523438, test_abs_avg=27.35110092163086
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.6475683450698853, max_abs=4.0, mean_rel=0.21379578113555908, max_rel=2093.75, norm_rel=0.02395023964345455, ref_abs_avg=27.157873153686523, test_abs_avg=27.15821075439453
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.5089535713195801, max_abs=2.2275390625, mean_rel=0.16089072823524475, max_rel=25.7125244140625, norm_rel=0.02271934039890766, ref_abs_avg=22.01713752746582, test_abs_avg=22.026689529418945
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.6186178922653198, max_abs=4.0, mean_rel=0.3251696228981018, max_rel=1999.9998779296875, norm_rel=0.023932091891765594, ref_abs_avg=25.953052520751953, test_abs_avg=25.952489852905273
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.6115027070045471, max_abs=4.15625, mean_rel=0.23298805952072144, max_rel=2343.75, norm_rel=0.02394801937043667, ref_abs_avg=25.623046875, test_abs_avg=25.62246322631836
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.4568476676940918, max_abs=1.5, mean_rel=0.12091030180454254, max_rel=8.37486457824707, norm_rel=0.023263264447450638, ref_abs_avg=19.734241485595703, test_abs_avg=19.74005699157715
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.5781098008155823, max_abs=4.25, mean_rel=0.33797192573547363, max_rel=1999.9998779296875, norm_rel=0.023726483806967735, ref_abs_avg=24.456052780151367, test_abs_avg=24.456270217895508
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.5650991201400757, max_abs=3.375, mean_rel=0.214867502450943, max_rel=1874.9998779296875, norm_rel=0.023877494037151337, ref_abs_avg=23.72242546081543, test_abs_avg=23.72241973876953
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.4435079097747803, max_abs=1.875, mean_rel=0.11702679842710495, max_rel=16.62520408630371, norm_rel=0.023461826145648956, ref_abs_avg=18.811264038085938, test_abs_avg=18.79386329650879
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.5404434204101562, max_abs=3.5, mean_rel=0.3262141942977905, max_rel=2078.125, norm_rel=0.023626523092389107, ref_abs_avg=22.94829559326172, test_abs_avg=22.94829750061035
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.5311462879180908, max_abs=3.5, mean_rel=0.2271306812763214, max_rel=1789.0623779296875, norm_rel=0.02331165038049221, ref_abs_avg=22.805240631103516, test_abs_avg=22.807125091552734
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.4294929504394531, max_abs=2.0, mean_rel=0.06867344677448273, max_rel=3.579342842102051, norm_rel=0.023580733686685562, ref_abs_avg=18.579368591308594, test_abs_avg=18.565269470214844
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.5115690231323242, max_abs=3.15625, mean_rel=0.308933824300766, max_rel=2062.5, norm_rel=0.023320097476243973, ref_abs_avg=22.00307846069336, test_abs_avg=22.004291534423828
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.5078476667404175, max_abs=3.5, mean_rel=0.18800786137580872, max_rel=1828.1248779296875, norm_rel=0.023100141435861588, ref_abs_avg=22.056079864501953, test_abs_avg=22.058006286621094
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.40414875745773315, max_abs=1.5, mean_rel=0.11449293792247772, max_rel=14.69343090057373, norm_rel=0.022551346570253372, ref_abs_avg=17.83018684387207, test_abs_avg=17.8067569732666
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.49382269382476807, max_abs=3.125, mean_rel=0.34790316224098206, max_rel=1906.2498779296875, norm_rel=0.023036498576402664, ref_abs_avg=21.479015350341797, test_abs_avg=21.48040008544922
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.4820595681667328, max_abs=3.0, mean_rel=0.2110150009393692, max_rel=1734.3748779296875, norm_rel=0.023038776591420174, ref_abs_avg=20.938274383544922, test_abs_avg=20.937395095825195
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.3843876123428345, max_abs=1.5, mean_rel=0.1011798158288002, max_rel=13.510018348693848, norm_rel=0.022047460079193115, ref_abs_avg=17.084997177124023, test_abs_avg=17.10630989074707
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.4683513045310974, max_abs=3.25, mean_rel=0.2898663282394409, max_rel=2046.8748779296875, norm_rel=0.022876249626278877, ref_abs_avg=20.49856948852539, test_abs_avg=20.497304916381836
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.4581559896469116, max_abs=3.1171875, mean_rel=0.20134860277175903, max_rel=1874.9998779296875, norm_rel=0.022635551169514656, ref_abs_avg=20.253211975097656, test_abs_avg=20.24990463256836
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.4427001476287842, max_abs=1.6875, mean_rel=0.07519324123859406, max_rel=2.9148662090301514, norm_rel=0.024953534826636314, ref_abs_avg=17.707799911499023, test_abs_avg=17.68985366821289
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.5276829600334167, max_abs=3.5, mean_rel=0.30911222100257874, max_rel=1812.4998779296875, norm_rel=0.024195948615670204, ref_abs_avg=21.862266540527344, test_abs_avg=21.862945556640625
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.5152114629745483, max_abs=3.125, mean_rel=0.2304275631904602, max_rel=1999.9998779296875, norm_rel=0.02404172159731388, ref_abs_avg=21.49928092956543, test_abs_avg=21.499330520629883
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.39411258697509766, max_abs=2.109375, mean_rel=0.08896812796592712, max_rel=7.405947208404541, norm_rel=0.02428894303739071, ref_abs_avg=16.75925064086914, test_abs_avg=16.781848907470703
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.4805471897125244, max_abs=3.5, mean_rel=0.27266138792037964, max_rel=1437.4998779296875, norm_rel=0.023834368214011192, ref_abs_avg=20.187374114990234, test_abs_avg=20.186195373535156
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.4768329858779907, max_abs=3.25, mean_rel=0.2001539021730423, max_rel=1468.7498779296875, norm_rel=0.023826543241739273, ref_abs_avg=20.025814056396484, test_abs_avg=20.026639938354492
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.3474847078323364, max_abs=1.3125, mean_rel=0.18108724057674408, max_rel=39.792240142822266, norm_rel=0.02103925310075283, ref_abs_avg=16.879745483398438, test_abs_avg=16.844493865966797
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.44793784618377686, max_abs=3.375, mean_rel=0.27206647396087646, max_rel=1281.25, norm_rel=0.023381248116493225, ref_abs_avg=19.18609046936035, test_abs_avg=19.187519073486328
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.4390849173069, max_abs=3.0, mean_rel=0.21538421511650085, max_rel=1234.375, norm_rel=0.02292495407164097, ref_abs_avg=19.12338638305664, test_abs_avg=19.125370025634766
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.3392634391784668, max_abs=1.375, mean_rel=0.10000831633806229, max_rel=7.382925033569336, norm_rel=0.021304121240973473, ref_abs_avg=15.405131340026855, test_abs_avg=15.450162887573242
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.42141008377075195, max_abs=3.0, mean_rel=0.2581212818622589, max_rel=1874.9998779296875, norm_rel=0.022796152159571648, ref_abs_avg=18.498188018798828, test_abs_avg=18.496726989746094
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.4096211791038513, max_abs=2.6875, mean_rel=0.20415040850639343, max_rel=1562.4998779296875, norm_rel=0.02276262454688549, ref_abs_avg=17.97370147705078, test_abs_avg=17.977191925048828
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.3240528106689453, max_abs=1.375, mean_rel=0.09264303743839264, max_rel=9.47309684753418, norm_rel=0.02189246378839016, ref_abs_avg=14.73922348022461, test_abs_avg=14.76253890991211
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.39560723304748535, max_abs=3.0, mean_rel=0.2740938663482666, max_rel=1937.4998779296875, norm_rel=0.022413410246372223, ref_abs_avg=17.625429153442383, test_abs_avg=17.624656677246094
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.39082473516464233, max_abs=2.625, mean_rel=0.21860206127166748, max_rel=2250.0, norm_rel=0.022249847650527954, ref_abs_avg=17.57745361328125, test_abs_avg=17.57525062561035
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.3136310577392578, max_abs=1.1875, mean_rel=0.100099116563797, max_rel=3.6324572563171387, norm_rel=0.022377679124474525, ref_abs_avg=13.536881446838379, test_abs_avg=13.53869915008545
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.3736630976200104, max_abs=3.375, mean_rel=0.24378836154937744, max_rel=1312.4998779296875, norm_rel=0.022074539214372635, ref_abs_avg=16.915512084960938, test_abs_avg=16.914907455444336
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.36613649129867554, max_abs=2.75, mean_rel=0.17855435609817505, max_rel=1187.5, norm_rel=0.021831568330526352, ref_abs_avg=16.734439849853516, test_abs_avg=16.726852416992188
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.2858247756958008, max_abs=1.0, mean_rel=0.08715423941612244, max_rel=5.5899553298950195, norm_rel=0.020398087799549103, ref_abs_avg=13.921530723571777, test_abs_avg=13.910181045532227
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.35389626026153564, max_abs=2.375, mean_rel=0.22701159119606018, max_rel=1125.0, norm_rel=0.021718144416809082, ref_abs_avg=16.294937133789062, test_abs_avg=16.295555114746094
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.35096409916877747, max_abs=3.0, mean_rel=0.17780062556266785, max_rel=1125.0, norm_rel=0.021401599049568176, ref_abs_avg=16.39023208618164, test_abs_avg=16.400182723999023
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.2827250063419342, max_abs=1.1875, mean_rel=0.5440053939819336, max_rel=122.19792175292969, norm_rel=0.021601004526019096, ref_abs_avg=13.350378036499023, test_abs_avg=13.355846405029297
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.3383997678756714, max_abs=3.5, mean_rel=0.23367831110954285, max_rel=1187.5, norm_rel=0.021264396607875824, ref_abs_avg=15.868847846984863, test_abs_avg=15.869341850280762
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.33095782995224, max_abs=2.375, mean_rel=0.1788141429424286, max_rel=921.8749389648438, norm_rel=0.02100575715303421, ref_abs_avg=15.744575500488281, test_abs_avg=15.743617057800293
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.30278700590133667, max_abs=1.125, mean_rel=0.21845358610153198, max_rel=34.270416259765625, norm_rel=0.022685818374156952, ref_abs_avg=13.607373237609863, test_abs_avg=13.609090805053711
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.36723989248275757, max_abs=2.75, mean_rel=0.2459825575351715, max_rel=1593.7498779296875, norm_rel=0.023280462250113487, ref_abs_avg=15.783008575439453, test_abs_avg=15.782180786132812
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.3579269051551819, max_abs=2.625, mean_rel=0.187599316239357, max_rel=968.7499389648438, norm_rel=0.022755805402994156, ref_abs_avg=15.695590019226074, test_abs_avg=15.695127487182617
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.28946077823638916, max_abs=1.1875, mean_rel=0.2101406455039978, max_rel=28.030786514282227, norm_rel=0.02252236381173134, ref_abs_avg=12.416604995727539, test_abs_avg=12.445413589477539
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.3382546603679657, max_abs=3.3125, mean_rel=0.22910623252391815, max_rel=1109.375, norm_rel=0.022769462317228317, ref_abs_avg=14.831121444702148, test_abs_avg=14.829667091369629
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.33459192514419556, max_abs=3.0, mean_rel=0.18233580887317657, max_rel=1453.1248779296875, norm_rel=0.022696873173117638, ref_abs_avg=14.735480308532715, test_abs_avg=14.733620643615723
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.25574302673339844, max_abs=1.0, mean_rel=0.07020720094442368, max_rel=5.9985456466674805, norm_rel=0.020808517932891846, ref_abs_avg=12.702856063842773, test_abs_avg=12.695892333984375
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.31468480825424194, max_abs=2.625, mean_rel=0.253233402967453, max_rel=1437.4998779296875, norm_rel=0.021940333768725395, ref_abs_avg=14.351104736328125, test_abs_avg=14.34982967376709
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.30394071340560913, max_abs=2.5, mean_rel=0.16093409061431885, max_rel=874.9999389648438, norm_rel=0.02097773738205433, ref_abs_avg=14.373897552490234, test_abs_avg=14.375737190246582
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.2486894130706787, max_abs=1.0, mean_rel=0.09646951407194138, max_rel=9.281558990478516, norm_rel=0.021624337881803513, ref_abs_avg=11.396480560302734, test_abs_avg=11.414304733276367
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.2929171621799469, max_abs=2.375, mean_rel=0.21424002945423126, max_rel=1390.6248779296875, norm_rel=0.02152554877102375, ref_abs_avg=13.62846851348877, test_abs_avg=13.629036903381348
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.2882874011993408, max_abs=2.75, mean_rel=0.153170645236969, max_rel=683.5936889648438, norm_rel=0.021393366158008575, ref_abs_avg=13.475860595703125, test_abs_avg=13.483207702636719
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.22942456603050232, max_abs=0.9375, mean_rel=0.6443466544151306, max_rel=229.2245330810547, norm_rel=0.02073494717478752, ref_abs_avg=11.46314525604248, test_abs_avg=11.473411560058594
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.27448004484176636, max_abs=3.0, mean_rel=0.19577831029891968, max_rel=1468.7498779296875, norm_rel=0.020986022427678108, ref_abs_avg=13.120227813720703, test_abs_avg=13.121016502380371
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.26869285106658936, max_abs=2.4375, mean_rel=0.1474037766456604, max_rel=841.2479858398438, norm_rel=0.021121861413121223, ref_abs_avg=12.793086051940918, test_abs_avg=12.799783706665039
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.2249019593000412, max_abs=0.78662109375, mean_rel=0.19413521885871887, max_rel=43.666290283203125, norm_rel=0.01984587498009205, ref_abs_avg=11.660257339477539, test_abs_avg=11.65919303894043
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.26241064071655273, max_abs=3.0, mean_rel=0.19464659690856934, max_rel=1187.5, norm_rel=0.02046411857008934, ref_abs_avg=12.901397705078125, test_abs_avg=12.90185260772705
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.2523933947086334, max_abs=2.75, mean_rel=0.15296411514282227, max_rel=1031.25, norm_rel=0.02030457742512226, ref_abs_avg=12.536197662353516, test_abs_avg=12.540443420410156
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.2076406627893448, max_abs=0.875, mean_rel=0.09016457200050354, max_rel=6.06734037399292, norm_rel=0.020595069974660873, ref_abs_avg=10.175165176391602, test_abs_avg=10.192729949951172
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.2437887042760849, max_abs=2.5, mean_rel=0.17685666680335999, max_rel=999.9999389648438, norm_rel=0.019921589642763138, ref_abs_avg=12.352249145507812, test_abs_avg=12.352937698364258
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.23818159103393555, max_abs=2.65625, mean_rel=0.1366395354270935, max_rel=968.7499389648438, norm_rel=0.019522182643413544, ref_abs_avg=12.359708786010742, test_abs_avg=12.353797912597656
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.20459318161010742, max_abs=0.75, mean_rel=0.08194688707590103, max_rel=3.20564341545105, norm_rel=0.020451653748750687, ref_abs_avg=10.042418479919434, test_abs_avg=10.036016464233398
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.23512297868728638, max_abs=2.4375, mean_rel=0.1702626347541809, max_rel=1187.5, norm_rel=0.01976202428340912, ref_abs_avg=12.094030380249023, test_abs_avg=12.094766616821289
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.2252025157213211, max_abs=2.5, mean_rel=0.1322275698184967, max_rel=609.375, norm_rel=0.019237833097577095, ref_abs_avg=11.880948066711426, test_abs_avg=11.874992370605469
liger_forward vs paper_forward output: mean_abs=0.0003822897851932794, max_abs=0.02734375
liger_forward grad[0] vs paper_forward: mean_abs=0.004624573513865471, max_abs=0.25, mean_rel=0.03607728332281113, max_rel=57.86094665527344, norm_rel=0.012271609157323837, ref_abs_avg=0.45513755083084106, test_abs_avg=0.45512014627456665
liger_forward grad[1] vs paper_forward: mean_abs=2.433082342147827, max_abs=16.75, mean_rel=0.09026651829481125, max_rel=260.1099853515625, norm_rel=0.00995609164237976, ref_abs_avg=220.91897583007812, test_abs_avg=220.94369506835938
liger_forward grad[2] vs paper_forward: mean_abs=0.46754440665245056, max_abs=2.0, mean_rel=0.33468762040138245, max_rel=118.331787109375, norm_rel=0.012935002334415913, ref_abs_avg=37.39579772949219, test_abs_avg=37.397491455078125
liger_forward grad[3] vs paper_forward: mean_abs=0.5888388156890869, max_abs=4.125, mean_rel=0.25078579783439636, max_rel=2125.0, norm_rel=0.013203863054513931, ref_abs_avg=45.413307189941406, test_abs_avg=45.41435241699219
liger_forward grad[4] vs paper_forward: mean_abs=0.5705884695053101, max_abs=3.75, mean_rel=0.1591716706752777, max_rel=1687.4998779296875, norm_rel=0.012946059927344322, ref_abs_avg=44.841949462890625, test_abs_avg=44.841976165771484
liger_forward grad[5] vs paper_forward: mean_abs=0.4200878143310547, max_abs=2.0, mean_rel=0.04932049289345741, max_rel=1.9850605726242065, norm_rel=0.013532212935388088, ref_abs_avg=31.152013778686523, test_abs_avg=31.150028228759766
liger_forward grad[6] vs paper_forward: mean_abs=0.5138000249862671, max_abs=3.5, mean_rel=0.24901440739631653, max_rel=1874.9998779296875, norm_rel=0.012926307506859303, ref_abs_avg=40.50524139404297, test_abs_avg=40.50469207763672
liger_forward grad[7] vs paper_forward: mean_abs=0.49684929847717285, max_abs=3.625, mean_rel=0.13611246645450592, max_rel=1796.8748779296875, norm_rel=0.01264062151312828, ref_abs_avg=40.084922790527344, test_abs_avg=40.08250427246094
liger_forward grad[8] vs paper_forward: mean_abs=0.3934030532836914, max_abs=1.25, mean_rel=0.04423471540212631, max_rel=2.4077138900756836, norm_rel=0.013416201807558537, ref_abs_avg=29.721181869506836, test_abs_avg=29.697864532470703
liger_forward grad[9] vs paper_forward: mean_abs=0.4613674283027649, max_abs=3.0, mean_rel=0.21929273009300232, max_rel=1562.4998779296875, norm_rel=0.012659784406423569, ref_abs_avg=37.17271423339844, test_abs_avg=37.17267608642578
liger_forward grad[10] vs paper_forward: mean_abs=0.44674152135849, max_abs=3.4375, mean_rel=0.1148323267698288, max_rel=874.9999389648438, norm_rel=0.012339232489466667, ref_abs_avg=36.91738510131836, test_abs_avg=36.91746520996094
liger_forward grad[11] vs paper_forward: mean_abs=0.34731054306030273, max_abs=1.513671875, mean_rel=0.07368100434541702, max_rel=6.102583885192871, norm_rel=0.012349439784884453, ref_abs_avg=28.961830139160156, test_abs_avg=28.989694595336914
liger_forward grad[12] vs paper_forward: mean_abs=0.4227398633956909, max_abs=2.5, mean_rel=0.22478409111499786, max_rel=1499.9998779296875, norm_rel=0.012455444782972336, ref_abs_avg=34.61470031738281, test_abs_avg=34.615234375
liger_forward grad[13] vs paper_forward: mean_abs=0.40898823738098145, max_abs=2.5, mean_rel=0.1502733826637268, max_rel=1437.4998779296875, norm_rel=0.012116840109229088, ref_abs_avg=34.41101837158203, test_abs_avg=34.41522216796875
liger_forward grad[14] vs paper_forward: mean_abs=0.3245064914226532, max_abs=1.25, mean_rel=0.046504855155944824, max_rel=4.046762466430664, norm_rel=0.012976061552762985, ref_abs_avg=25.978391647338867, test_abs_avg=25.992046356201172
liger_forward grad[15] vs paper_forward: mean_abs=0.3905942440032959, max_abs=2.5, mean_rel=0.1907399445772171, max_rel=1374.9998779296875, norm_rel=0.012239642441272736, ref_abs_avg=32.56072998046875, test_abs_avg=32.56189727783203
liger_forward grad[16] vs paper_forward: mean_abs=0.37832528352737427, max_abs=2.5, mean_rel=0.11711431294679642, max_rel=999.9999389648438, norm_rel=0.011991148814558983, ref_abs_avg=32.15374755859375, test_abs_avg=32.155609130859375
liger_forward grad[17] vs paper_forward: mean_abs=0.3091866970062256, max_abs=1.25, mean_rel=0.07437358051538467, max_rel=6.840376853942871, norm_rel=0.011885383166372776, ref_abs_avg=25.502288818359375, test_abs_avg=25.500370025634766
liger_forward grad[18] vs paper_forward: mean_abs=0.3652400076389313, max_abs=2.25, mean_rel=0.18629145622253418, max_rel=1624.9998779296875, norm_rel=0.012092811986804008, ref_abs_avg=30.84798240661621, test_abs_avg=30.84836196899414
liger_forward grad[19] vs paper_forward: mean_abs=0.3531234860420227, max_abs=2.125, mean_rel=0.12856857478618622, max_rel=874.9999389648438, norm_rel=0.011783057823777199, ref_abs_avg=30.54539680480957, test_abs_avg=30.542743682861328
liger_forward grad[20] vs paper_forward: mean_abs=0.28516387939453125, max_abs=1.15625, mean_rel=0.0476626493036747, max_rel=3.2479805946350098, norm_rel=0.012008801102638245, ref_abs_avg=24.271549224853516, test_abs_avg=24.247447967529297
liger_forward grad[21] vs paper_forward: mean_abs=0.34406524896621704, max_abs=2.5, mean_rel=0.1606447398662567, max_rel=1250.0, norm_rel=0.011964979581534863, ref_abs_avg=29.36042022705078, test_abs_avg=29.35966682434082
liger_forward grad[22] vs paper_forward: mean_abs=0.3318841755390167, max_abs=2.0, mean_rel=0.11807844042778015, max_rel=843.7499389648438, norm_rel=0.01166288647800684, ref_abs_avg=29.033206939697266, test_abs_avg=29.031448364257812
liger_forward grad[23] vs paper_forward: mean_abs=0.2601010203361511, max_abs=1.0, mean_rel=0.042844366282224655, max_rel=2.265363931655884, norm_rel=0.011196121573448181, ref_abs_avg=23.873512268066406, test_abs_avg=23.892635345458984
liger_forward grad[24] vs paper_forward: mean_abs=0.3255824148654938, max_abs=2.25, mean_rel=0.1770596206188202, max_rel=1374.9998779296875, norm_rel=0.011829840019345284, ref_abs_avg=28.10967445373535, test_abs_avg=28.109619140625
liger_forward grad[25] vs paper_forward: mean_abs=0.31372883915901184, max_abs=1.75, mean_rel=0.11349020898342133, max_rel=937.4999389648438, norm_rel=0.011474561877548695, ref_abs_avg=27.909358978271484, test_abs_avg=27.90964126586914
liger_forward grad[26] vs paper_forward: mean_abs=0.2875504493713379, max_abs=1.0, mean_rel=0.03244038671255112, max_rel=0.9050060510635376, norm_rel=0.011105872690677643, ref_abs_avg=26.34033966064453, test_abs_avg=26.340784072875977
liger_forward grad[27] vs paper_forward: mean_abs=0.365744948387146, max_abs=2.625, mean_rel=0.1696006953716278, max_rel=1374.9998779296875, norm_rel=0.01225039828568697, ref_abs_avg=30.453622817993164, test_abs_avg=30.45226287841797
liger_forward grad[28] vs paper_forward: mean_abs=0.355404794216156, max_abs=2.25, mean_rel=0.13012956082820892, max_rel=968.7499389648438, norm_rel=0.012028980068862438, ref_abs_avg=30.14556312561035, test_abs_avg=30.14556121826172
liger_forward grad[29] vs paper_forward: mean_abs=0.2507479190826416, max_abs=1.0, mean_rel=0.04687100648880005, max_rel=3.197493076324463, norm_rel=0.010917718522250652, ref_abs_avg=23.63501739501953, test_abs_avg=23.625396728515625
liger_forward grad[30] vs paper_forward: mean_abs=0.3208892047405243, max_abs=2.0, mean_rel=0.1528729647397995, max_rel=1062.5, norm_rel=0.01198074221611023, ref_abs_avg=27.351669311523438, test_abs_avg=27.35077476501465
liger_forward grad[31] vs paper_forward: mean_abs=0.3124169111251831, max_abs=2.0, mean_rel=0.11063484102487564, max_rel=921.8749389648438, norm_rel=0.011747492477297783, ref_abs_avg=27.157873153686523, test_abs_avg=27.157642364501953
liger_forward grad[32] vs paper_forward: mean_abs=0.2387866973876953, max_abs=1.0, mean_rel=0.07251083105802536, max_rel=11.822800636291504, norm_rel=0.010975144803524017, ref_abs_avg=22.01713752746582, test_abs_avg=22.01481056213379
liger_forward grad[33] vs paper_forward: mean_abs=0.297372043132782, max_abs=2.0, mean_rel=0.1423311084508896, max_rel=1250.0, norm_rel=0.011701184324920177, ref_abs_avg=25.953052520751953, test_abs_avg=25.95162582397461
liger_forward grad[34] vs paper_forward: mean_abs=0.28865015506744385, max_abs=1.875, mean_rel=0.10918905586004257, max_rel=843.7499389648438, norm_rel=0.011494646780192852, ref_abs_avg=25.623046875, test_abs_avg=25.62166404724121
liger_forward grad[35] vs paper_forward: mean_abs=0.21662521362304688, max_abs=0.908203125, mean_rel=0.05394810438156128, max_rel=4.647119998931885, norm_rel=0.011410350911319256, ref_abs_avg=19.734241485595703, test_abs_avg=19.74068832397461
liger_forward grad[36] vs paper_forward: mean_abs=0.27617108821868896, max_abs=1.75, mean_rel=0.15349577367305756, max_rel=906.2499389648438, norm_rel=0.01152928825467825, ref_abs_avg=24.456052780151367, test_abs_avg=24.455270767211914
liger_forward grad[37] vs paper_forward: mean_abs=0.2670978009700775, max_abs=1.75, mean_rel=0.09092992544174194, max_rel=527.34375, norm_rel=0.011482911184430122, ref_abs_avg=23.72242546081543, test_abs_avg=23.72182273864746
liger_forward grad[38] vs paper_forward: mean_abs=0.21239395439624786, max_abs=0.875, mean_rel=0.03598121926188469, max_rel=1.892676830291748, norm_rel=0.011521616950631142, ref_abs_avg=18.811264038085938, test_abs_avg=18.809276580810547
liger_forward grad[39] vs paper_forward: mean_abs=0.2570754289627075, max_abs=2.0, mean_rel=0.14945411682128906, max_rel=1593.7498779296875, norm_rel=0.01143671665340662, ref_abs_avg=22.94829559326172, test_abs_avg=22.9477481842041
liger_forward grad[40] vs paper_forward: mean_abs=0.24890486896038055, max_abs=1.75, mean_rel=0.10606798529624939, max_rel=843.7499389648438, norm_rel=0.011123019270598888, ref_abs_avg=22.805240631103516, test_abs_avg=22.804676055908203
liger_forward grad[41] vs paper_forward: mean_abs=0.19849026203155518, max_abs=1.0, mean_rel=0.03128965198993683, max_rel=0.9094430208206177, norm_rel=0.011241251602768898, ref_abs_avg=18.579368591308594, test_abs_avg=18.561534881591797
liger_forward grad[42] vs paper_forward: mean_abs=0.24274660646915436, max_abs=1.5, mean_rel=0.1352289468050003, max_rel=999.9999389648438, norm_rel=0.011262902989983559, ref_abs_avg=22.00307846069336, test_abs_avg=22.003005981445312
liger_forward grad[43] vs paper_forward: mean_abs=0.23571814596652985, max_abs=1.5, mean_rel=0.08726032078266144, max_rel=781.2499389648438, norm_rel=0.010915384627878666, ref_abs_avg=22.056079864501953, test_abs_avg=22.056533813476562
liger_forward grad[44] vs paper_forward: mean_abs=0.1809181571006775, max_abs=0.75, mean_rel=0.15739330649375916, max_rel=59.25273513793945, norm_rel=0.0104982303455472, ref_abs_avg=17.83018684387207, test_abs_avg=17.82928466796875
liger_forward grad[45] vs paper_forward: mean_abs=0.2315634787082672, max_abs=1.5, mean_rel=0.14907555282115936, max_rel=937.4999389648438, norm_rel=0.011020593345165253, ref_abs_avg=21.479015350341797, test_abs_avg=21.47813606262207
liger_forward grad[46] vs paper_forward: mean_abs=0.22353985905647278, max_abs=1.5, mean_rel=0.0941486582159996, max_rel=671.8749389648438, norm_rel=0.01090225763618946, ref_abs_avg=20.938274383544922, test_abs_avg=20.937545776367188
liger_forward grad[47] vs paper_forward: mean_abs=0.18290388584136963, max_abs=0.75, mean_rel=0.05325894057750702, max_rel=3.5888357162475586, norm_rel=0.010791987180709839, ref_abs_avg=17.084997177124023, test_abs_avg=17.076244354248047
liger_forward grad[48] vs paper_forward: mean_abs=0.2201615273952484, max_abs=1.5, mean_rel=0.14121049642562866, max_rel=749.9999389648438, norm_rel=0.010967087931931019, ref_abs_avg=20.49856948852539, test_abs_avg=20.498462677001953
liger_forward grad[49] vs paper_forward: mean_abs=0.21289734542369843, max_abs=1.375, mean_rel=0.09501221776008606, max_rel=687.4999389648438, norm_rel=0.010717793367803097, ref_abs_avg=20.253211975097656, test_abs_avg=20.253925323486328
liger_forward grad[50] vs paper_forward: mean_abs=0.2048492431640625, max_abs=0.875, mean_rel=0.03668826073408127, max_rel=1.2740716934204102, norm_rel=0.01205406617373228, ref_abs_avg=17.707799911499023, test_abs_avg=17.702795028686523
liger_forward grad[51] vs paper_forward: mean_abs=0.2506313920021057, max_abs=1.75, mean_rel=0.1451742798089981, max_rel=968.7499389648438, norm_rel=0.01168818213045597, ref_abs_avg=21.862266540527344, test_abs_avg=21.86161994934082
liger_forward grad[52] vs paper_forward: mean_abs=0.24317505955696106, max_abs=1.625, mean_rel=0.10679464042186737, max_rel=874.9999389648438, norm_rel=0.011561045423150063, ref_abs_avg=21.49928092956543, test_abs_avg=21.50027847290039
liger_forward grad[53] vs paper_forward: mean_abs=0.19052410125732422, max_abs=0.75, mean_rel=0.04225095361471176, max_rel=2.224600076675415, norm_rel=0.01158352941274643, ref_abs_avg=16.75925064086914, test_abs_avg=16.77515411376953
liger_forward grad[54] vs paper_forward: mean_abs=0.22528426349163055, max_abs=1.5, mean_rel=0.12669576704502106, max_rel=749.9999389648438, norm_rel=0.011380113661289215, ref_abs_avg=20.187374114990234, test_abs_avg=20.186817169189453
liger_forward grad[55] vs paper_forward: mean_abs=0.22068898379802704, max_abs=1.5, mean_rel=0.10126985609531403, max_rel=546.875, norm_rel=0.011218930594623089, ref_abs_avg=20.025814056396484, test_abs_avg=20.023609161376953
liger_forward grad[56] vs paper_forward: mean_abs=0.16794312000274658, max_abs=0.6875, mean_rel=0.061880145221948624, max_rel=9.99492073059082, norm_rel=0.010490410029888153, ref_abs_avg=16.879745483398438, test_abs_avg=16.881961822509766
liger_forward grad[57] vs paper_forward: mean_abs=0.20920632779598236, max_abs=1.5, mean_rel=0.1287371665239334, max_rel=874.9999389648438, norm_rel=0.011123396456241608, ref_abs_avg=19.18609046936035, test_abs_avg=19.185701370239258
liger_forward grad[58] vs paper_forward: mean_abs=0.20278866589069366, max_abs=1.375, mean_rel=0.0903693437576294, max_rel=664.0624389648438, norm_rel=0.010789407417178154, ref_abs_avg=19.12338638305664, test_abs_avg=19.12253189086914
liger_forward grad[59] vs paper_forward: mean_abs=0.15841229259967804, max_abs=0.75, mean_rel=0.05819479376077652, max_rel=7.698469161987305, norm_rel=0.010339896194636822, ref_abs_avg=15.405131340026855, test_abs_avg=15.394161224365234
liger_forward grad[60] vs paper_forward: mean_abs=0.19710953533649445, max_abs=1.25, mean_rel=0.12213877588510513, max_rel=625.0, norm_rel=0.010867866687476635, ref_abs_avg=18.498188018798828, test_abs_avg=18.49789810180664
liger_forward grad[61] vs paper_forward: mean_abs=0.1898314654827118, max_abs=1.25, mean_rel=0.09883640706539154, max_rel=562.5, norm_rel=0.010760975070297718, ref_abs_avg=17.97370147705078, test_abs_avg=17.97511863708496
liger_forward grad[62] vs paper_forward: mean_abs=0.147538423538208, max_abs=0.51953125, mean_rel=0.04179120808839798, max_rel=2.4685816764831543, norm_rel=0.010222351178526878, ref_abs_avg=14.73922348022461, test_abs_avg=14.753058433532715
liger_forward grad[63] vs paper_forward: mean_abs=0.18405207991600037, max_abs=1.5, mean_rel=0.11871735006570816, max_rel=812.4999389648438, norm_rel=0.010641150176525116, ref_abs_avg=17.625429153442383, test_abs_avg=17.624839782714844
liger_forward grad[64] vs paper_forward: mean_abs=0.17840322852134705, max_abs=1.25, mean_rel=0.08574806153774261, max_rel=640.625, norm_rel=0.01035371609032154, ref_abs_avg=17.57745361328125, test_abs_avg=17.577266693115234
liger_forward grad[65] vs paper_forward: mean_abs=0.13614273071289062, max_abs=0.6875, mean_rel=0.05157088860869408, max_rel=3.0115244388580322, norm_rel=0.010022723115980625, ref_abs_avg=13.536881446838379, test_abs_avg=13.535683631896973
liger_forward grad[66] vs paper_forward: mean_abs=0.17324835062026978, max_abs=1.25, mean_rel=0.11817886680364609, max_rel=687.4999389648438, norm_rel=0.010456890799105167, ref_abs_avg=16.915512084960938, test_abs_avg=16.915111541748047
liger_forward grad[67] vs paper_forward: mean_abs=0.16828599572181702, max_abs=1.25, mean_rel=0.08639520406723022, max_rel=546.875, norm_rel=0.010250873863697052, ref_abs_avg=16.734439849853516, test_abs_avg=16.73703384399414
liger_forward grad[68] vs paper_forward: mean_abs=0.12949752807617188, max_abs=0.5, mean_rel=0.048155538737773895, max_rel=2.481628894805908, norm_rel=0.009612057358026505, ref_abs_avg=13.921530723571777, test_abs_avg=13.918916702270508
liger_forward grad[69] vs paper_forward: mean_abs=0.163995623588562, max_abs=1.1875, mean_rel=0.09975235164165497, max_rel=578.125, norm_rel=0.01029015053063631, ref_abs_avg=16.294937133789062, test_abs_avg=16.29517364501953
liger_forward grad[70] vs paper_forward: mean_abs=0.16042891144752502, max_abs=1.28125, mean_rel=0.07363422214984894, max_rel=499.9999694824219, norm_rel=0.010006429627537727, ref_abs_avg=16.39023208618164, test_abs_avg=16.390151977539062
liger_forward grad[71] vs paper_forward: mean_abs=0.12503251433372498, max_abs=0.53125, mean_rel=0.1573605239391327, max_rel=34.26787567138672, norm_rel=0.009835765697062016, ref_abs_avg=13.350378036499023, test_abs_avg=13.34341049194336
liger_forward grad[72] vs paper_forward: mean_abs=0.1565394103527069, max_abs=1.625, mean_rel=0.10974304378032684, max_rel=593.75, norm_rel=0.010065428912639618, ref_abs_avg=15.868847846984863, test_abs_avg=15.868691444396973
liger_forward grad[73] vs paper_forward: mean_abs=0.15095752477645874, max_abs=1.0625, mean_rel=0.08036452531814575, max_rel=437.4999694824219, norm_rel=0.009787563234567642, ref_abs_avg=15.744575500488281, test_abs_avg=15.746417999267578
liger_forward grad[74] vs paper_forward: mean_abs=0.15100866556167603, max_abs=0.625, mean_rel=0.11207129806280136, max_rel=33.765289306640625, norm_rel=0.011433436535298824, ref_abs_avg=13.607373237609863, test_abs_avg=13.59727954864502
liger_forward grad[75] vs paper_forward: mean_abs=0.1756768673658371, max_abs=1.5, mean_rel=0.11715338379144669, max_rel=1093.75, norm_rel=0.011343569494783878, ref_abs_avg=15.783008575439453, test_abs_avg=15.782238960266113
liger_forward grad[76] vs paper_forward: mean_abs=0.1716332882642746, max_abs=1.5, mean_rel=0.08194009214639664, max_rel=531.25, norm_rel=0.011109696701169014, ref_abs_avg=15.695590019226074, test_abs_avg=15.69436264038086
liger_forward grad[77] vs paper_forward: mean_abs=0.13127130270004272, max_abs=0.5625, mean_rel=0.09365038573741913, max_rel=12.675420761108398, norm_rel=0.010712090879678726, ref_abs_avg=12.416604995727539, test_abs_avg=12.418729782104492
liger_forward grad[78] vs paper_forward: mean_abs=0.1605290174484253, max_abs=1.5, mean_rel=0.10301288217306137, max_rel=531.25, norm_rel=0.011012333445250988, ref_abs_avg=14.831121444702148, test_abs_avg=14.831131935119629
liger_forward grad[79] vs paper_forward: mean_abs=0.1572589874267578, max_abs=1.5, mean_rel=0.07921873778104782, max_rel=484.3749694824219, norm_rel=0.010865372605621815, ref_abs_avg=14.735480308532715, test_abs_avg=14.734989166259766
liger_forward grad[80] vs paper_forward: mean_abs=0.11985397338867188, max_abs=0.5, mean_rel=0.03456383943557739, max_rel=2.0601065158843994, norm_rel=0.010074151679873466, ref_abs_avg=12.702856063842773, test_abs_avg=12.705934524536133
liger_forward grad[81] vs paper_forward: mean_abs=0.1491658091545105, max_abs=1.3125, mean_rel=0.10677787661552429, max_rel=546.875, norm_rel=0.010635236278176308, ref_abs_avg=14.351104736328125, test_abs_avg=14.351236343383789
liger_forward grad[82] vs paper_forward: mean_abs=0.14417248964309692, max_abs=1.34375, mean_rel=0.08098234236240387, max_rel=562.5, norm_rel=0.010217578150331974, ref_abs_avg=14.373897552490234, test_abs_avg=14.374497413635254
liger_forward grad[83] vs paper_forward: mean_abs=0.10236191749572754, max_abs=0.5, mean_rel=0.030195416882634163, max_rel=1.2040094137191772, norm_rel=0.009610971435904503, ref_abs_avg=11.396480560302734, test_abs_avg=11.396627426147461
liger_forward grad[84] vs paper_forward: mean_abs=0.13894854485988617, max_abs=1.1875, mean_rel=0.09843692183494568, max_rel=625.0, norm_rel=0.010429168120026588, ref_abs_avg=13.62846851348877, test_abs_avg=13.628118515014648
liger_forward grad[85] vs paper_forward: mean_abs=0.13400456309318542, max_abs=1.125, mean_rel=0.07326047122478485, max_rel=464.8437194824219, norm_rel=0.010144812054932117, ref_abs_avg=13.475860595703125, test_abs_avg=13.479531288146973
liger_forward grad[86] vs paper_forward: mean_abs=0.10566392540931702, max_abs=0.46875, mean_rel=0.46020206809043884, max_rel=215.767333984375, norm_rel=0.00972031056880951, ref_abs_avg=11.46314525604248, test_abs_avg=11.468317031860352
liger_forward grad[87] vs paper_forward: mean_abs=0.1308959424495697, max_abs=1.25, mean_rel=0.09331455081701279, max_rel=499.9999694824219, norm_rel=0.01023504976183176, ref_abs_avg=13.120227813720703, test_abs_avg=13.120628356933594
liger_forward grad[88] vs paper_forward: mean_abs=0.1272953748703003, max_abs=1.25, mean_rel=0.07345078140497208, max_rel=507.8124694824219, norm_rel=0.010186314582824707, ref_abs_avg=12.793086051940918, test_abs_avg=12.793466567993164
liger_forward grad[89] vs paper_forward: mean_abs=0.10005669295787811, max_abs=0.5, mean_rel=0.2175023853778839, max_rel=62.66272735595703, norm_rel=0.009192102588713169, ref_abs_avg=11.660257339477539, test_abs_avg=11.657408714294434
liger_forward grad[90] vs paper_forward: mean_abs=0.1247536838054657, max_abs=1.5, mean_rel=0.08830413222312927, max_rel=562.5, norm_rel=0.00996796041727066, ref_abs_avg=12.901397705078125, test_abs_avg=12.90130615234375
liger_forward grad[91] vs paper_forward: mean_abs=0.12204189598560333, max_abs=1.25, mean_rel=0.07110138982534409, max_rel=781.2499389648438, norm_rel=0.010071567259728909, ref_abs_avg=12.536197662353516, test_abs_avg=12.534669876098633
liger_forward grad[92] vs paper_forward: mean_abs=0.08953499794006348, max_abs=0.359375, mean_rel=0.04391242191195488, max_rel=3.701157569885254, norm_rel=0.009113107807934284, ref_abs_avg=10.175165176391602, test_abs_avg=10.17031478881836
liger_forward grad[93] vs paper_forward: mean_abs=0.11585702002048492, max_abs=1.25, mean_rel=0.08831460773944855, max_rel=562.5, norm_rel=0.009715150110423565, ref_abs_avg=12.352249145507812, test_abs_avg=12.352171897888184
liger_forward grad[94] vs paper_forward: mean_abs=0.11436866223812103, max_abs=1.0625, mean_rel=0.06817656010389328, max_rel=468.7499694824219, norm_rel=0.009596971794962883, ref_abs_avg=12.359708786010742, test_abs_avg=12.357593536376953
liger_forward grad[95] vs paper_forward: mean_abs=0.09673607349395752, max_abs=0.375, mean_rel=0.041070614010095596, max_rel=2.894041061401367, norm_rel=0.009967442601919174, ref_abs_avg=10.042418479919434, test_abs_avg=10.049426078796387
liger_forward grad[96] vs paper_forward: mean_abs=0.11126862466335297, max_abs=1.5, mean_rel=0.08146065473556519, max_rel=499.9999694824219, norm_rel=0.009601271711289883, ref_abs_avg=12.094030380249023, test_abs_avg=12.094173431396484
liger_forward grad[97] vs paper_forward: mean_abs=0.10846517235040665, max_abs=1.25, mean_rel=0.06248801574110985, max_rel=265.625, norm_rel=0.009531374089419842, ref_abs_avg=11.880948066711426, test_abs_avg=11.880422592163086

