identity layers + randn queries

/usr/local/lib/python3.12/dist-packages/torch/_inductor/lowering.py:7627: UserWarning: 
Online softmax is disabled on the fly since Inductor decides to
split the reduction. Cut an issue to PyTorch if this is an
important use case and you want to speed it up with online
softmax.

  warnings.warn(
/usr/local/lib/python3.12/dist-packages/torch/_inductor/lowering.py:7627: UserWarning: 
Online softmax is disabled on the fly since Inductor decides to
split the reduction. Cut an issue to PyTorch if this is an
important use case and you want to speed it up with online
softmax.

  warnings.warn(
/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py:321: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
  warnings.warn(
E0428 03:59:02.967000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Runtime error during autotuning: 
E0428 03:59:02.967000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] CUDA driver error: invalid argument
E0428 03:59:02.967000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:02.967000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] This may mean this GPU is too small for max_autotune mode.
E0428 03:59:02.967000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:02.967000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] . 
E0428 03:59:02.967000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Ignoring this choice.
E0428 03:59:02.973000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Runtime error during autotuning: 
E0428 03:59:02.973000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] CUDA driver error: invalid argument
E0428 03:59:02.973000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:02.973000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] This may mean this GPU is too small for max_autotune mode.
E0428 03:59:02.973000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:02.973000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] . 
E0428 03:59:02.973000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Ignoring this choice.
E0428 03:59:02.979000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Runtime error during autotuning: 
E0428 03:59:02.979000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] CUDA driver error: invalid argument
E0428 03:59:02.979000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:02.979000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] This may mean this GPU is too small for max_autotune mode.
E0428 03:59:02.979000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:02.979000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] . 
E0428 03:59:02.979000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Ignoring this choice.
E0428 03:59:02.985000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Runtime error during autotuning: 
E0428 03:59:02.985000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] CUDA driver error: invalid argument
E0428 03:59:02.985000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:02.985000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] This may mean this GPU is too small for max_autotune mode.
E0428 03:59:02.985000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:02.985000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] . 
E0428 03:59:02.985000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Ignoring this choice.
E0428 03:59:02.990000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Runtime error during autotuning: 
E0428 03:59:02.990000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] CUDA driver error: invalid argument
E0428 03:59:02.990000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:02.990000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] This may mean this GPU is too small for max_autotune mode.
E0428 03:59:02.990000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:02.990000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] . 
E0428 03:59:02.990000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Ignoring this choice.
E0428 03:59:02.998000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Runtime error during autotuning: 
E0428 03:59:02.998000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] CUDA driver error: invalid argument
E0428 03:59:02.998000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:02.998000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] This may mean this GPU is too small for max_autotune mode.
E0428 03:59:02.998000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:02.998000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] . 
E0428 03:59:02.998000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Ignoring this choice.
E0428 03:59:03.003000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Runtime error during autotuning: 
E0428 03:59:03.003000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] CUDA driver error: invalid argument
E0428 03:59:03.003000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:03.003000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] This may mean this GPU is too small for max_autotune mode.
E0428 03:59:03.003000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:03.003000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] . 
E0428 03:59:03.003000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Ignoring this choice.
E0428 03:59:03.009000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Runtime error during autotuning: 
E0428 03:59:03.009000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] CUDA driver error: invalid argument
E0428 03:59:03.009000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:03.009000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] This may mean this GPU is too small for max_autotune mode.
E0428 03:59:03.009000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:03.009000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] . 
E0428 03:59:03.009000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Ignoring this choice.
E0428 03:59:03.014000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Runtime error during autotuning: 
E0428 03:59:03.014000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] CUDA driver error: invalid argument
E0428 03:59:03.014000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:03.014000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] This may mean this GPU is too small for max_autotune mode.
E0428 03:59:03.014000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:03.014000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] . 
E0428 03:59:03.014000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Ignoring this choice.
E0428 03:59:03.019000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Runtime error during autotuning: 
E0428 03:59:03.019000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] CUDA driver error: invalid argument
E0428 03:59:03.019000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:03.019000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] This may mean this GPU is too small for max_autotune mode.
E0428 03:59:03.019000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:03.019000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] . 
E0428 03:59:03.019000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Ignoring this choice.
E0428 03:59:03.025000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Runtime error during autotuning: 
E0428 03:59:03.025000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] CUDA driver error: invalid argument
E0428 03:59:03.025000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:03.025000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] This may mean this GPU is too small for max_autotune mode.
E0428 03:59:03.025000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:03.025000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] . 
E0428 03:59:03.025000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Ignoring this choice.
E0428 03:59:03.030000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Runtime error during autotuning: 
E0428 03:59:03.030000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] CUDA driver error: invalid argument
E0428 03:59:03.030000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:03.030000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] This may mean this GPU is too small for max_autotune mode.
E0428 03:59:03.030000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] 
E0428 03:59:03.030000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] . 
E0428 03:59:03.030000 4294 torch/_inductor/select_algorithm.py:3727] [3/1] Ignoring this choice.
Autotune Choices Stats:
{"num_choices": 13, "num_triton_choices": 12, "best_kernel": "bmm", "best_time": 1.7106239795684814, "best_triton_pos": 1, "best_triton_time": Infinity, "best_triton_kernel": "triton_bmm_51", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2"}
AUTOTUNE bmm(131072x2x1, 131072x1x512)
strides: [1, 131072, 0], [512, 0, 1]
dtypes: torch.float32, torch.float32
  bmm 1.7106 ms 100.0% 
  triton_bmm_51 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2
  triton_bmm_52 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2
  triton_bmm_53 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_bmm_54 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2
  triton_bmm_55 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_bmm_56 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_bmm_57 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_bmm_58 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_bmm_59 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
SingleProcess AUTOTUNE benchmarking takes 0.2825 seconds and 0.1584 seconds precompiling for 13 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_70", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.173567995429039, "best_triton_pos": 0}
AUTOTUNE mm(512x1, 1x262144)
strides: [1, 512], [0, 1]
dtypes: torch.float32, torch.float32
  triton_mm_70 0.1736 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_75 0.1742 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_68 0.1745 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_72 0.1752 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_73 0.1795 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_71 0.1796 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_74 0.1800 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_67 0.1828 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_69 0.1843 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
  triton_mm_63 0.1849 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2
SingleProcess AUTOTUNE benchmarking takes 0.5564 seconds and 0.2878 seconds precompiling for 18 choices
/usr/local/lib/python3.12/dist-packages/torch/_inductor/lowering.py:7627: UserWarning: 
Online softmax is disabled on the fly since Inductor decides to
split the reduction. Cut an issue to PyTorch if this is an
important use case and you want to speed it up with online
softmax.

  warnings.warn(
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_92", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.08886399865150452, "best_triton_pos": 0}
AUTOTUNE mm(512x1, 1x131072)
strides: [1, 512], [0, 1]
dtypes: torch.float32, torch.float32
  triton_mm_92 0.0889 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_85 0.0890 ms 99.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_87 0.0892 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_86 0.0892 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
  triton_mm_89 0.0892 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_90 0.0894 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_88 0.0900 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_91 0.0900 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_84 0.0928 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_93 0.0932 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8
SingleProcess AUTOTUNE benchmarking takes 0.3456 seconds and 0.5553 seconds precompiling for 18 choices

paper_forward fwd+bwd:  221.180 ms
paper_forward bwd-only: 173.920 ms
paper_forward peak allocated: fwd=35.128 GiB, fwd+bwd=37.247 GiB
paper_forward peak reserved:  fwd=35.916 GiB, fwd+bwd=38.416 GiB
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (1, 512, 8, 1, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 9.87s,
best config selected: num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None;
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_2_online_softmax_merge_intrablock_out_kernel,
with key as (512, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.bfloat16'),
finished after 5.00s,
best config selected: num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (2, 512, 8, 2, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 11.20s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (3, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 12.84s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (4, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 13.13s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (5, 512, 1, 8, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 6.90s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (5, 512, 1, 8, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 9.89s,
best config selected: num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None;
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Triton autotuning for function phase_1_reduce_grad_pseudo_queries_kernel,
with key as (131072, 512, 1, 'torch.float32', 'torch.float32'),
finished after 1.76s,
best config selected: BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_2_online_softmax_merge_intrablock_backward_kernel,
with key as (512, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 5.40s,
best config selected: num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Triton autotuning for function phase_2_reduce_grad_pseudo_query_kernel,
with key as (131072, 512, 'torch.float32', 'torch.float32'),
finished after 1.74s,
best config selected: BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (4, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 30.82s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Triton autotuning for function phase_1_reduce_grad_pseudo_queries_kernel,
with key as (131072, 512, 8, 'torch.float32', 'torch.float32'),
finished after 1.73s,
best config selected: BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (3, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 29.24s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (2, 512, 8, 2, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 21.46s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (1, 512, 8, 1, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 13.16s,
best config selected: num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None;
production_forward fwd+bwd:  66.271 ms
production_forward bwd-only: 56.363 ms
production_forward peak allocated: fwd=7.614 GiB, fwd+bwd=15.618 GiB
production_forward peak reserved:  fwd=27.117 GiB, fwd+bwd=27.117 GiB

Autotune Choices Stats:
{"num_choices": 17, "num_triton_choices": 16, "best_kernel": "mm", "best_time": 0.13820800185203552, "best_triton_pos": 1, "best_triton_time": 0.21302400529384613, "best_triton_kernel": "triton_mm_108", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"}
AUTOTUNE mm(131072x512, 512x8)
strides: [512, 1], [1, 512]
dtypes: torch.float32, torch.float32
  mm 0.1382 ms 100.0% 
  triton_mm_108 0.2130 ms 64.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_111 0.2144 ms 64.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_98 0.3475 ms 39.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2
  triton_mm_104 0.3492 ms 39.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_105 0.3502 ms 39.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_109 0.3506 ms 39.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_106 0.3520 ms 39.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_100 0.3530 ms 39.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_103 0.3538 ms 39.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
SingleProcess AUTOTUNE benchmarking takes 0.5905 seconds and 0.9188 seconds precompiling for 17 choices
Autotune Choices Stats:
{"num_choices": 17, "num_triton_choices": 16, "best_kernel": "mm", "best_time": 0.2568320035934448, "best_triton_pos": 1, "best_triton_time": 0.4176639914512634, "best_triton_kernel": "triton_mm_124", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"}
AUTOTUNE mm(262144x512, 512x8)
strides: [512, 1], [1, 512]
dtypes: torch.float32, torch.float32
  mm 0.2568 ms 100.0% 
  triton_mm_124 0.4177 ms 61.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_127 0.4233 ms 60.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_114 0.6790 ms 37.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2
  triton_mm_120 0.6873 ms 37.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_121 0.6920 ms 37.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_125 0.6944 ms 37.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_122 0.6946 ms 37.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_119 0.6956 ms 36.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_116 0.6973 ms 36.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
SingleProcess AUTOTUNE benchmarking takes 0.6272 seconds and 0.7947 seconds precompiling for 17 choices
Autotune Choices Stats:
{"num_choices": 6, "num_triton_choices": 0, "best_kernel": "decompose_k_mm_128_split_3", "best_kernel_desc": "k_split=128", "best_time": 0.2361920028924942}
AUTOTUNE mm(512x262144, 262144x8)
strides: [1, 512], [8, 1]
dtypes: torch.float32, torch.float32
  decompose_k_mm_128_split_3 0.2362 ms 100.0% k_split=128
  decompose_k_mm_256_split_4 0.2405 ms 98.2% k_split=256
  decompose_k_mm_64_split_2 0.2542 ms 92.9% k_split=64
  mm 0.2650 ms 89.1% 
  decompose_k_mm_32_split_1 0.3789 ms 62.3% k_split=32
  decompose_k_mm_16_split_0 0.5984 ms 39.5% k_split=16
SingleProcess AUTOTUNE benchmarking takes 3.8985 seconds and 0.0003 seconds precompiling for 6 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_134", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.18614399433135986, "best_triton_pos": 0}
AUTOTUNE mm(262144x8, 8x512)
strides: [8, 1], [512, 1]
dtypes: torch.float32, torch.float32
  triton_mm_134 0.1861 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_136 0.1863 ms 99.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_138 0.1863 ms 99.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_141 0.1864 ms 99.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_140 0.1869 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_137 0.1873 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_139 0.1873 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_131 0.2140 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8
  triton_mm_143 0.2144 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_129 0.2235 ms 83.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2
SingleProcess AUTOTUNE benchmarking takes 0.5795 seconds and 0.0003 seconds precompiling for 18 choices
Autotune Choices Stats:
{"num_choices": 17, "num_triton_choices": 16, "best_kernel": "mm", "best_time": 0.4920639991760254, "best_triton_pos": 1, "best_triton_time": 0.8266559839248657, "best_triton_kernel": "triton_mm_157", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"}
AUTOTUNE mm(524288x512, 512x8)
strides: [512, 1], [1, 512]
dtypes: torch.float32, torch.float32
  mm 0.4921 ms 100.0% 
  triton_mm_157 0.8267 ms 59.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_160 0.8344 ms 59.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_147 1.3394 ms 36.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2
  triton_mm_153 1.3476 ms 36.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_154 1.3585 ms 36.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_152 1.3618 ms 36.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_155 1.3627 ms 36.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_149 1.3679 ms 36.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_158 1.3796 ms 35.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
SingleProcess AUTOTUNE benchmarking takes 0.6734 seconds and 0.5146 seconds precompiling for 17 choices
Autotune Choices Stats:
{"num_choices": 6, "num_triton_choices": 0, "best_kernel": "decompose_k_mm_256_split_9", "best_kernel_desc": "k_split=256", "best_time": 0.4564479887485504}
AUTOTUNE mm(512x524288, 524288x8)
strides: [1, 512], [8, 1]
dtypes: torch.float32, torch.float32
  decompose_k_mm_256_split_9 0.4564 ms 100.0% k_split=256
  decompose_k_mm_128_split_8 0.4805 ms 95.0% k_split=128
  decompose_k_mm_64_split_7 0.4884 ms 93.5% k_split=64
  mm 0.4967 ms 91.9% 
  decompose_k_mm_32_split_6 0.7381 ms 61.8% k_split=32
  decompose_k_mm_16_split_5 1.1740 ms 38.9% k_split=16
SingleProcess AUTOTUNE benchmarking takes 5.0723 seconds and 0.0003 seconds precompiling for 6 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_171", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.3672640025615692, "best_triton_pos": 0}
AUTOTUNE mm(524288x8, 8x512)
strides: [8, 1], [512, 1]
dtypes: torch.float32, torch.float32
  triton_mm_171 0.3673 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_169 0.3679 ms 99.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_173 0.3680 ms 99.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_167 0.3682 ms 99.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_170 0.3683 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_174 0.3684 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_172 0.3686 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_176 0.4154 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_164 0.4231 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8
  triton_mm_162 0.4412 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2
SingleProcess AUTOTUNE benchmarking takes 0.6522 seconds and 0.0004 seconds precompiling for 18 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_184", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.41916799545288086, "best_triton_pos": 0}
AUTOTUNE mm(655360x1, 1x512)
strides: [1, 0], [512, 1]
dtypes: torch.float32, torch.float32
  triton_mm_184 0.4192 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_188 0.4193 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_191 0.4196 ms 99.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_186 0.4197 ms 99.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_187 0.4208 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_189 0.4209 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_190 0.4209 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_183 0.4271 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_185 0.4292 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
  triton_mm_194 0.4362 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8
SingleProcess AUTOTUNE benchmarking takes 0.6553 seconds and 0.3385 seconds precompiling for 18 choices
Autotune Choices Stats:
{"num_choices": 9, "num_triton_choices": 0, "best_kernel": "decompose_k_mm_128_split_16", "best_kernel_desc": "k_split=128", "best_time": 0.1276479959487915}
AUTOTUNE mm(512x131072, 131072x8)
strides: [1, 512], [8, 1]
dtypes: torch.float32, torch.float32
  decompose_k_mm_128_split_16 0.1276 ms 100.0% k_split=128
  decompose_k_mm_256_split_17 0.1317 ms 96.9% k_split=256
  decompose_k_mm_64_split_15 0.1343 ms 95.1% k_split=64
  mm 0.1496 ms 85.3% 
  decompose_k_mm_32_split_14 0.1999 ms 63.9% k_split=32
  decompose_k_mm_16_split_13 0.3090 ms 41.3% k_split=16
  decompose_k_mm_8_split_12 0.5766 ms 22.1% k_split=8
  decompose_k_mm_4_split_11 1.1334 ms 11.3% k_split=4
  decompose_k_mm_2_split_10 2.2836 ms 5.6% k_split=2
SingleProcess AUTOTUNE benchmarking takes 5.5952 seconds and 0.0003 seconds precompiling for 9 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_208", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.09532800316810608, "best_triton_pos": 0}
AUTOTUNE mm(131072x8, 8x512)
strides: [8, 1], [512, 1]
dtypes: torch.float32, torch.float32
  triton_mm_208 0.0953 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_203 0.0956 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_201 0.0957 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_205 0.0957 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_206 0.0958 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_207 0.0958 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_204 0.0961 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_200 0.1095 ms 87.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_202 0.1126 ms 84.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
  triton_mm_196 0.1135 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2
SingleProcess AUTOTUNE benchmarking takes 0.3923 seconds and 0.0002 seconds precompiling for 18 choices

torch_compile_phases_forward fwd+bwd:  94.945 ms
torch_compile_phases_forward bwd-only: 76.457 ms
torch_compile_phases_forward peak allocated: fwd=18.203 GiB, fwd+bwd=18.831 GiB
torch_compile_phases_forward peak reserved:  fwd=27.148 GiB, fwd+bwd=27.148 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016654758946970105, max_abs=0.0625
production_forward grad[0] vs paper_forward: mean_abs=0.008518998511135578, max_abs=0.46875, mean_rel=0.07290728390216827, max_rel=93.58258819580078, norm_rel=0.02002289891242981, ref_abs_avg=0.4632517397403717, test_abs_avg=0.46327489614486694
production_forward grad[1] vs paper_forward: mean_abs=7.452637672424316, max_abs=58.0, mean_rel=0.4005202054977417, max_rel=2304.8095703125, norm_rel=0.0205695740878582, ref_abs_avg=319.7028503417969, test_abs_avg=319.7225341796875
production_forward grad[2] vs paper_forward: mean_abs=1.2774238586425781, max_abs=4.9375, mean_rel=0.12514081597328186, max_rel=11.442474365234375, norm_rel=0.024547800421714783, ref_abs_avg=51.78153991699219, test_abs_avg=51.711055755615234
production_forward grad[3] vs paper_forward: mean_abs=1.5181827545166016, max_abs=10.0, mean_rel=0.161967933177948, max_rel=1305.9298095703125, norm_rel=0.02321453019976616, ref_abs_avg=65.74951934814453, test_abs_avg=65.75315856933594
production_forward grad[4] vs paper_forward: mean_abs=1.4811153411865234, max_abs=9.0, mean_rel=0.17986826598644257, max_rel=1563.7967529296875, norm_rel=0.022940563037991524, ref_abs_avg=64.91040802001953, test_abs_avg=64.90865325927734
production_forward grad[5] vs paper_forward: mean_abs=1.1728630065917969, max_abs=4.375, mean_rel=0.12349933385848999, max_rel=13.641446113586426, norm_rel=0.024067072197794914, ref_abs_avg=47.477569580078125, test_abs_avg=47.50248718261719
production_forward grad[6] vs paper_forward: mean_abs=1.328892469406128, max_abs=8.875, mean_rel=0.16928833723068237, max_rel=1857.443603515625, norm_rel=0.02291671372950077, ref_abs_avg=58.29741668701172, test_abs_avg=58.30044937133789
production_forward grad[7] vs paper_forward: mean_abs=1.2896332740783691, max_abs=8.25, mean_rel=0.15614837408065796, max_rel=1051.284912109375, norm_rel=0.022657768800854683, ref_abs_avg=57.251365661621094, test_abs_avg=57.25658416748047
production_forward grad[8] vs paper_forward: mean_abs=1.0139613151550293, max_abs=4.0, mean_rel=0.4295579791069031, max_rel=182.16693115234375, norm_rel=0.02339957468211651, ref_abs_avg=45.542484283447266, test_abs_avg=45.57767105102539
production_forward grad[9] vs paper_forward: mean_abs=1.208635687828064, max_abs=8.0, mean_rel=0.1727234423160553, max_rel=2558.640380859375, norm_rel=0.022754954174160957, ref_abs_avg=53.35002136230469, test_abs_avg=53.354129791259766
production_forward grad[10] vs paper_forward: mean_abs=1.1730821132659912, max_abs=7.46875, mean_rel=0.14566245675086975, max_rel=1545.4078369140625, norm_rel=0.022427665069699287, ref_abs_avg=52.57612609863281, test_abs_avg=52.57883071899414
production_forward grad[11] vs paper_forward: mean_abs=1.0141643285751343, max_abs=3.5, mean_rel=0.11241155117750168, max_rel=7.250516891479492, norm_rel=0.023757033050060272, ref_abs_avg=41.29027557373047, test_abs_avg=41.381004333496094
production_forward grad[12] vs paper_forward: mean_abs=1.1277250051498413, max_abs=7.25, mean_rel=0.17143994569778442, max_rel=2358.135498046875, norm_rel=0.02263796702027321, ref_abs_avg=50.01451873779297, test_abs_avg=50.0211181640625
production_forward grad[13] vs paper_forward: mean_abs=1.0960041284561157, max_abs=6.53125, mean_rel=0.1445268839597702, max_rel=812.50634765625, norm_rel=0.02233768440783024, ref_abs_avg=49.40098571777344, test_abs_avg=49.40299987792969
production_forward grad[14] vs paper_forward: mean_abs=0.8289132118225098, max_abs=4.0, mean_rel=0.08865149319171906, max_rel=9.020618438720703, norm_rel=0.022168509662151337, ref_abs_avg=38.331520080566406, test_abs_avg=38.348045349121094
production_forward grad[15] vs paper_forward: mean_abs=1.0578091144561768, max_abs=7.5, mean_rel=0.16821438074111938, max_rel=1854.988525390625, norm_rel=0.02248060703277588, ref_abs_avg=47.30946350097656, test_abs_avg=47.31088638305664
production_forward grad[16] vs paper_forward: mean_abs=1.0325124263763428, max_abs=6.5, mean_rel=0.16717267036437988, max_rel=2033.3349609375, norm_rel=0.02219264954328537, ref_abs_avg=46.737037658691406, test_abs_avg=46.739501953125
production_forward grad[17] vs paper_forward: mean_abs=0.7875006794929504, max_abs=3.5, mean_rel=0.3253899812698364, max_rel=51.926700592041016, norm_rel=0.020719876512885094, ref_abs_avg=38.618812561035156, test_abs_avg=38.55270767211914
production_forward grad[18] vs paper_forward: mean_abs=0.996688723564148, max_abs=6.5, mean_rel=0.15190023183822632, max_rel=964.931884765625, norm_rel=0.022325150668621063, ref_abs_avg=44.79883575439453, test_abs_avg=44.80400848388672
production_forward grad[19] vs paper_forward: mean_abs=0.9670752882957458, max_abs=5.375, mean_rel=0.14973920583724976, max_rel=1130.927978515625, norm_rel=0.02210114151239395, ref_abs_avg=43.99592590332031, test_abs_avg=44.0016975402832
production_forward grad[20] vs paper_forward: mean_abs=0.7947652339935303, max_abs=3.5, mean_rel=0.13096168637275696, max_rel=9.871190071105957, norm_rel=0.022769412025809288, ref_abs_avg=35.170310974121094, test_abs_avg=35.21849060058594
production_forward grad[21] vs paper_forward: mean_abs=0.9404255747795105, max_abs=6.0, mean_rel=0.1584216058254242, max_rel=1533.0150146484375, norm_rel=0.022263754159212112, ref_abs_avg=42.448646545410156, test_abs_avg=42.45132827758789
production_forward grad[22] vs paper_forward: mean_abs=0.9159735441207886, max_abs=5.6875, mean_rel=0.16030260920524597, max_rel=3064.137451171875, norm_rel=0.02183503285050392, ref_abs_avg=42.17488098144531, test_abs_avg=42.17503356933594
production_forward grad[23] vs paper_forward: mean_abs=0.7562215328216553, max_abs=3.25, mean_rel=0.18995440006256104, max_rel=65.62367248535156, norm_rel=0.02227471023797989, ref_abs_avg=34.842926025390625, test_abs_avg=34.86686706542969
production_forward grad[24] vs paper_forward: mean_abs=0.8959184885025024, max_abs=6.0, mean_rel=0.1500815600156784, max_rel=1227.953125, norm_rel=0.022102292627096176, ref_abs_avg=40.71876525878906, test_abs_avg=40.71992492675781
production_forward grad[25] vs paper_forward: mean_abs=0.8738772869110107, max_abs=6.0, mean_rel=0.13936200737953186, max_rel=1205.2752685546875, norm_rel=0.021608414128422737, ref_abs_avg=40.65659713745117, test_abs_avg=40.65644073486328
production_forward grad[26] vs paper_forward: mean_abs=0.8431460857391357, max_abs=4.04296875, mean_rel=0.29896411299705505, max_rel=55.165679931640625, norm_rel=0.02393944188952446, ref_abs_avg=36.39126205444336, test_abs_avg=36.33989334106445
production_forward grad[27] vs paper_forward: mean_abs=1.0474584102630615, max_abs=6.5, mean_rel=0.164387509226799, max_rel=1429.860595703125, norm_rel=0.024118879809975624, ref_abs_avg=43.602325439453125, test_abs_avg=43.604610443115234
production_forward grad[28] vs paper_forward: mean_abs=1.0266287326812744, max_abs=6.75, mean_rel=0.15771164000034332, max_rel=734.788330078125, norm_rel=0.023709461092948914, ref_abs_avg=43.521766662597656, test_abs_avg=43.52079391479492
production_forward grad[29] vs paper_forward: mean_abs=0.8238115310668945, max_abs=3.625, mean_rel=0.1368308663368225, max_rel=14.383748054504395, norm_rel=0.024127818644046783, ref_abs_avg=34.1488151550293, test_abs_avg=34.074867248535156
production_forward grad[30] vs paper_forward: mean_abs=0.9831939339637756, max_abs=6.5, mean_rel=0.17147421836853027, max_rel=1551.7266845703125, norm_rel=0.024332894012331963, ref_abs_avg=40.55364990234375, test_abs_avg=40.554344177246094
production_forward grad[31] vs paper_forward: mean_abs=0.9631476402282715, max_abs=6.333984375, mean_rel=0.16565704345703125, max_rel=1021.4910278320312, norm_rel=0.024329079315066338, ref_abs_avg=39.72119903564453, test_abs_avg=39.719329833984375
production_forward grad[32] vs paper_forward: mean_abs=0.7212202548980713, max_abs=2.5, mean_rel=0.07756897807121277, max_rel=2.74511981010437, norm_rel=0.022833453491330147, ref_abs_avg=30.953411102294922, test_abs_avg=30.9094181060791
production_forward grad[33] vs paper_forward: mean_abs=0.909630298614502, max_abs=6.125, mean_rel=0.17214557528495789, max_rel=2182.703125, norm_rel=0.024301918223500252, ref_abs_avg=37.55168914794922, test_abs_avg=37.552616119384766
production_forward grad[34] vs paper_forward: mean_abs=0.8967108130455017, max_abs=5.375, mean_rel=0.1536305844783783, max_rel=623.2088012695312, norm_rel=0.024301491677761078, ref_abs_avg=37.02279281616211, test_abs_avg=37.02341079711914
production_forward grad[35] vs paper_forward: mean_abs=0.7494276165962219, max_abs=3.0, mean_rel=0.25096043944358826, max_rel=89.22208404541016, norm_rel=0.02503092773258686, ref_abs_avg=30.34878158569336, test_abs_avg=30.333621978759766
production_forward grad[36] vs paper_forward: mean_abs=0.8546004295349121, max_abs=5.375, mean_rel=0.17293822765350342, max_rel=2100.362060546875, norm_rel=0.024166475981473923, ref_abs_avg=35.463897705078125, test_abs_avg=35.46619415283203
production_forward grad[37] vs paper_forward: mean_abs=0.8406765460968018, max_abs=5.1875, mean_rel=0.150588259100914, max_rel=1057.8095703125, norm_rel=0.024212362244725227, ref_abs_avg=34.797122955322266, test_abs_avg=34.79473876953125
production_forward grad[38] vs paper_forward: mean_abs=0.6563339233398438, max_abs=3.25, mean_rel=0.08321191370487213, max_rel=2.424250364303589, norm_rel=0.023831799626350403, ref_abs_avg=27.601783752441406, test_abs_avg=27.617374420166016
production_forward grad[39] vs paper_forward: mean_abs=0.8075810074806213, max_abs=4.75, mean_rel=0.16280360519886017, max_rel=843.1109619140625, norm_rel=0.023993048816919327, ref_abs_avg=33.75590515136719, test_abs_avg=33.756126403808594
production_forward grad[40] vs paper_forward: mean_abs=0.7948808073997498, max_abs=5.0, mean_rel=0.16685591638088226, max_rel=1434.181884765625, norm_rel=0.023811141029000282, ref_abs_avg=33.50721740722656, test_abs_avg=33.50347900390625
production_forward grad[41] vs paper_forward: mean_abs=0.6314897537231445, max_abs=2.5, mean_rel=0.18916204571723938, max_rel=23.614015579223633, norm_rel=0.02464752085506916, ref_abs_avg=25.453895568847656, test_abs_avg=25.45682144165039
production_forward grad[42] vs paper_forward: mean_abs=0.7591022253036499, max_abs=5.25, mean_rel=0.16650733351707458, max_rel=1517.4384765625, norm_rel=0.023679612204432487, ref_abs_avg=32.10530471801758, test_abs_avg=32.10535430908203
production_forward grad[43] vs paper_forward: mean_abs=0.750572681427002, max_abs=5.375, mean_rel=0.16210415959358215, max_rel=1158.7574462890625, norm_rel=0.023475633934140205, ref_abs_avg=32.020896911621094, test_abs_avg=32.013668060302734
production_forward grad[44] vs paper_forward: mean_abs=0.6499509811401367, max_abs=2.5, mean_rel=0.12744808197021484, max_rel=9.50069808959961, norm_rel=0.025176873430609703, ref_abs_avg=25.92569351196289, test_abs_avg=25.882530212402344
production_forward grad[45] vs paper_forward: mean_abs=0.7289621233940125, max_abs=5.0, mean_rel=0.15000605583190918, max_rel=1216.447021484375, norm_rel=0.023236867040395737, ref_abs_avg=31.374011993408203, test_abs_avg=31.375917434692383
production_forward grad[46] vs paper_forward: mean_abs=0.7130269408226013, max_abs=4.75, mean_rel=0.14612269401550293, max_rel=1264.6473388671875, norm_rel=0.023044107481837273, ref_abs_avg=31.00945281982422, test_abs_avg=31.013938903808594
production_forward grad[47] vs paper_forward: mean_abs=0.5743707418441772, max_abs=2.625, mean_rel=0.09442218393087387, max_rel=7.966911315917969, norm_rel=0.02344457618892193, ref_abs_avg=25.33816909790039, test_abs_avg=25.361234664916992
production_forward grad[48] vs paper_forward: mean_abs=0.6899017095565796, max_abs=4.75, mean_rel=0.15291425585746765, max_rel=1018.7800903320312, norm_rel=0.023082105442881584, ref_abs_avg=29.87822723388672, test_abs_avg=29.880889892578125
production_forward grad[49] vs paper_forward: mean_abs=0.6830809712409973, max_abs=4.25, mean_rel=0.16197223961353302, max_rel=902.9703369140625, norm_rel=0.023197516798973083, ref_abs_avg=29.538257598876953, test_abs_avg=29.53912353515625
production_forward grad[50] vs paper_forward: mean_abs=0.6126470565795898, max_abs=2.5, mean_rel=0.06373167783021927, max_rel=2.5524754524230957, norm_rel=0.02397139184176922, ref_abs_avg=26.228256225585938, test_abs_avg=26.231346130371094
production_forward grad[51] vs paper_forward: mean_abs=0.773491621017456, max_abs=5.5, mean_rel=0.1649339497089386, max_rel=870.9198608398438, norm_rel=0.024802112951874733, ref_abs_avg=31.264230728149414, test_abs_avg=31.266937255859375
production_forward grad[52] vs paper_forward: mean_abs=0.7596398591995239, max_abs=5.125, mean_rel=0.16643387079238892, max_rel=1184.9976806640625, norm_rel=0.024342648684978485, ref_abs_avg=31.309131622314453, test_abs_avg=31.304359436035156
production_forward grad[53] vs paper_forward: mean_abs=0.6179871559143066, max_abs=2.375, mean_rel=0.13560158014297485, max_rel=27.587841033935547, norm_rel=0.02622617967426777, ref_abs_avg=23.878612518310547, test_abs_avg=23.943614959716797
production_forward grad[54] vs paper_forward: mean_abs=0.7138476371765137, max_abs=4.75, mean_rel=0.15653420984745026, max_rel=607.0428466796875, norm_rel=0.024330344051122665, ref_abs_avg=29.376312255859375, test_abs_avg=29.378028869628906
production_forward grad[55] vs paper_forward: mean_abs=0.6988780498504639, max_abs=5.033203125, mean_rel=0.15520597994327545, max_rel=579.5032958984375, norm_rel=0.024064533412456512, ref_abs_avg=29.06734848022461, test_abs_avg=29.070384979248047
production_forward grad[56] vs paper_forward: mean_abs=0.516338586807251, max_abs=2.5, mean_rel=0.1477835774421692, max_rel=28.951234817504883, norm_rel=0.022385340183973312, ref_abs_avg=23.136579513549805, test_abs_avg=23.183103561401367
production_forward grad[57] vs paper_forward: mean_abs=0.6676130294799805, max_abs=4.5, mean_rel=0.1670438051223755, max_rel=1632.44970703125, norm_rel=0.024053050205111504, ref_abs_avg=27.801513671875, test_abs_avg=27.802078247070312
production_forward grad[58] vs paper_forward: mean_abs=0.6557706594467163, max_abs=4.5, mean_rel=0.1582634150981903, max_rel=1087.7591552734375, norm_rel=0.02391214109957218, ref_abs_avg=27.49233627319336, test_abs_avg=27.49544906616211
production_forward grad[59] vs paper_forward: mean_abs=0.5465116500854492, max_abs=2.1875, mean_rel=0.08241534233093262, max_rel=2.439330816268921, norm_rel=0.02446090802550316, ref_abs_avg=22.122207641601562, test_abs_avg=22.117168426513672
production_forward grad[60] vs paper_forward: mean_abs=0.6244281530380249, max_abs=4.875, mean_rel=0.15396906435489655, max_rel=809.7369384765625, norm_rel=0.023388659581542015, ref_abs_avg=26.680299758911133, test_abs_avg=26.681438446044922
production_forward grad[61] vs paper_forward: mean_abs=0.6137118339538574, max_abs=4.25, mean_rel=0.14781780540943146, max_rel=943.8117065429688, norm_rel=0.02322634682059288, ref_abs_avg=26.463903427124023, test_abs_avg=26.468658447265625
production_forward grad[62] vs paper_forward: mean_abs=0.4893287420272827, max_abs=1.75, mean_rel=0.10504641383886337, max_rel=17.163522720336914, norm_rel=0.024097014218568802, ref_abs_avg=20.49696159362793, test_abs_avg=20.50664520263672
production_forward grad[63] vs paper_forward: mean_abs=0.5936920642852783, max_abs=4.125, mean_rel=0.1563752293586731, max_rel=897.6026000976562, norm_rel=0.023221604526042938, ref_abs_avg=25.55374526977539, test_abs_avg=25.553970336914062
production_forward grad[64] vs paper_forward: mean_abs=0.5807110071182251, max_abs=4.0, mean_rel=0.14968770742416382, max_rel=595.2125854492188, norm_rel=0.023170296102762222, ref_abs_avg=25.127513885498047, test_abs_avg=25.125789642333984
production_forward grad[65] vs paper_forward: mean_abs=0.44329094886779785, max_abs=1.875, mean_rel=0.15366703271865845, max_rel=15.374253273010254, norm_rel=0.021282948553562164, ref_abs_avg=21.373435974121094, test_abs_avg=21.369468688964844
production_forward grad[66] vs paper_forward: mean_abs=0.5592066645622253, max_abs=4.0, mean_rel=0.14836031198501587, max_rel=892.2314453125, norm_rel=0.02258925512433052, ref_abs_avg=24.74059295654297, test_abs_avg=24.742761611938477
production_forward grad[67] vs paper_forward: mean_abs=0.5467325448989868, max_abs=4.0, mean_rel=0.14207591116428375, max_rel=1002.4619140625, norm_rel=0.02266218140721321, ref_abs_avg=24.192153930664062, test_abs_avg=24.19886016845703
production_forward grad[68] vs paper_forward: mean_abs=0.4304618835449219, max_abs=2.0, mean_rel=0.08929311484098434, max_rel=8.423604965209961, norm_rel=0.0224017146974802, ref_abs_avg=19.47371482849121, test_abs_avg=19.518327713012695
production_forward grad[69] vs paper_forward: mean_abs=0.5261807441711426, max_abs=4.0, mean_rel=0.14630231261253357, max_rel=900.5332641601562, norm_rel=0.02243897318840027, ref_abs_avg=23.46450424194336, test_abs_avg=23.466644287109375
production_forward grad[70] vs paper_forward: mean_abs=0.517173171043396, max_abs=3.75, mean_rel=0.14406485855579376, max_rel=750.7918701171875, norm_rel=0.021847397089004517, ref_abs_avg=23.6771240234375, test_abs_avg=23.682052612304688
production_forward grad[71] vs paper_forward: mean_abs=0.43697428703308105, max_abs=1.75, mean_rel=0.07934122532606125, max_rel=8.679762840270996, norm_rel=0.022843411192297935, ref_abs_avg=19.46068572998047, test_abs_avg=19.474782943725586
production_forward grad[72] vs paper_forward: mean_abs=0.5109798908233643, max_abs=3.75, mean_rel=0.15046346187591553, max_rel=833.275390625, norm_rel=0.022122783586382866, ref_abs_avg=23.096134185791016, test_abs_avg=23.096622467041016
production_forward grad[73] vs paper_forward: mean_abs=0.49459463357925415, max_abs=3.5, mean_rel=0.14258143305778503, max_rel=528.8984375, norm_rel=0.021628180518746376, ref_abs_avg=22.868499755859375, test_abs_avg=22.862384796142578
production_forward grad[74] vs paper_forward: mean_abs=0.4723236560821533, max_abs=1.75, mean_rel=0.10864382237195969, max_rel=10.577202796936035, norm_rel=0.023026203736662865, ref_abs_avg=20.740882873535156, test_abs_avg=20.792612075805664
production_forward grad[75] vs paper_forward: mean_abs=0.5635857582092285, max_abs=5.0, mean_rel=0.14675450325012207, max_rel=1340.5513916015625, norm_rel=0.023367326706647873, ref_abs_avg=24.108306884765625, test_abs_avg=24.110950469970703
production_forward grad[76] vs paper_forward: mean_abs=0.5467734336853027, max_abs=4.5, mean_rel=0.15847119688987732, max_rel=1232.1920166015625, norm_rel=0.022940289229154587, ref_abs_avg=23.849597930908203, test_abs_avg=23.847084045410156
production_forward grad[77] vs paper_forward: mean_abs=0.43400800228118896, max_abs=1.6875, mean_rel=0.27671703696250916, max_rel=94.97405242919922, norm_rel=0.02167009748518467, ref_abs_avg=20.5399112701416, test_abs_avg=20.525394439697266
production_forward grad[78] vs paper_forward: mean_abs=0.5198736786842346, max_abs=4.875, mean_rel=0.15859726071357727, max_rel=666.1768798828125, norm_rel=0.022381940856575966, ref_abs_avg=23.15810203552246, test_abs_avg=23.160165786743164
production_forward grad[79] vs paper_forward: mean_abs=0.4957965016365051, max_abs=3.8125, mean_rel=0.1618141382932663, max_rel=781.0682373046875, norm_rel=0.02214241400361061, ref_abs_avg=22.37938690185547, test_abs_avg=22.396671295166016
production_forward grad[80] vs paper_forward: mean_abs=0.38216257095336914, max_abs=1.375, mean_rel=0.11090967804193497, max_rel=18.642837524414062, norm_rel=0.020705237984657288, ref_abs_avg=18.462278366088867, test_abs_avg=18.472084045410156
production_forward grad[81] vs paper_forward: mean_abs=0.47318291664123535, max_abs=4.1875, mean_rel=0.14774373173713684, max_rel=704.0009765625, norm_rel=0.021862870082259178, ref_abs_avg=21.62053871154785, test_abs_avg=21.624034881591797
production_forward grad[82] vs paper_forward: mean_abs=0.4655679762363434, max_abs=4.0, mean_rel=0.14880719780921936, max_rel=954.92041015625, norm_rel=0.022021634504199028, ref_abs_avg=21.191295623779297, test_abs_avg=21.20261573791504
production_forward grad[83] vs paper_forward: mean_abs=0.3649725914001465, max_abs=1.25, mean_rel=0.09465238451957703, max_rel=9.204964637756348, norm_rel=0.02210909314453602, ref_abs_avg=16.474449157714844, test_abs_avg=16.475933074951172
production_forward grad[84] vs paper_forward: mean_abs=0.44840100407600403, max_abs=4.0, mean_rel=0.14360016584396362, max_rel=1157.293212890625, norm_rel=0.021681511774659157, ref_abs_avg=20.692161560058594, test_abs_avg=20.69257926940918
production_forward grad[85] vs paper_forward: mean_abs=0.4362154006958008, max_abs=3.25, mean_rel=0.1373739242553711, max_rel=874.3975830078125, norm_rel=0.020924720913171768, ref_abs_avg=20.76996612548828, test_abs_avg=20.772932052612305
production_forward grad[86] vs paper_forward: mean_abs=0.3473074436187744, max_abs=1.5, mean_rel=0.09637275338172913, max_rel=17.750463485717773, norm_rel=0.0205176193267107, ref_abs_avg=16.890522003173828, test_abs_avg=16.844112396240234
production_forward grad[87] vs paper_forward: mean_abs=0.4172918200492859, max_abs=4.0, mean_rel=0.1365627497434616, max_rel=773.59521484375, norm_rel=0.02075302042067051, ref_abs_avg=20.156517028808594, test_abs_avg=20.157203674316406
production_forward grad[88] vs paper_forward: mean_abs=0.4121657609939575, max_abs=3.796875, mean_rel=0.12905296683311462, max_rel=471.0376892089844, norm_rel=0.020780568942427635, ref_abs_avg=20.019521713256836, test_abs_avg=20.005586624145508
production_forward grad[89] vs paper_forward: mean_abs=0.3408393859863281, max_abs=1.5, mean_rel=0.12645599246025085, max_rel=14.920045852661133, norm_rel=0.0203987006098032, ref_abs_avg=16.71544075012207, test_abs_avg=16.68522834777832
production_forward grad[90] vs paper_forward: mean_abs=0.3961946964263916, max_abs=4.0, mean_rel=0.13044631481170654, max_rel=594.0381469726562, norm_rel=0.020537598058581352, ref_abs_avg=19.403541564941406, test_abs_avg=19.403783798217773
production_forward grad[91] vs paper_forward: mean_abs=0.3842688798904419, max_abs=3.5, mean_rel=0.12136111408472061, max_rel=335.6299133300781, norm_rel=0.02026146464049816, ref_abs_avg=19.11168098449707, test_abs_avg=19.113590240478516
production_forward grad[92] vs paper_forward: mean_abs=0.32644712924957275, max_abs=1.375, mean_rel=0.07573610544204712, max_rel=4.833032131195068, norm_rel=0.02141587994992733, ref_abs_avg=15.52446460723877, test_abs_avg=15.520085334777832
production_forward grad[93] vs paper_forward: mean_abs=0.3722144365310669, max_abs=3.5, mean_rel=0.1226610466837883, max_rel=668.6150512695312, norm_rel=0.02014128677546978, ref_abs_avg=18.639108657836914, test_abs_avg=18.63927459716797
production_forward grad[94] vs paper_forward: mean_abs=0.3703354001045227, max_abs=3.75, mean_rel=0.13033543527126312, max_rel=634.5086669921875, norm_rel=0.020139433443546295, ref_abs_avg=18.561412811279297, test_abs_avg=18.555173873901367
production_forward grad[95] vs paper_forward: mean_abs=0.2998638153076172, max_abs=1.40625, mean_rel=0.1698717176914215, max_rel=16.023876190185547, norm_rel=0.0194186232984066, ref_abs_avg=15.40833854675293, test_abs_avg=15.432969093322754
production_forward grad[96] vs paper_forward: mean_abs=0.35400331020355225, max_abs=4.25, mean_rel=0.128422811627388, max_rel=916.7642822265625, norm_rel=0.01969258114695549, ref_abs_avg=18.250940322875977, test_abs_avg=18.25259780883789
production_forward grad[97] vs paper_forward: mean_abs=0.34582409262657166, max_abs=3.0, mean_rel=0.11763449758291245, max_rel=345.463134765625, norm_rel=0.019388198852539062, ref_abs_avg=18.06781578063965, test_abs_avg=18.06976890563965
torch_compile_phases_forward vs paper_forward output: mean_abs=0.001667745877057314, max_abs=0.0625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008860427886247635, max_abs=0.5625, mean_rel=0.07549554109573364, max_rel=98.87422943115234, norm_rel=0.020696887746453285, ref_abs_avg=0.4632517397403717, test_abs_avg=0.4632631540298462
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.567413330078125, max_abs=54.0, mean_rel=0.265608549118042, max_rel=567.538330078125, norm_rel=0.02082483097910881, ref_abs_avg=319.7028503417969, test_abs_avg=319.7274169921875
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.3031865358352661, max_abs=5.0, mean_rel=0.11052730679512024, max_rel=6.431874752044678, norm_rel=0.025234123691916466, ref_abs_avg=51.78153991699219, test_abs_avg=51.702659606933594
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.5708155632019043, max_abs=10.0, mean_rel=0.1711399257183075, max_rel=2716.18701171875, norm_rel=0.024015257135033607, ref_abs_avg=65.74951934814453, test_abs_avg=65.75206756591797
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5331566333770752, max_abs=9.5, mean_rel=0.18634280562400818, max_rel=2070.55908203125, norm_rel=0.023744897916913033, ref_abs_avg=64.91040802001953, test_abs_avg=64.90675354003906
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.202406883239746, max_abs=4.25, mean_rel=0.12134503573179245, max_rel=10.836397171020508, norm_rel=0.024871785193681717, ref_abs_avg=47.477569580078125, test_abs_avg=47.47572326660156
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.374287486076355, max_abs=8.25, mean_rel=0.18074378371238708, max_rel=1785.703369140625, norm_rel=0.023688865825533867, ref_abs_avg=58.29741668701172, test_abs_avg=58.29857635498047
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3294564485549927, max_abs=9.0, mean_rel=0.16063839197158813, max_rel=1525.7784423828125, norm_rel=0.023337146267294884, ref_abs_avg=57.251365661621094, test_abs_avg=57.26171112060547
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0304920673370361, max_abs=4.75, mean_rel=0.4715733826160431, max_rel=185.0053253173828, norm_rel=0.024107545614242554, ref_abs_avg=45.542484283447266, test_abs_avg=45.59440612792969
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.2454924583435059, max_abs=8.0, mean_rel=0.1741131842136383, max_rel=3056.560302734375, norm_rel=0.023464320227503777, ref_abs_avg=53.35002136230469, test_abs_avg=53.35222625732422
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.2088892459869385, max_abs=8.0, mean_rel=0.14994299411773682, max_rel=1275.755859375, norm_rel=0.023107770830392838, ref_abs_avg=52.57612609863281, test_abs_avg=52.57414627075195
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9743947982788086, max_abs=4.0, mean_rel=0.1070394217967987, max_rel=6.162259101867676, norm_rel=0.023328591138124466, ref_abs_avg=41.29027557373047, test_abs_avg=41.323402404785156
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.1608881950378418, max_abs=7.0, mean_rel=0.17493095993995667, max_rel=2475.313232421875, norm_rel=0.023293128237128258, ref_abs_avg=50.01451873779297, test_abs_avg=50.018218994140625
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1314271688461304, max_abs=6.5, mean_rel=0.157230943441391, max_rel=1296.4072265625, norm_rel=0.023031694814562798, ref_abs_avg=49.40098571777344, test_abs_avg=49.405303955078125
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8545126914978027, max_abs=3.75, mean_rel=0.15090452134609222, max_rel=35.773921966552734, norm_rel=0.022743752226233482, ref_abs_avg=38.331520080566406, test_abs_avg=38.320682525634766
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.0869684219360352, max_abs=7.25, mean_rel=0.17331290245056152, max_rel=2065.590576171875, norm_rel=0.023095933720469475, ref_abs_avg=47.30946350097656, test_abs_avg=47.310028076171875
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0590119361877441, max_abs=6.5, mean_rel=0.17874398827552795, max_rel=2263.117919921875, norm_rel=0.0227699875831604, ref_abs_avg=46.737037658691406, test_abs_avg=46.73752212524414
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8254387974739075, max_abs=3.0, mean_rel=0.4173988997936249, max_rel=98.41093444824219, norm_rel=0.02141418680548668, ref_abs_avg=38.618812561035156, test_abs_avg=38.587501525878906
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0220370292663574, max_abs=6.5, mean_rel=0.15622320771217346, max_rel=1082.5218505859375, norm_rel=0.022883307188749313, ref_abs_avg=44.79883575439453, test_abs_avg=44.802764892578125
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9924474954605103, max_abs=6.0, mean_rel=0.15615178644657135, max_rel=1320.9344482421875, norm_rel=0.022684868425130844, ref_abs_avg=43.99592590332031, test_abs_avg=44.004825592041016
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.8113508224487305, max_abs=3.0, mean_rel=0.14488285779953003, max_rel=12.102205276489258, norm_rel=0.023146606981754303, ref_abs_avg=35.170310974121094, test_abs_avg=35.19209289550781
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9633204340934753, max_abs=6.0, mean_rel=0.16686910390853882, max_rel=2180.131103515625, norm_rel=0.022782014682888985, ref_abs_avg=42.448646545410156, test_abs_avg=42.44930648803711
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9382060766220093, max_abs=5.75, mean_rel=0.170835942029953, max_rel=3092.641845703125, norm_rel=0.022359779104590416, ref_abs_avg=42.17488098144531, test_abs_avg=42.173465728759766
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7780587673187256, max_abs=2.75, mean_rel=0.23474034667015076, max_rel=83.02401733398438, norm_rel=0.022856662049889565, ref_abs_avg=34.842926025390625, test_abs_avg=34.88314437866211
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9168771505355835, max_abs=6.5, mean_rel=0.1551147997379303, max_rel=1492.29150390625, norm_rel=0.022612974047660828, ref_abs_avg=40.71876525878906, test_abs_avg=40.71952438354492
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8900942802429199, max_abs=6.25, mean_rel=0.13552239537239075, max_rel=870.0900268554688, norm_rel=0.02200891077518463, ref_abs_avg=40.65659713745117, test_abs_avg=40.652931213378906
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.848466694355011, max_abs=3.5625, mean_rel=0.2286805361509323, max_rel=46.283172607421875, norm_rel=0.02411198802292347, ref_abs_avg=36.39126205444336, test_abs_avg=36.351165771484375
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.0724046230316162, max_abs=6.875, mean_rel=0.16409409046173096, max_rel=1578.5382080078125, norm_rel=0.0246890839189291, ref_abs_avg=43.602325439453125, test_abs_avg=43.60235595703125
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0535991191864014, max_abs=6.5, mean_rel=0.16831724345684052, max_rel=1165.4898681640625, norm_rel=0.024345751851797104, ref_abs_avg=43.521766662597656, test_abs_avg=43.515724182128906
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.8415961265563965, max_abs=3.625, mean_rel=0.1274239718914032, max_rel=16.20159912109375, norm_rel=0.02480754628777504, ref_abs_avg=34.1488151550293, test_abs_avg=34.100616455078125
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.004536509513855, max_abs=6.78125, mean_rel=0.1772502213716507, max_rel=2300.4287109375, norm_rel=0.02483634278178215, ref_abs_avg=40.55364990234375, test_abs_avg=40.55366516113281
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.984130322933197, max_abs=6.125, mean_rel=0.16784580051898956, max_rel=1068.2540283203125, norm_rel=0.02486734464764595, ref_abs_avg=39.72119903564453, test_abs_avg=39.71841049194336
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7165975570678711, max_abs=2.4375, mean_rel=0.08158450573682785, max_rel=3.743345260620117, norm_rel=0.022970495745539665, ref_abs_avg=30.953411102294922, test_abs_avg=30.951587677001953
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9280308485031128, max_abs=6.375, mean_rel=0.17531949281692505, max_rel=2193.1484375, norm_rel=0.024776732549071312, ref_abs_avg=37.55168914794922, test_abs_avg=37.55144500732422
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.9103007316589355, max_abs=5.47265625, mean_rel=0.15415120124816895, max_rel=740.45947265625, norm_rel=0.024655818939208984, ref_abs_avg=37.02279281616211, test_abs_avg=37.02351379394531
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.745044469833374, max_abs=3.5, mean_rel=0.22244277596473694, max_rel=73.12920379638672, norm_rel=0.025217076763510704, ref_abs_avg=30.34878158569336, test_abs_avg=30.336624145507812
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.870812177658081, max_abs=5.5, mean_rel=0.17772532999515533, max_rel=1893.807861328125, norm_rel=0.024614367634058, ref_abs_avg=35.463897705078125, test_abs_avg=35.46495819091797
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8556492924690247, max_abs=5.5, mean_rel=0.15620088577270508, max_rel=1033.506103515625, norm_rel=0.024659497663378716, ref_abs_avg=34.797122955322266, test_abs_avg=34.79609680175781
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6386375427246094, max_abs=2.5, mean_rel=0.07558880746364594, max_rel=2.69138765335083, norm_rel=0.023685507476329803, ref_abs_avg=27.601783752441406, test_abs_avg=27.623123168945312
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8205389976501465, max_abs=5.0, mean_rel=0.16812962293624878, max_rel=1343.6990966796875, norm_rel=0.024367986246943474, ref_abs_avg=33.75590515136719, test_abs_avg=33.75505828857422
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.8052850961685181, max_abs=5.0, mean_rel=0.16378575563430786, max_rel=762.2932739257812, norm_rel=0.024113712832331657, ref_abs_avg=33.50721740722656, test_abs_avg=33.506248474121094
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6276065111160278, max_abs=2.5, mean_rel=0.17003269493579865, max_rel=23.332632064819336, norm_rel=0.025162870064377785, ref_abs_avg=25.453895568847656, test_abs_avg=25.473979949951172
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7706257104873657, max_abs=5.3125, mean_rel=0.1671084463596344, max_rel=1089.4334716796875, norm_rel=0.0240496639162302, ref_abs_avg=32.10530471801758, test_abs_avg=32.106475830078125
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7611961960792542, max_abs=5.0, mean_rel=0.16239914298057556, max_rel=1323.1522216796875, norm_rel=0.023819759488105774, ref_abs_avg=32.020896911621094, test_abs_avg=32.01490020751953
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.6448068618774414, max_abs=2.4375, mean_rel=0.1281200647354126, max_rel=7.044826984405518, norm_rel=0.02509603090584278, ref_abs_avg=25.92569351196289, test_abs_avg=25.896448135375977
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.738348126411438, max_abs=5.0, mean_rel=0.15411582589149475, max_rel=1301.482666015625, norm_rel=0.02353176474571228, ref_abs_avg=31.374011993408203, test_abs_avg=31.376256942749023
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.721059262752533, max_abs=4.25, mean_rel=0.15019385516643524, max_rel=1641.9093017578125, norm_rel=0.023306522518396378, ref_abs_avg=31.00945281982422, test_abs_avg=31.01165008544922
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5905985832214355, max_abs=2.6875, mean_rel=0.1374853551387787, max_rel=20.569622039794922, norm_rel=0.02376209944486618, ref_abs_avg=25.33816909790039, test_abs_avg=25.355722427368164
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.6980623006820679, max_abs=5.25, mean_rel=0.15213733911514282, max_rel=763.9780883789062, norm_rel=0.023349637165665627, ref_abs_avg=29.87822723388672, test_abs_avg=29.87993049621582
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.690363883972168, max_abs=4.5, mean_rel=0.1683500111103058, max_rel=1136.6705322265625, norm_rel=0.02342938631772995, ref_abs_avg=29.538257598876953, test_abs_avg=29.54256820678711
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.627662181854248, max_abs=2.75, mean_rel=0.06488049030303955, max_rel=2.495436906814575, norm_rel=0.02450259029865265, ref_abs_avg=26.228256225585938, test_abs_avg=26.210281372070312
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.785698413848877, max_abs=5.25, mean_rel=0.17014867067337036, max_rel=1509.7220458984375, norm_rel=0.02519429475069046, ref_abs_avg=31.264230728149414, test_abs_avg=31.266050338745117
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7713805437088013, max_abs=5.0, mean_rel=0.1669187843799591, max_rel=1610.71240234375, norm_rel=0.02472553215920925, ref_abs_avg=31.309131622314453, test_abs_avg=31.3051700592041
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.6113376617431641, max_abs=2.25, mean_rel=0.10769528150558472, max_rel=14.385651588439941, norm_rel=0.026359841227531433, ref_abs_avg=23.878612518310547, test_abs_avg=23.912179946899414
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7233974933624268, max_abs=4.75, mean_rel=0.15817159414291382, max_rel=838.4029541015625, norm_rel=0.02463892661035061, ref_abs_avg=29.376312255859375, test_abs_avg=29.377601623535156
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.710113525390625, max_abs=4.75, mean_rel=0.15911906957626343, max_rel=894.635009765625, norm_rel=0.024457428604364395, ref_abs_avg=29.06734848022461, test_abs_avg=29.071094512939453
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5447981357574463, max_abs=2.375, mean_rel=0.17467862367630005, max_rel=37.81766891479492, norm_rel=0.02338765375316143, ref_abs_avg=23.136579513549805, test_abs_avg=23.211631774902344
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6761747598648071, max_abs=4.625, mean_rel=0.16834503412246704, max_rel=1425.9134521484375, norm_rel=0.024359259754419327, ref_abs_avg=27.801513671875, test_abs_avg=27.802248001098633
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.663684606552124, max_abs=4.25, mean_rel=0.15765345096588135, max_rel=851.1920776367188, norm_rel=0.024193331599235535, ref_abs_avg=27.49233627319336, test_abs_avg=27.492321014404297
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5247163772583008, max_abs=1.9375, mean_rel=0.08614811301231384, max_rel=3.6618525981903076, norm_rel=0.023717796429991722, ref_abs_avg=22.122207641601562, test_abs_avg=22.1055965423584
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.632275402545929, max_abs=5.0, mean_rel=0.16394147276878357, max_rel=810.5548095703125, norm_rel=0.023664211854338646, ref_abs_avg=26.680299758911133, test_abs_avg=26.682323455810547
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.6221229434013367, max_abs=4.375, mean_rel=0.15159407258033752, max_rel=925.6534423828125, norm_rel=0.023510348051786423, ref_abs_avg=26.463903427124023, test_abs_avg=26.47450065612793
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.5061162710189819, max_abs=1.75, mean_rel=0.10664943605661392, max_rel=14.3228759765625, norm_rel=0.02467581070959568, ref_abs_avg=20.49696159362793, test_abs_avg=20.491107940673828
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.6011306643486023, max_abs=4.5625, mean_rel=0.16126513481140137, max_rel=1212.3885498046875, norm_rel=0.023484092205762863, ref_abs_avg=25.55374526977539, test_abs_avg=25.553510665893555
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5866578817367554, max_abs=4.0, mean_rel=0.15278837084770203, max_rel=818.352783203125, norm_rel=0.0234063770622015, ref_abs_avg=25.127513885498047, test_abs_avg=25.126277923583984
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.45010876655578613, max_abs=2.0, mean_rel=0.09314295649528503, max_rel=6.4563164710998535, norm_rel=0.021649757400155067, ref_abs_avg=21.373435974121094, test_abs_avg=21.356515884399414
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5653481483459473, max_abs=4.0625, mean_rel=0.1498694121837616, max_rel=958.1089477539062, norm_rel=0.022852633148431778, ref_abs_avg=24.74059295654297, test_abs_avg=24.742290496826172
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5514698624610901, max_abs=3.75, mean_rel=0.14418113231658936, max_rel=752.8997192382812, norm_rel=0.022867275401949883, ref_abs_avg=24.192153930664062, test_abs_avg=24.197864532470703
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.43216896057128906, max_abs=2.0, mean_rel=0.09098407626152039, max_rel=9.305328369140625, norm_rel=0.022227562963962555, ref_abs_avg=19.47371482849121, test_abs_avg=19.51595687866211
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5306707620620728, max_abs=3.625, mean_rel=0.14889663457870483, max_rel=750.5245971679688, norm_rel=0.022623885422945023, ref_abs_avg=23.46450424194336, test_abs_avg=23.467105865478516
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5235695838928223, max_abs=3.5, mean_rel=0.14563320577144623, max_rel=839.0811767578125, norm_rel=0.02212703414261341, ref_abs_avg=23.6771240234375, test_abs_avg=23.678241729736328
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.41899871826171875, max_abs=1.875, mean_rel=0.08664338290691376, max_rel=14.333029747009277, norm_rel=0.022128121927380562, ref_abs_avg=19.46068572998047, test_abs_avg=19.473846435546875
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5155200958251953, max_abs=3.5, mean_rel=0.14923067390918732, max_rel=933.0415649414062, norm_rel=0.022292450070381165, ref_abs_avg=23.096134185791016, test_abs_avg=23.097427368164062
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.498475044965744, max_abs=3.5, mean_rel=0.14159783720970154, max_rel=554.0245361328125, norm_rel=0.021801112219691277, ref_abs_avg=22.868499755859375, test_abs_avg=22.86479949951172
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.47229909896850586, max_abs=1.890625, mean_rel=0.0994129404425621, max_rel=6.724338531494141, norm_rel=0.023291822522878647, ref_abs_avg=20.740882873535156, test_abs_avg=20.787338256835938
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5698763728141785, max_abs=4.5, mean_rel=0.14839869737625122, max_rel=1777.0108642578125, norm_rel=0.023606998845934868, ref_abs_avg=24.108306884765625, test_abs_avg=24.111661911010742
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5556427836418152, max_abs=4.5, mean_rel=0.15667687356472015, max_rel=1025.624755859375, norm_rel=0.02331918105483055, ref_abs_avg=23.849597930908203, test_abs_avg=23.845420837402344
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.4323927164077759, max_abs=1.875, mean_rel=0.13289694488048553, max_rel=18.547657012939453, norm_rel=0.021462319418787956, ref_abs_avg=20.5399112701416, test_abs_avg=20.538293838500977
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5252096652984619, max_abs=4.75, mean_rel=0.15883812308311462, max_rel=708.7962646484375, norm_rel=0.022593602538108826, ref_abs_avg=23.15810203552246, test_abs_avg=23.160240173339844
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.5019298791885376, max_abs=4.0625, mean_rel=0.1615845263004303, max_rel=718.995849609375, norm_rel=0.022431667894124985, ref_abs_avg=22.37938690185547, test_abs_avg=22.393661499023438
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.3930068016052246, max_abs=1.75, mean_rel=0.11132191121578217, max_rel=21.984525680541992, norm_rel=0.021621987223625183, ref_abs_avg=18.462278366088867, test_abs_avg=18.494609832763672
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.477117121219635, max_abs=4.26171875, mean_rel=0.14653527736663818, max_rel=802.5789184570312, norm_rel=0.022031618282198906, ref_abs_avg=21.62053871154785, test_abs_avg=21.624235153198242
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4697335958480835, max_abs=4.0, mean_rel=0.14924903213977814, max_rel=750.751708984375, norm_rel=0.022249411791563034, ref_abs_avg=21.191295623779297, test_abs_avg=21.198936462402344
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.36609840393066406, max_abs=1.34375, mean_rel=0.10548914223909378, max_rel=11.111226081848145, norm_rel=0.022155143320560455, ref_abs_avg=16.474449157714844, test_abs_avg=16.49014663696289
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.45105141401290894, max_abs=4.5, mean_rel=0.1463400423526764, max_rel=1024.5228271484375, norm_rel=0.021809294819831848, ref_abs_avg=20.692161560058594, test_abs_avg=20.693519592285156
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.4410425126552582, max_abs=4.0, mean_rel=0.1360759139060974, max_rel=593.5035400390625, norm_rel=0.021192051470279694, ref_abs_avg=20.76996612548828, test_abs_avg=20.779062271118164
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.3475637435913086, max_abs=1.671875, mean_rel=0.08804166316986084, max_rel=9.193105697631836, norm_rel=0.021117765456438065, ref_abs_avg=16.890522003173828, test_abs_avg=16.84868621826172
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.419735848903656, max_abs=3.75, mean_rel=0.13542956113815308, max_rel=816.4420776367188, norm_rel=0.020858658477663994, ref_abs_avg=20.156517028808594, test_abs_avg=20.157852172851562
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.4197811782360077, max_abs=3.375, mean_rel=0.13271228969097137, max_rel=475.2853698730469, norm_rel=0.021097859367728233, ref_abs_avg=20.019521713256836, test_abs_avg=20.001731872558594
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.3254871368408203, max_abs=1.5, mean_rel=0.14884069561958313, max_rel=24.343233108520508, norm_rel=0.019825957715511322, ref_abs_avg=16.71544075012207, test_abs_avg=16.702239990234375
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.39799559116363525, max_abs=4.25, mean_rel=0.13066279888153076, max_rel=835.6368408203125, norm_rel=0.020611176267266273, ref_abs_avg=19.403541564941406, test_abs_avg=19.40442657470703
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.3828899562358856, max_abs=3.5, mean_rel=0.12368260324001312, max_rel=346.12860107421875, norm_rel=0.020124003291130066, ref_abs_avg=19.11168098449707, test_abs_avg=19.113548278808594
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.33109569549560547, max_abs=1.25, mean_rel=0.06915919482707977, max_rel=3.261108875274658, norm_rel=0.021352870389819145, ref_abs_avg=15.52446460723877, test_abs_avg=15.527841567993164
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.37348097562789917, max_abs=3.25, mean_rel=0.12404264509677887, max_rel=588.04931640625, norm_rel=0.020207472145557404, ref_abs_avg=18.639108657836914, test_abs_avg=18.63943099975586
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.37181246280670166, max_abs=4.0, mean_rel=0.12950852513313293, max_rel=603.719970703125, norm_rel=0.0203104130923748, ref_abs_avg=18.561412811279297, test_abs_avg=18.55459213256836
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.3039358854293823, max_abs=1.34375, mean_rel=0.17401008307933807, max_rel=20.22771453857422, norm_rel=0.019736479967832565, ref_abs_avg=15.40833854675293, test_abs_avg=15.433759689331055
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3547913730144501, max_abs=4.5, mean_rel=0.12860357761383057, max_rel=808.2271118164062, norm_rel=0.019729649648070335, ref_abs_avg=18.250940322875977, test_abs_avg=18.25257110595703
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.3438899517059326, max_abs=3.9375, mean_rel=0.11723987758159637, max_rel=372.18560791015625, norm_rel=0.01932443305850029, ref_abs_avg=18.06781578063965, test_abs_avg=18.07026481628418
identity layers + randn queries
torch_compile_phases_forward fwd+bwd:  94.912 ms
torch_compile_phases_forward bwd-only: 76.458 ms
torch_compile_phases_forward peak allocated: fwd=18.203 GiB, fwd+bwd=18.831 GiB
torch_compile_phases_forward peak reserved:  fwd=27.398 GiB, fwd+bwd=27.398 GiB
paper_forward fwd+bwd:  221.169 ms
paper_forward bwd-only: 173.977 ms
paper_forward peak allocated: fwd=35.128 GiB, fwd+bwd=37.247 GiB
paper_forward peak reserved:  fwd=36.166 GiB, fwd+bwd=38.666 GiB
production_forward fwd+bwd:  66.269 ms
production_forward bwd-only: 56.452 ms
production_forward peak allocated: fwd=7.614 GiB, fwd+bwd=15.618 GiB
production_forward peak reserved:  fwd=27.369 GiB, fwd+bwd=27.369 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0015977541916072369, max_abs=0.05078125
production_forward grad[0] vs paper_forward: mean_abs=0.008167203515768051, max_abs=0.4453125, mean_rel=0.07214279472827911, max_rel=111.38794708251953, norm_rel=0.019791405647993088, ref_abs_avg=0.4472152292728424, test_abs_avg=0.44722840189933777
production_forward grad[1] vs paper_forward: mean_abs=7.090784549713135, max_abs=48.0, mean_rel=0.15776444971561432, max_rel=275.8760681152344, norm_rel=0.020244434475898743, ref_abs_avg=309.1185607910156, test_abs_avg=309.1405334472656
production_forward grad[2] vs paper_forward: mean_abs=1.172018051147461, max_abs=4.75, mean_rel=0.10754811018705368, max_rel=13.383315086364746, norm_rel=0.02331123687326908, ref_abs_avg=51.0189094543457, test_abs_avg=50.969093322753906
production_forward grad[3] vs paper_forward: mean_abs=1.4840878248214722, max_abs=9.953125, mean_rel=0.17294615507125854, max_rel=3093.509521484375, norm_rel=0.023097580298781395, ref_abs_avg=64.55804443359375, test_abs_avg=64.55648040771484
production_forward grad[4] vs paper_forward: mean_abs=1.4399341344833374, max_abs=9.25, mean_rel=0.14459505677223206, max_rel=863.2056884765625, norm_rel=0.022886069491505623, ref_abs_avg=63.32072448730469, test_abs_avg=63.32365417480469
production_forward grad[5] vs paper_forward: mean_abs=1.1639957427978516, max_abs=3.75, mean_rel=0.08288787305355072, max_rel=4.939473628997803, norm_rel=0.02406454272568226, ref_abs_avg=47.978172302246094, test_abs_avg=48.00908660888672
production_forward grad[6] vs paper_forward: mean_abs=1.2833302021026611, max_abs=8.25, mean_rel=0.15157011151313782, max_rel=1003.3912963867188, norm_rel=0.02278510108590126, ref_abs_avg=56.587257385253906, test_abs_avg=56.58577346801758
production_forward grad[7] vs paper_forward: mean_abs=1.2502455711364746, max_abs=7.5, mean_rel=0.16259756684303284, max_rel=2230.168701171875, norm_rel=0.022647323086857796, ref_abs_avg=55.575618743896484, test_abs_avg=55.5749626159668
production_forward grad[8] vs paper_forward: mean_abs=0.9858608245849609, max_abs=4.0, mean_rel=0.5147910118103027, max_rel=200.07200622558594, norm_rel=0.022538481280207634, ref_abs_avg=42.72745132446289, test_abs_avg=42.665802001953125
production_forward grad[9] vs paper_forward: mean_abs=1.1631006002426147, max_abs=8.0, mean_rel=0.16267520189285278, max_rel=3500.30810546875, norm_rel=0.02262093871831894, ref_abs_avg=51.665409088134766, test_abs_avg=51.6677360534668
production_forward grad[10] vs paper_forward: mean_abs=1.1385966539382935, max_abs=7.875, mean_rel=0.157988041639328, max_rel=1309.734619140625, norm_rel=0.02245917171239853, ref_abs_avg=50.978118896484375, test_abs_avg=50.97285079956055
production_forward grad[11] vs paper_forward: mean_abs=0.8680672645568848, max_abs=3.125, mean_rel=0.1614493727684021, max_rel=26.864818572998047, norm_rel=0.02203613519668579, ref_abs_avg=39.098716735839844, test_abs_avg=39.18687057495117
production_forward grad[12] vs paper_forward: mean_abs=1.077020287513733, max_abs=8.0, mean_rel=0.14755424857139587, max_rel=1722.5042724609375, norm_rel=0.022449376061558723, ref_abs_avg=48.23442840576172, test_abs_avg=48.238059997558594
production_forward grad[13] vs paper_forward: mean_abs=1.0432857275009155, max_abs=6.5, mean_rel=0.1643766164779663, max_rel=1646.15673828125, norm_rel=0.02209310233592987, ref_abs_avg=47.39921951293945, test_abs_avg=47.40217971801758
production_forward grad[14] vs paper_forward: mean_abs=0.829566478729248, max_abs=3.0, mean_rel=0.16138239204883575, max_rel=49.87189483642578, norm_rel=0.021121470257639885, ref_abs_avg=39.232025146484375, test_abs_avg=39.24961471557617
production_forward grad[15] vs paper_forward: mean_abs=0.9993599653244019, max_abs=6.0, mean_rel=0.15401317179203033, max_rel=1265.3563232421875, norm_rel=0.022279823198914528, ref_abs_avg=45.07920837402344, test_abs_avg=45.079471588134766
production_forward grad[16] vs paper_forward: mean_abs=0.9766547083854675, max_abs=5.5, mean_rel=0.1515088677406311, max_rel=899.2672119140625, norm_rel=0.022018112242221832, ref_abs_avg=44.606109619140625, test_abs_avg=44.60919189453125
production_forward grad[17] vs paper_forward: mean_abs=0.7883419990539551, max_abs=3.0, mean_rel=0.0676390528678894, max_rel=2.8446593284606934, norm_rel=0.02215452678501606, ref_abs_avg=35.749332427978516, test_abs_avg=35.74151611328125
production_forward grad[18] vs paper_forward: mean_abs=0.9445236921310425, max_abs=5.8125, mean_rel=0.16195064783096313, max_rel=2136.938720703125, norm_rel=0.022214101627469063, ref_abs_avg=42.722877502441406, test_abs_avg=42.728973388671875
production_forward grad[19] vs paper_forward: mean_abs=0.9217382073402405, max_abs=6.0, mean_rel=0.1567838191986084, max_rel=1412.5279541015625, norm_rel=0.021954189985990524, ref_abs_avg=42.209068298339844, test_abs_avg=42.21091842651367
production_forward grad[20] vs paper_forward: mean_abs=0.6932201385498047, max_abs=3.0, mean_rel=0.08435606956481934, max_rel=4.155585289001465, norm_rel=0.021061871200799942, ref_abs_avg=33.044219970703125, test_abs_avg=33.073429107666016
production_forward grad[21] vs paper_forward: mean_abs=0.8986415863037109, max_abs=5.5, mean_rel=0.14653220772743225, max_rel=1196.228271484375, norm_rel=0.021995076909661293, ref_abs_avg=41.023563385009766, test_abs_avg=41.02836227416992
production_forward grad[22] vs paper_forward: mean_abs=0.8731679916381836, max_abs=5.0, mean_rel=0.15774491429328918, max_rel=981.7335815429688, norm_rel=0.02169143036007881, ref_abs_avg=40.420135498046875, test_abs_avg=40.421539306640625
production_forward grad[23] vs paper_forward: mean_abs=0.6914491653442383, max_abs=2.5, mean_rel=0.1573675572872162, max_rel=23.795623779296875, norm_rel=0.02099580317735672, ref_abs_avg=31.935867309570312, test_abs_avg=31.981718063354492
production_forward grad[24] vs paper_forward: mean_abs=0.8527892827987671, max_abs=5.6875, mean_rel=0.1393432319164276, max_rel=632.293212890625, norm_rel=0.021787134930491447, ref_abs_avg=39.2745475769043, test_abs_avg=39.27585220336914
production_forward grad[25] vs paper_forward: mean_abs=0.8314722776412964, max_abs=5.625, mean_rel=0.13799259066581726, max_rel=851.787353515625, norm_rel=0.021708054468035698, ref_abs_avg=38.44120788574219, test_abs_avg=38.44970703125
production_forward grad[26] vs paper_forward: mean_abs=0.8193533420562744, max_abs=4.0, mean_rel=0.21081393957138062, max_rel=42.64678955078125, norm_rel=0.023453738540410995, ref_abs_avg=34.35024642944336, test_abs_avg=34.30268096923828
production_forward grad[27] vs paper_forward: mean_abs=0.9916075468063354, max_abs=8.5, mean_rel=0.16060376167297363, max_rel=1560.7369384765625, norm_rel=0.02403956465423107, ref_abs_avg=41.43635177612305, test_abs_avg=41.436988830566406
production_forward grad[28] vs paper_forward: mean_abs=0.9782524108886719, max_abs=6.5, mean_rel=0.15160606801509857, max_rel=1259.6029052734375, norm_rel=0.024105912074446678, ref_abs_avg=40.77674102783203, test_abs_avg=40.77757263183594
production_forward grad[29] vs paper_forward: mean_abs=0.7632508277893066, max_abs=2.75, mean_rel=0.10425563156604767, max_rel=19.908588409423828, norm_rel=0.024756044149398804, ref_abs_avg=30.481657028198242, test_abs_avg=30.461669921875
production_forward grad[30] vs paper_forward: mean_abs=0.927750825881958, max_abs=6.125, mean_rel=0.19243843853473663, max_rel=2371.4404296875, norm_rel=0.024430785328149796, ref_abs_avg=38.1173095703125, test_abs_avg=38.12151336669922
production_forward grad[31] vs paper_forward: mean_abs=0.9177513122558594, max_abs=6.0625, mean_rel=0.19484445452690125, max_rel=1679.872802734375, norm_rel=0.024231068789958954, ref_abs_avg=38.00283432006836, test_abs_avg=38.00244140625
production_forward grad[32] vs paper_forward: mean_abs=0.6848773956298828, max_abs=2.5, mean_rel=0.09997083246707916, max_rel=5.077687740325928, norm_rel=0.022767843678593636, ref_abs_avg=30.03038787841797, test_abs_avg=30.00863265991211
production_forward grad[33] vs paper_forward: mean_abs=0.8617587089538574, max_abs=5.890625, mean_rel=0.17262309789657593, max_rel=2056.419677734375, norm_rel=0.024236170575022697, ref_abs_avg=35.65735626220703, test_abs_avg=35.65928649902344
production_forward grad[34] vs paper_forward: mean_abs=0.8500055074691772, max_abs=6.421875, mean_rel=0.16095712780952454, max_rel=806.178466796875, norm_rel=0.024226751178503036, ref_abs_avg=35.16852569580078, test_abs_avg=35.1732063293457
production_forward grad[35] vs paper_forward: mean_abs=0.637272834777832, max_abs=2.515625, mean_rel=0.07845056802034378, max_rel=5.008347511291504, norm_rel=0.022943170741200447, ref_abs_avg=28.641633987426758, test_abs_avg=28.624122619628906
production_forward grad[36] vs paper_forward: mean_abs=0.8084390759468079, max_abs=5.5, mean_rel=0.15882275998592377, max_rel=1004.2057495117188, norm_rel=0.0240686796605587, ref_abs_avg=33.666404724121094, test_abs_avg=33.66859436035156
production_forward grad[37] vs paper_forward: mean_abs=0.7930560111999512, max_abs=5.0, mean_rel=0.14785811305046082, max_rel=675.1695556640625, norm_rel=0.02410055696964264, ref_abs_avg=33.05060577392578, test_abs_avg=33.05577087402344
production_forward grad[38] vs paper_forward: mean_abs=0.605743408203125, max_abs=2.28125, mean_rel=0.20288226008415222, max_rel=38.512290954589844, norm_rel=0.02234850637614727, ref_abs_avg=27.363842010498047, test_abs_avg=27.331768035888672
production_forward grad[39] vs paper_forward: mean_abs=0.7606208324432373, max_abs=4.75, mean_rel=0.16199246048927307, max_rel=1403.1070556640625, norm_rel=0.02381058596074581, ref_abs_avg=32.018619537353516, test_abs_avg=32.019798278808594
production_forward grad[40] vs paper_forward: mean_abs=0.7443884611129761, max_abs=4.75, mean_rel=0.16028308868408203, max_rel=708.9313354492188, norm_rel=0.02367492765188217, ref_abs_avg=31.559024810791016, test_abs_avg=31.55914306640625
production_forward grad[41] vs paper_forward: mean_abs=0.6229772567749023, max_abs=2.5, mean_rel=0.3288368284702301, max_rel=99.59901428222656, norm_rel=0.024449389427900314, ref_abs_avg=25.5723819732666, test_abs_avg=25.581748962402344
production_forward grad[42] vs paper_forward: mean_abs=0.7211810350418091, max_abs=4.75, mean_rel=0.15760046243667603, max_rel=1072.0604248046875, norm_rel=0.02363482490181923, ref_abs_avg=30.640249252319336, test_abs_avg=30.640033721923828
production_forward grad[43] vs paper_forward: mean_abs=0.7104966044425964, max_abs=4.75, mean_rel=0.1641843318939209, max_rel=1270.2860107421875, norm_rel=0.023714736104011536, ref_abs_avg=30.083797454833984, test_abs_avg=30.086711883544922
production_forward grad[44] vs paper_forward: mean_abs=0.5590126514434814, max_abs=2.25, mean_rel=0.11583127081394196, max_rel=6.129013538360596, norm_rel=0.02396344393491745, ref_abs_avg=23.25376319885254, test_abs_avg=23.217296600341797
production_forward grad[45] vs paper_forward: mean_abs=0.6852697730064392, max_abs=4.5, mean_rel=0.16397184133529663, max_rel=1410.5860595703125, norm_rel=0.023410074412822723, ref_abs_avg=29.347599029541016, test_abs_avg=29.349735260009766
production_forward grad[46] vs paper_forward: mean_abs=0.67252516746521, max_abs=4.25, mean_rel=0.1633007824420929, max_rel=1469.8837890625, norm_rel=0.023352494463324547, ref_abs_avg=28.885866165161133, test_abs_avg=28.891502380371094
production_forward grad[47] vs paper_forward: mean_abs=0.5128335952758789, max_abs=1.75, mean_rel=0.23893696069717407, max_rel=26.09225845336914, norm_rel=0.021225033327937126, ref_abs_avg=23.753896713256836, test_abs_avg=23.763208389282227
production_forward grad[48] vs paper_forward: mean_abs=0.651338517665863, max_abs=4.5, mean_rel=0.1513720452785492, max_rel=1002.0753173828125, norm_rel=0.02328955940902233, ref_abs_avg=28.03961181640625, test_abs_avg=28.037841796875
production_forward grad[49] vs paper_forward: mean_abs=0.6436254382133484, max_abs=4.5, mean_rel=0.1562505066394806, max_rel=1281.759765625, norm_rel=0.02309444732964039, ref_abs_avg=27.934171676635742, test_abs_avg=27.93688201904297
production_forward grad[50] vs paper_forward: mean_abs=0.598461389541626, max_abs=2.75, mean_rel=0.08888791501522064, max_rel=5.6058220863342285, norm_rel=0.02428632602095604, ref_abs_avg=25.23055648803711, test_abs_avg=25.304351806640625
production_forward grad[51] vs paper_forward: mean_abs=0.7280669212341309, max_abs=4.5, mean_rel=0.14993427693843842, max_rel=823.0374755859375, norm_rel=0.024201802909374237, ref_abs_avg=30.169279098510742, test_abs_avg=30.17022132873535
production_forward grad[52] vs paper_forward: mean_abs=0.7101314067840576, max_abs=4.25, mean_rel=0.1530064046382904, max_rel=899.01025390625, norm_rel=0.023767441511154175, ref_abs_avg=29.902870178222656, test_abs_avg=29.901079177856445
production_forward grad[53] vs paper_forward: mean_abs=0.5552911758422852, max_abs=2.5, mean_rel=0.21638208627700806, max_rel=53.41718673706055, norm_rel=0.025006171315908432, ref_abs_avg=22.34347152709961, test_abs_avg=22.31334686279297
production_forward grad[54] vs paper_forward: mean_abs=0.6625850796699524, max_abs=4.75, mean_rel=0.16522981226444244, max_rel=2514.39208984375, norm_rel=0.023827897384762764, ref_abs_avg=27.83119773864746, test_abs_avg=27.831161499023438
production_forward grad[55] vs paper_forward: mean_abs=0.6491132974624634, max_abs=4.5625, mean_rel=0.15357355773448944, max_rel=818.657470703125, norm_rel=0.023413723334670067, ref_abs_avg=27.736282348632812, test_abs_avg=27.73164176940918
production_forward grad[56] vs paper_forward: mean_abs=0.5181951522827148, max_abs=2.15625, mean_rel=0.08957041800022125, max_rel=6.671534538269043, norm_rel=0.024163680151104927, ref_abs_avg=21.82950210571289, test_abs_avg=21.814151763916016
production_forward grad[57] vs paper_forward: mean_abs=0.6187740564346313, max_abs=4.5, mean_rel=0.15822365880012512, max_rel=1217.1282958984375, norm_rel=0.023273710161447525, ref_abs_avg=26.577068328857422, test_abs_avg=26.575870513916016
production_forward grad[58] vs paper_forward: mean_abs=0.6054142117500305, max_abs=4.125, mean_rel=0.1576094925403595, max_rel=925.53125, norm_rel=0.023103317245841026, ref_abs_avg=26.205734252929688, test_abs_avg=26.213510513305664
production_forward grad[59] vs paper_forward: mean_abs=0.5239009857177734, max_abs=2.375, mean_rel=0.08174414187669754, max_rel=3.472960948944092, norm_rel=0.026759082451462746, ref_abs_avg=19.22702980041504, test_abs_avg=19.19750213623047
production_forward grad[60] vs paper_forward: mean_abs=0.5795510411262512, max_abs=3.5, mean_rel=0.14904865622520447, max_rel=1336.5390625, norm_rel=0.02290290780365467, ref_abs_avg=25.294780731201172, test_abs_avg=25.294334411621094
production_forward grad[61] vs paper_forward: mean_abs=0.5691012144088745, max_abs=3.875, mean_rel=0.15018108487129211, max_rel=917.573974609375, norm_rel=0.022576777264475822, ref_abs_avg=25.2205810546875, test_abs_avg=25.219100952148438
production_forward grad[62] vs paper_forward: mean_abs=0.4481625556945801, max_abs=2.21875, mean_rel=0.1134144514799118, max_rel=20.448699951171875, norm_rel=0.02203158102929592, ref_abs_avg=20.93893814086914, test_abs_avg=20.944236755371094
production_forward grad[63] vs paper_forward: mean_abs=0.5458974242210388, max_abs=4.75, mean_rel=0.15595418214797974, max_rel=989.9447631835938, norm_rel=0.02253836952149868, ref_abs_avg=24.23200035095215, test_abs_avg=24.23312759399414
production_forward grad[64] vs paper_forward: mean_abs=0.5417816638946533, max_abs=4.125, mean_rel=0.1422097384929657, max_rel=722.1474609375, norm_rel=0.02224576659500599, ref_abs_avg=24.323627471923828, test_abs_avg=24.329513549804688
production_forward grad[65] vs paper_forward: mean_abs=0.4187049865722656, max_abs=2.5, mean_rel=0.1260150671005249, max_rel=24.186561584472656, norm_rel=0.022000307217240334, ref_abs_avg=19.37779998779297, test_abs_avg=19.345386505126953
production_forward grad[66] vs paper_forward: mean_abs=0.5241988897323608, max_abs=3.75, mean_rel=0.15946361422538757, max_rel=2071.21484375, norm_rel=0.022310197353363037, ref_abs_avg=23.495349884033203, test_abs_avg=23.49573516845703
production_forward grad[67] vs paper_forward: mean_abs=0.5083705186843872, max_abs=4.0, mean_rel=0.14551305770874023, max_rel=1458.9398193359375, norm_rel=0.02192043513059616, ref_abs_avg=23.202388763427734, test_abs_avg=23.199298858642578
production_forward grad[68] vs paper_forward: mean_abs=0.4119380712509155, max_abs=1.75, mean_rel=0.07991290092468262, max_rel=6.412116527557373, norm_rel=0.021598849445581436, ref_abs_avg=19.07666015625, test_abs_avg=19.093212127685547
production_forward grad[69] vs paper_forward: mean_abs=0.4957698881626129, max_abs=3.5, mean_rel=0.1496790498495102, max_rel=1109.8751220703125, norm_rel=0.021516695618629456, ref_abs_avg=23.02351951599121, test_abs_avg=23.024633407592773
production_forward grad[70] vs paper_forward: mean_abs=0.4805387258529663, max_abs=3.5, mean_rel=0.14309130609035492, max_rel=1114.7078857421875, norm_rel=0.021239669993519783, ref_abs_avg=22.649585723876953, test_abs_avg=22.648557662963867
production_forward grad[71] vs paper_forward: mean_abs=0.39905738830566406, max_abs=1.625, mean_rel=0.061241816729307175, max_rel=3.3445050716400146, norm_rel=0.021474013105034828, ref_abs_avg=18.58806610107422, test_abs_avg=18.573392868041992
production_forward grad[72] vs paper_forward: mean_abs=0.47691959142684937, max_abs=3.875, mean_rel=0.14499711990356445, max_rel=673.1004028320312, norm_rel=0.02140110544860363, ref_abs_avg=22.289203643798828, test_abs_avg=22.28896141052246
production_forward grad[73] vs paper_forward: mean_abs=0.46429020166397095, max_abs=4.5, mean_rel=0.13389810919761658, max_rel=548.0206298828125, norm_rel=0.021128645166754723, ref_abs_avg=21.9764347076416, test_abs_avg=21.972198486328125
production_forward grad[74] vs paper_forward: mean_abs=0.46332550048828125, max_abs=1.8125, mean_rel=0.10281109064817429, max_rel=6.03770637512207, norm_rel=0.02419355697929859, ref_abs_avg=19.247657775878906, test_abs_avg=19.259017944335938
production_forward grad[75] vs paper_forward: mean_abs=0.5295878648757935, max_abs=4.0, mean_rel=0.15657344460487366, max_rel=1006.0153198242188, norm_rel=0.022994887083768845, ref_abs_avg=23.06048011779785, test_abs_avg=23.058956146240234
production_forward grad[76] vs paper_forward: mean_abs=0.5162273049354553, max_abs=5.109375, mean_rel=0.1466081738471985, max_rel=843.4500732421875, norm_rel=0.022777771577239037, ref_abs_avg=22.729183197021484, test_abs_avg=22.726972579956055
production_forward grad[77] vs paper_forward: mean_abs=0.38275575637817383, max_abs=2.3125, mean_rel=0.10655141621828079, max_rel=11.294943809509277, norm_rel=0.02307703346014023, ref_abs_avg=17.273143768310547, test_abs_avg=17.246097564697266
production_forward grad[78] vs paper_forward: mean_abs=0.48539286851882935, max_abs=3.75, mean_rel=0.14497700333595276, max_rel=651.3595581054688, norm_rel=0.022483432665467262, ref_abs_avg=21.623868942260742, test_abs_avg=21.62207794189453
production_forward grad[79] vs paper_forward: mean_abs=0.4736734628677368, max_abs=4.0, mean_rel=0.14520540833473206, max_rel=1255.0595703125, norm_rel=0.021905457600951195, ref_abs_avg=21.594587326049805, test_abs_avg=21.593135833740234
production_forward grad[80] vs paper_forward: mean_abs=0.38149070739746094, max_abs=1.75, mean_rel=0.08707258105278015, max_rel=6.093994140625, norm_rel=0.021952994167804718, ref_abs_avg=17.928085327148438, test_abs_avg=17.922765731811523
production_forward grad[81] vs paper_forward: mean_abs=0.4448619782924652, max_abs=3.5, mean_rel=0.13929937779903412, max_rel=1412.9093017578125, norm_rel=0.02170666866004467, ref_abs_avg=20.55243682861328, test_abs_avg=20.552108764648438
production_forward grad[82] vs paper_forward: mean_abs=0.43797311186790466, max_abs=3.822265625, mean_rel=0.14005303382873535, max_rel=697.93994140625, norm_rel=0.021723324432969093, ref_abs_avg=20.246646881103516, test_abs_avg=20.247314453125
production_forward grad[83] vs paper_forward: mean_abs=0.3448977470397949, max_abs=1.25, mean_rel=0.1302151083946228, max_rel=20.622896194458008, norm_rel=0.020818034186959267, ref_abs_avg=16.43000030517578, test_abs_avg=16.409908294677734
production_forward grad[84] vs paper_forward: mean_abs=0.41559672355651855, max_abs=3.5, mean_rel=0.1362980455160141, max_rel=795.1072387695312, norm_rel=0.02103397808969021, ref_abs_avg=19.787853240966797, test_abs_avg=19.787044525146484
production_forward grad[85] vs paper_forward: mean_abs=0.4059907793998718, max_abs=3.5, mean_rel=0.13768541812896729, max_rel=771.468505859375, norm_rel=0.02040785923600197, ref_abs_avg=19.872760772705078, test_abs_avg=19.880863189697266
production_forward grad[86] vs paper_forward: mean_abs=0.3193451762199402, max_abs=1.375, mean_rel=0.09762223064899445, max_rel=8.162002563476562, norm_rel=0.02121606096625328, ref_abs_avg=15.476546287536621, test_abs_avg=15.495158195495605
production_forward grad[87] vs paper_forward: mean_abs=0.3878134787082672, max_abs=4.0, mean_rel=0.1298278123140335, max_rel=690.4376220703125, norm_rel=0.020384248346090317, ref_abs_avg=19.124412536621094, test_abs_avg=19.124584197998047
production_forward grad[88] vs paper_forward: mean_abs=0.38323432207107544, max_abs=4.5, mean_rel=0.13091982901096344, max_rel=476.93798828125, norm_rel=0.020548978820443153, ref_abs_avg=18.758922576904297, test_abs_avg=18.758056640625
production_forward grad[89] vs paper_forward: mean_abs=0.32086801528930664, max_abs=1.25, mean_rel=0.10442514717578888, max_rel=9.036144256591797, norm_rel=0.022074231877923012, ref_abs_avg=14.71555233001709, test_abs_avg=14.719938278198242
production_forward grad[90] vs paper_forward: mean_abs=0.3660762906074524, max_abs=3.375, mean_rel=0.12241692841053009, max_rel=487.91204833984375, norm_rel=0.01994776912033558, ref_abs_avg=18.481075286865234, test_abs_avg=18.480958938598633
production_forward grad[91] vs paper_forward: mean_abs=0.36379820108413696, max_abs=4.03125, mean_rel=0.12215753644704819, max_rel=452.05169677734375, norm_rel=0.02009844221174717, ref_abs_avg=18.245283126831055, test_abs_avg=18.240821838378906
production_forward grad[92] vs paper_forward: mean_abs=0.3075484037399292, max_abs=1.0, mean_rel=0.14023883640766144, max_rel=29.672435760498047, norm_rel=0.019946571439504623, ref_abs_avg=15.66010856628418, test_abs_avg=15.670270919799805
production_forward grad[93] vs paper_forward: mean_abs=0.3462033271789551, max_abs=3.625, mean_rel=0.11600151658058167, max_rel=467.10406494140625, norm_rel=0.019381405785679817, ref_abs_avg=18.061439514160156, test_abs_avg=18.061491012573242
production_forward grad[94] vs paper_forward: mean_abs=0.3341531753540039, max_abs=4.3203125, mean_rel=0.1178349107503891, max_rel=452.5238952636719, norm_rel=0.01913466677069664, ref_abs_avg=17.731449127197266, test_abs_avg=17.731704711914062
production_forward grad[95] vs paper_forward: mean_abs=0.2626011371612549, max_abs=1.0625, mean_rel=0.195988729596138, max_rel=26.028181076049805, norm_rel=0.01912333257496357, ref_abs_avg=14.011417388916016, test_abs_avg=13.974802017211914
production_forward grad[96] vs paper_forward: mean_abs=0.32323193550109863, max_abs=3.625, mean_rel=0.12024138867855072, max_rel=681.7757568359375, norm_rel=0.019128737971186638, ref_abs_avg=17.173198699951172, test_abs_avg=17.172332763671875
production_forward grad[97] vs paper_forward: mean_abs=0.32228630781173706, max_abs=4.0, mean_rel=0.11973360180854797, max_rel=592.480712890625, norm_rel=0.019245782867074013, ref_abs_avg=17.12924575805664, test_abs_avg=17.129364013671875
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016026649391278625, max_abs=0.05078125
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008505335077643394, max_abs=0.421875, mean_rel=0.07484282553195953, max_rel=89.12887573242188, norm_rel=0.02047440968453884, ref_abs_avg=0.4472152292728424, test_abs_avg=0.4472154974937439
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.220457553863525, max_abs=56.0, mean_rel=0.14011287689208984, max_rel=94.55640411376953, norm_rel=0.020644156262278557, ref_abs_avg=309.1185607910156, test_abs_avg=309.1248779296875
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.2753181457519531, max_abs=4.5, mean_rel=0.12575598061084747, max_rel=12.929917335510254, norm_rel=0.025849243625998497, ref_abs_avg=51.0189094543457, test_abs_avg=50.92920684814453
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.532768964767456, max_abs=9.9375, mean_rel=0.17893686890602112, max_rel=2773.339111328125, norm_rel=0.023847796022892, ref_abs_avg=64.55804443359375, test_abs_avg=64.55720520019531
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.4963016510009766, max_abs=10.5, mean_rel=0.15198111534118652, max_rel=864.8245239257812, norm_rel=0.02377883717417717, ref_abs_avg=63.32072448730469, test_abs_avg=63.323097229003906
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.1827678680419922, max_abs=4.5, mean_rel=0.07923927903175354, max_rel=3.444639205932617, norm_rel=0.024238990619778633, ref_abs_avg=47.978172302246094, test_abs_avg=47.9970588684082
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.3267898559570312, max_abs=8.25, mean_rel=0.1622951328754425, max_rel=1191.93603515625, norm_rel=0.023544427007436752, ref_abs_avg=56.587257385253906, test_abs_avg=56.58624267578125
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.2973759174346924, max_abs=8.9375, mean_rel=0.1688404381275177, max_rel=1909.5675048828125, norm_rel=0.023475008085370064, ref_abs_avg=55.575618743896484, test_abs_avg=55.574913024902344
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.9338469505310059, max_abs=4.5625, mean_rel=0.6183030605316162, max_rel=262.05670166015625, norm_rel=0.02215922437608242, ref_abs_avg=42.72745132446289, test_abs_avg=42.67915344238281
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.2002485990524292, max_abs=8.0, mean_rel=0.16313761472702026, max_rel=2511.09814453125, norm_rel=0.023323019966483116, ref_abs_avg=51.665409088134766, test_abs_avg=51.666648864746094
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.1757285594940186, max_abs=8.5, mean_rel=0.1621582955121994, max_rel=1332.7294921875, norm_rel=0.02317247912287712, ref_abs_avg=50.978118896484375, test_abs_avg=50.968746185302734
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9091858863830566, max_abs=3.5, mean_rel=0.22290080785751343, max_rel=46.9066276550293, norm_rel=0.02286318875849247, ref_abs_avg=39.098716735839844, test_abs_avg=39.210350036621094
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.109023928642273, max_abs=7.0, mean_rel=0.14916357398033142, max_rel=1029.01025390625, norm_rel=0.023111125454306602, ref_abs_avg=48.23442840576172, test_abs_avg=48.236610412597656
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.0781145095825195, max_abs=6.625, mean_rel=0.17313966155052185, max_rel=1963.8507080078125, norm_rel=0.022841211408376694, ref_abs_avg=47.39921951293945, test_abs_avg=47.40218734741211
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8494551181793213, max_abs=3.25, mean_rel=0.20052890479564667, max_rel=68.82610321044922, norm_rel=0.021815136075019836, ref_abs_avg=39.232025146484375, test_abs_avg=39.22110366821289
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.0272979736328125, max_abs=6.375, mean_rel=0.15838176012039185, max_rel=1368.59326171875, norm_rel=0.022902678698301315, ref_abs_avg=45.07920837402344, test_abs_avg=45.078163146972656
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0027799606323242, max_abs=6.421875, mean_rel=0.15694667398929596, max_rel=1468.231689453125, norm_rel=0.0226118341088295, ref_abs_avg=44.606109619140625, test_abs_avg=44.609375
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.854697585105896, max_abs=3.25, mean_rel=0.08110196143388748, max_rel=2.9898622035980225, norm_rel=0.023611344397068024, ref_abs_avg=35.749332427978516, test_abs_avg=35.71760940551758
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=0.9687913656234741, max_abs=6.75, mean_rel=0.15814855694770813, max_rel=1391.98876953125, norm_rel=0.022787755355238914, ref_abs_avg=42.722877502441406, test_abs_avg=42.72565460205078
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9466137290000916, max_abs=6.25, mean_rel=0.1630031168460846, max_rel=1224.152099609375, norm_rel=0.02252248488366604, ref_abs_avg=42.209068298339844, test_abs_avg=42.20785903930664
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7401928901672363, max_abs=3.125, mean_rel=0.07728676497936249, max_rel=4.552224636077881, norm_rel=0.022001298144459724, ref_abs_avg=33.044219970703125, test_abs_avg=33.05857849121094
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9208905100822449, max_abs=6.0, mean_rel=0.15247313678264618, max_rel=954.1151733398438, norm_rel=0.022547805681824684, ref_abs_avg=41.023563385009766, test_abs_avg=41.026947021484375
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.8934761881828308, max_abs=5.34375, mean_rel=0.16131162643432617, max_rel=1239.3304443359375, norm_rel=0.02220733091235161, ref_abs_avg=40.420135498046875, test_abs_avg=40.42013168334961
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.6990580558776855, max_abs=2.5, mean_rel=0.15219804644584656, max_rel=25.51616096496582, norm_rel=0.021486321464180946, ref_abs_avg=31.935867309570312, test_abs_avg=31.98371124267578
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.8725870847702026, max_abs=5.875, mean_rel=0.14588811993598938, max_rel=1069.6007080078125, norm_rel=0.022287901490926743, ref_abs_avg=39.2745475769043, test_abs_avg=39.2750358581543
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8499525785446167, max_abs=5.5, mean_rel=0.14043359458446503, max_rel=851.787353515625, norm_rel=0.022206388413906097, ref_abs_avg=38.44120788574219, test_abs_avg=38.44941711425781
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8311922550201416, max_abs=4.0, mean_rel=0.22372539341449738, max_rel=35.26251983642578, norm_rel=0.024339254945516586, ref_abs_avg=34.35024642944336, test_abs_avg=34.330379486083984
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.0151323080062866, max_abs=7.75, mean_rel=0.16463229060173035, max_rel=1323.404296875, norm_rel=0.024615369737148285, ref_abs_avg=41.43635177612305, test_abs_avg=41.43474197387695
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.00469970703125, max_abs=6.75, mean_rel=0.1567278802394867, max_rel=1197.4532470703125, norm_rel=0.024765802547335625, ref_abs_avg=40.77674102783203, test_abs_avg=40.767822265625
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.7609305381774902, max_abs=3.0, mean_rel=0.12848129868507385, max_rel=26.045944213867188, norm_rel=0.02474498376250267, ref_abs_avg=30.481657028198242, test_abs_avg=30.464584350585938
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.948604166507721, max_abs=7.0, mean_rel=0.200073704123497, max_rel=2196.68798828125, norm_rel=0.024954212829470634, ref_abs_avg=38.1173095703125, test_abs_avg=38.12261962890625
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9384802579879761, max_abs=5.625, mean_rel=0.19382169842720032, max_rel=843.862060546875, norm_rel=0.024759477004408836, ref_abs_avg=38.00283432006836, test_abs_avg=38.00297546386719
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.686953067779541, max_abs=3.125, mean_rel=0.09178601205348969, max_rel=4.453997611999512, norm_rel=0.02319948561489582, ref_abs_avg=30.03038787841797, test_abs_avg=30.012554168701172
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.8790357708930969, max_abs=5.75, mean_rel=0.173273503780365, max_rel=1091.5916748046875, norm_rel=0.024723807349801064, ref_abs_avg=35.65735626220703, test_abs_avg=35.65693664550781
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8659957051277161, max_abs=5.71875, mean_rel=0.16392739117145538, max_rel=862.4669799804688, norm_rel=0.024692155420780182, ref_abs_avg=35.16852569580078, test_abs_avg=35.17246627807617
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.6510767936706543, max_abs=2.609375, mean_rel=0.0802937000989914, max_rel=6.875339984893799, norm_rel=0.02317940443754196, ref_abs_avg=28.641633987426758, test_abs_avg=28.637163162231445
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.823634147644043, max_abs=5.25, mean_rel=0.1632607877254486, max_rel=1125.394287109375, norm_rel=0.024514896795153618, ref_abs_avg=33.666404724121094, test_abs_avg=33.668556213378906
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8109254837036133, max_abs=5.5, mean_rel=0.15645447373390198, max_rel=1383.573486328125, norm_rel=0.024632839486002922, ref_abs_avg=33.05060577392578, test_abs_avg=33.05621337890625
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6354742050170898, max_abs=2.75, mean_rel=0.23877407610416412, max_rel=49.82280349731445, norm_rel=0.023331724107265472, ref_abs_avg=27.363842010498047, test_abs_avg=27.324935913085938
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.7756906747817993, max_abs=5.75, mean_rel=0.16361454129219055, max_rel=1322.154296875, norm_rel=0.024261808022856712, ref_abs_avg=32.018619537353516, test_abs_avg=32.01995086669922
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7568076848983765, max_abs=4.75, mean_rel=0.16163292527198792, max_rel=981.4042358398438, norm_rel=0.024063169956207275, ref_abs_avg=31.559024810791016, test_abs_avg=31.559431076049805
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6068564057350159, max_abs=2.625, mean_rel=0.25434768199920654, max_rel=73.0158462524414, norm_rel=0.024388067424297333, ref_abs_avg=25.5723819732666, test_abs_avg=25.5662899017334
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7331657409667969, max_abs=5.0, mean_rel=0.15943960845470428, max_rel=1161.48681640625, norm_rel=0.024013319984078407, ref_abs_avg=30.640249252319336, test_abs_avg=30.639415740966797
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.721416711807251, max_abs=4.5, mean_rel=0.16947868466377258, max_rel=1329.1243896484375, norm_rel=0.024062011390924454, ref_abs_avg=30.083797454833984, test_abs_avg=30.086009979248047
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5839071273803711, max_abs=2.25, mean_rel=0.13079045712947845, max_rel=10.8042573928833, norm_rel=0.02467901073396206, ref_abs_avg=23.25376319885254, test_abs_avg=23.220787048339844
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.6945465803146362, max_abs=4.25, mean_rel=0.16750627756118774, max_rel=1496.964599609375, norm_rel=0.023723267018795013, ref_abs_avg=29.347599029541016, test_abs_avg=29.350181579589844
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.685508131980896, max_abs=4.25, mean_rel=0.17150482535362244, max_rel=989.5096435546875, norm_rel=0.02379940263926983, ref_abs_avg=28.885866165161133, test_abs_avg=28.88985824584961
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5287604331970215, max_abs=2.0, mean_rel=0.3109826445579529, max_rel=42.97819519042969, norm_rel=0.02195214293897152, ref_abs_avg=23.753896713256836, test_abs_avg=23.754684448242188
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.6602046489715576, max_abs=5.09375, mean_rel=0.15149345993995667, max_rel=916.7846069335938, norm_rel=0.023608505725860596, ref_abs_avg=28.03961181640625, test_abs_avg=28.038345336914062
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6511654257774353, max_abs=4.5, mean_rel=0.15554097294807434, max_rel=1304.9239501953125, norm_rel=0.023351551964879036, ref_abs_avg=27.934171676635742, test_abs_avg=27.934040069580078
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.613311767578125, max_abs=2.75, mean_rel=0.08918511122465134, max_rel=3.5098438262939453, norm_rel=0.024241143837571144, ref_abs_avg=25.23055648803711, test_abs_avg=25.31441879272461
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.7398619055747986, max_abs=5.0, mean_rel=0.1537698209285736, max_rel=728.0463256835938, norm_rel=0.024573950096964836, ref_abs_avg=30.169279098510742, test_abs_avg=30.168596267700195
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7212566137313843, max_abs=4.625, mean_rel=0.1547035425901413, max_rel=1228.96923828125, norm_rel=0.024119138717651367, ref_abs_avg=29.902870178222656, test_abs_avg=29.899673461914062
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5648436546325684, max_abs=2.2958984375, mean_rel=0.22275912761688232, max_rel=57.51997375488281, norm_rel=0.025416459888219833, ref_abs_avg=22.34347152709961, test_abs_avg=22.306392669677734
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.6722487211227417, max_abs=4.6875, mean_rel=0.1716245710849762, max_rel=2595.063232421875, norm_rel=0.024180425330996513, ref_abs_avg=27.83119773864746, test_abs_avg=27.830318450927734
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6582281589508057, max_abs=4.5, mean_rel=0.15782895684242249, max_rel=708.1809692382812, norm_rel=0.023749802261590958, ref_abs_avg=27.736282348632812, test_abs_avg=27.72940444946289
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5376567840576172, max_abs=2.0, mean_rel=0.08285081386566162, max_rel=6.206850528717041, norm_rel=0.02492373250424862, ref_abs_avg=21.82950210571289, test_abs_avg=21.817541122436523
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6265639662742615, max_abs=4.5, mean_rel=0.15914857387542725, max_rel=895.8727416992188, norm_rel=0.023565879091620445, ref_abs_avg=26.577068328857422, test_abs_avg=26.575244903564453
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6107347011566162, max_abs=4.5, mean_rel=0.15881092846393585, max_rel=865.57568359375, norm_rel=0.023309359326958656, ref_abs_avg=26.205734252929688, test_abs_avg=26.214248657226562
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5032253265380859, max_abs=2.375, mean_rel=0.08039071410894394, max_rel=3.0039823055267334, norm_rel=0.026293836534023285, ref_abs_avg=19.22702980041504, test_abs_avg=19.199304580688477
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.5855474472045898, max_abs=3.875, mean_rel=0.1541575938463211, max_rel=1779.5992431640625, norm_rel=0.023156097158789635, ref_abs_avg=25.294780731201172, test_abs_avg=25.294204711914062
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.5766212940216064, max_abs=4.125, mean_rel=0.1535465568304062, max_rel=1144.4739990234375, norm_rel=0.022894950583577156, ref_abs_avg=25.2205810546875, test_abs_avg=25.221084594726562
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4536423683166504, max_abs=1.96875, mean_rel=0.14260458946228027, max_rel=29.725818634033203, norm_rel=0.02199718914926052, ref_abs_avg=20.93893814086914, test_abs_avg=20.948135375976562
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.5523617267608643, max_abs=3.75, mean_rel=0.15702348947525024, max_rel=930.1677856445312, norm_rel=0.022805942222476006, ref_abs_avg=24.23200035095215, test_abs_avg=24.232860565185547
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5475372672080994, max_abs=4.484375, mean_rel=0.1450727880001068, max_rel=691.525146484375, norm_rel=0.022492049261927605, ref_abs_avg=24.323627471923828, test_abs_avg=24.33061981201172
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4316554069519043, max_abs=2.25, mean_rel=0.13684521615505219, max_rel=24.633813858032227, norm_rel=0.022124024108052254, ref_abs_avg=19.37779998779297, test_abs_avg=19.36151885986328
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5283762216567993, max_abs=3.5, mean_rel=0.15543925762176514, max_rel=1705.2489013671875, norm_rel=0.02249799109995365, ref_abs_avg=23.495349884033203, test_abs_avg=23.495941162109375
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5126123428344727, max_abs=4.125, mean_rel=0.14300958812236786, max_rel=1060.5487060546875, norm_rel=0.022094054147601128, ref_abs_avg=23.202388763427734, test_abs_avg=23.199321746826172
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.4155964255332947, max_abs=2.0, mean_rel=0.08089698851108551, max_rel=4.860708236694336, norm_rel=0.021855466067790985, ref_abs_avg=19.07666015625, test_abs_avg=19.089313507080078
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.4993361830711365, max_abs=4.0, mean_rel=0.1490098237991333, max_rel=1068.0966796875, norm_rel=0.021664725616574287, ref_abs_avg=23.02351951599121, test_abs_avg=23.024545669555664
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.48739761114120483, max_abs=3.5, mean_rel=0.14677694439888, max_rel=933.1897583007812, norm_rel=0.021524356678128242, ref_abs_avg=22.649585723876953, test_abs_avg=22.646854400634766
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.39702582359313965, max_abs=1.546875, mean_rel=0.06719672679901123, max_rel=4.809046268463135, norm_rel=0.021468112245202065, ref_abs_avg=18.58806610107422, test_abs_avg=18.57166290283203
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.48032069206237793, max_abs=3.375, mean_rel=0.14919039607048035, max_rel=811.49560546875, norm_rel=0.02154719829559326, ref_abs_avg=22.289203643798828, test_abs_avg=22.289413452148438
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.4684482216835022, max_abs=4.5, mean_rel=0.13359415531158447, max_rel=528.4859008789062, norm_rel=0.021331600844860077, ref_abs_avg=21.9764347076416, test_abs_avg=21.972904205322266
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.46693944931030273, max_abs=1.84375, mean_rel=0.09352731704711914, max_rel=2.3901166915893555, norm_rel=0.02484474889934063, ref_abs_avg=19.247657775878906, test_abs_avg=19.278451919555664
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5353459119796753, max_abs=4.25, mean_rel=0.15799883008003235, max_rel=924.3651733398438, norm_rel=0.023227544501423836, ref_abs_avg=23.06048011779785, test_abs_avg=23.058917999267578
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5202178955078125, max_abs=4.0, mean_rel=0.14778590202331543, max_rel=1016.4771728515625, norm_rel=0.02293100208044052, ref_abs_avg=22.729183197021484, test_abs_avg=22.725765228271484
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.4092569351196289, max_abs=1.875, mean_rel=0.13213437795639038, max_rel=13.856712341308594, norm_rel=0.023951776325702667, ref_abs_avg=17.273143768310547, test_abs_avg=17.241703033447266
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.49051612615585327, max_abs=4.75, mean_rel=0.14439347386360168, max_rel=676.4297485351562, norm_rel=0.02269633486866951, ref_abs_avg=21.623868942260742, test_abs_avg=21.62122344970703
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.4770733416080475, max_abs=4.5, mean_rel=0.14317214488983154, max_rel=931.8353271484375, norm_rel=0.022055406123399734, ref_abs_avg=21.594587326049805, test_abs_avg=21.59290313720703
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.37195873260498047, max_abs=1.5, mean_rel=0.07351210713386536, max_rel=3.6563963890075684, norm_rel=0.021538686007261276, ref_abs_avg=17.928085327148438, test_abs_avg=17.917545318603516
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.44872573018074036, max_abs=3.5, mean_rel=0.14381182193756104, max_rel=1948.345703125, norm_rel=0.02188216522336006, ref_abs_avg=20.55243682861328, test_abs_avg=20.55179786682129
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.44512495398521423, max_abs=4.375, mean_rel=0.1436809003353119, max_rel=915.7861328125, norm_rel=0.022101745009422302, ref_abs_avg=20.246646881103516, test_abs_avg=20.24705696105957
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.34756994247436523, max_abs=1.5, mean_rel=0.11719104647636414, max_rel=15.782828330993652, norm_rel=0.02101898193359375, ref_abs_avg=16.43000030517578, test_abs_avg=16.417781829833984
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.41886216402053833, max_abs=3.5, mean_rel=0.1382676661014557, max_rel=826.9189453125, norm_rel=0.021168997511267662, ref_abs_avg=19.787853240966797, test_abs_avg=19.787290573120117
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.4124109148979187, max_abs=4.0, mean_rel=0.1404924988746643, max_rel=867.447509765625, norm_rel=0.02075060084462166, ref_abs_avg=19.872760772705078, test_abs_avg=19.880802154541016
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.31165027618408203, max_abs=1.59375, mean_rel=0.11418501287698746, max_rel=14.777270317077637, norm_rel=0.02089964784681797, ref_abs_avg=15.476546287536621, test_abs_avg=15.493707656860352
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.39025184512138367, max_abs=3.5, mean_rel=0.1285119652748108, max_rel=662.59326171875, norm_rel=0.020487124100327492, ref_abs_avg=19.124412536621094, test_abs_avg=19.124887466430664
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.3826660215854645, max_abs=4.0, mean_rel=0.1269364058971405, max_rel=496.2803955078125, norm_rel=0.020478634163737297, ref_abs_avg=18.758922576904297, test_abs_avg=18.756256103515625
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.31655681133270264, max_abs=1.265625, mean_rel=0.09590204060077667, max_rel=6.906938076019287, norm_rel=0.021716348826885223, ref_abs_avg=14.71555233001709, test_abs_avg=14.717865943908691
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.36782652139663696, max_abs=3.25, mean_rel=0.1227685883641243, max_rel=605.3362426757812, norm_rel=0.020041028037667274, ref_abs_avg=18.481075286865234, test_abs_avg=18.481346130371094
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.36442622542381287, max_abs=3.921875, mean_rel=0.12425820529460907, max_rel=463.4255676269531, norm_rel=0.02011212147772312, ref_abs_avg=18.245283126831055, test_abs_avg=18.243663787841797
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.2925189733505249, max_abs=1.15625, mean_rel=0.07666333764791489, max_rel=8.186208724975586, norm_rel=0.01924728974699974, ref_abs_avg=15.66010856628418, test_abs_avg=15.666683197021484
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.34789425134658813, max_abs=3.5, mean_rel=0.11810702085494995, max_rel=501.9000549316406, norm_rel=0.019472848623991013, ref_abs_avg=18.061439514160156, test_abs_avg=18.062259674072266
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3339076042175293, max_abs=4.0, mean_rel=0.11882957816123962, max_rel=378.9432067871094, norm_rel=0.019132565706968307, ref_abs_avg=17.731449127197266, test_abs_avg=17.731576919555664
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.2661426067352295, max_abs=1.03125, mean_rel=0.1939842700958252, max_rel=29.330368041992188, norm_rel=0.01918560080230236, ref_abs_avg=14.011417388916016, test_abs_avg=13.974077224731445
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3239689767360687, max_abs=3.5, mean_rel=0.11833168566226959, max_rel=372.763671875, norm_rel=0.019150489941239357, ref_abs_avg=17.173198699951172, test_abs_avg=17.172893524169922
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.32455211877822876, max_abs=4.0, mean_rel=0.1216612234711647, max_rel=589.2261962890625, norm_rel=0.019357098266482353, ref_abs_avg=17.12924575805664, test_abs_avg=17.130794525146484
identity layers + randn queries
torch_compile_phases_forward fwd+bwd:  94.920 ms
torch_compile_phases_forward bwd-only: 76.434 ms
torch_compile_phases_forward peak allocated: fwd=18.203 GiB, fwd+bwd=18.831 GiB
torch_compile_phases_forward peak reserved:  fwd=27.398 GiB, fwd+bwd=27.398 GiB
paper_forward fwd+bwd:  221.168 ms
paper_forward bwd-only: 173.987 ms
paper_forward peak allocated: fwd=35.128 GiB, fwd+bwd=37.247 GiB
paper_forward peak reserved:  fwd=36.166 GiB, fwd+bwd=38.666 GiB
production_forward fwd+bwd:  66.270 ms
production_forward bwd-only: 56.447 ms
production_forward peak allocated: fwd=7.614 GiB, fwd+bwd=15.618 GiB
production_forward peak reserved:  fwd=27.369 GiB, fwd+bwd=27.369 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.001649742480367422, max_abs=0.04296875
production_forward grad[0] vs paper_forward: mean_abs=0.008448777720332146, max_abs=0.44140625, mean_rel=0.07293469458818436, max_rel=124.33363342285156, norm_rel=0.020027944818139076, ref_abs_avg=0.459066241979599, test_abs_avg=0.4590953588485718
production_forward grad[1] vs paper_forward: mean_abs=7.43505334854126, max_abs=50.0, mean_rel=0.1899489313364029, max_rel=1003.95458984375, norm_rel=0.020584605634212494, ref_abs_avg=320.4160461425781, test_abs_avg=320.5039978027344
production_forward grad[2] vs paper_forward: mean_abs=1.1497135162353516, max_abs=4.5, mean_rel=0.07891376316547394, max_rel=1.8595248460769653, norm_rel=0.020959001034498215, ref_abs_avg=56.0364990234375, test_abs_avg=56.05887985229492
production_forward grad[3] vs paper_forward: mean_abs=1.5078632831573486, max_abs=10.0, mean_rel=0.15911218523979187, max_rel=2372.531494140625, norm_rel=0.02317625656723976, ref_abs_avg=65.3734359741211, test_abs_avg=65.38638305664062
production_forward grad[4] vs paper_forward: mean_abs=1.4559884071350098, max_abs=9.25, mean_rel=0.15673638880252838, max_rel=1143.433837890625, norm_rel=0.022727621719241142, ref_abs_avg=64.3333511352539, test_abs_avg=64.34315490722656
production_forward grad[5] vs paper_forward: mean_abs=1.0380077362060547, max_abs=4.875, mean_rel=0.06795412302017212, max_rel=2.643428087234497, norm_rel=0.021622778847813606, ref_abs_avg=49.15776824951172, test_abs_avg=49.184417724609375
production_forward grad[6] vs paper_forward: mean_abs=1.3455129861831665, max_abs=8.25, mean_rel=0.15248586237430573, max_rel=1705.723876953125, norm_rel=0.022980840876698494, ref_abs_avg=58.84486389160156, test_abs_avg=58.85527420043945
production_forward grad[7] vs paper_forward: mean_abs=1.3067328929901123, max_abs=8.25, mean_rel=0.1552780568599701, max_rel=1096.2310791015625, norm_rel=0.022698504850268364, ref_abs_avg=57.86680221557617, test_abs_avg=57.87770080566406
production_forward grad[8] vs paper_forward: mean_abs=1.0399208068847656, max_abs=4.5, mean_rel=0.07214441895484924, max_rel=1.6089258193969727, norm_rel=0.024328000843524933, ref_abs_avg=42.72032928466797, test_abs_avg=42.69437789916992
production_forward grad[9] vs paper_forward: mean_abs=1.2215121984481812, max_abs=7.125, mean_rel=0.16534423828125, max_rel=1561.912109375, norm_rel=0.02278960309922695, ref_abs_avg=53.87067794799805, test_abs_avg=53.880733489990234
production_forward grad[10] vs paper_forward: mean_abs=1.1881604194641113, max_abs=7.0, mean_rel=0.14088088274002075, max_rel=705.0911254882812, norm_rel=0.022478366270661354, ref_abs_avg=53.133949279785156, test_abs_avg=53.138572692871094
production_forward grad[11] vs paper_forward: mean_abs=0.9391121864318848, max_abs=3.8125, mean_rel=0.10177282989025116, max_rel=5.310145378112793, norm_rel=0.023428358137607574, ref_abs_avg=39.10554504394531, test_abs_avg=39.06904983520508
production_forward grad[12] vs paper_forward: mean_abs=1.1183981895446777, max_abs=8.0126953125, mean_rel=0.15017148852348328, max_rel=1778.18359375, norm_rel=0.022528594359755516, ref_abs_avg=49.90147018432617, test_abs_avg=49.90629196166992
production_forward grad[13] vs paper_forward: mean_abs=1.0925657749176025, max_abs=6.75, mean_rel=0.14957758784294128, max_rel=1783.5296630859375, norm_rel=0.022378070279955864, ref_abs_avg=49.09611511230469, test_abs_avg=49.09971237182617
production_forward grad[14] vs paper_forward: mean_abs=0.9134807586669922, max_abs=3.8125, mean_rel=0.15069586038589478, max_rel=34.00656509399414, norm_rel=0.024717597290873528, ref_abs_avg=36.98609924316406, test_abs_avg=36.961490631103516
production_forward grad[15] vs paper_forward: mean_abs=1.046877145767212, max_abs=6.5, mean_rel=0.1598348468542099, max_rel=1223.4688720703125, norm_rel=0.022390957921743393, ref_abs_avg=47.01218795776367, test_abs_avg=47.013282775878906
production_forward grad[16] vs paper_forward: mean_abs=1.0283138751983643, max_abs=6.25, mean_rel=0.16212692856788635, max_rel=960.4981689453125, norm_rel=0.022332744672894478, ref_abs_avg=46.26674270629883, test_abs_avg=46.27116012573242
production_forward grad[17] vs paper_forward: mean_abs=0.809445858001709, max_abs=3.25, mean_rel=0.11200263351202011, max_rel=16.88396453857422, norm_rel=0.021886663511395454, ref_abs_avg=37.528446197509766, test_abs_avg=37.499759674072266
production_forward grad[18] vs paper_forward: mean_abs=0.9859213829040527, max_abs=6.25, mean_rel=0.1526910811662674, max_rel=1244.9749755859375, norm_rel=0.022260885685682297, ref_abs_avg=44.49477767944336, test_abs_avg=44.498329162597656
production_forward grad[19] vs paper_forward: mean_abs=0.9634844660758972, max_abs=6.25, mean_rel=0.16525143384933472, max_rel=1107.449951171875, norm_rel=0.02213476039469242, ref_abs_avg=43.74237823486328, test_abs_avg=43.74628448486328
production_forward grad[20] vs paper_forward: mean_abs=0.7617974281311035, max_abs=3.5, mean_rel=0.09695565700531006, max_rel=14.5635986328125, norm_rel=0.023305216804146767, ref_abs_avg=32.73814010620117, test_abs_avg=32.78096008300781
production_forward grad[21] vs paper_forward: mean_abs=0.9264571070671082, max_abs=6.0, mean_rel=0.16273528337478638, max_rel=1357.1541748046875, norm_rel=0.022196399047970772, ref_abs_avg=41.953399658203125, test_abs_avg=41.9563102722168
production_forward grad[22] vs paper_forward: mean_abs=0.9108655452728271, max_abs=5.5, mean_rel=0.1446262151002884, max_rel=987.632080078125, norm_rel=0.02195824310183525, ref_abs_avg=41.719825744628906, test_abs_avg=41.73051834106445
production_forward grad[23] vs paper_forward: mean_abs=0.7058844566345215, max_abs=3.0, mean_rel=0.11080117523670197, max_rel=20.164026260375977, norm_rel=0.022417429834604263, ref_abs_avg=31.52224349975586, test_abs_avg=31.50636100769043
production_forward grad[24] vs paper_forward: mean_abs=0.8889153003692627, max_abs=5.25, mean_rel=0.1644761860370636, max_rel=1458.580078125, norm_rel=0.021992355585098267, ref_abs_avg=40.5753173828125, test_abs_avg=40.57865905761719
production_forward grad[25] vs paper_forward: mean_abs=0.8700404167175293, max_abs=5.25, mean_rel=0.1626664400100708, max_rel=2047.2379150390625, norm_rel=0.02188717946410179, ref_abs_avg=39.965858459472656, test_abs_avg=39.97325897216797
production_forward grad[26] vs paper_forward: mean_abs=0.9164772033691406, max_abs=4.0, mean_rel=0.22686344385147095, max_rel=47.48908233642578, norm_rel=0.025026030838489532, ref_abs_avg=35.585262298583984, test_abs_avg=35.55208969116211
production_forward grad[27] vs paper_forward: mean_abs=1.0453091859817505, max_abs=7.0, mean_rel=0.1715097427368164, max_rel=1267.2640380859375, norm_rel=0.02395118959248066, ref_abs_avg=43.829437255859375, test_abs_avg=43.834510803222656
production_forward grad[28] vs paper_forward: mean_abs=1.0168657302856445, max_abs=6.75, mean_rel=0.17279469966888428, max_rel=1799.341552734375, norm_rel=0.023736974224448204, ref_abs_avg=42.99359893798828, test_abs_avg=42.997920989990234
production_forward grad[29] vs paper_forward: mean_abs=0.7733850479125977, max_abs=3.125, mean_rel=0.09534981846809387, max_rel=8.959664344787598, norm_rel=0.022303471341729164, ref_abs_avg=36.35183334350586, test_abs_avg=36.34494400024414
production_forward grad[30] vs paper_forward: mean_abs=0.9696294069290161, max_abs=6.5, mean_rel=0.16379860043525696, max_rel=1629.25634765625, norm_rel=0.024272296577692032, ref_abs_avg=40.0872802734375, test_abs_avg=40.090789794921875
production_forward grad[31] vs paper_forward: mean_abs=0.948121190071106, max_abs=5.75, mean_rel=0.1646009385585785, max_rel=740.6244506835938, norm_rel=0.024127954617142677, ref_abs_avg=39.37628936767578, test_abs_avg=39.3720817565918
production_forward grad[32] vs paper_forward: mean_abs=0.7061233520507812, max_abs=3.4375, mean_rel=0.06682421267032623, max_rel=9.50394058227539, norm_rel=0.024295881390571594, ref_abs_avg=30.18592071533203, test_abs_avg=30.164260864257812
production_forward grad[33] vs paper_forward: mean_abs=0.894433856010437, max_abs=6.0, mean_rel=0.17174747586250305, max_rel=1551.890869140625, norm_rel=0.02414235658943653, ref_abs_avg=37.19481658935547, test_abs_avg=37.19565200805664
production_forward grad[34] vs paper_forward: mean_abs=0.8818950057029724, max_abs=6.0, mean_rel=0.15144354104995728, max_rel=777.947021484375, norm_rel=0.02406919375061989, ref_abs_avg=36.71428298950195, test_abs_avg=36.72164535522461
production_forward grad[35] vs paper_forward: mean_abs=0.6975224018096924, max_abs=2.75, mean_rel=0.24743105471134186, max_rel=55.12055969238281, norm_rel=0.023812277242541313, ref_abs_avg=29.077392578125, test_abs_avg=29.095165252685547
production_forward grad[36] vs paper_forward: mean_abs=0.8381572365760803, max_abs=5.5, mean_rel=0.16899950802326202, max_rel=2271.14306640625, norm_rel=0.02395433932542801, ref_abs_avg=35.134063720703125, test_abs_avg=35.1370849609375
production_forward grad[37] vs paper_forward: mean_abs=0.8238779306411743, max_abs=5.34375, mean_rel=0.15996518731117249, max_rel=829.1776733398438, norm_rel=0.02379399724304676, ref_abs_avg=34.81127166748047, test_abs_avg=34.81171417236328
production_forward grad[38] vs paper_forward: mean_abs=0.6196650266647339, max_abs=2.5, mean_rel=0.6366889476776123, max_rel=281.8712463378906, norm_rel=0.0226602666079998, ref_abs_avg=27.639110565185547, test_abs_avg=27.624267578125
production_forward grad[39] vs paper_forward: mean_abs=0.7876027822494507, max_abs=5.25, mean_rel=0.15294073522090912, max_rel=807.2042236328125, norm_rel=0.023591935634613037, ref_abs_avg=33.455379486083984, test_abs_avg=33.45689392089844
production_forward grad[40] vs paper_forward: mean_abs=0.78006511926651, max_abs=5.25, mean_rel=0.14449851214885712, max_rel=411.6438293457031, norm_rel=0.023708904162049294, ref_abs_avg=33.00205993652344, test_abs_avg=33.0149040222168
production_forward grad[41] vs paper_forward: mean_abs=0.5837535858154297, max_abs=2.6875, mean_rel=0.0714026540517807, max_rel=3.737809419631958, norm_rel=0.02181464619934559, ref_abs_avg=27.178325653076172, test_abs_avg=27.217844009399414
production_forward grad[42] vs paper_forward: mean_abs=0.7478670477867126, max_abs=4.875, mean_rel=0.16076985001564026, max_rel=1605.989013671875, norm_rel=0.023378845304250717, ref_abs_avg=32.04686737060547, test_abs_avg=32.05186462402344
production_forward grad[43] vs paper_forward: mean_abs=0.7322757244110107, max_abs=4.5, mean_rel=0.1543700397014618, max_rel=785.1490478515625, norm_rel=0.02336565963923931, ref_abs_avg=31.4578857421875, test_abs_avg=31.46356201171875
production_forward grad[44] vs paper_forward: mean_abs=0.594298779964447, max_abs=2.404296875, mean_rel=0.08437229692935944, max_rel=5.901732921600342, norm_rel=0.023532887920737267, ref_abs_avg=25.394744873046875, test_abs_avg=25.389991760253906
production_forward grad[45] vs paper_forward: mean_abs=0.7180967330932617, max_abs=4.875, mean_rel=0.14570701122283936, max_rel=997.4854125976562, norm_rel=0.02324357070028782, ref_abs_avg=30.93465805053711, test_abs_avg=30.937602996826172
production_forward grad[46] vs paper_forward: mean_abs=0.6990365982055664, max_abs=4.25, mean_rel=0.14906209707260132, max_rel=579.2068481445312, norm_rel=0.022907059639692307, ref_abs_avg=30.6446533203125, test_abs_avg=30.64495849609375
production_forward grad[47] vs paper_forward: mean_abs=0.566359281539917, max_abs=2.375, mean_rel=0.3662498891353607, max_rel=102.67347717285156, norm_rel=0.024082433432340622, ref_abs_avg=23.886869430541992, test_abs_avg=23.845657348632812
production_forward grad[48] vs paper_forward: mean_abs=0.6831339597702026, max_abs=4.5, mean_rel=0.15729327499866486, max_rel=742.4028930664062, norm_rel=0.023111077025532722, ref_abs_avg=29.569833755493164, test_abs_avg=29.571380615234375
production_forward grad[49] vs paper_forward: mean_abs=0.6692405939102173, max_abs=4.2333984375, mean_rel=0.15677738189697266, max_rel=909.15380859375, norm_rel=0.02283535711467266, ref_abs_avg=29.37718963623047, test_abs_avg=29.378562927246094
production_forward grad[50] vs paper_forward: mean_abs=0.6761951446533203, max_abs=2.75, mean_rel=0.09385745972394943, max_rel=7.705833435058594, norm_rel=0.025881504639983177, ref_abs_avg=26.28071403503418, test_abs_avg=26.295032501220703
production_forward grad[51] vs paper_forward: mean_abs=0.7647764682769775, max_abs=4.75, mean_rel=0.17001810669898987, max_rel=888.4236450195312, norm_rel=0.02478477731347084, ref_abs_avg=30.878366470336914, test_abs_avg=30.881563186645508
production_forward grad[52] vs paper_forward: mean_abs=0.7426903247833252, max_abs=4.75, mean_rel=0.15388169884681702, max_rel=546.5349731445312, norm_rel=0.024695569649338722, ref_abs_avg=30.19180679321289, test_abs_avg=30.195968627929688
production_forward grad[53] vs paper_forward: mean_abs=0.5930418968200684, max_abs=2.5, mean_rel=0.07762818783521652, max_rel=2.6736581325531006, norm_rel=0.024739433079957962, ref_abs_avg=23.673542022705078, test_abs_avg=23.730667114257812
production_forward grad[54] vs paper_forward: mean_abs=0.7033743262290955, max_abs=4.4375, mean_rel=0.17236623167991638, max_rel=1057.23974609375, norm_rel=0.024448120966553688, ref_abs_avg=28.8134765625, test_abs_avg=28.817739486694336
production_forward grad[55] vs paper_forward: mean_abs=0.688870370388031, max_abs=4.6875, mean_rel=0.1575445681810379, max_rel=695.6791381835938, norm_rel=0.0244416706264019, ref_abs_avg=28.278791427612305, test_abs_avg=28.286720275878906
production_forward grad[56] vs paper_forward: mean_abs=0.5266191363334656, max_abs=2.75, mean_rel=0.1395222544670105, max_rel=32.880226135253906, norm_rel=0.024139543995261192, ref_abs_avg=22.34365463256836, test_abs_avg=22.305706024169922
production_forward grad[57] vs paper_forward: mean_abs=0.6447056531906128, max_abs=4.375, mean_rel=0.15975321829319, max_rel=1290.7030029296875, norm_rel=0.023977026343345642, ref_abs_avg=26.883485794067383, test_abs_avg=26.888145446777344
production_forward grad[58] vs paper_forward: mean_abs=0.6328392028808594, max_abs=4.25, mean_rel=0.1473846733570099, max_rel=595.272705078125, norm_rel=0.023962166160345078, ref_abs_avg=26.467296600341797, test_abs_avg=26.466833114624023
production_forward grad[59] vs paper_forward: mean_abs=0.5102431774139404, max_abs=2.0, mean_rel=0.1238560676574707, max_rel=14.210551261901855, norm_rel=0.024351028725504875, ref_abs_avg=20.555030822753906, test_abs_avg=20.528278350830078
production_forward grad[60] vs paper_forward: mean_abs=0.5991656184196472, max_abs=4.0, mean_rel=0.16006115078926086, max_rel=1062.99462890625, norm_rel=0.023586487397551537, ref_abs_avg=25.398757934570312, test_abs_avg=25.402379989624023
production_forward grad[61] vs paper_forward: mean_abs=0.5862406492233276, max_abs=4.0, mean_rel=0.14791664481163025, max_rel=540.2449340820312, norm_rel=0.023378023877739906, ref_abs_avg=25.148372650146484, test_abs_avg=25.14965057373047
production_forward grad[62] vs paper_forward: mean_abs=0.47828006744384766, max_abs=1.6875, mean_rel=0.11640214920043945, max_rel=16.81736183166504, norm_rel=0.021891839802265167, ref_abs_avg=21.36952018737793, test_abs_avg=21.37923812866211
production_forward grad[63] vs paper_forward: mean_abs=0.5736292600631714, max_abs=4.125, mean_rel=0.15242193639278412, max_rel=987.1666870117188, norm_rel=0.023084163665771484, ref_abs_avg=24.840877532958984, test_abs_avg=24.845027923583984
production_forward grad[64] vs paper_forward: mean_abs=0.5584641695022583, max_abs=4.125, mean_rel=0.15138699114322662, max_rel=713.1561889648438, norm_rel=0.022632822394371033, ref_abs_avg=24.652873992919922, test_abs_avg=24.650875091552734
production_forward grad[65] vs paper_forward: mean_abs=0.45328330993652344, max_abs=1.875, mean_rel=0.12672662734985352, max_rel=10.828511238098145, norm_rel=0.02588237263262272, ref_abs_avg=17.808656692504883, test_abs_avg=17.794795989990234
production_forward grad[66] vs paper_forward: mean_abs=0.5428460836410522, max_abs=3.75, mean_rel=0.15405352413654327, max_rel=1451.529296875, norm_rel=0.0230262354016304, ref_abs_avg=23.607202529907227, test_abs_avg=23.608182907104492
production_forward grad[67] vs paper_forward: mean_abs=0.5332059860229492, max_abs=3.59375, mean_rel=0.14833885431289673, max_rel=785.0802001953125, norm_rel=0.022785471752285957, ref_abs_avg=23.4188232421875, test_abs_avg=23.41301155090332
production_forward grad[68] vs paper_forward: mean_abs=0.42465561628341675, max_abs=1.8125, mean_rel=0.16398054361343384, max_rel=39.53496551513672, norm_rel=0.02198771759867668, ref_abs_avg=19.492490768432617, test_abs_avg=19.45941162109375
production_forward grad[69] vs paper_forward: mean_abs=0.5117159485816956, max_abs=3.70703125, mean_rel=0.14411652088165283, max_rel=635.4049682617188, norm_rel=0.022175267338752747, ref_abs_avg=23.04941177368164, test_abs_avg=23.050582885742188
production_forward grad[70] vs paper_forward: mean_abs=0.5022590160369873, max_abs=3.75, mean_rel=0.1464952528476715, max_rel=512.2227783203125, norm_rel=0.022008588537573814, ref_abs_avg=22.826370239257812, test_abs_avg=22.836734771728516
production_forward grad[71] vs paper_forward: mean_abs=0.40982913970947266, max_abs=1.9375, mean_rel=0.0700875073671341, max_rel=3.353394031524658, norm_rel=0.02139228582382202, ref_abs_avg=18.789501190185547, test_abs_avg=18.77777862548828
production_forward grad[72] vs paper_forward: mean_abs=0.4935030937194824, max_abs=3.5, mean_rel=0.15698187053203583, max_rel=1331.9766845703125, norm_rel=0.022401822730898857, ref_abs_avg=22.035436630249023, test_abs_avg=22.03746795654297
production_forward grad[73] vs paper_forward: mean_abs=0.4859909415245056, max_abs=3.5, mean_rel=0.13474196195602417, max_rel=800.8939819335938, norm_rel=0.02215959131717682, ref_abs_avg=21.91643524169922, test_abs_avg=21.916915893554688
production_forward grad[74] vs paper_forward: mean_abs=0.4828147888183594, max_abs=1.953125, mean_rel=0.09290885180234909, max_rel=3.4196386337280273, norm_rel=0.025066407397389412, ref_abs_avg=19.602495193481445, test_abs_avg=19.5863094329834
production_forward grad[75] vs paper_forward: mean_abs=0.5698778629302979, max_abs=4.3125, mean_rel=0.16345733404159546, max_rel=1051.12060546875, norm_rel=0.023277224972844124, ref_abs_avg=24.467998504638672, test_abs_avg=24.46926498413086
production_forward grad[76] vs paper_forward: mean_abs=0.5602322220802307, max_abs=4.3125, mean_rel=0.15410421788692474, max_rel=593.224853515625, norm_rel=0.023365087807178497, ref_abs_avg=24.05145263671875, test_abs_avg=24.046268463134766
production_forward grad[77] vs paper_forward: mean_abs=0.4260239601135254, max_abs=1.75, mean_rel=0.08027816563844681, max_rel=2.9759647846221924, norm_rel=0.022529175505042076, ref_abs_avg=18.867801666259766, test_abs_avg=18.84417724609375
production_forward grad[78] vs paper_forward: mean_abs=0.5288444757461548, max_abs=5.130859375, mean_rel=0.14511889219284058, max_rel=710.6123657226562, norm_rel=0.02275310829281807, ref_abs_avg=23.23119354248047, test_abs_avg=23.23430061340332
production_forward grad[79] vs paper_forward: mean_abs=0.507059633731842, max_abs=4.421875, mean_rel=0.15220907330513, max_rel=1079.2451171875, norm_rel=0.022595621645450592, ref_abs_avg=22.473499298095703, test_abs_avg=22.476816177368164
production_forward grad[80] vs paper_forward: mean_abs=0.39149045944213867, max_abs=1.625, mean_rel=0.12975472211837769, max_rel=17.079696655273438, norm_rel=0.021818924695253372, ref_abs_avg=18.22663116455078, test_abs_avg=18.242937088012695
production_forward grad[81] vs paper_forward: mean_abs=0.48252540826797485, max_abs=4.0, mean_rel=0.14195583760738373, max_rel=651.3770751953125, norm_rel=0.022171977907419205, ref_abs_avg=21.771211624145508, test_abs_avg=21.770536422729492
production_forward grad[82] vs paper_forward: mean_abs=0.4723203778266907, max_abs=4.0, mean_rel=0.13227149844169617, max_rel=866.8185424804688, norm_rel=0.02235020510852337, ref_abs_avg=21.239639282226562, test_abs_avg=21.245668411254883
production_forward grad[83] vs paper_forward: mean_abs=0.36899852752685547, max_abs=1.25, mean_rel=0.09010029584169388, max_rel=6.870510578155518, norm_rel=0.021227559074759483, ref_abs_avg=17.652252197265625, test_abs_avg=17.657161712646484
production_forward grad[84] vs paper_forward: mean_abs=0.44273045659065247, max_abs=4.3125, mean_rel=0.1305491030216217, max_rel=705.6141967773438, norm_rel=0.021124236285686493, ref_abs_avg=21.013282775878906, test_abs_avg=21.01412010192871
production_forward grad[85] vs paper_forward: mean_abs=0.42912811040878296, max_abs=3.25, mean_rel=0.1359434574842453, max_rel=520.0132446289062, norm_rel=0.0212227962911129, ref_abs_avg=20.29203987121582, test_abs_avg=20.299549102783203
production_forward grad[86] vs paper_forward: mean_abs=0.3556241989135742, max_abs=1.59375, mean_rel=0.092844158411026, max_rel=8.575883865356445, norm_rel=0.021704189479351044, ref_abs_avg=16.789966583251953, test_abs_avg=16.788047790527344
production_forward grad[87] vs paper_forward: mean_abs=0.41411471366882324, max_abs=4.0, mean_rel=0.12891581654548645, max_rel=604.5701904296875, norm_rel=0.020852819085121155, ref_abs_avg=19.957019805908203, test_abs_avg=19.959396362304688
production_forward grad[88] vs paper_forward: mean_abs=0.4126437306404114, max_abs=4.0, mean_rel=0.12786057591438293, max_rel=736.0169067382812, norm_rel=0.0211371760815382, ref_abs_avg=19.689491271972656, test_abs_avg=19.68933868408203
production_forward grad[89] vs paper_forward: mean_abs=0.3536376953125, max_abs=1.25, mean_rel=0.08626540005207062, max_rel=4.288085460662842, norm_rel=0.02198617160320282, ref_abs_avg=15.955394744873047, test_abs_avg=15.95290470123291
production_forward grad[90] vs paper_forward: mean_abs=0.394855797290802, max_abs=3.75, mean_rel=0.13487595319747925, max_rel=550.947021484375, norm_rel=0.020495200529694557, ref_abs_avg=19.3695068359375, test_abs_avg=19.370820999145508
production_forward grad[91] vs paper_forward: mean_abs=0.38067805767059326, max_abs=3.5, mean_rel=0.1286252737045288, max_rel=563.9064331054688, norm_rel=0.01975499466061592, ref_abs_avg=19.345115661621094, test_abs_avg=19.351722717285156
production_forward grad[92] vs paper_forward: mean_abs=0.3115050196647644, max_abs=1.5, mean_rel=0.184164896607399, max_rel=40.07530975341797, norm_rel=0.02009190060198307, ref_abs_avg=15.763510704040527, test_abs_avg=15.797310829162598
production_forward grad[93] vs paper_forward: mean_abs=0.3702397346496582, max_abs=5.0, mean_rel=0.12721258401870728, max_rel=624.2447509765625, norm_rel=0.019920283928513527, ref_abs_avg=18.774188995361328, test_abs_avg=18.773632049560547
production_forward grad[94] vs paper_forward: mean_abs=0.3598160743713379, max_abs=3.375, mean_rel=0.12620443105697632, max_rel=613.0035400390625, norm_rel=0.019593480974435806, ref_abs_avg=18.63051414489746, test_abs_avg=18.634227752685547
production_forward grad[95] vs paper_forward: mean_abs=0.3101479709148407, max_abs=1.25, mean_rel=0.15766067802906036, max_rel=36.774227142333984, norm_rel=0.020263321697711945, ref_abs_avg=15.347694396972656, test_abs_avg=15.327895164489746
production_forward grad[96] vs paper_forward: mean_abs=0.3424396514892578, max_abs=4.18359375, mean_rel=0.12263244390487671, max_rel=1167.620361328125, norm_rel=0.019141700118780136, ref_abs_avg=18.171680450439453, test_abs_avg=18.17171859741211
production_forward grad[97] vs paper_forward: mean_abs=0.33973783254623413, max_abs=3.5, mean_rel=0.1197495311498642, max_rel=565.4859619140625, norm_rel=0.01912401244044304, ref_abs_avg=18.00951385498047, test_abs_avg=18.0202579498291
torch_compile_phases_forward vs paper_forward output: mean_abs=0.00165324448607862, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008796913549304008, max_abs=0.43359375, mean_rel=0.07559118419885635, max_rel=103.21298217773438, norm_rel=0.02073061466217041, ref_abs_avg=0.459066241979599, test_abs_avg=0.4590792655944824
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.591123104095459, max_abs=56.0, mean_rel=0.19232654571533203, max_rel=770.6726684570312, norm_rel=0.02098901756107807, ref_abs_avg=320.4160461425781, test_abs_avg=320.5
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.267085075378418, max_abs=5.0, mean_rel=0.08987784385681152, max_rel=3.075760841369629, norm_rel=0.022761713713407516, ref_abs_avg=56.0364990234375, test_abs_avg=56.0511360168457
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.5606231689453125, max_abs=10.0, mean_rel=0.1621570587158203, max_rel=2911.361572265625, norm_rel=0.023973001167178154, ref_abs_avg=65.3734359741211, test_abs_avg=65.3819580078125
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5126445293426514, max_abs=9.0, mean_rel=0.16007497906684875, max_rel=1214.9349365234375, norm_rel=0.02359982766211033, ref_abs_avg=64.3333511352539, test_abs_avg=64.34307098388672
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.0981063842773438, max_abs=4.625, mean_rel=0.06846991181373596, max_rel=1.9857330322265625, norm_rel=0.022794945165514946, ref_abs_avg=49.15776824951172, test_abs_avg=49.13855743408203
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.393824815750122, max_abs=9.875, mean_rel=0.15780624747276306, max_rel=1432.177734375, norm_rel=0.023786121979355812, ref_abs_avg=58.84486389160156, test_abs_avg=58.85261917114258
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.352924108505249, max_abs=8.0, mean_rel=0.1716260015964508, max_rel=2705.9638671875, norm_rel=0.0234854593873024, ref_abs_avg=57.86680221557617, test_abs_avg=57.88258743286133
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0905466079711914, max_abs=5.0, mean_rel=0.06783199310302734, max_rel=1.5720620155334473, norm_rel=0.025929421186447144, ref_abs_avg=42.72032928466797, test_abs_avg=42.750633239746094
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.2628461122512817, max_abs=8.0, mean_rel=0.17232026159763336, max_rel=1397.78955078125, norm_rel=0.023557858541607857, ref_abs_avg=53.87067794799805, test_abs_avg=53.878543853759766
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.2315514087677002, max_abs=7.0, mean_rel=0.14528964459896088, max_rel=553.8026123046875, norm_rel=0.023300355300307274, ref_abs_avg=53.133949279785156, test_abs_avg=53.14027404785156
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=1.0263776779174805, max_abs=4.125, mean_rel=0.11071816086769104, max_rel=6.666271686553955, norm_rel=0.025049543008208275, ref_abs_avg=39.10554504394531, test_abs_avg=39.040828704833984
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.154857873916626, max_abs=7.5, mean_rel=0.15305832028388977, max_rel=1384.3699951171875, norm_rel=0.023246217519044876, ref_abs_avg=49.90147018432617, test_abs_avg=49.902931213378906
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1282153129577637, max_abs=7.0, mean_rel=0.1579926759004593, max_rel=1819.4443359375, norm_rel=0.023102277889847755, ref_abs_avg=49.09611511230469, test_abs_avg=49.099342346191406
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8839735984802246, max_abs=3.5, mean_rel=0.09648020565509796, max_rel=13.167284965515137, norm_rel=0.023852385580539703, ref_abs_avg=36.98609924316406, test_abs_avg=36.990509033203125
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.077850580215454, max_abs=7.0, mean_rel=0.15778306126594543, max_rel=1163.8280029296875, norm_rel=0.023048754781484604, ref_abs_avg=47.01218795776367, test_abs_avg=47.01117706298828
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0571439266204834, max_abs=7.0, mean_rel=0.16323170065879822, max_rel=1406.2205810546875, norm_rel=0.02295403741300106, ref_abs_avg=46.26674270629883, test_abs_avg=46.26842498779297
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8553543090820312, max_abs=3.25, mean_rel=0.11572533845901489, max_rel=17.310564041137695, norm_rel=0.023111822083592415, ref_abs_avg=37.528446197509766, test_abs_avg=37.49330139160156
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0124895572662354, max_abs=6.5, mean_rel=0.15377606451511383, max_rel=963.4181518554688, norm_rel=0.022856751456856728, ref_abs_avg=44.49477767944336, test_abs_avg=44.49684143066406
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.988336443901062, max_abs=6.0, mean_rel=0.16018280386924744, max_rel=1034.337158203125, norm_rel=0.022682076320052147, ref_abs_avg=43.74237823486328, test_abs_avg=43.74663543701172
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7813546657562256, max_abs=4.0, mean_rel=0.07855775952339172, max_rel=5.570196151733398, norm_rel=0.024178920313715935, ref_abs_avg=32.73814010620117, test_abs_avg=32.80183792114258
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9522197246551514, max_abs=6.25, mean_rel=0.1700076013803482, max_rel=1472.8499755859375, norm_rel=0.02278730273246765, ref_abs_avg=41.953399658203125, test_abs_avg=41.955928802490234
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9339059591293335, max_abs=6.0, mean_rel=0.1507480889558792, max_rel=2036.4705810546875, norm_rel=0.022496307268738747, ref_abs_avg=41.719825744628906, test_abs_avg=41.72589111328125
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.750281810760498, max_abs=3.0, mean_rel=0.15043506026268005, max_rel=32.181278228759766, norm_rel=0.023851102218031883, ref_abs_avg=31.52224349975586, test_abs_avg=31.528148651123047
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9109101295471191, max_abs=6.0, mean_rel=0.17040690779685974, max_rel=1407.251708984375, norm_rel=0.022532233968377113, ref_abs_avg=40.5753173828125, test_abs_avg=40.577335357666016
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8928764462471008, max_abs=5.25, mean_rel=0.1624753177165985, max_rel=2509.16015625, norm_rel=0.022438958287239075, ref_abs_avg=39.965858459472656, test_abs_avg=39.97271728515625
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.9292259216308594, max_abs=3.5, mean_rel=0.3051283657550812, max_rel=64.29718780517578, norm_rel=0.025142954662442207, ref_abs_avg=35.585262298583984, test_abs_avg=35.577842712402344
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.071785569190979, max_abs=7.0, mean_rel=0.18116530776023865, max_rel=1565.5770263671875, norm_rel=0.024554142728447914, ref_abs_avg=43.829437255859375, test_abs_avg=43.832401275634766
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0447765588760376, max_abs=6.5, mean_rel=0.1804535984992981, max_rel=1902.4593505859375, norm_rel=0.02438819780945778, ref_abs_avg=42.99359893798828, test_abs_avg=42.99253463745117
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.77935791015625, max_abs=3.75, mean_rel=0.09077216684818268, max_rel=8.653514862060547, norm_rel=0.02250521443784237, ref_abs_avg=36.35183334350586, test_abs_avg=36.323787689208984
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.9895302057266235, max_abs=6.25, mean_rel=0.1712392121553421, max_rel=1308.95751953125, norm_rel=0.024761272594332695, ref_abs_avg=40.0872802734375, test_abs_avg=40.088539123535156
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.968360424041748, max_abs=5.875, mean_rel=0.1624646633863449, max_rel=733.8062133789062, norm_rel=0.024652961641550064, ref_abs_avg=39.37628936767578, test_abs_avg=39.373130798339844
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7148818969726562, max_abs=3.96875, mean_rel=0.07978463172912598, max_rel=13.734353065490723, norm_rel=0.024266202002763748, ref_abs_avg=30.18592071533203, test_abs_avg=30.17566680908203
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9139941930770874, max_abs=6.5, mean_rel=0.1743442565202713, max_rel=1044.869140625, norm_rel=0.024671288207173347, ref_abs_avg=37.19481658935547, test_abs_avg=37.19397735595703
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8974382281303406, max_abs=6.0, mean_rel=0.15225383639335632, max_rel=841.9248657226562, norm_rel=0.024483447894454002, ref_abs_avg=36.71428298950195, test_abs_avg=36.722198486328125
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.7141361236572266, max_abs=2.625, mean_rel=0.18771491944789886, max_rel=42.016117095947266, norm_rel=0.024560537189245224, ref_abs_avg=29.077392578125, test_abs_avg=29.104774475097656
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8542202711105347, max_abs=6.0, mean_rel=0.17442406713962555, max_rel=2147.542724609375, norm_rel=0.02440088428556919, ref_abs_avg=35.134063720703125, test_abs_avg=35.13739776611328
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8406044244766235, max_abs=5.0, mean_rel=0.16789034008979797, max_rel=963.3812255859375, norm_rel=0.02424774132668972, ref_abs_avg=34.81127166748047, test_abs_avg=34.814178466796875
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6164182424545288, max_abs=2.5, mean_rel=0.20387884974479675, max_rel=49.502986907958984, norm_rel=0.022587088868021965, ref_abs_avg=27.639110565185547, test_abs_avg=27.64192008972168
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8020453453063965, max_abs=5.5, mean_rel=0.1544291377067566, max_rel=685.0562133789062, norm_rel=0.02402246929705143, ref_abs_avg=33.455379486083984, test_abs_avg=33.455955505371094
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7932590842247009, max_abs=5.0625, mean_rel=0.14504799246788025, max_rel=279.32257080078125, norm_rel=0.02409331500530243, ref_abs_avg=33.00205993652344, test_abs_avg=33.016151428222656
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6233043670654297, max_abs=2.65625, mean_rel=0.08139267563819885, max_rel=3.657209873199463, norm_rel=0.022914638742804527, ref_abs_avg=27.178325653076172, test_abs_avg=27.17637825012207
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7618116140365601, max_abs=5.0, mean_rel=0.1646352857351303, max_rel=1487.98193359375, norm_rel=0.02381305769085884, ref_abs_avg=32.04686737060547, test_abs_avg=32.05077362060547
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7453799247741699, max_abs=4.875, mean_rel=0.16183024644851685, max_rel=816.250244140625, norm_rel=0.023770950734615326, ref_abs_avg=31.4578857421875, test_abs_avg=31.463428497314453
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.6146128177642822, max_abs=2.5, mean_rel=0.10041016340255737, max_rel=10.467469215393066, norm_rel=0.02400125190615654, ref_abs_avg=25.394744873046875, test_abs_avg=25.406436920166016
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.729487419128418, max_abs=4.8515625, mean_rel=0.15043634176254272, max_rel=1104.1710205078125, norm_rel=0.023617852479219437, ref_abs_avg=30.93465805053711, test_abs_avg=30.936634063720703
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7124204039573669, max_abs=4.5, mean_rel=0.1472778022289276, max_rel=631.2794189453125, norm_rel=0.02332126535475254, ref_abs_avg=30.6446533203125, test_abs_avg=30.64561653137207
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5790669918060303, max_abs=2.25, mean_rel=0.32163625955581665, max_rel=87.58443450927734, norm_rel=0.024359440430998802, ref_abs_avg=23.886869430541992, test_abs_avg=23.85071563720703
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.6931669116020203, max_abs=4.375, mean_rel=0.15812091529369354, max_rel=552.2486572265625, norm_rel=0.02343817427754402, ref_abs_avg=29.569833755493164, test_abs_avg=29.571697235107422
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6803508996963501, max_abs=4.2021484375, mean_rel=0.15537118911743164, max_rel=829.2599487304688, norm_rel=0.02319248393177986, ref_abs_avg=29.37718963623047, test_abs_avg=29.376441955566406
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6874294281005859, max_abs=2.625, mean_rel=0.09263977408409119, max_rel=7.338068008422852, norm_rel=0.02650301344692707, ref_abs_avg=26.28071403503418, test_abs_avg=26.317855834960938
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.7772656679153442, max_abs=5.75, mean_rel=0.1754671335220337, max_rel=826.946044921875, norm_rel=0.025196127593517303, ref_abs_avg=30.878366470336914, test_abs_avg=30.88155746459961
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7573320865631104, max_abs=5.25, mean_rel=0.16451402008533478, max_rel=865.9722900390625, norm_rel=0.02516106143593788, ref_abs_avg=30.19180679321289, test_abs_avg=30.196514129638672
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.575838565826416, max_abs=2.5, mean_rel=0.0696554034948349, max_rel=2.691720485687256, norm_rel=0.02413676679134369, ref_abs_avg=23.673542022705078, test_abs_avg=23.73253631591797
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7158445715904236, max_abs=5.25, mean_rel=0.17550459504127502, max_rel=936.37548828125, norm_rel=0.024858612567186356, ref_abs_avg=28.8134765625, test_abs_avg=28.816478729248047
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6993459463119507, max_abs=5.0625, mean_rel=0.1627878099679947, max_rel=937.3577880859375, norm_rel=0.024826155975461006, ref_abs_avg=28.278791427612305, test_abs_avg=28.285871505737305
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5327905416488647, max_abs=2.3125, mean_rel=0.18085940182209015, max_rel=43.173309326171875, norm_rel=0.024465365335345268, ref_abs_avg=22.34365463256836, test_abs_avg=22.305870056152344
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6546027660369873, max_abs=4.5, mean_rel=0.1589108556509018, max_rel=967.9248046875, norm_rel=0.024324432015419006, ref_abs_avg=26.883485794067383, test_abs_avg=26.886281967163086
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6417403817176819, max_abs=4.0, mean_rel=0.1509692221879959, max_rel=763.84326171875, norm_rel=0.0243032518774271, ref_abs_avg=26.467296600341797, test_abs_avg=26.468338012695312
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.49034690856933594, max_abs=1.75, mean_rel=0.12159715592861176, max_rel=15.037680625915527, norm_rel=0.02394496463239193, ref_abs_avg=20.555030822753906, test_abs_avg=20.533525466918945
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6070914268493652, max_abs=4.17578125, mean_rel=0.16455566883087158, max_rel=1082.0068359375, norm_rel=0.02387770637869835, ref_abs_avg=25.398757934570312, test_abs_avg=25.40228843688965
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.5951454043388367, max_abs=4.0078125, mean_rel=0.1536531299352646, max_rel=814.5397338867188, norm_rel=0.02373410388827324, ref_abs_avg=25.148372650146484, test_abs_avg=25.14737319946289
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4886813163757324, max_abs=1.8125, mean_rel=0.1398707926273346, max_rel=19.35970115661621, norm_rel=0.022612133994698524, ref_abs_avg=21.36952018737793, test_abs_avg=21.377120971679688
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.5806448459625244, max_abs=4.25, mean_rel=0.15358608961105347, max_rel=855.5234375, norm_rel=0.02335435338318348, ref_abs_avg=24.840877532958984, test_abs_avg=24.844802856445312
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5665051937103271, max_abs=4.25, mean_rel=0.15787175297737122, max_rel=791.93505859375, norm_rel=0.02294417843222618, ref_abs_avg=24.652873992919922, test_abs_avg=24.65011978149414
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.46053266525268555, max_abs=1.90625, mean_rel=0.13637924194335938, max_rel=8.796440124511719, norm_rel=0.02568124607205391, ref_abs_avg=17.808656692504883, test_abs_avg=17.783205032348633
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5492949485778809, max_abs=4.1875, mean_rel=0.15536411106586456, max_rel=1660.6744384765625, norm_rel=0.023281343281269073, ref_abs_avg=23.607202529907227, test_abs_avg=23.607301712036133
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5397353172302246, max_abs=3.25, mean_rel=0.14749866724014282, max_rel=516.5013427734375, norm_rel=0.023033518344163895, ref_abs_avg=23.4188232421875, test_abs_avg=23.412290573120117
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.4366621971130371, max_abs=1.90625, mean_rel=0.13606107234954834, max_rel=34.428340911865234, norm_rel=0.022218650206923485, ref_abs_avg=19.492490768432617, test_abs_avg=19.471778869628906
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5175907015800476, max_abs=4.453125, mean_rel=0.14504221081733704, max_rel=725.5101928710938, norm_rel=0.022405890747904778, ref_abs_avg=23.04941177368164, test_abs_avg=23.051044464111328
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5066094398498535, max_abs=4.0, mean_rel=0.14925004541873932, max_rel=796.5723876953125, norm_rel=0.022198021411895752, ref_abs_avg=22.826370239257812, test_abs_avg=22.83661651611328
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.39505958557128906, max_abs=1.75, mean_rel=0.06893983483314514, max_rel=3.2717859745025635, norm_rel=0.02100296877324581, ref_abs_avg=18.789501190185547, test_abs_avg=18.77235221862793
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.49753475189208984, max_abs=3.5625, mean_rel=0.15173396468162537, max_rel=774.9618530273438, norm_rel=0.022566452622413635, ref_abs_avg=22.035436630249023, test_abs_avg=22.035999298095703
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.4905327260494232, max_abs=3.5, mean_rel=0.13734208047389984, max_rel=778.1189575195312, norm_rel=0.022395838052034378, ref_abs_avg=21.91643524169922, test_abs_avg=21.914722442626953
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4842686653137207, max_abs=2.078125, mean_rel=0.09605444967746735, max_rel=3.9692234992980957, norm_rel=0.025088757276535034, ref_abs_avg=19.602495193481445, test_abs_avg=19.574960708618164
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5767514705657959, max_abs=4.25, mean_rel=0.16052059829235077, max_rel=837.5849609375, norm_rel=0.02356010675430298, ref_abs_avg=24.467998504638672, test_abs_avg=24.469768524169922
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5686755180358887, max_abs=3.75, mean_rel=0.15655207633972168, max_rel=615.424072265625, norm_rel=0.02368374541401863, ref_abs_avg=24.05145263671875, test_abs_avg=24.04924964904785
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.42807841300964355, max_abs=1.75, mean_rel=0.07963189482688904, max_rel=3.7393977642059326, norm_rel=0.022672107443213463, ref_abs_avg=18.867801666259766, test_abs_avg=18.845932006835938
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5345446467399597, max_abs=5.380859375, mean_rel=0.14950230717658997, max_rel=1209.145751953125, norm_rel=0.022995537146925926, ref_abs_avg=23.23119354248047, test_abs_avg=23.234487533569336
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.5152830481529236, max_abs=4.5625, mean_rel=0.1508941352367401, max_rel=977.0508422851562, norm_rel=0.022953473031520844, ref_abs_avg=22.473499298095703, test_abs_avg=22.473770141601562
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.402512788772583, max_abs=1.546875, mean_rel=0.07495936751365662, max_rel=2.808669090270996, norm_rel=0.02198336087167263, ref_abs_avg=18.22663116455078, test_abs_avg=18.237716674804688
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.4873928427696228, max_abs=3.703125, mean_rel=0.14352425932884216, max_rel=879.9217529296875, norm_rel=0.022396888583898544, ref_abs_avg=21.771211624145508, test_abs_avg=21.77105140686035
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4774739146232605, max_abs=3.75, mean_rel=0.13496381044387817, max_rel=468.1062316894531, norm_rel=0.022588487714529037, ref_abs_avg=21.239639282226562, test_abs_avg=21.24614715576172
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3784494400024414, max_abs=1.375, mean_rel=0.0895194560289383, max_rel=5.0707597732543945, norm_rel=0.021476393565535545, ref_abs_avg=17.652252197265625, test_abs_avg=17.67158317565918
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.44649744033813477, max_abs=4.25, mean_rel=0.13766267895698547, max_rel=878.5010986328125, norm_rel=0.021291328594088554, ref_abs_avg=21.013282775878906, test_abs_avg=21.013694763183594
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.431794136762619, max_abs=3.5, mean_rel=0.13817772269248962, max_rel=619.329833984375, norm_rel=0.02142091654241085, ref_abs_avg=20.29203987121582, test_abs_avg=20.298282623291016
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.34034061431884766, max_abs=1.78125, mean_rel=0.09006820619106293, max_rel=6.296977519989014, norm_rel=0.02058319002389908, ref_abs_avg=16.789966583251953, test_abs_avg=16.780115127563477
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.41706159710884094, max_abs=4.25, mean_rel=0.1297827810049057, max_rel=596.8394165039062, norm_rel=0.020979836583137512, ref_abs_avg=19.957019805908203, test_abs_avg=19.959365844726562
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.4165239930152893, max_abs=4.0, mean_rel=0.13217437267303467, max_rel=720.2942504882812, norm_rel=0.02135845087468624, ref_abs_avg=19.689491271972656, test_abs_avg=19.694969177246094
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.3450765609741211, max_abs=1.125, mean_rel=0.09137720614671707, max_rel=10.927701950073242, norm_rel=0.02119375765323639, ref_abs_avg=15.955394744873047, test_abs_avg=15.95400619506836
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.39727091789245605, max_abs=4.0, mean_rel=0.13591307401657104, max_rel=589.0117797851562, norm_rel=0.0206157173961401, ref_abs_avg=19.3695068359375, test_abs_avg=19.370758056640625
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.38503509759902954, max_abs=3.625, mean_rel=0.1321340948343277, max_rel=623.2647094726562, norm_rel=0.019973929971456528, ref_abs_avg=19.345115661621094, test_abs_avg=19.34759521484375
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.3141477704048157, max_abs=1.25, mean_rel=0.32646608352661133, max_rel=110.21232604980469, norm_rel=0.02018456533551216, ref_abs_avg=15.763510704040527, test_abs_avg=15.789545059204102
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.3715730607509613, max_abs=4.5, mean_rel=0.1241852194070816, max_rel=685.2992553710938, norm_rel=0.01997782662510872, ref_abs_avg=18.774188995361328, test_abs_avg=18.774066925048828
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3630167245864868, max_abs=3.4375, mean_rel=0.13014855980873108, max_rel=705.3041381835938, norm_rel=0.01976119540631771, ref_abs_avg=18.63051414489746, test_abs_avg=18.6351261138916
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.29559779167175293, max_abs=1.25, mean_rel=0.10256273299455643, max_rel=12.09619426727295, norm_rel=0.019730616360902786, ref_abs_avg=15.347694396972656, test_abs_avg=15.320915222167969
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.34342288970947266, max_abs=3.8011474609375, mean_rel=0.12274359166622162, max_rel=1086.3341064453125, norm_rel=0.01919695921242237, ref_abs_avg=18.171680450439453, test_abs_avg=18.172224044799805
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.3375081419944763, max_abs=3.5, mean_rel=0.11657179892063141, max_rel=615.06103515625, norm_rel=0.018998464569449425, ref_abs_avg=18.00951385498047, test_abs_avg=18.024181365966797
identity layers + randn queries
production_forward fwd+bwd:  66.275 ms
production_forward bwd-only: 56.392 ms
production_forward peak allocated: fwd=7.614 GiB, fwd+bwd=15.618 GiB
production_forward peak reserved:  fwd=27.371 GiB, fwd+bwd=27.371 GiB
paper_forward fwd+bwd:  221.177 ms
paper_forward bwd-only: 173.967 ms
paper_forward peak allocated: fwd=35.128 GiB, fwd+bwd=37.247 GiB
paper_forward peak reserved:  fwd=36.166 GiB, fwd+bwd=38.666 GiB
torch_compile_phases_forward fwd+bwd:  94.909 ms
torch_compile_phases_forward bwd-only: 76.507 ms
torch_compile_phases_forward peak allocated: fwd=18.203 GiB, fwd+bwd=18.831 GiB
torch_compile_phases_forward peak reserved:  fwd=27.398 GiB, fwd+bwd=27.398 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0015950091183185577, max_abs=0.0390625
production_forward grad[0] vs paper_forward: mean_abs=0.008043903857469559, max_abs=0.40625, mean_rel=0.07047916948795319, max_rel=121.73290252685547, norm_rel=0.019338367506861687, ref_abs_avg=0.4523320198059082, test_abs_avg=0.45234376192092896
production_forward grad[1] vs paper_forward: mean_abs=6.974730491638184, max_abs=48.0, mean_rel=0.12695229053497314, max_rel=104.84223937988281, norm_rel=0.019747665151953697, ref_abs_avg=313.39501953125, test_abs_avg=313.40631103515625
production_forward grad[2] vs paper_forward: mean_abs=1.3020706176757812, max_abs=4.75, mean_rel=0.10968004912137985, max_rel=11.32137680053711, norm_rel=0.024724479764699936, ref_abs_avg=52.85480880737305, test_abs_avg=52.82827377319336
production_forward grad[3] vs paper_forward: mean_abs=1.4406064748764038, max_abs=9.25, mean_rel=0.1708848476409912, max_rel=1951.9805908203125, norm_rel=0.02269710786640644, ref_abs_avg=63.85650634765625, test_abs_avg=63.8643913269043
production_forward grad[4] vs paper_forward: mean_abs=1.4134156703948975, max_abs=9.0, mean_rel=0.16933313012123108, max_rel=1764.3653564453125, norm_rel=0.022649845108389854, ref_abs_avg=62.702152252197266, test_abs_avg=62.695438385009766
production_forward grad[5] vs paper_forward: mean_abs=1.09627366065979, max_abs=5.375, mean_rel=0.12665969133377075, max_rel=17.78216552734375, norm_rel=0.023770280182361603, ref_abs_avg=46.38446044921875, test_abs_avg=46.41971969604492
production_forward grad[6] vs paper_forward: mean_abs=1.2666044235229492, max_abs=8.0, mean_rel=0.17953138053417206, max_rel=2719.451416015625, norm_rel=0.02241692692041397, ref_abs_avg=56.81344985961914, test_abs_avg=56.82051086425781
production_forward grad[7] vs paper_forward: mean_abs=1.2321349382400513, max_abs=7.375, mean_rel=0.16376031935214996, max_rel=1451.7774658203125, norm_rel=0.022248366847634315, ref_abs_avg=55.64513397216797, test_abs_avg=55.643646240234375
production_forward grad[8] vs paper_forward: mean_abs=0.989284336566925, max_abs=4.125, mean_rel=0.7163071632385254, max_rel=283.2282409667969, norm_rel=0.022644290700554848, ref_abs_avg=43.47526550292969, test_abs_avg=43.43124771118164
production_forward grad[9] vs paper_forward: mean_abs=1.156860113143921, max_abs=7.125, mean_rel=0.15972468256950378, max_rel=1445.444580078125, norm_rel=0.02227117493748665, ref_abs_avg=52.17969512939453, test_abs_avg=52.1834602355957
production_forward grad[10] vs paper_forward: mean_abs=1.1241552829742432, max_abs=7.0, mean_rel=0.15210209786891937, max_rel=1417.7255859375, norm_rel=0.022022632881999016, ref_abs_avg=51.32337188720703, test_abs_avg=51.32931137084961
production_forward grad[11] vs paper_forward: mean_abs=0.857384443283081, max_abs=3.5, mean_rel=0.08346922695636749, max_rel=6.620571136474609, norm_rel=0.020981943234801292, ref_abs_avg=40.414947509765625, test_abs_avg=40.37037658691406
production_forward grad[12] vs paper_forward: mean_abs=1.0612739324569702, max_abs=7.25, mean_rel=0.1635822355747223, max_rel=3319.058349609375, norm_rel=0.022044643759727478, ref_abs_avg=48.40314483642578, test_abs_avg=48.40400695800781
production_forward grad[13] vs paper_forward: mean_abs=1.0292609930038452, max_abs=6.0, mean_rel=0.14116601645946503, max_rel=626.87548828125, norm_rel=0.021834857761859894, ref_abs_avg=47.38698196411133, test_abs_avg=47.39281463623047
production_forward grad[14] vs paper_forward: mean_abs=0.8472521305084229, max_abs=3.5, mean_rel=0.10075391083955765, max_rel=11.56149959564209, norm_rel=0.022406136617064476, ref_abs_avg=38.10015869140625, test_abs_avg=38.190406799316406
production_forward grad[15] vs paper_forward: mean_abs=0.9955525398254395, max_abs=6.0, mean_rel=0.14832395315170288, max_rel=1019.2686767578125, norm_rel=0.021942762657999992, ref_abs_avg=45.580665588378906, test_abs_avg=45.581016540527344
production_forward grad[16] vs paper_forward: mean_abs=0.9671547412872314, max_abs=7.0, mean_rel=0.141407310962677, max_rel=772.8536987304688, norm_rel=0.02162429876625538, ref_abs_avg=44.968727111816406, test_abs_avg=44.96944808959961
production_forward grad[17] vs paper_forward: mean_abs=0.7879390716552734, max_abs=3.125, mean_rel=0.0907692015171051, max_rel=4.159959316253662, norm_rel=0.02384135313332081, ref_abs_avg=32.910308837890625, test_abs_avg=32.96958923339844
production_forward grad[18] vs paper_forward: mean_abs=0.9357656240463257, max_abs=5.5, mean_rel=0.15540927648544312, max_rel=1351.5587158203125, norm_rel=0.021818486973643303, ref_abs_avg=43.08544158935547, test_abs_avg=43.08769226074219
production_forward grad[19] vs paper_forward: mean_abs=0.9122893214225769, max_abs=5.5, mean_rel=0.15534451603889465, max_rel=2060.6357421875, norm_rel=0.021501144394278526, ref_abs_avg=42.65062713623047, test_abs_avg=42.6535530090332
production_forward grad[20] vs paper_forward: mean_abs=0.7572293281555176, max_abs=2.75, mean_rel=0.10655753314495087, max_rel=5.162026405334473, norm_rel=0.02274986170232296, ref_abs_avg=32.65208053588867, test_abs_avg=32.715599060058594
production_forward grad[21] vs paper_forward: mean_abs=0.8915905952453613, max_abs=5.7890625, mean_rel=0.15177829563617706, max_rel=1019.2973022460938, norm_rel=0.021772759035229683, ref_abs_avg=41.14165115356445, test_abs_avg=41.14199447631836
production_forward grad[22] vs paper_forward: mean_abs=0.8649089336395264, max_abs=5.25, mean_rel=0.1539185643196106, max_rel=1641.9359130859375, norm_rel=0.02136390097439289, ref_abs_avg=40.67073059082031, test_abs_avg=40.67312240600586
production_forward grad[23] vs paper_forward: mean_abs=0.7206997871398926, max_abs=2.65625, mean_rel=0.1425744891166687, max_rel=16.979747772216797, norm_rel=0.021950585767626762, ref_abs_avg=32.599605560302734, test_abs_avg=32.58192825317383
production_forward grad[24] vs paper_forward: mean_abs=0.8455740213394165, max_abs=5.25, mean_rel=0.15695971250534058, max_rel=1791.522705078125, norm_rel=0.021589813753962517, ref_abs_avg=39.33604049682617, test_abs_avg=39.332523345947266
production_forward grad[25] vs paper_forward: mean_abs=0.8247722387313843, max_abs=5.125, mean_rel=0.14237037301063538, max_rel=622.4500122070312, norm_rel=0.021146461367607117, ref_abs_avg=39.16452407836914, test_abs_avg=39.16471862792969
production_forward grad[26] vs paper_forward: mean_abs=0.8380279541015625, max_abs=3.0, mean_rel=0.16687026619911194, max_rel=26.05136489868164, norm_rel=0.024891044944524765, ref_abs_avg=34.07139587402344, test_abs_avg=34.05481719970703
production_forward grad[27] vs paper_forward: mean_abs=0.9859452247619629, max_abs=7.0, mean_rel=0.1619449257850647, max_rel=1072.1883544921875, norm_rel=0.023548629134893417, ref_abs_avg=42.057228088378906, test_abs_avg=42.0588493347168
production_forward grad[28] vs paper_forward: mean_abs=0.9661052227020264, max_abs=6.0, mean_rel=0.1738981306552887, max_rel=2123.94921875, norm_rel=0.02333994023501873, ref_abs_avg=41.55974578857422, test_abs_avg=41.563148498535156
production_forward grad[29] vs paper_forward: mean_abs=0.7932968139648438, max_abs=3.625, mean_rel=0.08027805387973785, max_rel=2.945357322692871, norm_rel=0.025510115548968315, ref_abs_avg=30.99534797668457, test_abs_avg=31.090028762817383
production_forward grad[30] vs paper_forward: mean_abs=0.9157314300537109, max_abs=6.640625, mean_rel=0.16726115345954895, max_rel=1381.9073486328125, norm_rel=0.02370322495698929, ref_abs_avg=38.79198455810547, test_abs_avg=38.79130172729492
production_forward grad[31] vs paper_forward: mean_abs=0.898733913898468, max_abs=6.34375, mean_rel=0.16636459529399872, max_rel=1485.2657470703125, norm_rel=0.023705357685685158, ref_abs_avg=38.07465362548828, test_abs_avg=38.08544158935547
production_forward grad[32] vs paper_forward: mean_abs=0.7156021595001221, max_abs=3.0, mean_rel=0.14534306526184082, max_rel=10.677970886230469, norm_rel=0.025339119136333466, ref_abs_avg=28.11443328857422, test_abs_avg=28.136642456054688
production_forward grad[33] vs paper_forward: mean_abs=0.8499606251716614, max_abs=6.0, mean_rel=0.161357581615448, max_rel=1153.1708984375, norm_rel=0.023542800918221474, ref_abs_avg=36.191871643066406, test_abs_avg=36.193138122558594
production_forward grad[34] vs paper_forward: mean_abs=0.8381414413452148, max_abs=5.421875, mean_rel=0.16499626636505127, max_rel=783.5176391601562, norm_rel=0.023648560047149658, ref_abs_avg=35.5361442565918, test_abs_avg=35.535308837890625
production_forward grad[35] vs paper_forward: mean_abs=0.6850605010986328, max_abs=2.724609375, mean_rel=0.11636684834957123, max_rel=13.612792015075684, norm_rel=0.0240385290235281, ref_abs_avg=28.06060791015625, test_abs_avg=28.057374954223633
production_forward grad[36] vs paper_forward: mean_abs=0.8018349409103394, max_abs=5.03125, mean_rel=0.16284416615962982, max_rel=1345.8892822265625, norm_rel=0.023419342935085297, ref_abs_avg=34.30431365966797, test_abs_avg=34.30628204345703
production_forward grad[37] vs paper_forward: mean_abs=0.7869951725006104, max_abs=5.125, mean_rel=0.15969377756118774, max_rel=1076.65380859375, norm_rel=0.023199383169412613, ref_abs_avg=34.03364562988281, test_abs_avg=34.03845977783203
production_forward grad[38] vs paper_forward: mean_abs=0.6160979270935059, max_abs=2.5, mean_rel=0.06115700304508209, max_rel=1.2803963422775269, norm_rel=0.021251754835247993, ref_abs_avg=28.906829833984375, test_abs_avg=28.922292709350586
production_forward grad[39] vs paper_forward: mean_abs=0.7529424428939819, max_abs=4.78125, mean_rel=0.16533952951431274, max_rel=893.6482543945312, norm_rel=0.023128924891352654, ref_abs_avg=32.61396789550781, test_abs_avg=32.61445617675781
production_forward grad[40] vs paper_forward: mean_abs=0.7397165298461914, max_abs=4.9375, mean_rel=0.158278688788414, max_rel=944.6959838867188, norm_rel=0.023195834830403328, ref_abs_avg=32.02424621582031, test_abs_avg=32.02238082885742
production_forward grad[41] vs paper_forward: mean_abs=0.5799243450164795, max_abs=2.25, mean_rel=0.09116000682115555, max_rel=6.86686897277832, norm_rel=0.022548478096723557, ref_abs_avg=25.903766632080078, test_abs_avg=25.867416381835938
production_forward grad[42] vs paper_forward: mean_abs=0.711328387260437, max_abs=4.6875, mean_rel=0.1461368203163147, max_rel=511.7132873535156, norm_rel=0.022905763238668442, ref_abs_avg=31.09197235107422, test_abs_avg=31.094688415527344
production_forward grad[43] vs paper_forward: mean_abs=0.695934534072876, max_abs=4.4375, mean_rel=0.1629306972026825, max_rel=1696.203857421875, norm_rel=0.0226927287876606, ref_abs_avg=30.764232635498047, test_abs_avg=30.771377563476562
production_forward grad[44] vs paper_forward: mean_abs=0.5946788787841797, max_abs=2.109375, mean_rel=0.08957357704639435, max_rel=6.263225555419922, norm_rel=0.023951763287186623, ref_abs_avg=24.339197158813477, test_abs_avg=24.341463088989258
production_forward grad[45] vs paper_forward: mean_abs=0.6801884174346924, max_abs=4.5, mean_rel=0.1656390130519867, max_rel=1892.837158203125, norm_rel=0.022767100483179092, ref_abs_avg=29.953121185302734, test_abs_avg=29.95374298095703
production_forward grad[46] vs paper_forward: mean_abs=0.668542742729187, max_abs=4.5, mean_rel=0.14990942180156708, max_rel=1360.42333984375, norm_rel=0.02245813049376011, ref_abs_avg=29.855716705322266, test_abs_avg=29.851699829101562
production_forward grad[47] vs paper_forward: mean_abs=0.5390291213989258, max_abs=2.25, mean_rel=0.05180981010198593, max_rel=3.455310821533203, norm_rel=0.022425705567002296, ref_abs_avg=24.892053604125977, test_abs_avg=24.89519500732422
production_forward grad[48] vs paper_forward: mean_abs=0.6460038423538208, max_abs=4.0, mean_rel=0.14407548308372498, max_rel=1239.8702392578125, norm_rel=0.022588025778532028, ref_abs_avg=28.634845733642578, test_abs_avg=28.635833740234375
production_forward grad[49] vs paper_forward: mean_abs=0.6395388841629028, max_abs=4.0, mean_rel=0.15128740668296814, max_rel=886.8599853515625, norm_rel=0.022234773263335228, ref_abs_avg=28.829504013061523, test_abs_avg=28.828685760498047
production_forward grad[50] vs paper_forward: mean_abs=0.5786914825439453, max_abs=2.5, mean_rel=0.0874616801738739, max_rel=5.0864996910095215, norm_rel=0.02343904972076416, ref_abs_avg=24.93248176574707, test_abs_avg=24.95996856689453
production_forward grad[51] vs paper_forward: mean_abs=0.7211467623710632, max_abs=4.75, mean_rel=0.16753679513931274, max_rel=1012.29248046875, norm_rel=0.02444741316139698, ref_abs_avg=29.58096694946289, test_abs_avg=29.583024978637695
production_forward grad[52] vs paper_forward: mean_abs=0.7012895345687866, max_abs=4.6875, mean_rel=0.16653600335121155, max_rel=2140.123046875, norm_rel=0.024286217987537384, ref_abs_avg=28.96044921875, test_abs_avg=28.9606876373291
production_forward grad[53] vs paper_forward: mean_abs=0.5462751388549805, max_abs=2.5, mean_rel=0.12392076849937439, max_rel=13.949872016906738, norm_rel=0.022979727014899254, ref_abs_avg=23.545124053955078, test_abs_avg=23.52910804748535
production_forward grad[54] vs paper_forward: mean_abs=0.6542763710021973, max_abs=4.375, mean_rel=0.1692204475402832, max_rel=828.4053955078125, norm_rel=0.023856142535805702, ref_abs_avg=27.457454681396484, test_abs_avg=27.460487365722656
production_forward grad[55] vs paper_forward: mean_abs=0.6452645063400269, max_abs=4.4375, mean_rel=0.16503554582595825, max_rel=1175.59423828125, norm_rel=0.023620381951332092, ref_abs_avg=27.35483741760254, test_abs_avg=27.350669860839844
production_forward grad[56] vs paper_forward: mean_abs=0.5134178400039673, max_abs=2.0, mean_rel=0.2132766842842102, max_rel=44.35696029663086, norm_rel=0.0251037385314703, ref_abs_avg=20.393516540527344, test_abs_avg=20.43911361694336
production_forward grad[57] vs paper_forward: mean_abs=0.6144624352455139, max_abs=4.5, mean_rel=0.1509000062942505, max_rel=1385.8746337890625, norm_rel=0.023281540721654892, ref_abs_avg=26.350587844848633, test_abs_avg=26.349185943603516
production_forward grad[58] vs paper_forward: mean_abs=0.5929765105247498, max_abs=4.0, mean_rel=0.14903610944747925, max_rel=1085.52392578125, norm_rel=0.023053888231515884, ref_abs_avg=25.73780059814453, test_abs_avg=25.738489151000977
production_forward grad[59] vs paper_forward: mean_abs=0.4676990509033203, max_abs=2.125, mean_rel=0.14041930437088013, max_rel=23.1534366607666, norm_rel=0.0222884863615036, ref_abs_avg=21.08668327331543, test_abs_avg=21.10771369934082
production_forward grad[60] vs paper_forward: mean_abs=0.5651162266731262, max_abs=3.75, mean_rel=0.14942657947540283, max_rel=616.0778198242188, norm_rel=0.023031100630760193, ref_abs_avg=24.53970718383789, test_abs_avg=24.538848876953125
production_forward grad[61] vs paper_forward: mean_abs=0.560918390750885, max_abs=3.875, mean_rel=0.15607917308807373, max_rel=1481.4638671875, norm_rel=0.022830339148640633, ref_abs_avg=24.61306381225586, test_abs_avg=24.60400390625
production_forward grad[62] vs paper_forward: mean_abs=0.46897685527801514, max_abs=1.875, mean_rel=0.08420182764530182, max_rel=2.55269193649292, norm_rel=0.02408209815621376, ref_abs_avg=19.706266403198242, test_abs_avg=19.701627731323242
production_forward grad[63] vs paper_forward: mean_abs=0.5346164703369141, max_abs=4.2421875, mean_rel=0.14228150248527527, max_rel=1620.600830078125, norm_rel=0.02257712185382843, ref_abs_avg=23.634416580200195, test_abs_avg=23.634662628173828
production_forward grad[64] vs paper_forward: mean_abs=0.5301381349563599, max_abs=3.5, mean_rel=0.15081487596035004, max_rel=922.166259765625, norm_rel=0.022100718691945076, ref_abs_avg=23.931594848632812, test_abs_avg=23.928607940673828
production_forward grad[65] vs paper_forward: mean_abs=0.4125823974609375, max_abs=1.625, mean_rel=0.052923232316970825, max_rel=1.8784728050231934, norm_rel=0.021551335230469704, ref_abs_avg=20.070449829101562, test_abs_avg=20.05207061767578
production_forward grad[66] vs paper_forward: mean_abs=0.5133517384529114, max_abs=3.75, mean_rel=0.14912721514701843, max_rel=928.823486328125, norm_rel=0.022091079503297806, ref_abs_avg=23.205997467041016, test_abs_avg=23.204933166503906
production_forward grad[67] vs paper_forward: mean_abs=0.5006880164146423, max_abs=4.0, mean_rel=0.14574448764324188, max_rel=896.6741943359375, norm_rel=0.021854698657989502, ref_abs_avg=22.880666732788086, test_abs_avg=22.878631591796875
production_forward grad[68] vs paper_forward: mean_abs=0.39137935638427734, max_abs=1.75, mean_rel=0.0919872596859932, max_rel=7.276960849761963, norm_rel=0.020851772278547287, ref_abs_avg=18.800762176513672, test_abs_avg=18.805192947387695
production_forward grad[69] vs paper_forward: mean_abs=0.4885767698287964, max_abs=4.0, mean_rel=0.14240580797195435, max_rel=1025.0108642578125, norm_rel=0.0219730231910944, ref_abs_avg=22.18653106689453, test_abs_avg=22.185958862304688
production_forward grad[70] vs paper_forward: mean_abs=0.48274701833724976, max_abs=3.25, mean_rel=0.14172589778900146, max_rel=595.4290161132812, norm_rel=0.02185114286839962, ref_abs_avg=22.045766830444336, test_abs_avg=22.045425415039062
production_forward grad[71] vs paper_forward: mean_abs=0.3773496150970459, max_abs=1.75, mean_rel=0.15884242951869965, max_rel=19.58087158203125, norm_rel=0.02110215276479721, ref_abs_avg=17.951358795166016, test_abs_avg=17.970226287841797
production_forward grad[72] vs paper_forward: mean_abs=0.46806415915489197, max_abs=3.25, mean_rel=0.1436680555343628, max_rel=882.4887084960938, norm_rel=0.02181575633585453, ref_abs_avg=21.433683395385742, test_abs_avg=21.433507919311523
production_forward grad[73] vs paper_forward: mean_abs=0.46008428931236267, max_abs=3.59375, mean_rel=0.14268097281455994, max_rel=726.1318359375, norm_rel=0.021652882918715477, ref_abs_avg=21.255321502685547, test_abs_avg=21.25559425354004
production_forward grad[74] vs paper_forward: mean_abs=0.43796825408935547, max_abs=1.703125, mean_rel=0.07888312637805939, max_rel=4.003766059875488, norm_rel=0.023337190970778465, ref_abs_avg=18.549528121948242, test_abs_avg=18.542705535888672
production_forward grad[75] vs paper_forward: mean_abs=0.5256468653678894, max_abs=3.875, mean_rel=0.16529613733291626, max_rel=1000.5526733398438, norm_rel=0.02299569547176361, ref_abs_avg=22.863981246948242, test_abs_avg=22.866992950439453
production_forward grad[76] vs paper_forward: mean_abs=0.5156885385513306, max_abs=4.5, mean_rel=0.16074052453041077, max_rel=871.9226684570312, norm_rel=0.022856896743178368, ref_abs_avg=22.64212417602539, test_abs_avg=22.642114639282227
production_forward grad[77] vs paper_forward: mean_abs=0.39021921157836914, max_abs=1.75, mean_rel=0.07648972421884537, max_rel=5.161654949188232, norm_rel=0.02179165929555893, ref_abs_avg=18.228086471557617, test_abs_avg=18.241458892822266
production_forward grad[78] vs paper_forward: mean_abs=0.48908156156539917, max_abs=4.75, mean_rel=0.16089200973510742, max_rel=1310.9864501953125, norm_rel=0.02245890535414219, ref_abs_avg=21.760997772216797, test_abs_avg=21.761388778686523
production_forward grad[79] vs paper_forward: mean_abs=0.4775537848472595, max_abs=3.5, mean_rel=0.15153613686561584, max_rel=754.3764038085938, norm_rel=0.022676249966025352, ref_abs_avg=21.092327117919922, test_abs_avg=21.091472625732422
production_forward grad[80] vs paper_forward: mean_abs=0.37929296493530273, max_abs=1.5, mean_rel=0.07930654287338257, max_rel=7.440178394317627, norm_rel=0.020342709496617317, ref_abs_avg=18.435665130615234, test_abs_avg=18.46546745300293
production_forward grad[81] vs paper_forward: mean_abs=0.457123726606369, max_abs=3.75, mean_rel=0.14631657302379608, max_rel=853.962158203125, norm_rel=0.02193213626742363, ref_abs_avg=20.859342575073242, test_abs_avg=20.86185073852539
production_forward grad[82] vs paper_forward: mean_abs=0.44888269901275635, max_abs=3.5, mean_rel=0.1340169757604599, max_rel=606.870361328125, norm_rel=0.02208772674202919, ref_abs_avg=20.33094024658203, test_abs_avg=20.335615158081055
production_forward grad[83] vs paper_forward: mean_abs=0.36616384983062744, max_abs=1.5, mean_rel=0.1174488514661789, max_rel=31.319368362426758, norm_rel=0.021613532677292824, ref_abs_avg=17.17294692993164, test_abs_avg=17.185253143310547
production_forward grad[84] vs paper_forward: mean_abs=0.42764025926589966, max_abs=3.5, mean_rel=0.1399458795785904, max_rel=737.2361450195312, norm_rel=0.0213353019207716, ref_abs_avg=20.056137084960938, test_abs_avg=20.057254791259766
production_forward grad[85] vs paper_forward: mean_abs=0.41861462593078613, max_abs=3.5, mean_rel=0.14443862438201904, max_rel=1091.595458984375, norm_rel=0.02083042450249195, ref_abs_avg=20.124881744384766, test_abs_avg=20.120271682739258
production_forward grad[86] vs paper_forward: mean_abs=0.3449382781982422, max_abs=1.3125, mean_rel=0.10856223106384277, max_rel=10.632018089294434, norm_rel=0.021576257422566414, ref_abs_avg=16.195781707763672, test_abs_avg=16.16190528869629
production_forward grad[87] vs paper_forward: mean_abs=0.4088680148124695, max_abs=3.5, mean_rel=0.1333579421043396, max_rel=688.0474853515625, norm_rel=0.02105351723730564, ref_abs_avg=19.489471435546875, test_abs_avg=19.488059997558594
production_forward grad[88] vs paper_forward: mean_abs=0.39287835359573364, max_abs=4.25, mean_rel=0.13366781175136566, max_rel=536.2233276367188, norm_rel=0.020622828975319862, ref_abs_avg=19.136775970458984, test_abs_avg=19.131664276123047
production_forward grad[89] vs paper_forward: mean_abs=0.3135068416595459, max_abs=1.25, mean_rel=0.2532130181789398, max_rel=81.8522720336914, norm_rel=0.020580777898430824, ref_abs_avg=15.371847152709961, test_abs_avg=15.386255264282227
production_forward grad[90] vs paper_forward: mean_abs=0.37315088510513306, max_abs=3.25, mean_rel=0.13010632991790771, max_rel=676.072509765625, norm_rel=0.020217595621943474, ref_abs_avg=18.594972610473633, test_abs_avg=18.594863891601562
production_forward grad[91] vs paper_forward: mean_abs=0.3657234013080597, max_abs=3.25, mean_rel=0.1269454061985016, max_rel=849.8837890625, norm_rel=0.019635051488876343, ref_abs_avg=18.721431732177734, test_abs_avg=18.729190826416016
production_forward grad[92] vs paper_forward: mean_abs=0.29332637786865234, max_abs=1.25, mean_rel=0.07252855598926544, max_rel=4.502819538116455, norm_rel=0.019153254106640816, ref_abs_avg=15.415428161621094, test_abs_avg=15.39692497253418
production_forward grad[93] vs paper_forward: mean_abs=0.35160598158836365, max_abs=3.25, mean_rel=0.1261643022298813, max_rel=823.617919921875, norm_rel=0.01995747908949852, ref_abs_avg=17.78490447998047, test_abs_avg=17.783252716064453
production_forward grad[94] vs paper_forward: mean_abs=0.347895085811615, max_abs=4.0, mean_rel=0.12176235765218735, max_rel=385.2628479003906, norm_rel=0.019580837339162827, ref_abs_avg=17.97455406188965, test_abs_avg=17.965160369873047
production_forward grad[95] vs paper_forward: mean_abs=0.2802537977695465, max_abs=1.25, mean_rel=0.19536928832530975, max_rel=73.50054168701172, norm_rel=0.019167771562933922, ref_abs_avg=15.00484848022461, test_abs_avg=14.9918212890625
production_forward grad[96] vs paper_forward: mean_abs=0.3349747955799103, max_abs=3.75, mean_rel=0.1321457177400589, max_rel=1135.9215087890625, norm_rel=0.019295450299978256, ref_abs_avg=17.633211135864258, test_abs_avg=17.632633209228516
production_forward grad[97] vs paper_forward: mean_abs=0.3216038942337036, max_abs=3.5, mean_rel=0.1247197613120079, max_rel=450.7082824707031, norm_rel=0.01913806051015854, ref_abs_avg=17.11100959777832, test_abs_avg=17.101058959960938
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0015969579108059406, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008382457308471203, max_abs=0.40625, mean_rel=0.07304292917251587, max_rel=118.53523254394531, norm_rel=0.020015522837638855, ref_abs_avg=0.4523320198059082, test_abs_avg=0.45232897996902466
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.089765548706055, max_abs=50.0, mean_rel=0.12810909748077393, max_rel=99.1454849243164, norm_rel=0.02004494145512581, ref_abs_avg=313.39501953125, test_abs_avg=313.4252624511719
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.3680505752563477, max_abs=5.3125, mean_rel=0.1341870129108429, max_rel=12.870923042297363, norm_rel=0.02599797584116459, ref_abs_avg=52.85480880737305, test_abs_avg=52.8731575012207
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.4915099143981934, max_abs=9.75, mean_rel=0.17386381328105927, max_rel=2241.114013671875, norm_rel=0.023486925289034843, ref_abs_avg=63.85650634765625, test_abs_avg=63.860511779785156
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.4606820344924927, max_abs=9.0, mean_rel=0.16761204600334167, max_rel=906.7855224609375, norm_rel=0.023422548547387123, ref_abs_avg=62.702152252197266, test_abs_avg=62.699119567871094
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.0662381649017334, max_abs=4.5, mean_rel=0.0963933914899826, max_rel=6.960411548614502, norm_rel=0.023213915526866913, ref_abs_avg=46.38446044921875, test_abs_avg=46.48432922363281
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.3082972764968872, max_abs=8.0, mean_rel=0.19256383180618286, max_rel=4035.345947265625, norm_rel=0.023135246708989143, ref_abs_avg=56.81344985961914, test_abs_avg=56.81822204589844
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.2747881412506104, max_abs=8.0, mean_rel=0.1760869324207306, max_rel=1404.1181640625, norm_rel=0.023018188774585724, ref_abs_avg=55.64513397216797, test_abs_avg=55.64762878417969
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0197412967681885, max_abs=4.375, mean_rel=0.5886012315750122, max_rel=227.59117126464844, norm_rel=0.02392587438225746, ref_abs_avg=43.47526550292969, test_abs_avg=43.437740325927734
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.1938600540161133, max_abs=8.25, mean_rel=0.16356174647808075, max_rel=2974.334228515625, norm_rel=0.0229753777384758, ref_abs_avg=52.17969512939453, test_abs_avg=52.18449783325195
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.1641807556152344, max_abs=7.75, mean_rel=0.16637200117111206, max_rel=1741.64306640625, norm_rel=0.02279122918844223, ref_abs_avg=51.32337188720703, test_abs_avg=51.33143615722656
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9486312866210938, max_abs=3.5, mean_rel=0.09042491763830185, max_rel=9.580621719360352, norm_rel=0.023159315809607506, ref_abs_avg=40.414947509765625, test_abs_avg=40.386680603027344
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.0933775901794434, max_abs=7.25, mean_rel=0.16459059715270996, max_rel=2349.3291015625, norm_rel=0.022699400782585144, ref_abs_avg=48.40314483642578, test_abs_avg=48.400604248046875
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.0631656646728516, max_abs=6.5, mean_rel=0.14909347891807556, max_rel=1266.038818359375, norm_rel=0.02253606542944908, ref_abs_avg=47.38698196411133, test_abs_avg=47.389892578125
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8829154968261719, max_abs=3.5, mean_rel=0.10190965235233307, max_rel=8.733908653259277, norm_rel=0.02333522029221058, ref_abs_avg=38.10015869140625, test_abs_avg=38.214176177978516
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.0245296955108643, max_abs=6.25, mean_rel=0.15436041355133057, max_rel=1271.26220703125, norm_rel=0.02258930914103985, ref_abs_avg=45.580665588378906, test_abs_avg=45.57938766479492
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=0.9955242872238159, max_abs=6.0, mean_rel=0.14159417152404785, max_rel=718.736328125, norm_rel=0.022258054465055466, ref_abs_avg=44.968727111816406, test_abs_avg=44.969444274902344
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8390430212020874, max_abs=3.515625, mean_rel=0.09890154004096985, max_rel=4.4227800369262695, norm_rel=0.025400640442967415, ref_abs_avg=32.910308837890625, test_abs_avg=32.91327667236328
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=0.9616027474403381, max_abs=6.0, mean_rel=0.16314968466758728, max_rel=2256.57177734375, norm_rel=0.02240796759724617, ref_abs_avg=43.08544158935547, test_abs_avg=43.087886810302734
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9384516477584839, max_abs=6.0, mean_rel=0.16114403307437897, max_rel=2047.1683349609375, norm_rel=0.02211560681462288, ref_abs_avg=42.65062713623047, test_abs_avg=42.656436920166016
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7787108421325684, max_abs=3.1875, mean_rel=0.10400126874446869, max_rel=7.582129001617432, norm_rel=0.023494228720664978, ref_abs_avg=32.65208053588867, test_abs_avg=32.71897888183594
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9142856597900391, max_abs=5.3125, mean_rel=0.15728121995925903, max_rel=1117.86767578125, norm_rel=0.022313009947538376, ref_abs_avg=41.14165115356445, test_abs_avg=41.140586853027344
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.8860096335411072, max_abs=5.5, mean_rel=0.15713754296302795, max_rel=1450.0318603515625, norm_rel=0.021902985870838165, ref_abs_avg=40.67073059082031, test_abs_avg=40.672325134277344
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.6884274482727051, max_abs=2.75, mean_rel=0.132756307721138, max_rel=22.26027488708496, norm_rel=0.021340278908610344, ref_abs_avg=32.599605560302734, test_abs_avg=32.59658432006836
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.8661766052246094, max_abs=5.5, mean_rel=0.15939080715179443, max_rel=1534.259033203125, norm_rel=0.0220937617123127, ref_abs_avg=39.33604049682617, test_abs_avg=39.33191680908203
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8438620567321777, max_abs=5.25, mean_rel=0.1491106152534485, max_rel=833.1666870117188, norm_rel=0.021636156365275383, ref_abs_avg=39.16452407836914, test_abs_avg=39.1639404296875
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8512306809425354, max_abs=3.5, mean_rel=0.19667121767997742, max_rel=37.112586975097656, norm_rel=0.02518387883901596, ref_abs_avg=34.07139587402344, test_abs_avg=34.07243347167969
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.0088821649551392, max_abs=7.0, mean_rel=0.16426825523376465, max_rel=1191.4835205078125, norm_rel=0.024098673835396767, ref_abs_avg=42.057228088378906, test_abs_avg=42.05792999267578
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.9843502640724182, max_abs=6.0, mean_rel=0.17558354139328003, max_rel=2013.1544189453125, norm_rel=0.023756470531225204, ref_abs_avg=41.55974578857422, test_abs_avg=41.560157775878906
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.8207120895385742, max_abs=3.125, mean_rel=0.08826623857021332, max_rel=7.450729846954346, norm_rel=0.026384782046079636, ref_abs_avg=30.99534797668457, test_abs_avg=31.080612182617188
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.9344642758369446, max_abs=7.0, mean_rel=0.1735350340604782, max_rel=1701.3919677734375, norm_rel=0.024197082966566086, ref_abs_avg=38.79198455810547, test_abs_avg=38.79253005981445
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9192107915878296, max_abs=6.5, mean_rel=0.17287778854370117, max_rel=1892.9234619140625, norm_rel=0.024242212995886803, ref_abs_avg=38.07465362548828, test_abs_avg=38.08282470703125
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7095649242401123, max_abs=2.875, mean_rel=0.19294671714305878, max_rel=36.15425491333008, norm_rel=0.024634381756186485, ref_abs_avg=28.11443328857422, test_abs_avg=28.11257553100586
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.866140604019165, max_abs=5.4375, mean_rel=0.1608048379421234, max_rel=1056.5511474609375, norm_rel=0.023995833471417427, ref_abs_avg=36.191871643066406, test_abs_avg=36.19251251220703
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8533005118370056, max_abs=5.25, mean_rel=0.16773423552513123, max_rel=1634.441162109375, norm_rel=0.024082161486148834, ref_abs_avg=35.5361442565918, test_abs_avg=35.53459930419922
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.6675338745117188, max_abs=2.5, mean_rel=0.1451009064912796, max_rel=34.031982421875, norm_rel=0.02392846718430519, ref_abs_avg=28.06060791015625, test_abs_avg=28.079437255859375
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.816965639591217, max_abs=5.0, mean_rel=0.16236750781536102, max_rel=1725.6654052734375, norm_rel=0.023852087557315826, ref_abs_avg=34.30431365966797, test_abs_avg=34.304840087890625
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8023547530174255, max_abs=5.125, mean_rel=0.15882185101509094, max_rel=871.120361328125, norm_rel=0.02363990806043148, ref_abs_avg=34.03364562988281, test_abs_avg=34.03801345825195
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6388788223266602, max_abs=2.5, mean_rel=0.07456712424755096, max_rel=4.904318332672119, norm_rel=0.022184079512953758, ref_abs_avg=28.906829833984375, test_abs_avg=28.93859100341797
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.7664605379104614, max_abs=5.0, mean_rel=0.16545766592025757, max_rel=781.1359252929688, norm_rel=0.023527266457676888, ref_abs_avg=32.61396789550781, test_abs_avg=32.61302185058594
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7508246302604675, max_abs=4.5, mean_rel=0.16311156749725342, max_rel=1046.48974609375, norm_rel=0.023560425266623497, ref_abs_avg=32.02424621582031, test_abs_avg=32.02096176147461
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6115474700927734, max_abs=2.125, mean_rel=0.1038084328174591, max_rel=5.441778182983398, norm_rel=0.023578468710184097, ref_abs_avg=25.903766632080078, test_abs_avg=25.86674690246582
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7211476564407349, max_abs=4.5, mean_rel=0.1499633640050888, max_rel=577.3268432617188, norm_rel=0.023228511214256287, ref_abs_avg=31.09197235107422, test_abs_avg=31.093109130859375
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7065818309783936, max_abs=4.5, mean_rel=0.16741341352462769, max_rel=2205.114501953125, norm_rel=0.02302393689751625, ref_abs_avg=30.764232635498047, test_abs_avg=30.77056884765625
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.6241538524627686, max_abs=2.23046875, mean_rel=0.08630631864070892, max_rel=4.464450359344482, norm_rel=0.025462139397859573, ref_abs_avg=24.339197158813477, test_abs_avg=24.341251373291016
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.6899161338806152, max_abs=4.5, mean_rel=0.16216829419136047, max_rel=1707.6802978515625, norm_rel=0.023086553439497948, ref_abs_avg=29.953121185302734, test_abs_avg=29.952796936035156
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.6787098050117493, max_abs=4.5, mean_rel=0.1490754336118698, max_rel=968.93017578125, norm_rel=0.02278810366988182, ref_abs_avg=29.855716705322266, test_abs_avg=29.84925079345703
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5479412078857422, max_abs=2.375, mean_rel=0.052748437970876694, max_rel=3.810115098953247, norm_rel=0.022399097681045532, ref_abs_avg=24.892053604125977, test_abs_avg=24.905250549316406
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.6530647277832031, max_abs=4.375, mean_rel=0.14721010625362396, max_rel=1163.4117431640625, norm_rel=0.022837307304143906, ref_abs_avg=28.634845733642578, test_abs_avg=28.636611938476562
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.644966721534729, max_abs=4.375, mean_rel=0.1513252556324005, max_rel=837.363525390625, norm_rel=0.022448888048529625, ref_abs_avg=28.829504013061523, test_abs_avg=28.82805061340332
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.5944509506225586, max_abs=2.25, mean_rel=0.08349432796239853, max_rel=6.624876499176025, norm_rel=0.024031026288866997, ref_abs_avg=24.93248176574707, test_abs_avg=24.951820373535156
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.731102466583252, max_abs=4.5, mean_rel=0.17187458276748657, max_rel=1005.3223266601562, norm_rel=0.02477111853659153, ref_abs_avg=29.58096694946289, test_abs_avg=29.582290649414062
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7123098373413086, max_abs=5.125, mean_rel=0.16790851950645447, max_rel=1762.4886474609375, norm_rel=0.024646950885653496, ref_abs_avg=28.96044921875, test_abs_avg=28.95964241027832
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5281496047973633, max_abs=2.25, mean_rel=0.11434212327003479, max_rel=13.793058395385742, norm_rel=0.022592049092054367, ref_abs_avg=23.545124053955078, test_abs_avg=23.518878936767578
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.6634858846664429, max_abs=4.5, mean_rel=0.17011499404907227, max_rel=1008.7173461914062, norm_rel=0.02419253997504711, ref_abs_avg=27.457454681396484, test_abs_avg=27.459932327270508
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.654936671257019, max_abs=4.25, mean_rel=0.17047427594661713, max_rel=1176.6541748046875, norm_rel=0.023953508585691452, ref_abs_avg=27.35483741760254, test_abs_avg=27.350549697875977
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5299094319343567, max_abs=2.0, mean_rel=0.30324193835258484, max_rel=86.4659194946289, norm_rel=0.025633376091718674, ref_abs_avg=20.393516540527344, test_abs_avg=20.448326110839844
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6224955320358276, max_abs=4.662109375, mean_rel=0.15512937307357788, max_rel=1438.291259765625, norm_rel=0.02356407605111599, ref_abs_avg=26.350587844848633, test_abs_avg=26.348812103271484
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6022034287452698, max_abs=4.0, mean_rel=0.1532495617866516, max_rel=949.7989501953125, norm_rel=0.02341274358332157, ref_abs_avg=25.73780059814453, test_abs_avg=25.738801956176758
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.4881401062011719, max_abs=1.9375, mean_rel=0.15157592296600342, max_rel=24.48600959777832, norm_rel=0.023063139989972115, ref_abs_avg=21.08668327331543, test_abs_avg=21.112871170043945
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.572299599647522, max_abs=3.625, mean_rel=0.1535019725561142, max_rel=897.712646484375, norm_rel=0.023303115740418434, ref_abs_avg=24.53970718383789, test_abs_avg=24.53844451904297
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.5681336522102356, max_abs=3.75, mean_rel=0.15378941595554352, max_rel=1531.564453125, norm_rel=0.02311493270099163, ref_abs_avg=24.61306381225586, test_abs_avg=24.603227615356445
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.44641435146331787, max_abs=1.875, mean_rel=0.08504331111907959, max_rel=3.232499361038208, norm_rel=0.02343592420220375, ref_abs_avg=19.706266403198242, test_abs_avg=19.711132049560547
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.5403339862823486, max_abs=3.6953125, mean_rel=0.14230284094810486, max_rel=867.8968505859375, norm_rel=0.022822078317403793, ref_abs_avg=23.634416580200195, test_abs_avg=23.63573455810547
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5359066724777222, max_abs=4.0, mean_rel=0.15832918882369995, max_rel=1291.1280517578125, norm_rel=0.0223411675542593, ref_abs_avg=23.931594848632812, test_abs_avg=23.928680419921875
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.404508113861084, max_abs=1.875, mean_rel=0.05122857168316841, max_rel=1.055559754371643, norm_rel=0.02105501852929592, ref_abs_avg=20.070449829101562, test_abs_avg=20.068418502807617
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5181112289428711, max_abs=3.75, mean_rel=0.1524774432182312, max_rel=1414.2691650390625, norm_rel=0.02229878306388855, ref_abs_avg=23.205997467041016, test_abs_avg=23.20479393005371
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5058791041374207, max_abs=4.0, mean_rel=0.14381438493728638, max_rel=703.7122192382812, norm_rel=0.022076625376939774, ref_abs_avg=22.880666732788086, test_abs_avg=22.8783016204834
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.3983116149902344, max_abs=1.8125, mean_rel=0.09301064908504486, max_rel=10.850886344909668, norm_rel=0.021949367597699165, ref_abs_avg=18.800762176513672, test_abs_avg=18.80323028564453
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.49176210165023804, max_abs=3.25, mean_rel=0.14402109384536743, max_rel=1217.033203125, norm_rel=0.022128066048026085, ref_abs_avg=22.18653106689453, test_abs_avg=22.18532943725586
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.4856603443622589, max_abs=3.25, mean_rel=0.14520570635795593, max_rel=534.9220581054688, norm_rel=0.02199842967092991, ref_abs_avg=22.045766830444336, test_abs_avg=22.045761108398438
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.3870362639427185, max_abs=1.625, mean_rel=0.1581641435623169, max_rel=29.156530380249023, norm_rel=0.021789396181702614, ref_abs_avg=17.951358795166016, test_abs_avg=17.97633171081543
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.4710097014904022, max_abs=3.5, mean_rel=0.14334642887115479, max_rel=983.4520263671875, norm_rel=0.02194654755294323, ref_abs_avg=21.433683395385742, test_abs_avg=21.4329891204834
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.4623779356479645, max_abs=3.40625, mean_rel=0.1468198299407959, max_rel=630.4000244140625, norm_rel=0.0217579435557127, ref_abs_avg=21.255321502685547, test_abs_avg=21.254093170166016
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4444904327392578, max_abs=1.8125, mean_rel=0.08697567880153656, max_rel=5.170316219329834, norm_rel=0.023542268201708794, ref_abs_avg=18.549528121948242, test_abs_avg=18.57492446899414
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5318737030029297, max_abs=4.0, mean_rel=0.16282877326011658, max_rel=1401.97021484375, norm_rel=0.023264771327376366, ref_abs_avg=22.863981246948242, test_abs_avg=22.866167068481445
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5237638354301453, max_abs=5.5, mean_rel=0.15505990386009216, max_rel=822.964111328125, norm_rel=0.02323470078408718, ref_abs_avg=22.64212417602539, test_abs_avg=22.64341163635254
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.4051628112792969, max_abs=1.625, mean_rel=0.08534271270036697, max_rel=6.585559844970703, norm_rel=0.02228037267923355, ref_abs_avg=18.228086471557617, test_abs_avg=18.229881286621094
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.49283477663993835, max_abs=4.5, mean_rel=0.1621970683336258, max_rel=1237.75048828125, norm_rel=0.022620124742388725, ref_abs_avg=21.760997772216797, test_abs_avg=21.76128387451172
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.48270028829574585, max_abs=4.0, mean_rel=0.14654052257537842, max_rel=512.7576904296875, norm_rel=0.022952768951654434, ref_abs_avg=21.092327117919922, test_abs_avg=21.091472625732422
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.3903059959411621, max_abs=1.5, mean_rel=0.10084734857082367, max_rel=11.360620498657227, norm_rel=0.02108413726091385, ref_abs_avg=18.435665130615234, test_abs_avg=18.466806411743164
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.45988109707832336, max_abs=3.5, mean_rel=0.14581117033958435, max_rel=921.81201171875, norm_rel=0.022059014067053795, ref_abs_avg=20.859342575073242, test_abs_avg=20.861427307128906
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.45113304257392883, max_abs=3.59375, mean_rel=0.1365208625793457, max_rel=644.4833984375, norm_rel=0.02215285412967205, ref_abs_avg=20.33094024658203, test_abs_avg=20.33715057373047
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3504636287689209, max_abs=1.375, mean_rel=0.12212205678224564, max_rel=33.58640670776367, norm_rel=0.02096964418888092, ref_abs_avg=17.17294692993164, test_abs_avg=17.18667984008789
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.43032267689704895, max_abs=3.5, mean_rel=0.14019781351089478, max_rel=596.7909545898438, norm_rel=0.02145347185432911, ref_abs_avg=20.056137084960938, test_abs_avg=20.056711196899414
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.4211446940898895, max_abs=3.28125, mean_rel=0.14725260436534882, max_rel=950.30712890625, norm_rel=0.02092909999191761, ref_abs_avg=20.124881744384766, test_abs_avg=20.121227264404297
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.35517096519470215, max_abs=1.5, mean_rel=0.11862422525882721, max_rel=9.969135284423828, norm_rel=0.022326260805130005, ref_abs_avg=16.195781707763672, test_abs_avg=16.167118072509766
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4113920331001282, max_abs=3.625, mean_rel=0.12903381884098053, max_rel=473.1209411621094, norm_rel=0.021165287122130394, ref_abs_avg=19.489471435546875, test_abs_avg=19.488418579101562
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.3977905511856079, max_abs=4.0, mean_rel=0.13585148751735687, max_rel=461.5111389160156, norm_rel=0.02086428366601467, ref_abs_avg=19.136775970458984, test_abs_avg=19.1318416595459
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.32839322090148926, max_abs=1.4375, mean_rel=0.27879175543785095, max_rel=88.33963775634766, norm_rel=0.021569952368736267, ref_abs_avg=15.371847152709961, test_abs_avg=15.378182411193848
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.3753470778465271, max_abs=3.625, mean_rel=0.1303544044494629, max_rel=624.3173217773438, norm_rel=0.020330453291535378, ref_abs_avg=18.594972610473633, test_abs_avg=18.59481430053711
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.36349761486053467, max_abs=3.25, mean_rel=0.12657199800014496, max_rel=625.2569580078125, norm_rel=0.01954701729118824, ref_abs_avg=18.721431732177734, test_abs_avg=18.728666305541992
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.30110540986061096, max_abs=1.25, mean_rel=0.0684308111667633, max_rel=5.308435916900635, norm_rel=0.020029472187161446, ref_abs_avg=15.415428161621094, test_abs_avg=15.39291000366211
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.3525426387786865, max_abs=3.5, mean_rel=0.12744757533073425, max_rel=890.3743896484375, norm_rel=0.020015619695186615, ref_abs_avg=17.78490447998047, test_abs_avg=17.783538818359375
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.34498468041419983, max_abs=4.0, mean_rel=0.12272854149341583, max_rel=342.85400390625, norm_rel=0.019372284412384033, ref_abs_avg=17.97455406188965, test_abs_avg=17.967071533203125
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.2736329436302185, max_abs=1.125, mean_rel=0.287314772605896, max_rel=118.27540588378906, norm_rel=0.01864452287554741, ref_abs_avg=15.00484848022461, test_abs_avg=14.986794471740723
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3349347412586212, max_abs=3.796875, mean_rel=0.12841267883777618, max_rel=728.0657348632812, norm_rel=0.01928795501589775, ref_abs_avg=17.633211135864258, test_abs_avg=17.632356643676758
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.32290905714035034, max_abs=3.90625, mean_rel=0.12443424016237259, max_rel=481.01397705078125, norm_rel=0.01924760639667511, ref_abs_avg=17.11100959777832, test_abs_avg=17.104476928710938
identity layers + randn queries
production_forward fwd+bwd:  66.270 ms
production_forward bwd-only: 56.409 ms
production_forward peak allocated: fwd=7.614 GiB, fwd+bwd=15.618 GiB
production_forward peak reserved:  fwd=27.371 GiB, fwd+bwd=27.371 GiB
paper_forward fwd+bwd:  221.174 ms
paper_forward bwd-only: 173.980 ms
paper_forward peak allocated: fwd=35.128 GiB, fwd+bwd=37.247 GiB
paper_forward peak reserved:  fwd=36.166 GiB, fwd+bwd=38.666 GiB
torch_compile_phases_forward fwd+bwd:  94.911 ms
torch_compile_phases_forward bwd-only: 76.545 ms
torch_compile_phases_forward peak allocated: fwd=18.203 GiB, fwd+bwd=18.831 GiB
torch_compile_phases_forward peak reserved:  fwd=27.398 GiB, fwd+bwd=27.398 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016720001585781574, max_abs=0.046875
production_forward grad[0] vs paper_forward: mean_abs=0.008477874100208282, max_abs=0.7109375, mean_rel=0.0723189264535904, max_rel=143.09793090820312, norm_rel=0.01975870318710804, ref_abs_avg=0.4660375714302063, test_abs_avg=0.4660454988479614
production_forward grad[1] vs paper_forward: mean_abs=7.376903533935547, max_abs=80.0, mean_rel=0.15721966326236725, max_rel=293.109130859375, norm_rel=0.02046961709856987, ref_abs_avg=322.6180725097656, test_abs_avg=322.517578125
production_forward grad[2] vs paper_forward: mean_abs=1.3228353261947632, max_abs=4.5, mean_rel=0.2813342809677124, max_rel=42.939693450927734, norm_rel=0.023760776966810226, ref_abs_avg=54.63983154296875, test_abs_avg=54.56299591064453
production_forward grad[3] vs paper_forward: mean_abs=1.5518525838851929, max_abs=10.25, mean_rel=0.16014519333839417, max_rel=2051.858642578125, norm_rel=0.02287754975259304, ref_abs_avg=68.2209701538086, test_abs_avg=68.22044372558594
production_forward grad[4] vs paper_forward: mean_abs=1.5186190605163574, max_abs=9.5, mean_rel=0.17191778123378754, max_rel=1057.9822998046875, norm_rel=0.02284475602209568, ref_abs_avg=66.91441345214844, test_abs_avg=66.91508483886719
production_forward grad[5] vs paper_forward: mean_abs=1.149400234222412, max_abs=5.0, mean_rel=0.11247576028108597, max_rel=8.785395622253418, norm_rel=0.024046393111348152, ref_abs_avg=47.94666290283203, test_abs_avg=47.868812561035156
production_forward grad[6] vs paper_forward: mean_abs=1.3537847995758057, max_abs=9.0, mean_rel=0.14245820045471191, max_rel=1062.0830078125, norm_rel=0.022654257714748383, ref_abs_avg=60.040122985839844, test_abs_avg=60.0424690246582
production_forward grad[7] vs paper_forward: mean_abs=1.324637532234192, max_abs=8.1875, mean_rel=0.16287769377231598, max_rel=1688.8153076171875, norm_rel=0.022469719871878624, ref_abs_avg=59.26263427734375, test_abs_avg=59.26838302612305
production_forward grad[8] vs paper_forward: mean_abs=1.0212247371673584, max_abs=4.0, mean_rel=0.11859779804944992, max_rel=23.319713592529297, norm_rel=0.023199403658509254, ref_abs_avg=44.08515548706055, test_abs_avg=44.114227294921875
production_forward grad[9] vs paper_forward: mean_abs=1.2173768281936646, max_abs=8.0, mean_rel=0.15605145692825317, max_rel=1256.4666748046875, norm_rel=0.02241409942507744, ref_abs_avg=54.59223937988281, test_abs_avg=54.59063720703125
production_forward grad[10] vs paper_forward: mean_abs=1.1784217357635498, max_abs=7.25, mean_rel=0.13890458643436432, max_rel=1033.0296630859375, norm_rel=0.02204061672091484, ref_abs_avg=53.75177001953125, test_abs_avg=53.75605010986328
production_forward grad[11] vs paper_forward: mean_abs=0.9449496269226074, max_abs=3.5, mean_rel=0.07203857600688934, max_rel=3.6746792793273926, norm_rel=0.02301669493317604, ref_abs_avg=40.81098175048828, test_abs_avg=40.778297424316406
production_forward grad[12] vs paper_forward: mean_abs=1.1117639541625977, max_abs=7.0, mean_rel=0.1428396850824356, max_rel=1074.34033203125, norm_rel=0.022189030423760414, ref_abs_avg=50.33495330810547, test_abs_avg=50.33366394042969
production_forward grad[13] vs paper_forward: mean_abs=1.0850458145141602, max_abs=6.625, mean_rel=0.15152913331985474, max_rel=981.3342895507812, norm_rel=0.021860091015696526, ref_abs_avg=49.92258834838867, test_abs_avg=49.91757583618164
production_forward grad[14] vs paper_forward: mean_abs=0.8949718475341797, max_abs=3.25, mean_rel=0.1158326268196106, max_rel=15.451228141784668, norm_rel=0.023211369290947914, ref_abs_avg=38.643150329589844, test_abs_avg=38.58320999145508
production_forward grad[15] vs paper_forward: mean_abs=1.039781093597412, max_abs=6.0, mean_rel=0.15320539474487305, max_rel=1873.3262939453125, norm_rel=0.022154785692691803, ref_abs_avg=47.155731201171875, test_abs_avg=47.15537643432617
production_forward grad[16] vs paper_forward: mean_abs=1.0098106861114502, max_abs=6.5, mean_rel=0.15145006775856018, max_rel=726.1478881835938, norm_rel=0.02172769047319889, ref_abs_avg=46.71297836303711, test_abs_avg=46.71520233154297
production_forward grad[17] vs paper_forward: mean_abs=0.8168935775756836, max_abs=3.5625, mean_rel=0.08200572431087494, max_rel=8.526673316955566, norm_rel=0.023271776735782623, ref_abs_avg=34.49773406982422, test_abs_avg=34.492469787597656
production_forward grad[18] vs paper_forward: mean_abs=0.9853103160858154, max_abs=6.0, mean_rel=0.173050194978714, max_rel=1980.118896484375, norm_rel=0.022007066756486893, ref_abs_avg=44.95509338378906, test_abs_avg=44.9517822265625
production_forward grad[19] vs paper_forward: mean_abs=0.9603774547576904, max_abs=6.625, mean_rel=0.14507479965686798, max_rel=1237.971435546875, norm_rel=0.021625041961669922, ref_abs_avg=44.582763671875, test_abs_avg=44.58050537109375
production_forward grad[20] vs paper_forward: mean_abs=0.7827023267745972, max_abs=3.0, mean_rel=0.132924884557724, max_rel=26.87325668334961, norm_rel=0.021500572562217712, ref_abs_avg=36.380767822265625, test_abs_avg=36.377899169921875
production_forward grad[21] vs paper_forward: mean_abs=0.9344282150268555, max_abs=5.75, mean_rel=0.14652228355407715, max_rel=734.7089233398438, norm_rel=0.021909678354859352, ref_abs_avg=42.8272705078125, test_abs_avg=42.8275032043457
production_forward grad[22] vs paper_forward: mean_abs=0.9105226993560791, max_abs=5.5, mean_rel=0.1462191343307495, max_rel=1210.799560546875, norm_rel=0.02137666568160057, ref_abs_avg=42.75831604003906, test_abs_avg=42.756134033203125
production_forward grad[23] vs paper_forward: mean_abs=0.710932731628418, max_abs=2.75, mean_rel=0.09151268750429153, max_rel=11.10071849822998, norm_rel=0.021854110062122345, ref_abs_avg=33.42189025878906, test_abs_avg=33.44903564453125
production_forward grad[24] vs paper_forward: mean_abs=0.8901308178901672, max_abs=5.25, mean_rel=0.14755505323410034, max_rel=1201.29345703125, norm_rel=0.021721288561820984, ref_abs_avg=41.1409912109375, test_abs_avg=41.14326858520508
production_forward grad[25] vs paper_forward: mean_abs=0.8675469160079956, max_abs=5.5, mean_rel=0.14701971411705017, max_rel=632.5264892578125, norm_rel=0.021446844562888145, ref_abs_avg=40.59618377685547, test_abs_avg=40.60091781616211
production_forward grad[26] vs paper_forward: mean_abs=0.856116771697998, max_abs=3.6875, mean_rel=0.11897329241037369, max_rel=9.041109085083008, norm_rel=0.02470395155251026, ref_abs_avg=34.23219680786133, test_abs_avg=34.15070724487305
production_forward grad[27] vs paper_forward: mean_abs=1.0126559734344482, max_abs=7.0, mean_rel=0.16293026506900787, max_rel=1494.5631103515625, norm_rel=0.02347465604543686, ref_abs_avg=43.33843231201172, test_abs_avg=43.336204528808594
production_forward grad[28] vs paper_forward: mean_abs=0.9924048185348511, max_abs=7.0, mean_rel=0.16453050076961517, max_rel=1356.2088623046875, norm_rel=0.023299524560570717, ref_abs_avg=42.727508544921875, test_abs_avg=42.73234558105469
production_forward grad[29] vs paper_forward: mean_abs=0.7577953338623047, max_abs=3.0, mean_rel=0.13919022679328918, max_rel=27.554004669189453, norm_rel=0.023664437234401703, ref_abs_avg=32.10810852050781, test_abs_avg=32.08725357055664
production_forward grad[30] vs paper_forward: mean_abs=0.9541342854499817, max_abs=6.25, mean_rel=0.1680767685174942, max_rel=1662.58740234375, norm_rel=0.023850582540035248, ref_abs_avg=40.173282623291016, test_abs_avg=40.17411422729492
production_forward grad[31] vs paper_forward: mean_abs=0.9314475059509277, max_abs=5.5, mean_rel=0.1867372989654541, max_rel=2980.778564453125, norm_rel=0.023571569472551346, ref_abs_avg=39.67695617675781, test_abs_avg=39.678375244140625
production_forward grad[32] vs paper_forward: mean_abs=0.7535196542739868, max_abs=3.25, mean_rel=0.22422100603580475, max_rel=57.01218032836914, norm_rel=0.02272968180477619, ref_abs_avg=32.806617736816406, test_abs_avg=32.69838333129883
production_forward grad[33] vs paper_forward: mean_abs=0.8820127844810486, max_abs=5.25, mean_rel=0.17187249660491943, max_rel=1851.2657470703125, norm_rel=0.02367515303194523, ref_abs_avg=37.33856201171875, test_abs_avg=37.338829040527344
production_forward grad[34] vs paper_forward: mean_abs=0.8694982528686523, max_abs=5.75, mean_rel=0.17761999368667603, max_rel=1542.355712890625, norm_rel=0.02369106002151966, ref_abs_avg=36.79296875, test_abs_avg=36.799190521240234
production_forward grad[35] vs paper_forward: mean_abs=0.673607587814331, max_abs=3.0, mean_rel=0.11368097364902496, max_rel=18.878833770751953, norm_rel=0.02378002554178238, ref_abs_avg=29.755447387695312, test_abs_avg=29.74593734741211
production_forward grad[36] vs paper_forward: mean_abs=0.8321484327316284, max_abs=6.0, mean_rel=0.1541873961687088, max_rel=935.5899658203125, norm_rel=0.0233671385794878, ref_abs_avg=35.66194152832031, test_abs_avg=35.66025924682617
production_forward grad[37] vs paper_forward: mean_abs=0.8159606456756592, max_abs=5.1328125, mean_rel=0.1462782472372055, max_rel=789.36767578125, norm_rel=0.02334933914244175, ref_abs_avg=35.052085876464844, test_abs_avg=35.044654846191406
production_forward grad[38] vs paper_forward: mean_abs=0.6155581474304199, max_abs=3.0, mean_rel=0.08029500395059586, max_rel=5.317865371704102, norm_rel=0.021200215443968773, ref_abs_avg=30.094499588012695, test_abs_avg=30.1225528717041
production_forward grad[39] vs paper_forward: mean_abs=0.7900021076202393, max_abs=5.3125, mean_rel=0.16840462386608124, max_rel=1545.341552734375, norm_rel=0.02325618453323841, ref_abs_avg=34.06078338623047, test_abs_avg=34.05937194824219
production_forward grad[40] vs paper_forward: mean_abs=0.7767734527587891, max_abs=5.125, mean_rel=0.14567533135414124, max_rel=600.3020629882812, norm_rel=0.023197762668132782, ref_abs_avg=33.57579803466797, test_abs_avg=33.576011657714844
production_forward grad[41] vs paper_forward: mean_abs=0.6365258693695068, max_abs=2.1875, mean_rel=0.11106348037719727, max_rel=11.28754711151123, norm_rel=0.02457047812640667, ref_abs_avg=25.591310501098633, test_abs_avg=25.551658630371094
production_forward grad[42] vs paper_forward: mean_abs=0.7445094585418701, max_abs=5.125, mean_rel=0.15789249539375305, max_rel=1096.3612060546875, norm_rel=0.023149140179157257, ref_abs_avg=32.23855972290039, test_abs_avg=32.238365173339844
production_forward grad[43] vs paper_forward: mean_abs=0.7376550436019897, max_abs=4.546875, mean_rel=0.14902596175670624, max_rel=861.2081298828125, norm_rel=0.023022573441267014, ref_abs_avg=32.17693328857422, test_abs_avg=32.172157287597656
production_forward grad[44] vs paper_forward: mean_abs=0.5984504222869873, max_abs=2.125, mean_rel=0.5674595236778259, max_rel=168.1606903076172, norm_rel=0.022679543122649193, ref_abs_avg=26.301368713378906, test_abs_avg=26.268474578857422
production_forward grad[45] vs paper_forward: mean_abs=0.7156437635421753, max_abs=5.0, mean_rel=0.1519334316253662, max_rel=791.1825561523438, norm_rel=0.022953862324357033, ref_abs_avg=31.207632064819336, test_abs_avg=31.20597267150879
production_forward grad[46] vs paper_forward: mean_abs=0.7109723687171936, max_abs=4.5, mean_rel=0.1635032445192337, max_rel=1107.83984375, norm_rel=0.022942742332816124, ref_abs_avg=31.064285278320312, test_abs_avg=31.05853271484375
production_forward grad[47] vs paper_forward: mean_abs=0.5647716522216797, max_abs=2.015625, mean_rel=0.08687444031238556, max_rel=4.7450270652771, norm_rel=0.023058664053678513, ref_abs_avg=24.344444274902344, test_abs_avg=24.375303268432617
production_forward grad[48] vs paper_forward: mean_abs=0.6878722310066223, max_abs=4.875, mean_rel=0.15148603916168213, max_rel=1235.36962890625, norm_rel=0.022647298872470856, ref_abs_avg=30.39740753173828, test_abs_avg=30.396081924438477
production_forward grad[49] vs paper_forward: mean_abs=0.6746107339859009, max_abs=4.109375, mean_rel=0.16794228553771973, max_rel=931.43408203125, norm_rel=0.022558322176337242, ref_abs_avg=29.90087890625, test_abs_avg=29.90314483642578
production_forward grad[50] vs paper_forward: mean_abs=0.6473970413208008, max_abs=2.625, mean_rel=0.09120021760463715, max_rel=5.210201740264893, norm_rel=0.02617141418159008, ref_abs_avg=24.713558197021484, test_abs_avg=24.751365661621094
production_forward grad[51] vs paper_forward: mean_abs=0.7769217491149902, max_abs=5.5, mean_rel=0.16479140520095825, max_rel=1448.5150146484375, norm_rel=0.02410777471959591, ref_abs_avg=32.28428268432617, test_abs_avg=32.284549713134766
production_forward grad[52] vs paper_forward: mean_abs=0.7673789262771606, max_abs=5.4375, mean_rel=0.16159233450889587, max_rel=960.7925415039062, norm_rel=0.024256134405732155, ref_abs_avg=31.798320770263672, test_abs_avg=31.79737091064453
production_forward grad[53] vs paper_forward: mean_abs=0.6201686859130859, max_abs=2.25, mean_rel=0.1549915373325348, max_rel=18.44685173034668, norm_rel=0.024700438603758812, ref_abs_avg=25.167144775390625, test_abs_avg=25.173044204711914
production_forward grad[54] vs paper_forward: mean_abs=0.7122741937637329, max_abs=4.6015625, mean_rel=0.16403505206108093, max_rel=1485.6611328125, norm_rel=0.023710113018751144, ref_abs_avg=30.053455352783203, test_abs_avg=30.053197860717773
production_forward grad[55] vs paper_forward: mean_abs=0.6913506984710693, max_abs=4.5, mean_rel=0.1541537642478943, max_rel=930.7457275390625, norm_rel=0.02327747270464897, ref_abs_avg=29.757034301757812, test_abs_avg=29.75825309753418
production_forward grad[56] vs paper_forward: mean_abs=0.5495123863220215, max_abs=2.0, mean_rel=0.09504890441894531, max_rel=3.39697265625, norm_rel=0.02439197152853012, ref_abs_avg=23.01959800720215, test_abs_avg=23.02235221862793
production_forward grad[57] vs paper_forward: mean_abs=0.6569998264312744, max_abs=5.25, mean_rel=0.14850501716136932, max_rel=861.3987426757812, norm_rel=0.023181991651654243, ref_abs_avg=28.34645652770996, test_abs_avg=28.347761154174805
production_forward grad[58] vs paper_forward: mean_abs=0.6451204419136047, max_abs=4.0, mean_rel=0.14180490374565125, max_rel=980.4886474609375, norm_rel=0.02302270010113716, ref_abs_avg=28.049253463745117, test_abs_avg=28.050201416015625
production_forward grad[59] vs paper_forward: mean_abs=0.49401021003723145, max_abs=1.75, mean_rel=0.13060763478279114, max_rel=23.19635581970215, norm_rel=0.02322990819811821, ref_abs_avg=21.71236801147461, test_abs_avg=21.71642303466797
production_forward grad[60] vs paper_forward: mean_abs=0.6128798127174377, max_abs=4.125, mean_rel=0.15715044736862183, max_rel=1335.4644775390625, norm_rel=0.0228646881878376, ref_abs_avg=26.810312271118164, test_abs_avg=26.808338165283203
production_forward grad[61] vs paper_forward: mean_abs=0.6102420687675476, max_abs=4.1875, mean_rel=0.15780608355998993, max_rel=1396.181396484375, norm_rel=0.023120656609535217, ref_abs_avg=26.515762329101562, test_abs_avg=26.52096939086914
production_forward grad[62] vs paper_forward: mean_abs=0.47536230087280273, max_abs=1.875, mean_rel=0.11679750680923462, max_rel=16.743831634521484, norm_rel=0.022369401529431343, ref_abs_avg=21.748477935791016, test_abs_avg=21.78339385986328
production_forward grad[63] vs paper_forward: mean_abs=0.5803079009056091, max_abs=3.9375, mean_rel=0.1545925885438919, max_rel=1206.9974365234375, norm_rel=0.02241155132651329, ref_abs_avg=25.875782012939453, test_abs_avg=25.876113891601562
production_forward grad[64] vs paper_forward: mean_abs=0.5698016881942749, max_abs=4.25, mean_rel=0.14269647002220154, max_rel=749.8041381835938, norm_rel=0.022508995607495308, ref_abs_avg=25.39825439453125, test_abs_avg=25.39232063293457
production_forward grad[65] vs paper_forward: mean_abs=0.4593973159790039, max_abs=1.625, mean_rel=0.07612276822328568, max_rel=3.8889617919921875, norm_rel=0.022016657516360283, ref_abs_avg=20.83205223083496, test_abs_avg=20.829914093017578
production_forward grad[66] vs paper_forward: mean_abs=0.5511504411697388, max_abs=3.6875, mean_rel=0.14912158250808716, max_rel=1178.32470703125, norm_rel=0.021834904327988625, ref_abs_avg=25.202817916870117, test_abs_avg=25.20083236694336
production_forward grad[67] vs paper_forward: mean_abs=0.5320560932159424, max_abs=3.71875, mean_rel=0.14045503735542297, max_rel=515.6848754882812, norm_rel=0.021817630156874657, ref_abs_avg=24.440797805786133, test_abs_avg=24.439674377441406
production_forward grad[68] vs paper_forward: mean_abs=0.4486231803894043, max_abs=2.0, mean_rel=0.0735221803188324, max_rel=2.6925766468048096, norm_rel=0.021347787231206894, ref_abs_avg=21.60535430908203, test_abs_avg=21.61075210571289
production_forward grad[69] vs paper_forward: mean_abs=0.5224257707595825, max_abs=4.0, mean_rel=0.13967272639274597, max_rel=1185.61962890625, norm_rel=0.021452728658914566, ref_abs_avg=24.323007583618164, test_abs_avg=24.321826934814453
production_forward grad[70] vs paper_forward: mean_abs=0.5131248831748962, max_abs=3.5, mean_rel=0.14000669121742249, max_rel=960.783935546875, norm_rel=0.021023867651820183, ref_abs_avg=24.352964401245117, test_abs_avg=24.351287841796875
production_forward grad[71] vs paper_forward: mean_abs=0.4378318786621094, max_abs=1.734375, mean_rel=0.0972682535648346, max_rel=5.333737850189209, norm_rel=0.021988539025187492, ref_abs_avg=19.678485870361328, test_abs_avg=19.668624877929688
production_forward grad[72] vs paper_forward: mean_abs=0.5005031824111938, max_abs=4.25, mean_rel=0.1391678750514984, max_rel=874.5810546875, norm_rel=0.021345198154449463, ref_abs_avg=23.38199806213379, test_abs_avg=23.378843307495117
production_forward grad[73] vs paper_forward: mean_abs=0.48878002166748047, max_abs=4.0, mean_rel=0.14457568526268005, max_rel=798.0374145507812, norm_rel=0.02120324969291687, ref_abs_avg=23.047794342041016, test_abs_avg=23.051158905029297
production_forward grad[74] vs paper_forward: mean_abs=0.4390341639518738, max_abs=1.875, mean_rel=0.14771397411823273, max_rel=15.073622703552246, norm_rel=0.022540029138326645, ref_abs_avg=19.342159271240234, test_abs_avg=19.344860076904297
production_forward grad[75] vs paper_forward: mean_abs=0.546994686126709, max_abs=4.25, mean_rel=0.15886330604553223, max_rel=1293.5196533203125, norm_rel=0.023168617859482765, ref_abs_avg=23.604801177978516, test_abs_avg=23.60158348083496
production_forward grad[76] vs paper_forward: mean_abs=0.5380619168281555, max_abs=4.0, mean_rel=0.15775072574615479, max_rel=756.909912109375, norm_rel=0.023408811539411545, ref_abs_avg=23.114532470703125, test_abs_avg=23.10807228088379
production_forward grad[77] vs paper_forward: mean_abs=0.40741491317749023, max_abs=1.75, mean_rel=0.06904642283916473, max_rel=2.4993479251861572, norm_rel=0.022052669897675514, ref_abs_avg=18.924415588378906, test_abs_avg=18.934450149536133
production_forward grad[78] vs paper_forward: mean_abs=0.49663084745407104, max_abs=3.83984375, mean_rel=0.14954540133476257, max_rel=965.7177734375, norm_rel=0.022521264851093292, ref_abs_avg=22.046722412109375, test_abs_avg=22.043912887573242
production_forward grad[79] vs paper_forward: mean_abs=0.4830814599990845, max_abs=3.5, mean_rel=0.1462765783071518, max_rel=604.3734130859375, norm_rel=0.021911432966589928, ref_abs_avg=21.984285354614258, test_abs_avg=21.99626922607422
production_forward grad[80] vs paper_forward: mean_abs=0.37293195724487305, max_abs=1.5, mean_rel=0.10348401963710785, max_rel=8.166253089904785, norm_rel=0.0203865859657526, ref_abs_avg=17.786697387695312, test_abs_avg=17.787654876708984
production_forward grad[81] vs paper_forward: mean_abs=0.46226513385772705, max_abs=4.0, mean_rel=0.14172179996967316, max_rel=1009.8522338867188, norm_rel=0.021783551201224327, ref_abs_avg=21.19062614440918, test_abs_avg=21.18743324279785
production_forward grad[82] vs paper_forward: mean_abs=0.452786922454834, max_abs=4.375, mean_rel=0.1402266025543213, max_rel=1392.8197021484375, norm_rel=0.02147722989320755, ref_abs_avg=21.101350784301758, test_abs_avg=21.099319458007812
production_forward grad[83] vs paper_forward: mean_abs=0.34847378730773926, max_abs=1.4375, mean_rel=0.11307613551616669, max_rel=8.628486633300781, norm_rel=0.022475607693195343, ref_abs_avg=15.622149467468262, test_abs_avg=15.649053573608398
production_forward grad[84] vs paper_forward: mean_abs=0.425670325756073, max_abs=4.0, mean_rel=0.14150887727737427, max_rel=1168.538818359375, norm_rel=0.021311337128281593, ref_abs_avg=20.00954246520996, test_abs_avg=20.007139205932617
production_forward grad[85] vs paper_forward: mean_abs=0.4178611934185028, max_abs=3.75, mean_rel=0.126310795545578, max_rel=282.3403015136719, norm_rel=0.020978551357984543, ref_abs_avg=19.984024047851562, test_abs_avg=19.977088928222656
production_forward grad[86] vs paper_forward: mean_abs=0.35207223892211914, max_abs=1.280517578125, mean_rel=0.1365993618965149, max_rel=14.446121215820312, norm_rel=0.020640261471271515, ref_abs_avg=17.488576889038086, test_abs_avg=17.482158660888672
production_forward grad[87] vs paper_forward: mean_abs=0.40814918279647827, max_abs=3.5, mean_rel=0.13400909304618835, max_rel=1423.05224609375, norm_rel=0.021021388471126556, ref_abs_avg=19.47963523864746, test_abs_avg=19.478221893310547
production_forward grad[88] vs paper_forward: mean_abs=0.3905583918094635, max_abs=3.25, mean_rel=0.13293257355690002, max_rel=566.94580078125, norm_rel=0.02024090103805065, ref_abs_avg=19.379932403564453, test_abs_avg=19.38290786743164
production_forward grad[89] vs paper_forward: mean_abs=0.341572642326355, max_abs=1.25, mean_rel=0.26258066296577454, max_rel=78.77558135986328, norm_rel=0.02220025286078453, ref_abs_avg=15.600576400756836, test_abs_avg=15.628432273864746
production_forward grad[90] vs paper_forward: mean_abs=0.3865264654159546, max_abs=4.6875, mean_rel=0.13479948043823242, max_rel=1344.98974609375, norm_rel=0.02035614848136902, ref_abs_avg=19.094287872314453, test_abs_avg=19.092639923095703
production_forward grad[91] vs paper_forward: mean_abs=0.37641286849975586, max_abs=3.5, mean_rel=0.1224517896771431, max_rel=275.8164978027344, norm_rel=0.020397737622261047, ref_abs_avg=18.600528717041016, test_abs_avg=18.608530044555664
production_forward grad[92] vs paper_forward: mean_abs=0.30860161781311035, max_abs=1.25, mean_rel=0.06646398454904556, max_rel=1.8399159908294678, norm_rel=0.01993732340633869, ref_abs_avg=15.427177429199219, test_abs_avg=15.433662414550781
production_forward grad[93] vs paper_forward: mean_abs=0.3566489815711975, max_abs=4.75, mean_rel=0.1268945038318634, max_rel=961.2263793945312, norm_rel=0.019877566024661064, ref_abs_avg=18.101932525634766, test_abs_avg=18.101192474365234
production_forward grad[94] vs paper_forward: mean_abs=0.34361791610717773, max_abs=3.25, mean_rel=0.12126530706882477, max_rel=681.8007202148438, norm_rel=0.01915421336889267, ref_abs_avg=18.064653396606445, test_abs_avg=18.06195068359375
production_forward grad[95] vs paper_forward: mean_abs=0.29396331310272217, max_abs=1.03125, mean_rel=0.2039196640253067, max_rel=48.3339729309082, norm_rel=0.019455673173069954, ref_abs_avg=15.260076522827148, test_abs_avg=15.270183563232422
production_forward grad[96] vs paper_forward: mean_abs=0.34127581119537354, max_abs=3.75, mean_rel=0.1250830739736557, max_rel=536.35986328125, norm_rel=0.019481249153614044, ref_abs_avg=17.789379119873047, test_abs_avg=17.7882022857666
production_forward grad[97] vs paper_forward: mean_abs=0.32498300075531006, max_abs=3.25, mean_rel=0.11865884065628052, max_rel=351.7719421386719, norm_rel=0.018812449648976326, ref_abs_avg=17.516357421875, test_abs_avg=17.513046264648438
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016758710844442248, max_abs=0.046875
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.00883019994944334, max_abs=0.6875, mean_rel=0.07501435279846191, max_rel=126.99114227294922, norm_rel=0.02045655995607376, ref_abs_avg=0.4660375714302063, test_abs_avg=0.4660353362560272
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.554815292358398, max_abs=64.0, mean_rel=0.18296001851558685, max_rel=566.1011352539062, norm_rel=0.020867308601737022, ref_abs_avg=322.6180725097656, test_abs_avg=322.51171875
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.3590664863586426, max_abs=5.375, mean_rel=0.31427666544914246, max_rel=45.80399703979492, norm_rel=0.024412214756011963, ref_abs_avg=54.63983154296875, test_abs_avg=54.648643493652344
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.6066155433654785, max_abs=10.5, mean_rel=0.17036333680152893, max_rel=2316.63427734375, norm_rel=0.0236770361661911, ref_abs_avg=68.2209701538086, test_abs_avg=68.2203598022461
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5743869543075562, max_abs=9.0, mean_rel=0.1743171513080597, max_rel=1359.3729248046875, norm_rel=0.02366592548787594, ref_abs_avg=66.91441345214844, test_abs_avg=66.91183471679688
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.1827893257141113, max_abs=4.0, mean_rel=0.1450706422328949, max_rel=21.38644790649414, norm_rel=0.02469373494386673, ref_abs_avg=47.94666290283203, test_abs_avg=47.955509185791016
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.4020626544952393, max_abs=8.5, mean_rel=0.14816781878471375, max_rel=1565.6085205078125, norm_rel=0.02346760220825672, ref_abs_avg=60.040122985839844, test_abs_avg=60.040855407714844
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.371098279953003, max_abs=8.0390625, mean_rel=0.17317652702331543, max_rel=1810.9459228515625, norm_rel=0.02325042150914669, ref_abs_avg=59.26263427734375, test_abs_avg=59.2628173828125
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0545578002929688, max_abs=4.0, mean_rel=0.1124473512172699, max_rel=15.666359901428223, norm_rel=0.024305107071995735, ref_abs_avg=44.08515548706055, test_abs_avg=44.04423141479492
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.2576762437820435, max_abs=7.8125, mean_rel=0.15936338901519775, max_rel=1597.2099609375, norm_rel=0.023147908970713615, ref_abs_avg=54.59223937988281, test_abs_avg=54.589088439941406
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.218808889389038, max_abs=7.5, mean_rel=0.14499376714229584, max_rel=1061.7073974609375, norm_rel=0.022798636928200722, ref_abs_avg=53.75177001953125, test_abs_avg=53.75634002685547
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9947748184204102, max_abs=3.625, mean_rel=0.08413806557655334, max_rel=6.232743263244629, norm_rel=0.024439821019768715, ref_abs_avg=40.81098175048828, test_abs_avg=40.761749267578125
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.1471877098083496, max_abs=7.0, mean_rel=0.14748632907867432, max_rel=1265.1593017578125, norm_rel=0.02286403439939022, ref_abs_avg=50.33495330810547, test_abs_avg=50.332767486572266
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1220335960388184, max_abs=6.875, mean_rel=0.16217085719108582, max_rel=1463.3580322265625, norm_rel=0.022588660940527916, ref_abs_avg=49.92258834838867, test_abs_avg=49.914276123046875
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.909388542175293, max_abs=3.671875, mean_rel=0.10342147201299667, max_rel=11.05366325378418, norm_rel=0.02330198884010315, ref_abs_avg=38.643150329589844, test_abs_avg=38.56754684448242
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.0708832740783691, max_abs=6.5, mean_rel=0.15394799411296844, max_rel=1509.6234130859375, norm_rel=0.02280263975262642, ref_abs_avg=47.155731201171875, test_abs_avg=47.15510559082031
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0389961004257202, max_abs=7.0, mean_rel=0.15281984210014343, max_rel=691.5645751953125, norm_rel=0.02236625924706459, ref_abs_avg=46.71297836303711, test_abs_avg=46.714012145996094
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8472773432731628, max_abs=3.3125, mean_rel=0.1191178634762764, max_rel=22.755002975463867, norm_rel=0.024321915581822395, ref_abs_avg=34.49773406982422, test_abs_avg=34.47368621826172
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0102970600128174, max_abs=6.5, mean_rel=0.17288550734519958, max_rel=1857.90283203125, norm_rel=0.022586464881896973, ref_abs_avg=44.95509338378906, test_abs_avg=44.95192337036133
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.990516185760498, max_abs=6.0, mean_rel=0.1523466557264328, max_rel=1446.9117431640625, norm_rel=0.022294742986559868, ref_abs_avg=44.582763671875, test_abs_avg=44.57794952392578
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7990502119064331, max_abs=4.0, mean_rel=0.11975663900375366, max_rel=16.849136352539062, norm_rel=0.022008679807186127, ref_abs_avg=36.380767822265625, test_abs_avg=36.35710144042969
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9579369425773621, max_abs=6.0, mean_rel=0.1561327576637268, max_rel=1125.6937255859375, norm_rel=0.022472452372312546, ref_abs_avg=42.8272705078125, test_abs_avg=42.827964782714844
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.931859016418457, max_abs=5.75, mean_rel=0.1459304392337799, max_rel=1065.26806640625, norm_rel=0.02189452201128006, ref_abs_avg=42.75831604003906, test_abs_avg=42.75727844238281
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7182130813598633, max_abs=3.0625, mean_rel=0.11112411320209503, max_rel=12.347868919372559, norm_rel=0.022107204422354698, ref_abs_avg=33.42189025878906, test_abs_avg=33.43525695800781
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.912076473236084, max_abs=5.5, mean_rel=0.1450294703245163, max_rel=947.6964111328125, norm_rel=0.02223423682153225, ref_abs_avg=41.1409912109375, test_abs_avg=41.143577575683594
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8876028060913086, max_abs=5.3125, mean_rel=0.14814996719360352, max_rel=695.1279907226562, norm_rel=0.021943867206573486, ref_abs_avg=40.59618377685547, test_abs_avg=40.60077667236328
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8542232513427734, max_abs=4.59375, mean_rel=0.13554099202156067, max_rel=12.4830322265625, norm_rel=0.02476443722844124, ref_abs_avg=34.23219680786133, test_abs_avg=34.182090759277344
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.0390825271606445, max_abs=6.5, mean_rel=0.1643860936164856, max_rel=1114.744873046875, norm_rel=0.024079974740743637, ref_abs_avg=43.33843231201172, test_abs_avg=43.33670425415039
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0134841203689575, max_abs=6.375, mean_rel=0.17234477400779724, max_rel=1763.9873046875, norm_rel=0.023838508874177933, ref_abs_avg=42.727508544921875, test_abs_avg=42.73418045043945
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.7622628211975098, max_abs=3.375, mean_rel=0.1701686829328537, max_rel=42.03465270996094, norm_rel=0.023816801607608795, ref_abs_avg=32.10810852050781, test_abs_avg=32.053077697753906
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.977751612663269, max_abs=5.9375, mean_rel=0.17531748116016388, max_rel=1796.4639892578125, norm_rel=0.02443019114434719, ref_abs_avg=40.173282623291016, test_abs_avg=40.1736946105957
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9525834918022156, max_abs=5.75, mean_rel=0.18962517380714417, max_rel=2510.74755859375, norm_rel=0.02409784309566021, ref_abs_avg=39.67695617675781, test_abs_avg=39.67878341674805
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7772082090377808, max_abs=3.0, mean_rel=0.18034511804580688, max_rel=34.86480712890625, norm_rel=0.023131025955080986, ref_abs_avg=32.806617736816406, test_abs_avg=32.70241928100586
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9001122117042542, max_abs=5.75, mean_rel=0.1773531138896942, max_rel=2178.78173828125, norm_rel=0.02416319213807583, ref_abs_avg=37.33856201171875, test_abs_avg=37.33733367919922
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8874316215515137, max_abs=5.0, mean_rel=0.17902109026908875, max_rel=919.3872680664062, norm_rel=0.024186067283153534, ref_abs_avg=36.79296875, test_abs_avg=36.803565979003906
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.6770904064178467, max_abs=3.0, mean_rel=0.14800211787223816, max_rel=33.33563232421875, norm_rel=0.023808907717466354, ref_abs_avg=29.755447387695312, test_abs_avg=29.751941680908203
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8491188287734985, max_abs=6.0, mean_rel=0.1573578268289566, max_rel=1425.6343994140625, norm_rel=0.023851260542869568, ref_abs_avg=35.66194152832031, test_abs_avg=35.65907287597656
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8311917185783386, max_abs=5.0, mean_rel=0.14886882901191711, max_rel=809.7303466796875, norm_rel=0.023786667734384537, ref_abs_avg=35.052085876464844, test_abs_avg=35.04503631591797
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6426448822021484, max_abs=3.0, mean_rel=0.08843468874692917, max_rel=3.169710874557495, norm_rel=0.021528085693717003, ref_abs_avg=30.094499588012695, test_abs_avg=30.124069213867188
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8058149814605713, max_abs=5.75, mean_rel=0.16982248425483704, max_rel=1657.4847412109375, norm_rel=0.023711103945970535, ref_abs_avg=34.06078338623047, test_abs_avg=34.059688568115234
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7891897559165955, max_abs=5.625, mean_rel=0.15178588032722473, max_rel=523.1950073242188, norm_rel=0.02355678379535675, ref_abs_avg=33.57579803466797, test_abs_avg=33.57712936401367
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6342010498046875, max_abs=2.75, mean_rel=0.10019239783287048, max_rel=15.844414710998535, norm_rel=0.02465137094259262, ref_abs_avg=25.591310501098633, test_abs_avg=25.52931785583496
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7573157548904419, max_abs=4.875, mean_rel=0.1640610545873642, max_rel=826.6533203125, norm_rel=0.02353561297059059, ref_abs_avg=32.23855972290039, test_abs_avg=32.23885726928711
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7514317631721497, max_abs=4.625, mean_rel=0.15497758984565735, max_rel=644.7297973632812, norm_rel=0.023418176919221878, ref_abs_avg=32.17693328857422, test_abs_avg=32.17396926879883
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.6308305263519287, max_abs=2.1875, mean_rel=0.6382737755775452, max_rel=134.296142578125, norm_rel=0.024092433974146843, ref_abs_avg=26.301368713378906, test_abs_avg=26.26194190979004
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7264583706855774, max_abs=4.75, mean_rel=0.15391764044761658, max_rel=1284.761962890625, norm_rel=0.02331807091832161, ref_abs_avg=31.207632064819336, test_abs_avg=31.206279754638672
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7221740484237671, max_abs=4.75, mean_rel=0.1645326018333435, max_rel=959.78466796875, norm_rel=0.023299165070056915, ref_abs_avg=31.064285278320312, test_abs_avg=31.060684204101562
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5762624740600586, max_abs=2.53125, mean_rel=0.0824764221906662, max_rel=3.548726797103882, norm_rel=0.023420073091983795, ref_abs_avg=24.344444274902344, test_abs_avg=24.37143325805664
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.6981951594352722, max_abs=4.9375, mean_rel=0.15194793045520782, max_rel=1005.8478393554688, norm_rel=0.022978607565164566, ref_abs_avg=30.39740753173828, test_abs_avg=30.395923614501953
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6825810670852661, max_abs=4.25, mean_rel=0.16449500620365143, max_rel=1221.4713134765625, norm_rel=0.02284414693713188, ref_abs_avg=29.90087890625, test_abs_avg=29.905094146728516
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6586637496948242, max_abs=2.390625, mean_rel=0.10685443878173828, max_rel=7.792883396148682, norm_rel=0.026635238900780678, ref_abs_avg=24.713558197021484, test_abs_avg=24.73259735107422
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.7903727889060974, max_abs=5.5, mean_rel=0.17014604806900024, max_rel=1469.3521728515625, norm_rel=0.024521054700016975, ref_abs_avg=32.28428268432617, test_abs_avg=32.284400939941406
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7774214744567871, max_abs=5.3125, mean_rel=0.15982411801815033, max_rel=989.6278686523438, norm_rel=0.024560613557696342, ref_abs_avg=31.798320770263672, test_abs_avg=31.800804138183594
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.6319952011108398, max_abs=2.21875, mean_rel=0.18813812732696533, max_rel=32.039710998535156, norm_rel=0.025107575580477715, ref_abs_avg=25.167144775390625, test_abs_avg=25.174137115478516
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7246072292327881, max_abs=4.75, mean_rel=0.1623089611530304, max_rel=1881.864501953125, norm_rel=0.024116123095154762, ref_abs_avg=30.053455352783203, test_abs_avg=30.05350112915039
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.7027580738067627, max_abs=4.5, mean_rel=0.15665483474731445, max_rel=992.3763427734375, norm_rel=0.023642029613256454, ref_abs_avg=29.757034301757812, test_abs_avg=29.757099151611328
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5654172897338867, max_abs=1.75, mean_rel=0.09568962454795837, max_rel=4.189077377319336, norm_rel=0.024562858045101166, ref_abs_avg=23.01959800720215, test_abs_avg=23.03925323486328
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.667195200920105, max_abs=5.0, mean_rel=0.15441277623176575, max_rel=1097.937744140625, norm_rel=0.02353191375732422, ref_abs_avg=28.34645652770996, test_abs_avg=28.34677505493164
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6537970900535583, max_abs=4.5, mean_rel=0.14647024869918823, max_rel=1041.5447998046875, norm_rel=0.023314082995057106, ref_abs_avg=28.049253463745117, test_abs_avg=28.045915603637695
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5074872970581055, max_abs=1.75, mean_rel=0.12459345161914825, max_rel=18.547788619995117, norm_rel=0.023455677554011345, ref_abs_avg=21.71236801147461, test_abs_avg=21.717361450195312
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6220155954360962, max_abs=4.0, mean_rel=0.1596660017967224, max_rel=1637.48583984375, norm_rel=0.023204972967505455, ref_abs_avg=26.810312271118164, test_abs_avg=26.807899475097656
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.6181478500366211, max_abs=4.0, mean_rel=0.1539820432662964, max_rel=1244.0814208984375, norm_rel=0.0233954768627882, ref_abs_avg=26.515762329101562, test_abs_avg=26.51871681213379
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.48383665084838867, max_abs=2.0, mean_rel=0.12015040963888168, max_rel=17.65702247619629, norm_rel=0.022545218467712402, ref_abs_avg=21.748477935791016, test_abs_avg=21.76337432861328
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.588615894317627, max_abs=4.625, mean_rel=0.1566181778907776, max_rel=1498.971923828125, norm_rel=0.0227141622453928, ref_abs_avg=25.875782012939453, test_abs_avg=25.876312255859375
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5746367573738098, max_abs=4.5, mean_rel=0.14154508709907532, max_rel=692.501953125, norm_rel=0.022725306451320648, ref_abs_avg=25.39825439453125, test_abs_avg=25.391693115234375
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4760017395019531, max_abs=1.625, mean_rel=0.07970879226922989, max_rel=4.502292156219482, norm_rel=0.023022929206490517, ref_abs_avg=20.83205223083496, test_abs_avg=20.842031478881836
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5572540760040283, max_abs=4.0, mean_rel=0.14602209627628326, max_rel=1125.9619140625, norm_rel=0.022062640637159348, ref_abs_avg=25.202817916870117, test_abs_avg=25.20234489440918
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5377416610717773, max_abs=3.9375, mean_rel=0.1415279507637024, max_rel=646.7265014648438, norm_rel=0.022045355290174484, ref_abs_avg=24.440797805786133, test_abs_avg=24.43737030029297
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.4559977054595947, max_abs=1.75, mean_rel=0.10074310004711151, max_rel=10.895923614501953, norm_rel=0.021643852815032005, ref_abs_avg=21.60535430908203, test_abs_avg=21.625370025634766
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5273407697677612, max_abs=4.234375, mean_rel=0.1404004991054535, max_rel=929.7320556640625, norm_rel=0.0216494370251894, ref_abs_avg=24.323007583618164, test_abs_avg=24.321460723876953
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5186148881912231, max_abs=3.625, mean_rel=0.14220798015594482, max_rel=818.6763916015625, norm_rel=0.021257130429148674, ref_abs_avg=24.352964401245117, test_abs_avg=24.34960174560547
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.43621015548706055, max_abs=1.75, mean_rel=0.11426451057195663, max_rel=12.010329246520996, norm_rel=0.022098053246736526, ref_abs_avg=19.678485870361328, test_abs_avg=19.659591674804688
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.504456639289856, max_abs=4.25, mean_rel=0.14041513204574585, max_rel=883.7249145507812, norm_rel=0.021518904715776443, ref_abs_avg=23.38199806213379, test_abs_avg=23.37894058227539
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.49300456047058105, max_abs=4.5, mean_rel=0.14500196278095245, max_rel=959.7581176757812, norm_rel=0.021378574892878532, ref_abs_avg=23.047794342041016, test_abs_avg=23.05228614807129
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4598655700683594, max_abs=1.75, mean_rel=0.1254730373620987, max_rel=14.438943862915039, norm_rel=0.023657480254769325, ref_abs_avg=19.342159271240234, test_abs_avg=19.339969635009766
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.552742063999176, max_abs=4.125, mean_rel=0.1533830761909485, max_rel=1153.04052734375, norm_rel=0.02339799888432026, ref_abs_avg=23.604801177978516, test_abs_avg=23.601484298706055
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5426390171051025, max_abs=3.75, mean_rel=0.16027560830116272, max_rel=785.2792358398438, norm_rel=0.02361634001135826, ref_abs_avg=23.114532470703125, test_abs_avg=23.106929779052734
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.3975895643234253, max_abs=1.625, mean_rel=0.06770896911621094, max_rel=1.9597090482711792, norm_rel=0.02125682681798935, ref_abs_avg=18.924415588378906, test_abs_avg=18.93916893005371
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5023550391197205, max_abs=4.0, mean_rel=0.14934314787387848, max_rel=1121.25, norm_rel=0.0227661095559597, ref_abs_avg=22.046722412109375, test_abs_avg=22.044139862060547
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.4872819185256958, max_abs=3.5, mean_rel=0.14475879073143005, max_rel=633.852294921875, norm_rel=0.022104596719145775, ref_abs_avg=21.984285354614258, test_abs_avg=21.995500564575195
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.3777475357055664, max_abs=1.4375, mean_rel=0.10538503527641296, max_rel=6.629379749298096, norm_rel=0.020538099110126495, ref_abs_avg=17.786697387695312, test_abs_avg=17.80596923828125
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.46642133593559265, max_abs=4.5, mean_rel=0.14311742782592773, max_rel=803.9989013671875, norm_rel=0.021963447332382202, ref_abs_avg=21.19062614440918, test_abs_avg=21.187519073486328
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.45777904987335205, max_abs=4.015625, mean_rel=0.14104843139648438, max_rel=1238.0556640625, norm_rel=0.02170081064105034, ref_abs_avg=21.101350784301758, test_abs_avg=21.09960174560547
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.34753429889678955, max_abs=1.5625, mean_rel=0.12181442975997925, max_rel=10.375015258789062, norm_rel=0.022066477686166763, ref_abs_avg=15.622149467468262, test_abs_avg=15.648090362548828
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.4299827218055725, max_abs=3.671875, mean_rel=0.14492550492286682, max_rel=1491.428955078125, norm_rel=0.02150583639740944, ref_abs_avg=20.00954246520996, test_abs_avg=20.007827758789062
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.42111700773239136, max_abs=3.40625, mean_rel=0.12933743000030518, max_rel=514.0313110351562, norm_rel=0.021120445802807808, ref_abs_avg=19.984024047851562, test_abs_avg=19.979860305786133
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.3520193099975586, max_abs=1.30078125, mean_rel=0.12226579338312149, max_rel=12.493342399597168, norm_rel=0.02032480202615261, ref_abs_avg=17.488576889038086, test_abs_avg=17.48455810546875
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.41045504808425903, max_abs=3.75, mean_rel=0.13394588232040405, max_rel=1084.4818115234375, norm_rel=0.021113233640789986, ref_abs_avg=19.47963523864746, test_abs_avg=19.478261947631836
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.39502134919166565, max_abs=3.75, mean_rel=0.13420018553733826, max_rel=452.863525390625, norm_rel=0.020472073927521706, ref_abs_avg=19.379932403564453, test_abs_avg=19.380027770996094
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.3508607745170593, max_abs=1.375, mean_rel=0.2257080078125, max_rel=61.244102478027344, norm_rel=0.02256912551820278, ref_abs_avg=15.600576400756836, test_abs_avg=15.620950698852539
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.3887392282485962, max_abs=5.1875, mean_rel=0.13421201705932617, max_rel=857.4876708984375, norm_rel=0.020470095798373222, ref_abs_avg=19.094287872314453, test_abs_avg=19.0922794342041
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.37694495916366577, max_abs=3.5, mean_rel=0.11907358467578888, max_rel=254.77308654785156, norm_rel=0.020424049347639084, ref_abs_avg=18.600528717041016, test_abs_avg=18.608173370361328
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.31101131439208984, max_abs=1.25, mean_rel=0.07604102790355682, max_rel=6.718577861785889, norm_rel=0.020117996260523796, ref_abs_avg=15.427177429199219, test_abs_avg=15.422711372375488
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.3572304844856262, max_abs=4.0, mean_rel=0.12815523147583008, max_rel=1013.6430053710938, norm_rel=0.019908176735043526, ref_abs_avg=18.101932525634766, test_abs_avg=18.10116958618164
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3482063412666321, max_abs=3.5, mean_rel=0.12289281189441681, max_rel=712.4442749023438, norm_rel=0.019409876316785812, ref_abs_avg=18.064653396606445, test_abs_avg=18.059280395507812
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.30053436756134033, max_abs=1.125, mean_rel=0.22706401348114014, max_rel=62.56611251831055, norm_rel=0.01993796043097973, ref_abs_avg=15.260076522827148, test_abs_avg=15.270149230957031
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.34224653244018555, max_abs=3.5, mean_rel=0.12558358907699585, max_rel=695.6043701171875, norm_rel=0.01953098550438881, ref_abs_avg=17.789379119873047, test_abs_avg=17.788515090942383
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.3304298222064972, max_abs=3.90625, mean_rel=0.11972342431545258, max_rel=289.06060791015625, norm_rel=0.019136345013976097, ref_abs_avg=17.516357421875, test_abs_avg=17.51569366455078
identity layers + randn queries
torch_compile_phases_forward fwd+bwd:  94.908 ms
torch_compile_phases_forward bwd-only: 76.554 ms
torch_compile_phases_forward peak allocated: fwd=18.203 GiB, fwd+bwd=18.831 GiB
torch_compile_phases_forward peak reserved:  fwd=27.398 GiB, fwd+bwd=27.398 GiB
paper_forward fwd+bwd:  221.199 ms
paper_forward bwd-only: 173.997 ms
paper_forward peak allocated: fwd=35.128 GiB, fwd+bwd=37.247 GiB
paper_forward peak reserved:  fwd=36.168 GiB, fwd+bwd=38.668 GiB
production_forward fwd+bwd:  66.277 ms
production_forward bwd-only: 56.457 ms
production_forward peak allocated: fwd=7.614 GiB, fwd+bwd=15.618 GiB
production_forward peak reserved:  fwd=27.373 GiB, fwd+bwd=27.373 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016745917964726686, max_abs=0.046875
production_forward grad[0] vs paper_forward: mean_abs=0.008426288142800331, max_abs=0.34375, mean_rel=0.07201476395130157, max_rel=81.25924682617188, norm_rel=0.01952899992465973, ref_abs_avg=0.46651750802993774, test_abs_avg=0.466536283493042
production_forward grad[1] vs paper_forward: mean_abs=7.356612682342529, max_abs=72.0, mean_rel=0.1314416229724884, max_rel=135.41351318359375, norm_rel=0.020792296156287193, ref_abs_avg=322.8476257324219, test_abs_avg=322.8671569824219
production_forward grad[2] vs paper_forward: mean_abs=1.2317004203796387, max_abs=5.5, mean_rel=0.24609030783176422, max_rel=53.395668029785156, norm_rel=0.02277306281030178, ref_abs_avg=52.16645431518555, test_abs_avg=52.16815185546875
production_forward grad[3] vs paper_forward: mean_abs=1.4902119636535645, max_abs=10.0, mean_rel=0.16605687141418457, max_rel=1529.0196533203125, norm_rel=0.022740794345736504, ref_abs_avg=65.89208221435547, test_abs_avg=65.89422607421875
production_forward grad[4] vs paper_forward: mean_abs=1.4558825492858887, max_abs=10.125, mean_rel=0.15742337703704834, max_rel=707.830810546875, norm_rel=0.02246030792593956, ref_abs_avg=65.16464233398438, test_abs_avg=65.16783142089844
production_forward grad[5] vs paper_forward: mean_abs=1.0019607543945312, max_abs=4.6875, mean_rel=0.09490180015563965, max_rel=11.373950004577637, norm_rel=0.02064444124698639, ref_abs_avg=49.93217086791992, test_abs_avg=49.98293685913086
production_forward grad[6] vs paper_forward: mean_abs=1.3193594217300415, max_abs=8.0, mean_rel=0.1566663235425949, max_rel=1059.442626953125, norm_rel=0.022472452372312546, ref_abs_avg=58.9739990234375, test_abs_avg=58.975860595703125
production_forward grad[7] vs paper_forward: mean_abs=1.295166015625, max_abs=7.5, mean_rel=0.15979693830013275, max_rel=908.8097534179688, norm_rel=0.022291937842965126, ref_abs_avg=58.462459564208984, test_abs_avg=58.46347427368164
production_forward grad[8] vs paper_forward: mean_abs=1.0316898822784424, max_abs=5.0, mean_rel=0.08158436417579651, max_rel=4.969345569610596, norm_rel=0.02345500886440277, ref_abs_avg=44.58271026611328, test_abs_avg=44.533912658691406
production_forward grad[9] vs paper_forward: mean_abs=1.2107577323913574, max_abs=8.5, mean_rel=0.1548716127872467, max_rel=2361.828125, norm_rel=0.022273238748311996, ref_abs_avg=54.6953010559082, test_abs_avg=54.697750091552734
production_forward grad[10] vs paper_forward: mean_abs=1.1758852005004883, max_abs=7.0, mean_rel=0.13641783595085144, max_rel=711.6552734375, norm_rel=0.021990619599819183, ref_abs_avg=53.764259338378906, test_abs_avg=53.77809143066406
production_forward grad[11] vs paper_forward: mean_abs=0.9382768869400024, max_abs=4.0, mean_rel=0.08357405662536621, max_rel=4.19843864440918, norm_rel=0.02219141460955143, ref_abs_avg=41.35790252685547, test_abs_avg=41.36271286010742
production_forward grad[12] vs paper_forward: mean_abs=1.1151647567749023, max_abs=6.5, mean_rel=0.1708488166332245, max_rel=1497.662109375, norm_rel=0.022246789187192917, ref_abs_avg=50.374488830566406, test_abs_avg=50.3779296875
production_forward grad[13] vs paper_forward: mean_abs=1.085627794265747, max_abs=6.125, mean_rel=0.14777380228042603, max_rel=816.4338989257812, norm_rel=0.02182150073349476, ref_abs_avg=50.010154724121094, test_abs_avg=50.01935958862305
production_forward grad[14] vs paper_forward: mean_abs=0.8129425048828125, max_abs=4.0, mean_rel=0.07127530872821808, max_rel=4.390313625335693, norm_rel=0.020931290462613106, ref_abs_avg=40.15221405029297, test_abs_avg=40.15985870361328
production_forward grad[15] vs paper_forward: mean_abs=1.039726734161377, max_abs=6.375, mean_rel=0.16447864472866058, max_rel=2037.681396484375, norm_rel=0.02184714376926422, ref_abs_avg=47.802940368652344, test_abs_avg=47.80260467529297
production_forward grad[16] vs paper_forward: mean_abs=1.0122654438018799, max_abs=6.75, mean_rel=0.15425479412078857, max_rel=1801.5816650390625, norm_rel=0.02165539562702179, ref_abs_avg=47.055938720703125, test_abs_avg=47.04844665527344
production_forward grad[17] vs paper_forward: mean_abs=0.7734863758087158, max_abs=3.0, mean_rel=0.19031177461147308, max_rel=39.783077239990234, norm_rel=0.02069203183054924, ref_abs_avg=38.54402160644531, test_abs_avg=38.46915817260742
production_forward grad[18] vs paper_forward: mean_abs=0.9807161092758179, max_abs=6.125, mean_rel=0.15634748339653015, max_rel=1533.2381591796875, norm_rel=0.02188156545162201, ref_abs_avg=45.05385971069336, test_abs_avg=45.0577392578125
production_forward grad[19] vs paper_forward: mean_abs=0.9552026391029358, max_abs=6.0, mean_rel=0.15601539611816406, max_rel=2260.592529296875, norm_rel=0.021528204903006554, ref_abs_avg=44.62399673461914, test_abs_avg=44.629486083984375
production_forward grad[20] vs paper_forward: mean_abs=0.7289501428604126, max_abs=3.0625, mean_rel=0.12273401021957397, max_rel=15.360567092895508, norm_rel=0.02093399502336979, ref_abs_avg=34.58073043823242, test_abs_avg=34.624603271484375
production_forward grad[21] vs paper_forward: mean_abs=0.9231290221214294, max_abs=6.0, mean_rel=0.13967567682266235, max_rel=884.0183715820312, norm_rel=0.02172691747546196, ref_abs_avg=42.680755615234375, test_abs_avg=42.68427276611328
production_forward grad[22] vs paper_forward: mean_abs=0.905275821685791, max_abs=5.5, mean_rel=0.15601758658885956, max_rel=2152.798095703125, norm_rel=0.021535130217671394, ref_abs_avg=42.2696533203125, test_abs_avg=42.272552490234375
production_forward grad[23] vs paper_forward: mean_abs=0.7630887031555176, max_abs=3.25, mean_rel=0.1253339648246765, max_rel=11.204935073852539, norm_rel=0.022589348256587982, ref_abs_avg=34.41899490356445, test_abs_avg=34.372467041015625
production_forward grad[24] vs paper_forward: mean_abs=0.887994110584259, max_abs=6.0, mean_rel=0.16236604750156403, max_rel=1584.6729736328125, norm_rel=0.02173931524157524, ref_abs_avg=41.04682922363281, test_abs_avg=41.04987335205078
production_forward grad[25] vs paper_forward: mean_abs=0.859093427658081, max_abs=5.0, mean_rel=0.1639350950717926, max_rel=1764.5167236328125, norm_rel=0.02123817801475525, ref_abs_avg=40.69914245605469, test_abs_avg=40.70713806152344
production_forward grad[26] vs paper_forward: mean_abs=0.8378589153289795, max_abs=4.125, mean_rel=0.320159912109375, max_rel=54.64369201660156, norm_rel=0.023769883438944817, ref_abs_avg=35.25503921508789, test_abs_avg=35.23149871826172
production_forward grad[27] vs paper_forward: mean_abs=1.0186790227890015, max_abs=6.328125, mean_rel=0.15660959482192993, max_rel=1042.8310546875, norm_rel=0.02343626506626606, ref_abs_avg=43.63968276977539, test_abs_avg=43.643287658691406
production_forward grad[28] vs paper_forward: mean_abs=0.9943501949310303, max_abs=6.5, mean_rel=0.166854590177536, max_rel=948.3734741210938, norm_rel=0.02298697829246521, ref_abs_avg=43.41062927246094, test_abs_avg=43.41845703125
production_forward grad[29] vs paper_forward: mean_abs=0.7661604881286621, max_abs=3.25, mean_rel=0.11062121391296387, max_rel=6.979058742523193, norm_rel=0.023043395951390266, ref_abs_avg=33.04347610473633, test_abs_avg=33.031368255615234
production_forward grad[30] vs paper_forward: mean_abs=0.9355359077453613, max_abs=6.0, mean_rel=0.1719340682029724, max_rel=1533.757568359375, norm_rel=0.023585747927427292, ref_abs_avg=39.786014556884766, test_abs_avg=39.790653228759766
production_forward grad[31] vs paper_forward: mean_abs=0.9258636236190796, max_abs=5.5, mean_rel=0.16675205528736115, max_rel=1670.1173095703125, norm_rel=0.023709414526820183, ref_abs_avg=39.27376937866211, test_abs_avg=39.27301788330078
production_forward grad[32] vs paper_forward: mean_abs=0.7094627022743225, max_abs=3.4765625, mean_rel=0.35933157801628113, max_rel=138.84605407714844, norm_rel=0.023666536435484886, ref_abs_avg=30.22931480407715, test_abs_avg=30.29995346069336
production_forward grad[33] vs paper_forward: mean_abs=0.8729274868965149, max_abs=5.5, mean_rel=0.16367526352405548, max_rel=1297.687744140625, norm_rel=0.023641133680939674, ref_abs_avg=37.0399284362793, test_abs_avg=37.041839599609375
production_forward grad[34] vs paper_forward: mean_abs=0.8616928458213806, max_abs=6.375, mean_rel=0.17119000852108002, max_rel=1805.40966796875, norm_rel=0.023551424965262413, ref_abs_avg=36.79002380371094, test_abs_avg=36.79553985595703
production_forward grad[35] vs paper_forward: mean_abs=0.6674442291259766, max_abs=3.0, mean_rel=0.07614672183990479, max_rel=2.651134490966797, norm_rel=0.023581182584166527, ref_abs_avg=27.812904357910156, test_abs_avg=27.83342170715332
production_forward grad[36] vs paper_forward: mean_abs=0.8207122683525085, max_abs=5.25, mean_rel=0.1537366807460785, max_rel=1112.49853515625, norm_rel=0.02339106611907482, ref_abs_avg=35.188636779785156, test_abs_avg=35.19355773925781
production_forward grad[37] vs paper_forward: mean_abs=0.8085281848907471, max_abs=5.75, mean_rel=0.16148623824119568, max_rel=1639.078125, norm_rel=0.023354237899184227, ref_abs_avg=34.730743408203125, test_abs_avg=34.736297607421875
production_forward grad[38] vs paper_forward: mean_abs=0.6988983154296875, max_abs=2.25, mean_rel=0.06855233013629913, max_rel=2.2455499172210693, norm_rel=0.02385886386036873, ref_abs_avg=29.01205062866211, test_abs_avg=28.996715545654297
production_forward grad[39] vs paper_forward: mean_abs=0.7829197645187378, max_abs=4.875, mean_rel=0.15177875757217407, max_rel=1173.724609375, norm_rel=0.02317367121577263, ref_abs_avg=33.9554443359375, test_abs_avg=33.955787658691406
production_forward grad[40] vs paper_forward: mean_abs=0.7698202729225159, max_abs=4.5, mean_rel=0.17292520403862, max_rel=2147.77880859375, norm_rel=0.022953351959586143, ref_abs_avg=33.63521194458008, test_abs_avg=33.632781982421875
production_forward grad[41] vs paper_forward: mean_abs=0.6163043975830078, max_abs=2.75, mean_rel=0.10686349868774414, max_rel=6.4511518478393555, norm_rel=0.022721096873283386, ref_abs_avg=27.6386661529541, test_abs_avg=27.63821029663086
production_forward grad[42] vs paper_forward: mean_abs=0.7511023283004761, max_abs=5.0, mean_rel=0.15317308902740479, max_rel=1031.3687744140625, norm_rel=0.022896818816661835, ref_abs_avg=32.888362884521484, test_abs_avg=32.889892578125
production_forward grad[43] vs paper_forward: mean_abs=0.7336824536323547, max_abs=4.625, mean_rel=0.1538611650466919, max_rel=867.6010131835938, norm_rel=0.02291981503367424, ref_abs_avg=32.07781982421875, test_abs_avg=32.07926940917969
production_forward grad[44] vs paper_forward: mean_abs=0.5945565700531006, max_abs=2.125, mean_rel=0.2309250831604004, max_rel=71.05748748779297, norm_rel=0.02132214605808258, ref_abs_avg=27.48475456237793, test_abs_avg=27.494396209716797
production_forward grad[45] vs paper_forward: mean_abs=0.7104511260986328, max_abs=5.0, mean_rel=0.16047386825084686, max_rel=1095.7548828125, norm_rel=0.022764692083001137, ref_abs_avg=31.292888641357422, test_abs_avg=31.294784545898438
production_forward grad[46] vs paper_forward: mean_abs=0.6972839832305908, max_abs=4.0, mean_rel=0.15337595343589783, max_rel=770.6439819335938, norm_rel=0.02262253127992153, ref_abs_avg=30.923824310302734, test_abs_avg=30.93183708190918
production_forward grad[47] vs paper_forward: mean_abs=0.5847258567810059, max_abs=2.25, mean_rel=0.13342899084091187, max_rel=13.698500633239746, norm_rel=0.02463389001786709, ref_abs_avg=24.089447021484375, test_abs_avg=24.059728622436523
production_forward grad[48] vs paper_forward: mean_abs=0.6785945892333984, max_abs=4.25, mean_rel=0.16116754710674286, max_rel=2035.7750244140625, norm_rel=0.02252906747162342, ref_abs_avg=30.15206527709961, test_abs_avg=30.154123306274414
production_forward grad[49] vs paper_forward: mean_abs=0.6640230417251587, max_abs=4.75, mean_rel=0.15917907655239105, max_rel=989.376953125, norm_rel=0.022652069106698036, ref_abs_avg=29.41305160522461, test_abs_avg=29.423778533935547
production_forward grad[50] vs paper_forward: mean_abs=0.6009855270385742, max_abs=2.75, mean_rel=0.30590808391571045, max_rel=84.74063873291016, norm_rel=0.023372234776616096, ref_abs_avg=26.039302825927734, test_abs_avg=26.05905532836914
production_forward grad[51] vs paper_forward: mean_abs=0.7482664585113525, max_abs=5.0, mean_rel=0.1620100438594818, max_rel=1527.5531005859375, norm_rel=0.02379510924220085, ref_abs_avg=31.54035186767578, test_abs_avg=31.54012680053711
production_forward grad[52] vs paper_forward: mean_abs=0.7442808151245117, max_abs=5.0, mean_rel=0.15877792239189148, max_rel=734.7720947265625, norm_rel=0.0237413477152586, ref_abs_avg=31.46586799621582, test_abs_avg=31.46891212463379
production_forward grad[53] vs paper_forward: mean_abs=0.5714994668960571, max_abs=2.25, mean_rel=0.06634700298309326, max_rel=2.058117389678955, norm_rel=0.023802941665053368, ref_abs_avg=23.830116271972656, test_abs_avg=23.843799591064453
production_forward grad[54] vs paper_forward: mean_abs=0.69866943359375, max_abs=5.0, mean_rel=0.15071186423301697, max_rel=727.7001342773438, norm_rel=0.0232868492603302, ref_abs_avg=29.999217987060547, test_abs_avg=30.000696182250977
production_forward grad[55] vs paper_forward: mean_abs=0.6834267377853394, max_abs=5.0, mean_rel=0.15650349855422974, max_rel=1496.2655029296875, norm_rel=0.023304792121052742, ref_abs_avg=29.40024185180664, test_abs_avg=29.396259307861328
production_forward grad[56] vs paper_forward: mean_abs=0.5277454853057861, max_abs=2.34375, mean_rel=0.16013619303703308, max_rel=21.06977653503418, norm_rel=0.022857816889882088, ref_abs_avg=23.68341064453125, test_abs_avg=23.69965362548828
production_forward grad[57] vs paper_forward: mean_abs=0.6459816098213196, max_abs=5.25, mean_rel=0.14975322782993317, max_rel=594.373779296875, norm_rel=0.02293485216796398, ref_abs_avg=28.114574432373047, test_abs_avg=28.117145538330078
production_forward grad[58] vs paper_forward: mean_abs=0.6322903633117676, max_abs=4.25, mean_rel=0.14899979531764984, max_rel=528.4931030273438, norm_rel=0.02295752801001072, ref_abs_avg=27.606937408447266, test_abs_avg=27.608808517456055
production_forward grad[59] vs paper_forward: mean_abs=0.47565770149230957, max_abs=1.75, mean_rel=0.1842278242111206, max_rel=36.64128875732422, norm_rel=0.02057911641895771, ref_abs_avg=23.08111572265625, test_abs_avg=23.094038009643555
production_forward grad[60] vs paper_forward: mean_abs=0.60732102394104, max_abs=4.5, mean_rel=0.1461600810289383, max_rel=772.4725341796875, norm_rel=0.022588498890399933, ref_abs_avg=26.876949310302734, test_abs_avg=26.877120971679688
production_forward grad[61] vs paper_forward: mean_abs=0.5950050950050354, max_abs=3.75, mean_rel=0.16316920518875122, max_rel=1819.9244384765625, norm_rel=0.022128475829958916, ref_abs_avg=26.91661834716797, test_abs_avg=26.912132263183594
production_forward grad[62] vs paper_forward: mean_abs=0.4952225685119629, max_abs=1.75, mean_rel=0.12030421197414398, max_rel=20.569622039794922, norm_rel=0.023924166336655617, ref_abs_avg=21.13725471496582, test_abs_avg=21.119731903076172
production_forward grad[63] vs paper_forward: mean_abs=0.576920747756958, max_abs=4.28125, mean_rel=0.1405215859413147, max_rel=1269.86865234375, norm_rel=0.022399388253688812, ref_abs_avg=25.728130340576172, test_abs_avg=25.728099822998047
production_forward grad[64] vs paper_forward: mean_abs=0.5624563694000244, max_abs=3.5, mean_rel=0.15233802795410156, max_rel=928.9846801757812, norm_rel=0.022127287462353706, ref_abs_avg=25.387470245361328, test_abs_avg=25.38802146911621
production_forward grad[65] vs paper_forward: mean_abs=0.45304054021835327, max_abs=1.90625, mean_rel=0.2935735285282135, max_rel=86.11231231689453, norm_rel=0.022731788456439972, ref_abs_avg=20.866485595703125, test_abs_avg=20.94306182861328
production_forward grad[66] vs paper_forward: mean_abs=0.5459673404693604, max_abs=3.75, mean_rel=0.14732708036899567, max_rel=1007.887939453125, norm_rel=0.021885540336370468, ref_abs_avg=24.93425750732422, test_abs_avg=24.933868408203125
production_forward grad[67] vs paper_forward: mean_abs=0.5319873094558716, max_abs=4.0, mean_rel=0.15008732676506042, max_rel=702.2708129882812, norm_rel=0.021855995059013367, ref_abs_avg=24.36823844909668, test_abs_avg=24.367023468017578
production_forward grad[68] vs paper_forward: mean_abs=0.4083538055419922, max_abs=1.875, mean_rel=0.10909803211688995, max_rel=26.013416290283203, norm_rel=0.021146124228835106, ref_abs_avg=20.096372604370117, test_abs_avg=20.112394332885742
production_forward grad[69] vs paper_forward: mean_abs=0.5131106376647949, max_abs=4.25, mean_rel=0.13226495683193207, max_rel=680.4134521484375, norm_rel=0.021494556218385696, ref_abs_avg=23.855457305908203, test_abs_avg=23.85646629333496
production_forward grad[70] vs paper_forward: mean_abs=0.5094308853149414, max_abs=4.5, mean_rel=0.1352769285440445, max_rel=505.95654296875, norm_rel=0.021138612180948257, ref_abs_avg=24.050918579101562, test_abs_avg=24.061298370361328
production_forward grad[71] vs paper_forward: mean_abs=0.4283332824707031, max_abs=1.5, mean_rel=0.0649823546409607, max_rel=2.1130709648132324, norm_rel=0.021354082971811295, ref_abs_avg=19.712574005126953, test_abs_avg=19.753952026367188
production_forward grad[72] vs paper_forward: mean_abs=0.495402455329895, max_abs=3.8125, mean_rel=0.13940492272377014, max_rel=861.4495239257812, norm_rel=0.020983915776014328, ref_abs_avg=23.603675842285156, test_abs_avg=23.605018615722656
production_forward grad[73] vs paper_forward: mean_abs=0.4787534177303314, max_abs=4.25, mean_rel=0.1417074203491211, max_rel=902.5529174804688, norm_rel=0.02101687714457512, ref_abs_avg=22.84333038330078, test_abs_avg=22.84943962097168
production_forward grad[74] vs paper_forward: mean_abs=0.457466721534729, max_abs=1.84375, mean_rel=0.12308449298143387, max_rel=7.9842329025268555, norm_rel=0.02182035706937313, ref_abs_avg=20.33637237548828, test_abs_avg=20.342700958251953
production_forward grad[75] vs paper_forward: mean_abs=0.5488137006759644, max_abs=4.125, mean_rel=0.15582533180713654, max_rel=1632.3697509765625, norm_rel=0.022687245160341263, ref_abs_avg=24.189090728759766, test_abs_avg=24.18948745727539
production_forward grad[76] vs paper_forward: mean_abs=0.5400019884109497, max_abs=4.25, mean_rel=0.1376447081565857, max_rel=541.89453125, norm_rel=0.02259908989071846, ref_abs_avg=23.925804138183594, test_abs_avg=23.933374404907227
production_forward grad[77] vs paper_forward: mean_abs=0.44320034980773926, max_abs=2.25, mean_rel=0.06540653854608536, max_rel=2.6532294750213623, norm_rel=0.023005900904536247, ref_abs_avg=19.04680633544922, test_abs_avg=19.062519073486328
production_forward grad[78] vs paper_forward: mean_abs=0.5051989555358887, max_abs=4.5, mean_rel=0.14325714111328125, max_rel=1005.7230834960938, norm_rel=0.022303299978375435, ref_abs_avg=22.66328239440918, test_abs_avg=22.663867950439453
production_forward grad[79] vs paper_forward: mean_abs=0.48750072717666626, max_abs=4.0, mean_rel=0.14723879098892212, max_rel=730.1387939453125, norm_rel=0.022138874977827072, ref_abs_avg=22.04354476928711, test_abs_avg=22.046199798583984
production_forward grad[80] vs paper_forward: mean_abs=0.4233725070953369, max_abs=1.75, mean_rel=0.2895393371582031, max_rel=91.23295593261719, norm_rel=0.024326667189598083, ref_abs_avg=17.375699996948242, test_abs_avg=17.365489959716797
production_forward grad[81] vs paper_forward: mean_abs=0.4655854403972626, max_abs=4.5, mean_rel=0.13545265793800354, max_rel=600.6502075195312, norm_rel=0.02159196324646473, ref_abs_avg=21.569000244140625, test_abs_avg=21.56814193725586
production_forward grad[82] vs paper_forward: mean_abs=0.4565446376800537, max_abs=4.0, mean_rel=0.13362063467502594, max_rel=501.28192138671875, norm_rel=0.021762464195489883, ref_abs_avg=21.109893798828125, test_abs_avg=21.110458374023438
production_forward grad[83] vs paper_forward: mean_abs=0.37268805503845215, max_abs=1.625, mean_rel=0.06513912975788116, max_rel=3.3103151321411133, norm_rel=0.021855561062693596, ref_abs_avg=17.163105010986328, test_abs_avg=17.145309448242188
production_forward grad[84] vs paper_forward: mean_abs=0.4381354749202728, max_abs=3.625, mean_rel=0.13949260115623474, max_rel=1243.6510009765625, norm_rel=0.020915746688842773, ref_abs_avg=20.95899200439453, test_abs_avg=20.959041595458984
production_forward grad[85] vs paper_forward: mean_abs=0.42606061697006226, max_abs=4.0, mean_rel=0.14523820579051971, max_rel=1239.5767822265625, norm_rel=0.02075940929353237, ref_abs_avg=20.612031936645508, test_abs_avg=20.616046905517578
production_forward grad[86] vs paper_forward: mean_abs=0.37322938442230225, max_abs=1.84375, mean_rel=0.24428525567054749, max_rel=57.14579391479492, norm_rel=0.02248804271221161, ref_abs_avg=16.377037048339844, test_abs_avg=16.407629013061523
production_forward grad[87] vs paper_forward: mean_abs=0.42418989539146423, max_abs=4.0, mean_rel=0.13419920206069946, max_rel=924.8565673828125, norm_rel=0.02068844810128212, ref_abs_avg=20.54326057434082, test_abs_avg=20.54324722290039
production_forward grad[88] vs paper_forward: mean_abs=0.4048057794570923, max_abs=4.0, mean_rel=0.12365461885929108, max_rel=550.7666015625, norm_rel=0.020677361637353897, ref_abs_avg=19.708011627197266, test_abs_avg=19.702545166015625
production_forward grad[89] vs paper_forward: mean_abs=0.34996938705444336, max_abs=1.640625, mean_rel=0.10180157423019409, max_rel=11.06590461730957, norm_rel=0.021970832720398903, ref_abs_avg=16.096406936645508, test_abs_avg=16.108661651611328
production_forward grad[90] vs paper_forward: mean_abs=0.3961842656135559, max_abs=4.5, mean_rel=0.12734833359718323, max_rel=606.1466064453125, norm_rel=0.020201411098241806, ref_abs_avg=19.76224136352539, test_abs_avg=19.76144027709961
production_forward grad[91] vs paper_forward: mean_abs=0.3786216974258423, max_abs=3.5, mean_rel=0.13540714979171753, max_rel=841.0860595703125, norm_rel=0.01960371620953083, ref_abs_avg=19.46263885498047, test_abs_avg=19.46312141418457
production_forward grad[92] vs paper_forward: mean_abs=0.31426477432250977, max_abs=1.5, mean_rel=0.09019685536623001, max_rel=6.70442008972168, norm_rel=0.020425528287887573, ref_abs_avg=15.239168167114258, test_abs_avg=15.252007484436035
production_forward grad[93] vs paper_forward: mean_abs=0.37280163168907166, max_abs=4.0, mean_rel=0.13196013867855072, max_rel=835.7862548828125, norm_rel=0.01974308490753174, ref_abs_avg=19.105548858642578, test_abs_avg=19.105056762695312
production_forward grad[94] vs paper_forward: mean_abs=0.36208611726760864, max_abs=3.25, mean_rel=0.12735429406166077, max_rel=639.3565673828125, norm_rel=0.019219113513827324, ref_abs_avg=18.922073364257812, test_abs_avg=18.917926788330078
production_forward grad[95] vs paper_forward: mean_abs=0.29758715629577637, max_abs=1.0625, mean_rel=0.09758293628692627, max_rel=13.974201202392578, norm_rel=0.019364261999726295, ref_abs_avg=15.38458251953125, test_abs_avg=15.399971961975098
production_forward grad[96] vs paper_forward: mean_abs=0.34535640478134155, max_abs=3.75, mean_rel=0.11968280375003815, max_rel=442.0666809082031, norm_rel=0.019334081560373306, ref_abs_avg=18.128787994384766, test_abs_avg=18.127880096435547
production_forward grad[97] vs paper_forward: mean_abs=0.3411421775817871, max_abs=4.5, mean_rel=0.124972403049469, max_rel=512.697021484375, norm_rel=0.01964125595986843, ref_abs_avg=17.700794219970703, test_abs_avg=17.708415985107422
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016780780861154199, max_abs=0.046875
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.00877992995083332, max_abs=0.34375, mean_rel=0.07465247809886932, max_rel=88.14049530029297, norm_rel=0.020229939371347427, ref_abs_avg=0.46651750802993774, test_abs_avg=0.4665219187736511
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.452233791351318, max_abs=65.0, mean_rel=0.1418132334947586, max_rel=211.61416625976562, norm_rel=0.021043715998530388, ref_abs_avg=322.8476257324219, test_abs_avg=322.9100341796875
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.2684340476989746, max_abs=5.5, mean_rel=0.25741931796073914, max_rel=63.976375579833984, norm_rel=0.023610077798366547, ref_abs_avg=52.16645431518555, test_abs_avg=52.19837951660156
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.5409648418426514, max_abs=10.25, mean_rel=0.16985774040222168, max_rel=1704.0263671875, norm_rel=0.023501716554164886, ref_abs_avg=65.89208221435547, test_abs_avg=65.89197540283203
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5129122734069824, max_abs=10.0, mean_rel=0.16236896812915802, max_rel=869.5033569335938, norm_rel=0.02331257052719593, ref_abs_avg=65.16464233398438, test_abs_avg=65.16741180419922
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.0499858856201172, max_abs=4.4375, mean_rel=0.10349386185407639, max_rel=16.81983184814453, norm_rel=0.021459853276610374, ref_abs_avg=49.93217086791992, test_abs_avg=49.934532165527344
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.365875005722046, max_abs=9.15625, mean_rel=0.16339422762393951, max_rel=1175.934326171875, norm_rel=0.023257484659552574, ref_abs_avg=58.9739990234375, test_abs_avg=58.97454071044922
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3469839096069336, max_abs=8.25, mean_rel=0.15879015624523163, max_rel=908.8097534179688, norm_rel=0.023175809532403946, ref_abs_avg=58.462459564208984, test_abs_avg=58.4640007019043
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0913219451904297, max_abs=4.25, mean_rel=0.07900571823120117, max_rel=3.7892558574676514, norm_rel=0.0243380069732666, ref_abs_avg=44.58271026611328, test_abs_avg=44.54222106933594
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.2535772323608398, max_abs=8.5, mean_rel=0.16624867916107178, max_rel=2460.458984375, norm_rel=0.023024700582027435, ref_abs_avg=54.6953010559082, test_abs_avg=54.697452545166016
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.2167502641677856, max_abs=8.75, mean_rel=0.14399772882461548, max_rel=707.9253540039062, norm_rel=0.022738739848136902, ref_abs_avg=53.764259338378906, test_abs_avg=53.777225494384766
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9723529815673828, max_abs=3.5, mean_rel=0.0926976203918457, max_rel=3.812636137008667, norm_rel=0.023221097886562347, ref_abs_avg=41.35790252685547, test_abs_avg=41.349082946777344
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.1502928733825684, max_abs=7.0, mean_rel=0.1656491905450821, max_rel=1811.8936767578125, norm_rel=0.022946732118725777, ref_abs_avg=50.374488830566406, test_abs_avg=50.37736511230469
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1192800998687744, max_abs=6.75, mean_rel=0.14967632293701172, max_rel=706.3880004882812, norm_rel=0.022503895685076714, ref_abs_avg=50.010154724121094, test_abs_avg=50.01987838745117
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8361475467681885, max_abs=3.5, mean_rel=0.06865512579679489, max_rel=2.6404428482055664, norm_rel=0.021615147590637207, ref_abs_avg=40.15221405029297, test_abs_avg=40.122352600097656
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.0716540813446045, max_abs=7.25, mean_rel=0.16593892872333527, max_rel=1521.350341796875, norm_rel=0.02250879630446434, ref_abs_avg=47.802940368652344, test_abs_avg=47.80359649658203
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0436246395111084, max_abs=6.625, mean_rel=0.15806809067726135, max_rel=1865.16357421875, norm_rel=0.02230551466345787, ref_abs_avg=47.055938720703125, test_abs_avg=47.050010681152344
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8297169208526611, max_abs=3.5, mean_rel=0.10912599414587021, max_rel=22.56780242919922, norm_rel=0.021980540826916695, ref_abs_avg=38.54402160644531, test_abs_avg=38.46079635620117
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0081608295440674, max_abs=6.0, mean_rel=0.16261133551597595, max_rel=1973.251220703125, norm_rel=0.022482289001345634, ref_abs_avg=45.05385971069336, test_abs_avg=45.056304931640625
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9845084547996521, max_abs=6.75, mean_rel=0.15700724720954895, max_rel=2083.697265625, norm_rel=0.02217879891395569, ref_abs_avg=44.62399673461914, test_abs_avg=44.626277923583984
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7627121210098267, max_abs=2.75, mean_rel=0.22903534770011902, max_rel=57.227840423583984, norm_rel=0.021510036662220955, ref_abs_avg=34.58073043823242, test_abs_avg=34.61970520019531
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9460799694061279, max_abs=6.0, mean_rel=0.14493273198604584, max_rel=1206.504150390625, norm_rel=0.022266918793320656, ref_abs_avg=42.680755615234375, test_abs_avg=42.68270492553711
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9300358891487122, max_abs=6.0, mean_rel=0.15901963412761688, max_rel=1944.4520263671875, norm_rel=0.022114703431725502, ref_abs_avg=42.2696533203125, test_abs_avg=42.271671295166016
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7557909488677979, max_abs=3.0, mean_rel=0.12809859216213226, max_rel=17.524250030517578, norm_rel=0.022125978022813797, ref_abs_avg=34.41899490356445, test_abs_avg=34.333133697509766
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9091477394104004, max_abs=6.0, mean_rel=0.16365119814872742, max_rel=1415.3876953125, norm_rel=0.02225186675786972, ref_abs_avg=41.04682922363281, test_abs_avg=41.050872802734375
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8816295862197876, max_abs=5.25, mean_rel=0.16634798049926758, max_rel=1807.032958984375, norm_rel=0.02178107760846615, ref_abs_avg=40.69914245605469, test_abs_avg=40.70579528808594
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8386512994766235, max_abs=3.5, mean_rel=0.3691246211528778, max_rel=83.96066284179688, norm_rel=0.0238195713609457, ref_abs_avg=35.25503921508789, test_abs_avg=35.240257263183594
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.0430762767791748, max_abs=6.5, mean_rel=0.16220912337303162, max_rel=968.9965209960938, norm_rel=0.02400226704776287, ref_abs_avg=43.63968276977539, test_abs_avg=43.64240646362305
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0184547901153564, max_abs=7.0, mean_rel=0.165298193693161, max_rel=731.11572265625, norm_rel=0.023512769490480423, ref_abs_avg=43.41062927246094, test_abs_avg=43.420955657958984
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.7937315702438354, max_abs=3.5, mean_rel=0.1356075406074524, max_rel=16.158964157104492, norm_rel=0.023936135694384575, ref_abs_avg=33.04347610473633, test_abs_avg=33.06157684326172
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.9565240740776062, max_abs=5.5, mean_rel=0.17326399683952332, max_rel=1735.8223876953125, norm_rel=0.024112049490213394, ref_abs_avg=39.786014556884766, test_abs_avg=39.78929138183594
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9457715153694153, max_abs=5.5, mean_rel=0.1686105579137802, max_rel=2128.276123046875, norm_rel=0.02420344576239586, ref_abs_avg=39.27376937866211, test_abs_avg=39.26724624633789
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7385812401771545, max_abs=3.0, mean_rel=0.48114562034606934, max_rel=197.79443359375, norm_rel=0.024806542322039604, ref_abs_avg=30.22931480407715, test_abs_avg=30.3206787109375
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.8909677267074585, max_abs=5.8984375, mean_rel=0.16701120138168335, max_rel=1438.88525390625, norm_rel=0.024125687777996063, ref_abs_avg=37.0399284362793, test_abs_avg=37.04045486450195
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8799139261245728, max_abs=5.625, mean_rel=0.18128129839897156, max_rel=2231.863037109375, norm_rel=0.02403837814927101, ref_abs_avg=36.79002380371094, test_abs_avg=36.79677200317383
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.6950201988220215, max_abs=3.0, mean_rel=0.08741530776023865, max_rel=3.999708414077759, norm_rel=0.024526091292500496, ref_abs_avg=27.812904357910156, test_abs_avg=27.834814071655273
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8366711139678955, max_abs=5.5, mean_rel=0.15768763422966003, max_rel=1260.58984375, norm_rel=0.02383802831172943, ref_abs_avg=35.188636779785156, test_abs_avg=35.191253662109375
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8250025510787964, max_abs=5.5, mean_rel=0.1628946214914322, max_rel=1475.1749267578125, norm_rel=0.02380065806210041, ref_abs_avg=34.730743408203125, test_abs_avg=34.73527526855469
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.7072811126708984, max_abs=2.5, mean_rel=0.0697038471698761, max_rel=2.934913396835327, norm_rel=0.0242186076939106, ref_abs_avg=29.01205062866211, test_abs_avg=28.971574783325195
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.7975740432739258, max_abs=5.25, mean_rel=0.15618166327476501, max_rel=1016.0261840820312, norm_rel=0.023594025522470474, ref_abs_avg=33.9554443359375, test_abs_avg=33.95415496826172
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7847886085510254, max_abs=4.75, mean_rel=0.17028895020484924, max_rel=2230.851318359375, norm_rel=0.023388655856251717, ref_abs_avg=33.63521194458008, test_abs_avg=33.63249206542969
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6231136322021484, max_abs=3.125, mean_rel=0.10810348391532898, max_rel=8.64121150970459, norm_rel=0.02313200570642948, ref_abs_avg=27.6386661529541, test_abs_avg=27.635372161865234
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7641234397888184, max_abs=4.75, mean_rel=0.15458670258522034, max_rel=941.6992797851562, norm_rel=0.0232850294560194, ref_abs_avg=32.888362884521484, test_abs_avg=32.88918685913086
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7446024417877197, max_abs=4.5, mean_rel=0.15999296307563782, max_rel=804.2412719726562, norm_rel=0.023240625858306885, ref_abs_avg=32.07781982421875, test_abs_avg=32.079627990722656
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5982091426849365, max_abs=2.25, mean_rel=0.16809344291687012, max_rel=38.28413009643555, norm_rel=0.021846482530236244, ref_abs_avg=27.48475456237793, test_abs_avg=27.504547119140625
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7209949493408203, max_abs=4.875, mean_rel=0.16319073736667633, max_rel=1091.3648681640625, norm_rel=0.0230901800096035, ref_abs_avg=31.292888641357422, test_abs_avg=31.294200897216797
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7078068256378174, max_abs=4.25, mean_rel=0.15873263776302338, max_rel=821.0376586914062, norm_rel=0.022980937734246254, ref_abs_avg=30.923824310302734, test_abs_avg=30.93286895751953
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5806441307067871, max_abs=2.25, mean_rel=0.1519218236207962, max_rel=22.00667381286621, norm_rel=0.024559609591960907, ref_abs_avg=24.089447021484375, test_abs_avg=24.055204391479492
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.6882309913635254, max_abs=4.40625, mean_rel=0.16075731813907623, max_rel=1589.601318359375, norm_rel=0.022837411612272263, ref_abs_avg=30.15206527709961, test_abs_avg=30.153995513916016
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6742218136787415, max_abs=4.5, mean_rel=0.1629122793674469, max_rel=1410.2080078125, norm_rel=0.02297697216272354, ref_abs_avg=29.41305160522461, test_abs_avg=29.421710968017578
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6043688058853149, max_abs=3.0, mean_rel=0.2168901413679123, max_rel=56.31364822387695, norm_rel=0.02369692362844944, ref_abs_avg=26.039302825927734, test_abs_avg=26.049877166748047
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.7612863779067993, max_abs=5.0, mean_rel=0.1654927283525467, max_rel=1781.322021484375, norm_rel=0.024196816608309746, ref_abs_avg=31.54035186767578, test_abs_avg=31.538253784179688
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7576199173927307, max_abs=5.0, mean_rel=0.15903836488723755, max_rel=943.1220703125, norm_rel=0.024154983460903168, ref_abs_avg=31.46586799621582, test_abs_avg=31.461891174316406
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5484001636505127, max_abs=2.125, mean_rel=0.06944252550601959, max_rel=2.0294766426086426, norm_rel=0.023146847262978554, ref_abs_avg=23.830116271972656, test_abs_avg=23.82872772216797
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7092604637145996, max_abs=5.0, mean_rel=0.15392285585403442, max_rel=729.0913696289062, norm_rel=0.023630063980817795, ref_abs_avg=29.999217987060547, test_abs_avg=29.99888801574707
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6928954124450684, max_abs=5.0, mean_rel=0.15474370121955872, max_rel=1152.69189453125, norm_rel=0.02362755499780178, ref_abs_avg=29.40024185180664, test_abs_avg=29.392192840576172
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5344392657279968, max_abs=2.09375, mean_rel=0.17862269282341003, max_rel=29.511699676513672, norm_rel=0.02298571914434433, ref_abs_avg=23.68341064453125, test_abs_avg=23.72087860107422
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6555991172790527, max_abs=4.84375, mean_rel=0.15144793689250946, max_rel=1026.83203125, norm_rel=0.02326934225857258, ref_abs_avg=28.114574432373047, test_abs_avg=28.115976333618164
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6450064778327942, max_abs=4.390625, mean_rel=0.14952020347118378, max_rel=356.34259033203125, norm_rel=0.023391341790556908, ref_abs_avg=27.606937408447266, test_abs_avg=27.606616973876953
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.48535656929016113, max_abs=2.25, mean_rel=0.16856199502944946, max_rel=22.36921501159668, norm_rel=0.021261390298604965, ref_abs_avg=23.08111572265625, test_abs_avg=23.073566436767578
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.616461992263794, max_abs=4.875, mean_rel=0.1524580866098404, max_rel=1233.8065185546875, norm_rel=0.022917520254850388, ref_abs_avg=26.876949310302734, test_abs_avg=26.876157760620117
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.6010735034942627, max_abs=4.0, mean_rel=0.15573115646839142, max_rel=1539.3572998046875, norm_rel=0.02235490456223488, ref_abs_avg=26.91661834716797, test_abs_avg=26.911924362182617
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.5199074745178223, max_abs=2.0, mean_rel=0.12399578839540482, max_rel=17.917634963989258, norm_rel=0.025365274399518967, ref_abs_avg=21.13725471496582, test_abs_avg=21.115867614746094
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.5842752456665039, max_abs=4.75, mean_rel=0.14657674729824066, max_rel=1442.187255859375, norm_rel=0.02265068329870701, ref_abs_avg=25.728130340576172, test_abs_avg=25.728004455566406
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5667513608932495, max_abs=4.5, mean_rel=0.14937034249305725, max_rel=891.8517456054688, norm_rel=0.02230294793844223, ref_abs_avg=25.387470245361328, test_abs_avg=25.386911392211914
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4486081600189209, max_abs=1.9375, mean_rel=0.36269915103912354, max_rel=90.14137268066406, norm_rel=0.02229248359799385, ref_abs_avg=20.866485595703125, test_abs_avg=20.905376434326172
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5517703294754028, max_abs=4.0625, mean_rel=0.1479337513446808, max_rel=836.2437744140625, norm_rel=0.022112855687737465, ref_abs_avg=24.93425750732422, test_abs_avg=24.93344497680664
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.538335919380188, max_abs=4.375, mean_rel=0.1533312201499939, max_rel=735.7191162109375, norm_rel=0.022133702412247658, ref_abs_avg=24.36823844909668, test_abs_avg=24.36750030517578
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.40673792362213135, max_abs=2.0, mean_rel=0.10895776748657227, max_rel=24.728803634643555, norm_rel=0.02081638015806675, ref_abs_avg=20.096372604370117, test_abs_avg=20.09025764465332
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5178429484367371, max_abs=4.75, mean_rel=0.13396534323692322, max_rel=674.0263061523438, norm_rel=0.021680422127246857, ref_abs_avg=23.855457305908203, test_abs_avg=23.85614776611328
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5142806768417358, max_abs=4.5, mean_rel=0.13363519310951233, max_rel=508.4471435546875, norm_rel=0.021336207166314125, ref_abs_avg=24.050918579101562, test_abs_avg=24.06073760986328
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.4180183410644531, max_abs=1.75, mean_rel=0.06422808766365051, max_rel=2.114929437637329, norm_rel=0.021405505016446114, ref_abs_avg=19.712574005126953, test_abs_avg=19.744152069091797
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.49957865476608276, max_abs=3.875, mean_rel=0.13977421820163727, max_rel=871.9111938476562, norm_rel=0.0211450457572937, ref_abs_avg=23.603675842285156, test_abs_avg=23.604408264160156
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.4824655055999756, max_abs=3.8125, mean_rel=0.14219625294208527, max_rel=673.671142578125, norm_rel=0.02116270549595356, ref_abs_avg=22.84333038330078, test_abs_avg=22.84661102294922
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.45392727851867676, max_abs=1.8125, mean_rel=0.12489204108715057, max_rel=7.523880481719971, norm_rel=0.021938947960734367, ref_abs_avg=20.33637237548828, test_abs_avg=20.31636619567871
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5562312602996826, max_abs=4.125, mean_rel=0.15671464800834656, max_rel=1050.994140625, norm_rel=0.022977083921432495, ref_abs_avg=24.189090728759766, test_abs_avg=24.188587188720703
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5408885478973389, max_abs=4.25, mean_rel=0.1421942412853241, max_rel=669.1239013671875, norm_rel=0.02264842763543129, ref_abs_avg=23.925804138183594, test_abs_avg=23.93328094482422
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.4428997039794922, max_abs=2.25, mean_rel=0.061327189207077026, max_rel=1.1016521453857422, norm_rel=0.023099932819604874, ref_abs_avg=19.04680633544922, test_abs_avg=19.06157875061035
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5105271935462952, max_abs=4.25, mean_rel=0.14389817416667938, max_rel=785.94384765625, norm_rel=0.02252958156168461, ref_abs_avg=22.66328239440918, test_abs_avg=22.66244125366211
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.49284595251083374, max_abs=3.75, mean_rel=0.15037038922309875, max_rel=674.5906982421875, norm_rel=0.022344570606946945, ref_abs_avg=22.04354476928711, test_abs_avg=22.043903350830078
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.42059338092803955, max_abs=1.8125, mean_rel=0.25919222831726074, max_rel=76.00902557373047, norm_rel=0.024359412491321564, ref_abs_avg=17.375699996948242, test_abs_avg=17.366992950439453
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.4706505537033081, max_abs=4.5, mean_rel=0.13861458003520966, max_rel=565.4324951171875, norm_rel=0.02180812694132328, ref_abs_avg=21.569000244140625, test_abs_avg=21.56741714477539
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.46010029315948486, max_abs=4.0, mean_rel=0.1330966353416443, max_rel=599.27001953125, norm_rel=0.02188112586736679, ref_abs_avg=21.109893798828125, test_abs_avg=21.11056900024414
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3837570548057556, max_abs=1.5, mean_rel=0.0883035659790039, max_rel=10.449711799621582, norm_rel=0.022568123415112495, ref_abs_avg=17.163105010986328, test_abs_avg=17.13547134399414
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.4420870840549469, max_abs=4.0, mean_rel=0.14048849046230316, max_rel=780.6609497070312, norm_rel=0.021093379706144333, ref_abs_avg=20.95899200439453, test_abs_avg=20.959136962890625
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.43150797486305237, max_abs=4.296875, mean_rel=0.1501089334487915, max_rel=1165.584716796875, norm_rel=0.02101938985288143, ref_abs_avg=20.612031936645508, test_abs_avg=20.618669509887695
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.3589237928390503, max_abs=1.5, mean_rel=0.12428046017885208, max_rel=17.95048713684082, norm_rel=0.021964645013213158, ref_abs_avg=16.377037048339844, test_abs_avg=16.396333694458008
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.42762690782546997, max_abs=3.875, mean_rel=0.1348266303539276, max_rel=793.5531005859375, norm_rel=0.020843936130404472, ref_abs_avg=20.54326057434082, test_abs_avg=20.543373107910156
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.4074852466583252, max_abs=3.75, mean_rel=0.12473501265048981, max_rel=419.2678527832031, norm_rel=0.020760543644428253, ref_abs_avg=19.708011627197266, test_abs_avg=19.696924209594727
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.3599088191986084, max_abs=1.375, mean_rel=0.09149396419525146, max_rel=10.630270004272461, norm_rel=0.022556273266673088, ref_abs_avg=16.096406936645508, test_abs_avg=16.098072052001953
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.39804887771606445, max_abs=3.5, mean_rel=0.13007664680480957, max_rel=462.7375793457031, norm_rel=0.02029607631266117, ref_abs_avg=19.76224136352539, test_abs_avg=19.761585235595703
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.38566216826438904, max_abs=3.5, mean_rel=0.13329020142555237, max_rel=812.5581665039062, norm_rel=0.019965287297964096, ref_abs_avg=19.46263885498047, test_abs_avg=19.461347579956055
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.3320331573486328, max_abs=1.375, mean_rel=0.10027190297842026, max_rel=10.35149097442627, norm_rel=0.021004633978009224, ref_abs_avg=15.239168167114258, test_abs_avg=15.26327133178711
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.373796671628952, max_abs=4.5, mean_rel=0.13128924369812012, max_rel=683.76123046875, norm_rel=0.01979970932006836, ref_abs_avg=19.105548858642578, test_abs_avg=19.105945587158203
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3654809594154358, max_abs=3.5, mean_rel=0.1267908364534378, max_rel=581.2339477539062, norm_rel=0.01942875236272812, ref_abs_avg=18.922073364257812, test_abs_avg=18.916046142578125
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.28026819229125977, max_abs=1.25, mean_rel=0.08582362532615662, max_rel=6.386861324310303, norm_rel=0.01859413832426071, ref_abs_avg=15.38458251953125, test_abs_avg=15.397954940795898
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3456549644470215, max_abs=3.5, mean_rel=0.12262926995754242, max_rel=752.8732299804688, norm_rel=0.019358348101377487, ref_abs_avg=18.128787994384766, test_abs_avg=18.128103256225586
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.33899354934692383, max_abs=4.0, mean_rel=0.12388203293085098, max_rel=330.0997009277344, norm_rel=0.01955028809607029, ref_abs_avg=17.700794219970703, test_abs_avg=17.707454681396484
identity layers + randn queries
production_forward fwd+bwd:  66.272 ms
production_forward bwd-only: 56.472 ms
production_forward peak allocated: fwd=7.614 GiB, fwd+bwd=15.618 GiB
production_forward peak reserved:  fwd=27.373 GiB, fwd+bwd=27.373 GiB
torch_compile_phases_forward fwd+bwd:  94.949 ms
torch_compile_phases_forward bwd-only: 76.581 ms
torch_compile_phases_forward peak allocated: fwd=18.203 GiB, fwd+bwd=18.831 GiB
torch_compile_phases_forward peak reserved:  fwd=27.398 GiB, fwd+bwd=27.398 GiB
paper_forward fwd+bwd:  221.202 ms
paper_forward bwd-only: 174.006 ms
paper_forward peak allocated: fwd=35.128 GiB, fwd+bwd=37.247 GiB
paper_forward peak reserved:  fwd=36.168 GiB, fwd+bwd=38.668 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016224713763222098, max_abs=0.0390625
production_forward grad[0] vs paper_forward: mean_abs=0.00827695894986391, max_abs=0.333984375, mean_rel=0.07278728485107422, max_rel=98.47589111328125, norm_rel=0.019936250522732735, ref_abs_avg=0.4497406482696533, test_abs_avg=0.44974905252456665
production_forward grad[1] vs paper_forward: mean_abs=7.202285289764404, max_abs=64.0, mean_rel=0.3898948132991791, max_rel=3347.5908203125, norm_rel=0.020192643627524376, ref_abs_avg=313.1722106933594, test_abs_avg=313.10504150390625
production_forward grad[2] vs paper_forward: mean_abs=1.1916487216949463, max_abs=4.5625, mean_rel=0.19599786400794983, max_rel=54.6782341003418, norm_rel=0.022544993087649345, ref_abs_avg=53.334449768066406, test_abs_avg=53.27031707763672
production_forward grad[3] vs paper_forward: mean_abs=1.484786033630371, max_abs=9.6875, mean_rel=0.15628892183303833, max_rel=1184.0538330078125, norm_rel=0.02301466464996338, ref_abs_avg=64.83358001708984, test_abs_avg=64.835693359375
production_forward grad[4] vs paper_forward: mean_abs=1.457876443862915, max_abs=8.75, mean_rel=0.15029096603393555, max_rel=1370.21826171875, norm_rel=0.022869471460580826, ref_abs_avg=64.0855712890625, test_abs_avg=64.09314727783203
production_forward grad[5] vs paper_forward: mean_abs=1.1261405944824219, max_abs=4.25, mean_rel=0.11603762209415436, max_rel=9.751688003540039, norm_rel=0.023655608296394348, ref_abs_avg=47.56481170654297, test_abs_avg=47.52429962158203
production_forward grad[6] vs paper_forward: mean_abs=1.2985591888427734, max_abs=8.0, mean_rel=0.16667436063289642, max_rel=2137.920166015625, norm_rel=0.022733384743332863, ref_abs_avg=57.45222473144531, test_abs_avg=57.454734802246094
production_forward grad[7] vs paper_forward: mean_abs=1.2812000513076782, max_abs=9.0, mean_rel=0.1569652110338211, max_rel=1649.2181396484375, norm_rel=0.02257259003818035, ref_abs_avg=57.01789855957031, test_abs_avg=57.015769958496094
production_forward grad[8] vs paper_forward: mean_abs=0.9306869506835938, max_abs=3.75, mean_rel=0.1203928291797638, max_rel=15.830279350280762, norm_rel=0.020925253629684448, ref_abs_avg=46.17435073852539, test_abs_avg=46.12931442260742
production_forward grad[9] vs paper_forward: mean_abs=1.1813470125198364, max_abs=7.75, mean_rel=0.15841877460479736, max_rel=2017.3170166015625, norm_rel=0.02250356785953045, ref_abs_avg=52.798973083496094, test_abs_avg=52.804840087890625
production_forward grad[10] vs paper_forward: mean_abs=1.1556484699249268, max_abs=7.0, mean_rel=0.1551448553800583, max_rel=2464.28125, norm_rel=0.02239762246608734, ref_abs_avg=51.854881286621094, test_abs_avg=51.86091613769531
production_forward grad[11] vs paper_forward: mean_abs=0.9122276306152344, max_abs=3.625, mean_rel=0.11223051697015762, max_rel=16.212194442749023, norm_rel=0.02327592484652996, ref_abs_avg=39.548519134521484, test_abs_avg=39.606040954589844
production_forward grad[12] vs paper_forward: mean_abs=1.095879316329956, max_abs=7.25, mean_rel=0.15554115176200867, max_rel=937.1692504882812, norm_rel=0.022538375109434128, ref_abs_avg=48.850929260253906, test_abs_avg=48.851531982421875
production_forward grad[13] vs paper_forward: mean_abs=1.0664963722229004, max_abs=6.5, mean_rel=0.15641352534294128, max_rel=1401.6605224609375, norm_rel=0.0222479235380888, ref_abs_avg=48.106849670410156, test_abs_avg=48.10955047607422
production_forward grad[14] vs paper_forward: mean_abs=0.8480415344238281, max_abs=3.5, mean_rel=0.12014049291610718, max_rel=26.50493049621582, norm_rel=0.023049550130963326, ref_abs_avg=36.18433380126953, test_abs_avg=36.18309783935547
production_forward grad[15] vs paper_forward: mean_abs=1.022046446800232, max_abs=6.0, mean_rel=0.15844061970710754, max_rel=1638.268310546875, norm_rel=0.022347526624798775, ref_abs_avg=45.97761535644531, test_abs_avg=45.97702407836914
production_forward grad[16] vs paper_forward: mean_abs=0.9947541356086731, max_abs=6.25, mean_rel=0.15564292669296265, max_rel=1459.522705078125, norm_rel=0.022051263600587845, ref_abs_avg=45.35612487792969, test_abs_avg=45.356361389160156
production_forward grad[17] vs paper_forward: mean_abs=0.8300647735595703, max_abs=3.125, mean_rel=0.06186477094888687, max_rel=2.268611192703247, norm_rel=0.023002732545137405, ref_abs_avg=36.23204803466797, test_abs_avg=36.20323944091797
production_forward grad[18] vs paper_forward: mean_abs=0.9631320238113403, max_abs=5.6875, mean_rel=0.15564483404159546, max_rel=1701.734375, norm_rel=0.022154297679662704, ref_abs_avg=43.65232849121094, test_abs_avg=43.65516662597656
production_forward grad[19] vs paper_forward: mean_abs=0.9418193101882935, max_abs=6.5, mean_rel=0.1612066924571991, max_rel=1458.664794921875, norm_rel=0.021809956058859825, ref_abs_avg=43.420387268066406, test_abs_avg=43.41856002807617
production_forward grad[20] vs paper_forward: mean_abs=0.7405233383178711, max_abs=3.0, mean_rel=0.20319674909114838, max_rel=46.774105072021484, norm_rel=0.021409522742033005, ref_abs_avg=35.78038024902344, test_abs_avg=35.69776153564453
production_forward grad[21] vs paper_forward: mean_abs=0.9127205610275269, max_abs=5.5, mean_rel=0.14413991570472717, max_rel=1104.5697021484375, norm_rel=0.02198379673063755, ref_abs_avg=41.705692291259766, test_abs_avg=41.704952239990234
production_forward grad[22] vs paper_forward: mean_abs=0.8917883634567261, max_abs=5.25, mean_rel=0.15283489227294922, max_rel=1436.5477294921875, norm_rel=0.021686771884560585, ref_abs_avg=41.38194274902344, test_abs_avg=41.38479232788086
production_forward grad[23] vs paper_forward: mean_abs=0.744788408279419, max_abs=2.9375, mean_rel=0.1043906956911087, max_rel=6.413495063781738, norm_rel=0.02283654734492302, ref_abs_avg=33.76665496826172, test_abs_avg=33.815696716308594
production_forward grad[24] vs paper_forward: mean_abs=0.8667140007019043, max_abs=5.625, mean_rel=0.13954366743564606, max_rel=1070.2198486328125, norm_rel=0.021950747817754745, ref_abs_avg=39.67695999145508, test_abs_avg=39.677101135253906
production_forward grad[25] vs paper_forward: mean_abs=0.8518658876419067, max_abs=5.5, mean_rel=0.16684234142303467, max_rel=1833.4644775390625, norm_rel=0.02171671949326992, ref_abs_avg=39.423641204833984, test_abs_avg=39.42488098144531
production_forward grad[26] vs paper_forward: mean_abs=0.8470664024353027, max_abs=3.4375, mean_rel=0.1594257950782776, max_rel=34.70075225830078, norm_rel=0.025331920012831688, ref_abs_avg=33.748287200927734, test_abs_avg=33.856170654296875
production_forward grad[27] vs paper_forward: mean_abs=0.9890762567520142, max_abs=6.0625, mean_rel=0.15776307880878448, max_rel=1924.3199462890625, norm_rel=0.023916395381093025, ref_abs_avg=41.57036590576172, test_abs_avg=41.573448181152344
production_forward grad[28] vs paper_forward: mean_abs=0.9682130813598633, max_abs=6.625, mean_rel=0.1492462009191513, max_rel=633.1659545898438, norm_rel=0.02368852123618126, ref_abs_avg=41.03233337402344, test_abs_avg=41.025482177734375
production_forward grad[29] vs paper_forward: mean_abs=0.7937278747558594, max_abs=3.8125, mean_rel=0.14108610153198242, max_rel=11.146352767944336, norm_rel=0.024193469434976578, ref_abs_avg=32.357364654541016, test_abs_avg=32.38095474243164
production_forward grad[30] vs paper_forward: mean_abs=0.9264739155769348, max_abs=6.375, mean_rel=0.17591041326522827, max_rel=2448.452880859375, norm_rel=0.024253012612462044, ref_abs_avg=38.36176681518555, test_abs_avg=38.36195755004883
production_forward grad[31] vs paper_forward: mean_abs=0.9139488935470581, max_abs=6.0, mean_rel=0.1582256257534027, max_rel=700.3590087890625, norm_rel=0.024318665266036987, ref_abs_avg=37.77812957763672, test_abs_avg=37.771690368652344
production_forward grad[32] vs paper_forward: mean_abs=0.7485275864601135, max_abs=3.0, mean_rel=0.08488219231367111, max_rel=5.4707255363464355, norm_rel=0.024805061519145966, ref_abs_avg=30.069679260253906, test_abs_avg=30.108144760131836
production_forward grad[33] vs paper_forward: mean_abs=0.8674022555351257, max_abs=5.5, mean_rel=0.1708582192659378, max_rel=1458.6435546875, norm_rel=0.024137036874890327, ref_abs_avg=36.07612609863281, test_abs_avg=36.07715606689453
production_forward grad[34] vs paper_forward: mean_abs=0.8571627140045166, max_abs=5.75, mean_rel=0.17803415656089783, max_rel=1504.92822265625, norm_rel=0.024106217548251152, ref_abs_avg=35.7056999206543, test_abs_avg=35.701881408691406
production_forward grad[35] vs paper_forward: mean_abs=0.7025827169418335, max_abs=2.375, mean_rel=0.29966795444488525, max_rel=47.28831100463867, norm_rel=0.02592550963163376, ref_abs_avg=26.286781311035156, test_abs_avg=26.26347541809082
production_forward grad[36] vs paper_forward: mean_abs=0.8146257400512695, max_abs=5.625, mean_rel=0.15966127812862396, max_rel=997.7322998046875, norm_rel=0.023942377418279648, ref_abs_avg=34.119449615478516, test_abs_avg=34.117431640625
production_forward grad[37] vs paper_forward: mean_abs=0.796928346157074, max_abs=5.5, mean_rel=0.17065423727035522, max_rel=1416.714599609375, norm_rel=0.02363811805844307, ref_abs_avg=33.77418899536133, test_abs_avg=33.7714958190918
production_forward grad[38] vs paper_forward: mean_abs=0.6568870544433594, max_abs=2.75, mean_rel=0.07991611212491989, max_rel=5.05257511138916, norm_rel=0.023800760507583618, ref_abs_avg=26.60007095336914, test_abs_avg=26.61526107788086
production_forward grad[39] vs paper_forward: mean_abs=0.7652134299278259, max_abs=4.796875, mean_rel=0.15174174308776855, max_rel=803.7266235351562, norm_rel=0.023685334250330925, ref_abs_avg=32.39186096191406, test_abs_avg=32.393131256103516
production_forward grad[40] vs paper_forward: mean_abs=0.7507883906364441, max_abs=5.375, mean_rel=0.15890084207057953, max_rel=1339.4981689453125, norm_rel=0.023464184254407883, ref_abs_avg=32.10663604736328, test_abs_avg=32.10847854614258
production_forward grad[41] vs paper_forward: mean_abs=0.5924656391143799, max_abs=3.0, mean_rel=0.1463361531496048, max_rel=28.085582733154297, norm_rel=0.023676758632063866, ref_abs_avg=25.47693634033203, test_abs_avg=25.49997901916504
production_forward grad[42] vs paper_forward: mean_abs=0.7334164381027222, max_abs=4.625, mean_rel=0.15534161031246185, max_rel=1098.6378173828125, norm_rel=0.02343853749334812, ref_abs_avg=31.362319946289062, test_abs_avg=31.365066528320312
production_forward grad[43] vs paper_forward: mean_abs=0.7171010971069336, max_abs=4.75, mean_rel=0.17033712565898895, max_rel=1224.1190185546875, norm_rel=0.023403583094477654, ref_abs_avg=30.697986602783203, test_abs_avg=30.699230194091797
production_forward grad[44] vs paper_forward: mean_abs=0.5533590316772461, max_abs=2.421875, mean_rel=0.062480948865413666, max_rel=2.1185426712036133, norm_rel=0.02267293632030487, ref_abs_avg=25.117338180541992, test_abs_avg=25.09511375427246
production_forward grad[45] vs paper_forward: mean_abs=0.6942039728164673, max_abs=4.5, mean_rel=0.1652502566576004, max_rel=1439.8621826171875, norm_rel=0.023190179839730263, ref_abs_avg=30.01164436340332, test_abs_avg=30.01306915283203
production_forward grad[46] vs paper_forward: mean_abs=0.6834663152694702, max_abs=4.0, mean_rel=0.15789246559143066, max_rel=1421.273193359375, norm_rel=0.023205216974020004, ref_abs_avg=29.510732650756836, test_abs_avg=29.516551971435547
production_forward grad[47] vs paper_forward: mean_abs=0.5389233827590942, max_abs=2.125, mean_rel=0.3497873544692993, max_rel=113.26331329345703, norm_rel=0.02379797212779522, ref_abs_avg=22.727840423583984, test_abs_avg=22.697860717773438
production_forward grad[48] vs paper_forward: mean_abs=0.666888415813446, max_abs=4.5, mean_rel=0.15416860580444336, max_rel=901.27587890625, norm_rel=0.022986609488725662, ref_abs_avg=29.0584659576416, test_abs_avg=29.057758331298828
production_forward grad[49] vs paper_forward: mean_abs=0.654656171798706, max_abs=4.25, mean_rel=0.1677280217409134, max_rel=1231.3477783203125, norm_rel=0.022817043587565422, ref_abs_avg=28.73639678955078, test_abs_avg=28.734628677368164
production_forward grad[50] vs paper_forward: mean_abs=0.620458722114563, max_abs=2.5, mean_rel=0.21874968707561493, max_rel=74.01880645751953, norm_rel=0.024068739265203476, ref_abs_avg=25.384845733642578, test_abs_avg=25.430259704589844
production_forward grad[51] vs paper_forward: mean_abs=0.7460213899612427, max_abs=5.9375, mean_rel=0.15355268120765686, max_rel=955.0732421875, norm_rel=0.02443033643066883, ref_abs_avg=30.609264373779297, test_abs_avg=30.61079978942871
production_forward grad[52] vs paper_forward: mean_abs=0.72284996509552, max_abs=4.4375, mean_rel=0.17777536809444427, max_rel=1355.881103515625, norm_rel=0.023938067257404327, ref_abs_avg=30.233123779296875, test_abs_avg=30.233924865722656
production_forward grad[53] vs paper_forward: mean_abs=0.553161084651947, max_abs=2.25, mean_rel=0.818259596824646, max_rel=370.95098876953125, norm_rel=0.023941434919834137, ref_abs_avg=23.633399963378906, test_abs_avg=23.62442398071289
production_forward grad[54] vs paper_forward: mean_abs=0.6830335855484009, max_abs=4.40625, mean_rel=0.16518135368824005, max_rel=1016.0628662109375, norm_rel=0.024108175188302994, ref_abs_avg=28.37314224243164, test_abs_avg=28.37274932861328
production_forward grad[55] vs paper_forward: mean_abs=0.6730419993400574, max_abs=4.25, mean_rel=0.16941958665847778, max_rel=1324.968017578125, norm_rel=0.023960571736097336, ref_abs_avg=28.155527114868164, test_abs_avg=28.162622451782227
production_forward grad[56] vs paper_forward: mean_abs=0.5306639671325684, max_abs=2.0, mean_rel=0.12110695242881775, max_rel=8.061676025390625, norm_rel=0.023929981514811516, ref_abs_avg=21.92255210876465, test_abs_avg=21.976150512695312
production_forward grad[57] vs paper_forward: mean_abs=0.6390827894210815, max_abs=4.5, mean_rel=0.15849977731704712, max_rel=847.62744140625, norm_rel=0.02338695526123047, ref_abs_avg=27.31757354736328, test_abs_avg=27.31913185119629
production_forward grad[58] vs paper_forward: mean_abs=0.6176570653915405, max_abs=3.875, mean_rel=0.14673364162445068, max_rel=1131.498046875, norm_rel=0.02304503135383129, ref_abs_avg=26.783945083618164, test_abs_avg=26.79255485534668
production_forward grad[59] vs paper_forward: mean_abs=0.4770684242248535, max_abs=1.9453125, mean_rel=0.08905036747455597, max_rel=5.940184116363525, norm_rel=0.022054126486182213, ref_abs_avg=21.800844192504883, test_abs_avg=21.781761169433594
production_forward grad[60] vs paper_forward: mean_abs=0.5953240394592285, max_abs=4.0, mean_rel=0.1707075983285904, max_rel=2143.62255859375, norm_rel=0.023055560886859894, ref_abs_avg=25.829853057861328, test_abs_avg=25.82929039001465
production_forward grad[61] vs paper_forward: mean_abs=0.5886530876159668, max_abs=4.25, mean_rel=0.15187019109725952, max_rel=897.214111328125, norm_rel=0.023077677935361862, ref_abs_avg=25.54395294189453, test_abs_avg=25.547466278076172
production_forward grad[62] vs paper_forward: mean_abs=0.45842456817626953, max_abs=2.046875, mean_rel=0.07345971465110779, max_rel=4.6468400955200195, norm_rel=0.022559592500329018, ref_abs_avg=20.350265502929688, test_abs_avg=20.322525024414062
production_forward grad[63] vs paper_forward: mean_abs=0.5621825456619263, max_abs=3.8125, mean_rel=0.15475153923034668, max_rel=866.17138671875, norm_rel=0.022628022357821465, ref_abs_avg=24.85335350036621, test_abs_avg=24.854461669921875
production_forward grad[64] vs paper_forward: mean_abs=0.5514581799507141, max_abs=3.5625, mean_rel=0.14843513071537018, max_rel=1100.719970703125, norm_rel=0.022466488182544708, ref_abs_avg=24.62261390686035, test_abs_avg=24.62493896484375
production_forward grad[65] vs paper_forward: mean_abs=0.4465789794921875, max_abs=2.0, mean_rel=0.08021681755781174, max_rel=2.8983492851257324, norm_rel=0.022701777517795563, ref_abs_avg=20.27994728088379, test_abs_avg=20.216594696044922
production_forward grad[66] vs paper_forward: mean_abs=0.5387387275695801, max_abs=4.0, mean_rel=0.15197551250457764, max_rel=994.7747192382812, norm_rel=0.02227027155458927, ref_abs_avg=24.156585693359375, test_abs_avg=24.158161163330078
production_forward grad[67] vs paper_forward: mean_abs=0.5278603434562683, max_abs=3.5, mean_rel=0.16080152988433838, max_rel=1059.2747802734375, norm_rel=0.022195233032107353, ref_abs_avg=23.84030532836914, test_abs_avg=23.846330642700195
production_forward grad[68] vs paper_forward: mean_abs=0.4301333427429199, max_abs=1.75, mean_rel=0.23147666454315186, max_rel=65.58658599853516, norm_rel=0.02233883924782276, ref_abs_avg=19.44955062866211, test_abs_avg=19.447351455688477
production_forward grad[69] vs paper_forward: mean_abs=0.5088045001029968, max_abs=3.4375, mean_rel=0.15074622631072998, max_rel=1248.3048095703125, norm_rel=0.02203284204006195, ref_abs_avg=23.102794647216797, test_abs_avg=23.104028701782227
production_forward grad[70] vs paper_forward: mean_abs=0.49683958292007446, max_abs=3.5, mean_rel=0.15279987454414368, max_rel=836.7081298828125, norm_rel=0.021771864965558052, ref_abs_avg=22.851688385009766, test_abs_avg=22.846019744873047
production_forward grad[71] vs paper_forward: mean_abs=0.40319347381591797, max_abs=1.75, mean_rel=0.11764957010746002, max_rel=23.09327507019043, norm_rel=0.021372444927692413, ref_abs_avg=18.727821350097656, test_abs_avg=18.72949981689453
production_forward grad[72] vs paper_forward: mean_abs=0.48870885372161865, max_abs=4.125, mean_rel=0.1342831254005432, max_rel=835.4989624023438, norm_rel=0.021509554237127304, ref_abs_avg=22.683677673339844, test_abs_avg=22.68389129638672
production_forward grad[73] vs paper_forward: mean_abs=0.4798646569252014, max_abs=3.5, mean_rel=0.1353294551372528, max_rel=856.3417358398438, norm_rel=0.02148359827697277, ref_abs_avg=22.34290313720703, test_abs_avg=22.34687042236328
production_forward grad[74] vs paper_forward: mean_abs=0.4443083703517914, max_abs=1.8125, mean_rel=0.11386940628290176, max_rel=15.062822341918945, norm_rel=0.022529782727360725, ref_abs_avg=19.703779220581055, test_abs_avg=19.712806701660156
production_forward grad[75] vs paper_forward: mean_abs=0.5424755811691284, max_abs=5.25, mean_rel=0.1598750650882721, max_rel=1128.9942626953125, norm_rel=0.022846486419439316, ref_abs_avg=23.744121551513672, test_abs_avg=23.746620178222656
production_forward grad[76] vs paper_forward: mean_abs=0.5254443883895874, max_abs=3.75, mean_rel=0.14614494144916534, max_rel=873.0006103515625, norm_rel=0.0228293277323246, ref_abs_avg=23.039875030517578, test_abs_avg=23.0413761138916
production_forward grad[77] vs paper_forward: mean_abs=0.39440083503723145, max_abs=1.5, mean_rel=0.09769115597009659, max_rel=6.899075984954834, norm_rel=0.02235451526939869, ref_abs_avg=18.33229637145996, test_abs_avg=18.320457458496094
production_forward grad[78] vs paper_forward: mean_abs=0.48917800188064575, max_abs=4.0, mean_rel=0.15330249071121216, max_rel=1021.2278442382812, norm_rel=0.022556280717253685, ref_abs_avg=21.739906311035156, test_abs_avg=21.74126434326172
production_forward grad[79] vs paper_forward: mean_abs=0.485803484916687, max_abs=3.5, mean_rel=0.1344424933195114, max_rel=522.4865112304688, norm_rel=0.022313367575407028, ref_abs_avg=21.755252838134766, test_abs_avg=21.75775718688965
production_forward grad[80] vs paper_forward: mean_abs=0.387115478515625, max_abs=1.5078125, mean_rel=0.12624797224998474, max_rel=5.091838359832764, norm_rel=0.02367250807583332, ref_abs_avg=16.34885597229004, test_abs_avg=16.36130142211914
production_forward grad[81] vs paper_forward: mean_abs=0.4585990607738495, max_abs=3.75, mean_rel=0.1430385708808899, max_rel=718.7898559570312, norm_rel=0.021698128432035446, ref_abs_avg=21.141489028930664, test_abs_avg=21.141515731811523
production_forward grad[82] vs paper_forward: mean_abs=0.44050168991088867, max_abs=3.75, mean_rel=0.13877469301223755, max_rel=471.5124206542969, norm_rel=0.02128288522362709, ref_abs_avg=20.76384162902832, test_abs_avg=20.759742736816406
production_forward grad[83] vs paper_forward: mean_abs=0.3548860549926758, max_abs=1.5537109375, mean_rel=0.10985688865184784, max_rel=10.803540229797363, norm_rel=0.022061852738261223, ref_abs_avg=16.36861801147461, test_abs_avg=16.408145904541016
production_forward grad[84] vs paper_forward: mean_abs=0.4292898178100586, max_abs=4.25, mean_rel=0.14196598529815674, max_rel=680.549560546875, norm_rel=0.021311981603503227, ref_abs_avg=20.19241714477539, test_abs_avg=20.195281982421875
production_forward grad[85] vs paper_forward: mean_abs=0.41479939222335815, max_abs=3.5, mean_rel=0.130060076713562, max_rel=690.6741943359375, norm_rel=0.020925942808389664, ref_abs_avg=19.90252685546875, test_abs_avg=19.90964126586914
production_forward grad[86] vs paper_forward: mean_abs=0.3578672409057617, max_abs=1.5625, mean_rel=0.08137752115726471, max_rel=3.91375994682312, norm_rel=0.021738750860095024, ref_abs_avg=16.6363582611084, test_abs_avg=16.63054656982422
production_forward grad[87] vs paper_forward: mean_abs=0.4060695469379425, max_abs=4.0, mean_rel=0.13017922639846802, max_rel=810.3233032226562, norm_rel=0.020850030705332756, ref_abs_avg=19.56847381591797, test_abs_avg=19.569530487060547
production_forward grad[88] vs paper_forward: mean_abs=0.40040385723114014, max_abs=3.21484375, mean_rel=0.1300278902053833, max_rel=422.82550048828125, norm_rel=0.020429547876119614, ref_abs_avg=19.67206573486328, test_abs_avg=19.683544158935547
production_forward grad[89] vs paper_forward: mean_abs=0.3070201873779297, max_abs=1.125, mean_rel=0.086423359811306, max_rel=9.659326553344727, norm_rel=0.02061990275979042, ref_abs_avg=15.14585018157959, test_abs_avg=15.153471946716309
production_forward grad[90] vs paper_forward: mean_abs=0.37787532806396484, max_abs=3.5, mean_rel=0.13324816524982452, max_rel=785.388916015625, norm_rel=0.020342329517006874, ref_abs_avg=18.69383430480957, test_abs_avg=18.693708419799805
production_forward grad[91] vs paper_forward: mean_abs=0.3703044056892395, max_abs=3.5, mean_rel=0.12825119495391846, max_rel=848.0089721679688, norm_rel=0.019898733124136925, ref_abs_avg=18.80712890625, test_abs_avg=18.812942504882812
production_forward grad[92] vs paper_forward: mean_abs=0.31717678904533386, max_abs=1.26953125, mean_rel=0.3025047779083252, max_rel=85.93496704101562, norm_rel=0.021244604140520096, ref_abs_avg=14.757104873657227, test_abs_avg=14.757218360900879
production_forward grad[93] vs paper_forward: mean_abs=0.3625940978527069, max_abs=5.0, mean_rel=0.12011249363422394, max_rel=753.5852661132812, norm_rel=0.019996602088212967, ref_abs_avg=18.33589744567871, test_abs_avg=18.3353271484375
production_forward grad[94] vs paper_forward: mean_abs=0.3550753891468048, max_abs=3.5, mean_rel=0.11823439598083496, max_rel=455.53363037109375, norm_rel=0.019887031987309456, ref_abs_avg=18.03278160095215, test_abs_avg=18.030120849609375
production_forward grad[95] vs paper_forward: mean_abs=0.28569531440734863, max_abs=1.375, mean_rel=0.07208991795778275, max_rel=2.511127233505249, norm_rel=0.020179979503154755, ref_abs_avg=14.213163375854492, test_abs_avg=14.194849967956543
production_forward grad[96] vs paper_forward: mean_abs=0.33255618810653687, max_abs=3.5, mean_rel=0.11960267275571823, max_rel=645.1587524414062, norm_rel=0.01904066652059555, ref_abs_avg=17.7431640625, test_abs_avg=17.744136810302734
production_forward grad[97] vs paper_forward: mean_abs=0.3213375210762024, max_abs=3.75, mean_rel=0.12339110672473907, max_rel=606.5164794921875, norm_rel=0.01871752366423607, ref_abs_avg=17.403287887573242, test_abs_avg=17.411176681518555
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016254186630249023, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.00861712172627449, max_abs=0.52734375, mean_rel=0.07545289397239685, max_rel=115.74313354492188, norm_rel=0.02065826579928398, ref_abs_avg=0.4497406482696533, test_abs_avg=0.44973519444465637
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.3215742111206055, max_abs=64.0, mean_rel=0.18617556989192963, max_rel=412.1655578613281, norm_rel=0.0204513818025589, ref_abs_avg=313.1722106933594, test_abs_avg=313.0526123046875
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.2160990238189697, max_abs=4.75, mean_rel=0.10919170081615448, max_rel=8.82568359375, norm_rel=0.022689983248710632, ref_abs_avg=53.334449768066406, test_abs_avg=53.29750061035156
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.5387134552001953, max_abs=10.0, mean_rel=0.16313430666923523, max_rel=1434.9617919921875, norm_rel=0.023838117718696594, ref_abs_avg=64.83358001708984, test_abs_avg=64.83486938476562
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5069127082824707, max_abs=10.0, mean_rel=0.15906649827957153, max_rel=1370.21826171875, norm_rel=0.023632314056158066, ref_abs_avg=64.0855712890625, test_abs_avg=64.09420776367188
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.101613998413086, max_abs=4.5, mean_rel=0.09604524075984955, max_rel=10.274237632751465, norm_rel=0.02349822223186493, ref_abs_avg=47.56481170654297, test_abs_avg=47.58331298828125
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.3422377109527588, max_abs=8.0, mean_rel=0.18038348853588104, max_rel=3166.142333984375, norm_rel=0.02350032702088356, ref_abs_avg=57.45222473144531, test_abs_avg=57.45206832885742
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3251104354858398, max_abs=8.375, mean_rel=0.16237041354179382, max_rel=2490.208251953125, norm_rel=0.023355606943368912, ref_abs_avg=57.01789855957031, test_abs_avg=57.01683807373047
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.9930624961853027, max_abs=4.25, mean_rel=0.1404116153717041, max_rel=15.296125411987305, norm_rel=0.022602491080760956, ref_abs_avg=46.17435073852539, test_abs_avg=46.15818786621094
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.2240180969238281, max_abs=8.0, mean_rel=0.1643093228340149, max_rel=2195.31103515625, norm_rel=0.023287363350391388, ref_abs_avg=52.798973083496094, test_abs_avg=52.80385971069336
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.1935582160949707, max_abs=7.5, mean_rel=0.14966699481010437, max_rel=1074.28173828125, norm_rel=0.0231173038482666, ref_abs_avg=51.854881286621094, test_abs_avg=51.8585205078125
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9422507286071777, max_abs=4.3125, mean_rel=0.10219266265630722, max_rel=14.047561645507812, norm_rel=0.024649091064929962, ref_abs_avg=39.548519134521484, test_abs_avg=39.625057220458984
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.128593921661377, max_abs=8.0, mean_rel=0.15979772806167603, max_rel=1449.6358642578125, norm_rel=0.023222539573907852, ref_abs_avg=48.850929260253906, test_abs_avg=48.84943771362305
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1028443574905396, max_abs=7.0, mean_rel=0.16704480350017548, max_rel=1640.1790771484375, norm_rel=0.023004932329058647, ref_abs_avg=48.106849670410156, test_abs_avg=48.10760498046875
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8857959508895874, max_abs=3.5, mean_rel=0.12587086856365204, max_rel=19.694997787475586, norm_rel=0.023690784350037575, ref_abs_avg=36.18433380126953, test_abs_avg=36.15636444091797
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.053455114364624, max_abs=6.25, mean_rel=0.16587063670158386, max_rel=2057.040283203125, norm_rel=0.02302742190659046, ref_abs_avg=45.97761535644531, test_abs_avg=45.97532653808594
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0266814231872559, max_abs=6.25, mean_rel=0.15716081857681274, max_rel=886.3878784179688, norm_rel=0.02274932898581028, ref_abs_avg=45.35612487792969, test_abs_avg=45.354286193847656
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8448224067687988, max_abs=3.125, mean_rel=0.061096057295799255, max_rel=1.1673851013183594, norm_rel=0.02334526926279068, ref_abs_avg=36.23204803466797, test_abs_avg=36.17156219482422
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=0.992379367351532, max_abs=6.0, mean_rel=0.15616506338119507, max_rel=1722.2386474609375, norm_rel=0.022824564948678017, ref_abs_avg=43.65232849121094, test_abs_avg=43.6539421081543
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9695852994918823, max_abs=6.5, mean_rel=0.16395707428455353, max_rel=1734.2913818359375, norm_rel=0.022441519424319267, ref_abs_avg=43.420387268066406, test_abs_avg=43.41674041748047
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7725892066955566, max_abs=2.625, mean_rel=0.1608654111623764, max_rel=24.118967056274414, norm_rel=0.021905886009335518, ref_abs_avg=35.78038024902344, test_abs_avg=35.708221435546875
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9376500844955444, max_abs=6.25, mean_rel=0.15124531090259552, max_rel=1332.5504150390625, norm_rel=0.022570427507162094, ref_abs_avg=41.705692291259766, test_abs_avg=41.70325469970703
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9151118397712708, max_abs=5.5, mean_rel=0.16210734844207764, max_rel=1774.1175537109375, norm_rel=0.022254357114434242, ref_abs_avg=41.38194274902344, test_abs_avg=41.38337326049805
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7427327632904053, max_abs=3.0, mean_rel=0.1724829077720642, max_rel=44.832454681396484, norm_rel=0.022679386660456657, ref_abs_avg=33.76665496826172, test_abs_avg=33.805763244628906
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.8902516961097717, max_abs=6.4453125, mean_rel=0.14700564742088318, max_rel=1549.180908203125, norm_rel=0.02253481186926365, ref_abs_avg=39.67695999145508, test_abs_avg=39.67601013183594
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8729164004325867, max_abs=5.5, mean_rel=0.17649266123771667, max_rel=2026.4483642578125, norm_rel=0.022248534485697746, ref_abs_avg=39.423641204833984, test_abs_avg=39.42434310913086
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.811558723449707, max_abs=3.46875, mean_rel=0.15148478746414185, max_rel=32.141841888427734, norm_rel=0.024560390040278435, ref_abs_avg=33.748287200927734, test_abs_avg=33.867576599121094
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.0162718296051025, max_abs=7.75, mean_rel=0.1608523279428482, max_rel=1689.69482421875, norm_rel=0.02457789145410061, ref_abs_avg=41.57036590576172, test_abs_avg=41.570701599121094
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.9930217266082764, max_abs=7.625, mean_rel=0.15695910155773163, max_rel=723.550048828125, norm_rel=0.024303030222654343, ref_abs_avg=41.03233337402344, test_abs_avg=41.024452209472656
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.8338108062744141, max_abs=3.5, mean_rel=0.16684745252132416, max_rel=15.95802116394043, norm_rel=0.025413082912564278, ref_abs_avg=32.357364654541016, test_abs_avg=32.38340759277344
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.947449803352356, max_abs=6.1875, mean_rel=0.18072310090065002, max_rel=2491.851318359375, norm_rel=0.02478930912911892, ref_abs_avg=38.36176681518555, test_abs_avg=38.359920501708984
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9350510835647583, max_abs=5.75, mean_rel=0.15680649876594543, max_rel=740.3263549804688, norm_rel=0.024864213541150093, ref_abs_avg=37.77812957763672, test_abs_avg=37.765174865722656
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7788915634155273, max_abs=3.0, mean_rel=0.10683118551969528, max_rel=6.200155735015869, norm_rel=0.025678418576717377, ref_abs_avg=30.069679260253906, test_abs_avg=30.110185623168945
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.8873430490493774, max_abs=5.9375, mean_rel=0.1787015050649643, max_rel=1846.9534912109375, norm_rel=0.0246738214045763, ref_abs_avg=36.07612609863281, test_abs_avg=36.0756950378418
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8786722421646118, max_abs=5.375, mean_rel=0.17891302704811096, max_rel=1418.212646484375, norm_rel=0.024690696969628334, ref_abs_avg=35.7056999206543, test_abs_avg=35.699398040771484
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.7324315309524536, max_abs=2.4375, mean_rel=0.3741738200187683, max_rel=77.40425872802734, norm_rel=0.02727077342569828, ref_abs_avg=26.286781311035156, test_abs_avg=26.232364654541016
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8315093517303467, max_abs=5.0625, mean_rel=0.16551247239112854, max_rel=1445.2718505859375, norm_rel=0.024439172819256783, ref_abs_avg=34.119449615478516, test_abs_avg=34.11504364013672
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8137463927268982, max_abs=5.0, mean_rel=0.17303317785263062, max_rel=2062.842041015625, norm_rel=0.024134211242198944, ref_abs_avg=33.77418899536133, test_abs_avg=33.770263671875
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6299600601196289, max_abs=2.5, mean_rel=0.07638834416866302, max_rel=3.582735300064087, norm_rel=0.023544326424598694, ref_abs_avg=26.60007095336914, test_abs_avg=26.636478424072266
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.77970951795578, max_abs=5.0, mean_rel=0.1545395851135254, max_rel=939.2390747070312, norm_rel=0.024122819304466248, ref_abs_avg=32.39186096191406, test_abs_avg=32.39226531982422
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7653947472572327, max_abs=5.125, mean_rel=0.1596309244632721, max_rel=1123.90234375, norm_rel=0.02390621043741703, ref_abs_avg=32.10663604736328, test_abs_avg=32.107337951660156
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6142280101776123, max_abs=3.3828125, mean_rel=0.18932312726974487, max_rel=53.21247100830078, norm_rel=0.024816956371068954, ref_abs_avg=25.47693634033203, test_abs_avg=25.503068923950195
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7453848123550415, max_abs=5.0, mean_rel=0.15969887375831604, max_rel=868.78564453125, norm_rel=0.02382422238588333, ref_abs_avg=31.362319946289062, test_abs_avg=31.363567352294922
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7296327352523804, max_abs=4.625, mean_rel=0.17529651522636414, max_rel=1404.2890625, norm_rel=0.023781007155776024, ref_abs_avg=30.697986602783203, test_abs_avg=30.700054168701172
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5604348182678223, max_abs=2.421875, mean_rel=0.06758812814950943, max_rel=2.998276472091675, norm_rel=0.022568199783563614, ref_abs_avg=25.117338180541992, test_abs_avg=25.076168060302734
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7060174942016602, max_abs=4.375, mean_rel=0.16836395859718323, max_rel=1651.38720703125, norm_rel=0.023594101890921593, ref_abs_avg=30.01164436340332, test_abs_avg=30.012439727783203
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.6924839019775391, max_abs=4.375, mean_rel=0.15535955131053925, max_rel=857.8515625, norm_rel=0.02351963147521019, ref_abs_avg=29.510732650756836, test_abs_avg=29.514741897583008
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5623433589935303, max_abs=2.5, mean_rel=0.265577495098114, max_rel=75.92195892333984, norm_rel=0.0245183277875185, ref_abs_avg=22.727840423583984, test_abs_avg=22.714866638183594
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.6768237352371216, max_abs=5.0, mean_rel=0.15642333030700684, max_rel=744.4818725585938, norm_rel=0.023316772654652596, ref_abs_avg=29.0584659576416, test_abs_avg=29.05640411376953
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6644470691680908, max_abs=4.25, mean_rel=0.1739288568496704, max_rel=1220.435546875, norm_rel=0.02317504957318306, ref_abs_avg=28.73639678955078, test_abs_avg=28.732784271240234
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6091181039810181, max_abs=2.25, mean_rel=0.23265109956264496, max_rel=85.66499328613281, norm_rel=0.023673024028539658, ref_abs_avg=25.384845733642578, test_abs_avg=25.444236755371094
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.7596718072891235, max_abs=5.5, mean_rel=0.1551494002342224, max_rel=1173.0806884765625, norm_rel=0.02487366646528244, ref_abs_avg=30.609264373779297, test_abs_avg=30.608936309814453
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7375298142433167, max_abs=4.5, mean_rel=0.18068432807922363, max_rel=1852.5384521484375, norm_rel=0.024413926526904106, ref_abs_avg=30.233123779296875, test_abs_avg=30.231246948242188
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5790760517120361, max_abs=2.375, mean_rel=1.3081083297729492, max_rel=625.8031005859375, norm_rel=0.024735966697335243, ref_abs_avg=23.633399963378906, test_abs_avg=23.619138717651367
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.694804847240448, max_abs=4.375, mean_rel=0.1607818454504013, max_rel=767.4454956054688, norm_rel=0.024512115865945816, ref_abs_avg=28.37314224243164, test_abs_avg=28.37258529663086
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6856439709663391, max_abs=4.6875, mean_rel=0.16857005655765533, max_rel=797.7521362304688, norm_rel=0.02439776621758938, ref_abs_avg=28.155527114868164, test_abs_avg=28.161359786987305
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5468027591705322, max_abs=2.0, mean_rel=0.11206234991550446, max_rel=8.363517761230469, norm_rel=0.024906117469072342, ref_abs_avg=21.92255210876465, test_abs_avg=21.963960647583008
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6496149897575378, max_abs=4.6875, mean_rel=0.1598149538040161, max_rel=890.4597778320312, norm_rel=0.02376239001750946, ref_abs_avg=27.31757354736328, test_abs_avg=27.318260192871094
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6322624087333679, max_abs=4.28125, mean_rel=0.15339013934135437, max_rel=879.7002563476562, norm_rel=0.023598523810505867, ref_abs_avg=26.783945083618164, test_abs_avg=26.79498291015625
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.48569154739379883, max_abs=1.8828125, mean_rel=0.09830496460199356, max_rel=7.870990753173828, norm_rel=0.02273755893111229, ref_abs_avg=21.800844192504883, test_abs_avg=21.801610946655273
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6055975556373596, max_abs=3.5625, mean_rel=0.17420794069766998, max_rel=1808.236328125, norm_rel=0.023423273116350174, ref_abs_avg=25.829853057861328, test_abs_avg=25.829084396362305
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.5978000164031982, max_abs=4.5, mean_rel=0.1524735391139984, max_rel=1409.8187255859375, norm_rel=0.02342868410050869, ref_abs_avg=25.54395294189453, test_abs_avg=25.54651641845703
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4732246398925781, max_abs=1.953125, mean_rel=0.07740817219018936, max_rel=5.144715785980225, norm_rel=0.023170916363596916, ref_abs_avg=20.350265502929688, test_abs_avg=20.333873748779297
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.5705963969230652, max_abs=4.0625, mean_rel=0.1551119089126587, max_rel=837.96923828125, norm_rel=0.02295781672000885, ref_abs_avg=24.85335350036621, test_abs_avg=24.85501480102539
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5603050589561462, max_abs=3.5, mean_rel=0.1514711230993271, max_rel=885.7539672851562, norm_rel=0.02280651219189167, ref_abs_avg=24.62261390686035, test_abs_avg=24.626073837280273
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4598541259765625, max_abs=2.0, mean_rel=0.07671452313661575, max_rel=1.5151867866516113, norm_rel=0.0232719536870718, ref_abs_avg=20.27994728088379, test_abs_avg=20.206796646118164
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5465967655181885, max_abs=4.125, mean_rel=0.15251551568508148, max_rel=1461.5439453125, norm_rel=0.0225811880081892, ref_abs_avg=24.156585693359375, test_abs_avg=24.15819549560547
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5348694324493408, max_abs=3.75, mean_rel=0.16455665230751038, max_rel=1405.029296875, norm_rel=0.022482527419924736, ref_abs_avg=23.84030532836914, test_abs_avg=23.843849182128906
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.4283757209777832, max_abs=1.875, mean_rel=0.18321700394153595, max_rel=45.74617004394531, norm_rel=0.02243034541606903, ref_abs_avg=19.44955062866211, test_abs_avg=19.45028305053711
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5136677026748657, max_abs=3.625, mean_rel=0.15093956887722015, max_rel=1122.23291015625, norm_rel=0.02223832719027996, ref_abs_avg=23.102794647216797, test_abs_avg=23.103551864624023
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5007827281951904, max_abs=3.75, mean_rel=0.15118767321109772, max_rel=711.4859008789062, norm_rel=0.021895816549658775, ref_abs_avg=22.851688385009766, test_abs_avg=22.84529685974121
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.40253591537475586, max_abs=1.75, mean_rel=0.0800231397151947, max_rel=4.911749362945557, norm_rel=0.02158893086016178, ref_abs_avg=18.727821350097656, test_abs_avg=18.72574806213379
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.49320489168167114, max_abs=3.5, mean_rel=0.13863474130630493, max_rel=1202.771484375, norm_rel=0.021707039326429367, ref_abs_avg=22.683677673339844, test_abs_avg=22.683584213256836
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.4824814796447754, max_abs=3.7890625, mean_rel=0.13559886813163757, max_rel=888.5284423828125, norm_rel=0.021576328203082085, ref_abs_avg=22.34290313720703, test_abs_avg=22.34612274169922
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4587594270706177, max_abs=1.75, mean_rel=0.10947136580944061, max_rel=13.44469928741455, norm_rel=0.02317601628601551, ref_abs_avg=19.703779220581055, test_abs_avg=19.728666305541992
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5492323040962219, max_abs=4.75, mean_rel=0.16018015146255493, max_rel=1205.7794189453125, norm_rel=0.02312362566590309, ref_abs_avg=23.744121551513672, test_abs_avg=23.746477127075195
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5290504693984985, max_abs=3.75, mean_rel=0.14747489988803864, max_rel=946.9120483398438, norm_rel=0.022960664704442024, ref_abs_avg=23.039875030517578, test_abs_avg=23.042896270751953
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.4010477066040039, max_abs=1.5625, mean_rel=0.10585350543260574, max_rel=7.180671215057373, norm_rel=0.022095296531915665, ref_abs_avg=18.33229637145996, test_abs_avg=18.32366943359375
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.49433833360671997, max_abs=4.375, mean_rel=0.1529732048511505, max_rel=816.878173828125, norm_rel=0.022780438885092735, ref_abs_avg=21.739906311035156, test_abs_avg=21.741592407226562
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.4925452470779419, max_abs=3.625, mean_rel=0.1370478868484497, max_rel=575.6647338867188, norm_rel=0.022623300552368164, ref_abs_avg=21.755252838134766, test_abs_avg=21.757055282592773
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.3828272819519043, max_abs=1.50390625, mean_rel=0.12183865904808044, max_rel=6.166797161102295, norm_rel=0.023443404585123062, ref_abs_avg=16.34885597229004, test_abs_avg=16.35247039794922
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.46271663904190063, max_abs=3.375, mean_rel=0.14276698231697083, max_rel=817.5732421875, norm_rel=0.02188836969435215, ref_abs_avg=21.141489028930664, test_abs_avg=21.141674041748047
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4448193311691284, max_abs=4.5, mean_rel=0.13760431110858917, max_rel=412.03277587890625, norm_rel=0.02147870697081089, ref_abs_avg=20.76384162902832, test_abs_avg=20.760374069213867
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3710932731628418, max_abs=1.5, mean_rel=0.10632368922233582, max_rel=6.184687614440918, norm_rel=0.022678688168525696, ref_abs_avg=16.36861801147461, test_abs_avg=16.414125442504883
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.4329274892807007, max_abs=4.25, mean_rel=0.1435135304927826, max_rel=892.7442016601562, norm_rel=0.021474577486515045, ref_abs_avg=20.19241714477539, test_abs_avg=20.19467544555664
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.41816067695617676, max_abs=3.5, mean_rel=0.13149972259998322, max_rel=781.5864868164062, norm_rel=0.021086877211928368, ref_abs_avg=19.90252685546875, test_abs_avg=19.910303115844727
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.35463619232177734, max_abs=1.53125, mean_rel=0.07320936024188995, max_rel=3.9309256076812744, norm_rel=0.021337280049920082, ref_abs_avg=16.6363582611084, test_abs_avg=16.625015258789062
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4087461829185486, max_abs=4.4150390625, mean_rel=0.13073155283927917, max_rel=840.8822631835938, norm_rel=0.02098916843533516, ref_abs_avg=19.56847381591797, test_abs_avg=19.569440841674805
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.40297555923461914, max_abs=3.0, mean_rel=0.13814778625965118, max_rel=499.7254638671875, norm_rel=0.020548703148961067, ref_abs_avg=19.67206573486328, test_abs_avg=19.681095123291016
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.31523799896240234, max_abs=1.1875, mean_rel=0.08909472823143005, max_rel=11.996260643005371, norm_rel=0.02111188881099224, ref_abs_avg=15.14585018157959, test_abs_avg=15.164239883422852
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.37977856397628784, max_abs=3.5, mean_rel=0.13361456990242004, max_rel=674.888671875, norm_rel=0.02043572999536991, ref_abs_avg=18.69383430480957, test_abs_avg=18.694591522216797
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.3747660219669342, max_abs=4.0, mean_rel=0.12869027256965637, max_rel=861.9697265625, norm_rel=0.020100411027669907, ref_abs_avg=18.80712890625, test_abs_avg=18.81881332397461
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.3198183476924896, max_abs=1.25, mean_rel=0.17331962287425995, max_rel=25.24638557434082, norm_rel=0.021674232557415962, ref_abs_avg=14.757104873657227, test_abs_avg=14.759010314941406
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.3636814057826996, max_abs=6.0, mean_rel=0.12016014754772186, max_rel=753.5852661132812, norm_rel=0.02005423791706562, ref_abs_avg=18.33589744567871, test_abs_avg=18.335289001464844
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.35542744398117065, max_abs=3.0, mean_rel=0.11697481572628021, max_rel=407.8710021972656, norm_rel=0.01990913785994053, ref_abs_avg=18.03278160095215, test_abs_avg=18.031946182250977
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.2785447835922241, max_abs=1.375, mean_rel=0.09098543971776962, max_rel=8.235926628112793, norm_rel=0.019974466413259506, ref_abs_avg=14.213163375854492, test_abs_avg=14.197432518005371
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3329204320907593, max_abs=3.5, mean_rel=0.11952322721481323, max_rel=431.2311096191406, norm_rel=0.019070303067564964, ref_abs_avg=17.7431640625, test_abs_avg=17.744426727294922
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.32348161935806274, max_abs=3.5, mean_rel=0.1243552416563034, max_rel=722.1541748046875, norm_rel=0.018875284120440483, ref_abs_avg=17.403287887573242, test_abs_avg=17.40915298461914
identity layers + randn queries
torch_compile_phases_forward fwd+bwd:  94.945 ms
torch_compile_phases_forward bwd-only: 76.589 ms
torch_compile_phases_forward peak allocated: fwd=18.203 GiB, fwd+bwd=18.831 GiB
torch_compile_phases_forward peak reserved:  fwd=27.398 GiB, fwd+bwd=27.398 GiB
paper_forward fwd+bwd:  221.200 ms
paper_forward bwd-only: 174.019 ms
paper_forward peak allocated: fwd=35.128 GiB, fwd+bwd=37.247 GiB
paper_forward peak reserved:  fwd=36.168 GiB, fwd+bwd=38.668 GiB
production_forward fwd+bwd:  66.279 ms
production_forward bwd-only: 56.486 ms
production_forward peak allocated: fwd=7.614 GiB, fwd+bwd=15.618 GiB
production_forward peak reserved:  fwd=27.375 GiB, fwd+bwd=27.375 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.001633772044442594, max_abs=0.046875
production_forward grad[0] vs paper_forward: mean_abs=0.008222769945859909, max_abs=0.4375, mean_rel=0.07142387330532074, max_rel=120.70884704589844, norm_rel=0.019465001299977303, ref_abs_avg=0.456993043422699, test_abs_avg=0.4570170044898987
production_forward grad[1] vs paper_forward: mean_abs=7.133438587188721, max_abs=56.0, mean_rel=0.1635115146636963, max_rel=562.2504272460938, norm_rel=0.020057205110788345, ref_abs_avg=314.8877258300781, test_abs_avg=314.8201599121094
production_forward grad[2] vs paper_forward: mean_abs=1.2286186218261719, max_abs=6.0, mean_rel=0.06804481148719788, max_rel=3.1454150676727295, norm_rel=0.02153630368411541, ref_abs_avg=59.75926208496094, test_abs_avg=59.73244094848633
production_forward grad[3] vs paper_forward: mean_abs=1.5118545293807983, max_abs=11.0, mean_rel=0.16143110394477844, max_rel=1787.73876953125, norm_rel=0.02271183766424656, ref_abs_avg=66.94349670410156, test_abs_avg=66.94827270507812
production_forward grad[4] vs paper_forward: mean_abs=1.483333945274353, max_abs=10.0, mean_rel=0.16418397426605225, max_rel=2158.28466796875, norm_rel=0.02255348674952984, ref_abs_avg=66.17384338378906, test_abs_avg=66.17110443115234
production_forward grad[5] vs paper_forward: mean_abs=1.0991973876953125, max_abs=4.5, mean_rel=0.09959450364112854, max_rel=6.530983924865723, norm_rel=0.02387760765850544, ref_abs_avg=45.804710388183594, test_abs_avg=45.756446838378906
production_forward grad[6] vs paper_forward: mean_abs=1.300674557685852, max_abs=8.0, mean_rel=0.16049723327159882, max_rel=1912.2939453125, norm_rel=0.02245336025953293, ref_abs_avg=58.21352005004883, test_abs_avg=58.21709060668945
production_forward grad[7] vs paper_forward: mean_abs=1.2623660564422607, max_abs=7.0, mean_rel=0.16284707188606262, max_rel=1157.04833984375, norm_rel=0.021995754912495613, ref_abs_avg=57.71700668334961, test_abs_avg=57.72867965698242
production_forward grad[8] vs paper_forward: mean_abs=1.0050716400146484, max_abs=3.5, mean_rel=0.13043542206287384, max_rel=10.023561477661133, norm_rel=0.02299700863659382, ref_abs_avg=41.866615295410156, test_abs_avg=41.843109130859375
production_forward grad[9] vs paper_forward: mean_abs=1.1668992042541504, max_abs=7.5, mean_rel=0.1608615517616272, max_rel=1653.0311279296875, norm_rel=0.022264985367655754, ref_abs_avg=52.712955474853516, test_abs_avg=52.71723937988281
production_forward grad[10] vs paper_forward: mean_abs=1.1304833889007568, max_abs=7.0, mean_rel=0.15902099013328552, max_rel=1065.8330078125, norm_rel=0.022045454010367393, ref_abs_avg=51.598976135253906, test_abs_avg=51.597869873046875
production_forward grad[11] vs paper_forward: mean_abs=0.9066982269287109, max_abs=3.5, mean_rel=0.10127988457679749, max_rel=7.005436897277832, norm_rel=0.021427391096949577, ref_abs_avg=41.914207458496094, test_abs_avg=41.96288299560547
production_forward grad[12] vs paper_forward: mean_abs=1.075666904449463, max_abs=7.0, mean_rel=0.1529828906059265, max_rel=1253.2823486328125, norm_rel=0.021993888542056084, ref_abs_avg=49.134490966796875, test_abs_avg=49.13432312011719
production_forward grad[13] vs paper_forward: mean_abs=1.0556671619415283, max_abs=6.25, mean_rel=0.14252081513404846, max_rel=741.5474243164062, norm_rel=0.021738534793257713, ref_abs_avg=48.86534118652344, test_abs_avg=48.86371612548828
production_forward grad[14] vs paper_forward: mean_abs=0.8467502593994141, max_abs=3.5, mean_rel=0.10784909129142761, max_rel=10.098093032836914, norm_rel=0.023084936663508415, ref_abs_avg=36.836585998535156, test_abs_avg=36.861846923828125
production_forward grad[15] vs paper_forward: mean_abs=1.0085711479187012, max_abs=6.25, mean_rel=0.14465251564979553, max_rel=1490.2796630859375, norm_rel=0.021952439099550247, ref_abs_avg=46.164344787597656, test_abs_avg=46.16783142089844
production_forward grad[16] vs paper_forward: mean_abs=0.9810796976089478, max_abs=7.0, mean_rel=0.1648152470588684, max_rel=1818.02197265625, norm_rel=0.021633297204971313, ref_abs_avg=45.57128143310547, test_abs_avg=45.5804443359375
production_forward grad[17] vs paper_forward: mean_abs=0.7938833236694336, max_abs=3.25, mean_rel=0.14659224450588226, max_rel=26.737674713134766, norm_rel=0.02193399704992771, ref_abs_avg=35.86116409301758, test_abs_avg=35.8238525390625
production_forward grad[18] vs paper_forward: mean_abs=0.9512010216712952, max_abs=6.25, mean_rel=0.15522798895835876, max_rel=1969.501220703125, norm_rel=0.021724926307797432, ref_abs_avg=44.02367401123047, test_abs_avg=44.02880859375
production_forward grad[19] vs paper_forward: mean_abs=0.9237250089645386, max_abs=5.5, mean_rel=0.1457950919866562, max_rel=998.9542236328125, norm_rel=0.021462585777044296, ref_abs_avg=43.24833679199219, test_abs_avg=43.250450134277344
production_forward grad[20] vs paper_forward: mean_abs=0.7532474994659424, max_abs=2.7265625, mean_rel=0.19263504445552826, max_rel=32.052433013916016, norm_rel=0.021577022969722748, ref_abs_avg=33.66946792602539, test_abs_avg=33.61341857910156
production_forward grad[21] vs paper_forward: mean_abs=0.8963236808776855, max_abs=5.5, mean_rel=0.1499832570552826, max_rel=1076.402099609375, norm_rel=0.02167157083749771, ref_abs_avg=41.56105041503906, test_abs_avg=41.561439514160156
production_forward grad[22] vs paper_forward: mean_abs=0.8720498085021973, max_abs=6.0, mean_rel=0.16724412143230438, max_rel=1495.946044921875, norm_rel=0.021191025152802467, ref_abs_avg=41.28948974609375, test_abs_avg=41.28727340698242
production_forward grad[23] vs paper_forward: mean_abs=0.7135648727416992, max_abs=2.9140625, mean_rel=0.1323336511850357, max_rel=10.789508819580078, norm_rel=0.02180693857371807, ref_abs_avg=32.219512939453125, test_abs_avg=32.215003967285156
production_forward grad[24] vs paper_forward: mean_abs=0.8585727214813232, max_abs=5.0, mean_rel=0.14589190483093262, max_rel=1385.8546142578125, norm_rel=0.02163531444966793, ref_abs_avg=39.834716796875, test_abs_avg=39.83549118041992
production_forward grad[25] vs paper_forward: mean_abs=0.8359777331352234, max_abs=5.0, mean_rel=0.13972291350364685, max_rel=883.4498291015625, norm_rel=0.02135406993329525, ref_abs_avg=39.30659484863281, test_abs_avg=39.31063461303711
production_forward grad[26] vs paper_forward: mean_abs=0.8127443790435791, max_abs=3.625, mean_rel=0.1651381254196167, max_rel=31.746578216552734, norm_rel=0.023065196350216866, ref_abs_avg=35.32591247558594, test_abs_avg=35.3295783996582
production_forward grad[27] vs paper_forward: mean_abs=0.9854165315628052, max_abs=7.0, mean_rel=0.16073603928089142, max_rel=1355.724609375, norm_rel=0.023187432438135147, ref_abs_avg=42.684913635253906, test_abs_avg=42.68674850463867
production_forward grad[28] vs paper_forward: mean_abs=0.960771381855011, max_abs=5.75, mean_rel=0.1488294154405594, max_rel=552.03662109375, norm_rel=0.023015892133116722, ref_abs_avg=41.88469314575195, test_abs_avg=41.89161682128906
production_forward grad[29] vs paper_forward: mean_abs=0.7965731620788574, max_abs=4.0, mean_rel=0.18841248750686646, max_rel=39.67640686035156, norm_rel=0.02574343979358673, ref_abs_avg=31.035839080810547, test_abs_avg=31.0889949798584
production_forward grad[30] vs paper_forward: mean_abs=0.9153101444244385, max_abs=5.5, mean_rel=0.15780392289161682, max_rel=1181.92041015625, norm_rel=0.02357437275350094, ref_abs_avg=38.982383728027344, test_abs_avg=38.98711013793945
production_forward grad[31] vs paper_forward: mean_abs=0.9039275050163269, max_abs=6.0, mean_rel=0.15475665032863617, max_rel=682.1713256835938, norm_rel=0.023476475849747658, ref_abs_avg=38.69256591796875, test_abs_avg=38.693477630615234
production_forward grad[32] vs paper_forward: mean_abs=0.7147185802459717, max_abs=3.25, mean_rel=0.17299923300743103, max_rel=40.64278030395508, norm_rel=0.023124905303120613, ref_abs_avg=31.320598602294922, test_abs_avg=31.381851196289062
production_forward grad[33] vs paper_forward: mean_abs=0.8614431619644165, max_abs=5.6953125, mean_rel=0.16187497973442078, max_rel=1415.15234375, norm_rel=0.023488681763410568, ref_abs_avg=36.759422302246094, test_abs_avg=36.760780334472656
production_forward grad[34] vs paper_forward: mean_abs=0.8438254594802856, max_abs=5.5, mean_rel=0.15167900919914246, max_rel=1348.375, norm_rel=0.0232139453291893, ref_abs_avg=36.404022216796875, test_abs_avg=36.4014778137207
production_forward grad[35] vs paper_forward: mean_abs=0.6647700071334839, max_abs=2.875, mean_rel=0.15077471733093262, max_rel=20.868711471557617, norm_rel=0.023832283914089203, ref_abs_avg=28.551488876342773, test_abs_avg=28.542036056518555
production_forward grad[36] vs paper_forward: mean_abs=0.800329327583313, max_abs=5.125, mean_rel=0.15864276885986328, max_rel=958.73388671875, norm_rel=0.023313911631703377, ref_abs_avg=34.43366622924805, test_abs_avg=34.43401336669922
production_forward grad[37] vs paper_forward: mean_abs=0.7881771922111511, max_abs=5.0, mean_rel=0.16466307640075684, max_rel=1175.770263671875, norm_rel=0.022920239716768265, ref_abs_avg=34.46232986450195, test_abs_avg=34.46515655517578
production_forward grad[38] vs paper_forward: mean_abs=0.6436090469360352, max_abs=2.375, mean_rel=0.08515313267707825, max_rel=5.4715142250061035, norm_rel=0.024044595658779144, ref_abs_avg=27.284547805786133, test_abs_avg=27.317092895507812
production_forward grad[39] vs paper_forward: mean_abs=0.7601000070571899, max_abs=5.0, mean_rel=0.155351921916008, max_rel=1670.90380859375, norm_rel=0.023026878014206886, ref_abs_avg=33.09608840942383, test_abs_avg=33.09889221191406
production_forward grad[40] vs paper_forward: mean_abs=0.7488322257995605, max_abs=4.875, mean_rel=0.15099599957466125, max_rel=773.6817626953125, norm_rel=0.022822534665465355, ref_abs_avg=32.85040283203125, test_abs_avg=32.85169219970703
production_forward grad[41] vs paper_forward: mean_abs=0.5977036952972412, max_abs=2.5, mean_rel=0.17247247695922852, max_rel=23.149206161499023, norm_rel=0.024356771260499954, ref_abs_avg=24.798213958740234, test_abs_avg=24.74692153930664
production_forward grad[42] vs paper_forward: mean_abs=0.7265427708625793, max_abs=5.0, mean_rel=0.15504533052444458, max_rel=1190.0985107421875, norm_rel=0.022819725796580315, ref_abs_avg=31.890911102294922, test_abs_avg=31.894393920898438
production_forward grad[43] vs paper_forward: mean_abs=0.7156637907028198, max_abs=4.875, mean_rel=0.1608196496963501, max_rel=863.654296875, norm_rel=0.022809093818068504, ref_abs_avg=31.42582893371582, test_abs_avg=31.43410873413086
production_forward grad[44] vs paper_forward: mean_abs=0.5639996528625488, max_abs=2.625, mean_rel=0.44260522723197937, max_rel=174.3809814453125, norm_rel=0.022270234301686287, ref_abs_avg=25.725791931152344, test_abs_avg=25.711040496826172
production_forward grad[45] vs paper_forward: mean_abs=0.6877346038818359, max_abs=4.5, mean_rel=0.14452704787254333, max_rel=871.2398681640625, norm_rel=0.02247489243745804, ref_abs_avg=30.64044189453125, test_abs_avg=30.640731811523438
production_forward grad[46] vs paper_forward: mean_abs=0.6799774765968323, max_abs=5.0, mean_rel=0.172819584608078, max_rel=1770.919921875, norm_rel=0.022502820938825607, ref_abs_avg=30.235321044921875, test_abs_avg=30.238096237182617
production_forward grad[47] vs paper_forward: mean_abs=0.5171127319335938, max_abs=2.125, mean_rel=0.18124771118164062, max_rel=32.591217041015625, norm_rel=0.021788477897644043, ref_abs_avg=24.944643020629883, test_abs_avg=24.95264434814453
production_forward grad[48] vs paper_forward: mean_abs=0.6582874059677124, max_abs=4.5, mean_rel=0.15438058972358704, max_rel=1036.32958984375, norm_rel=0.022380350157618523, ref_abs_avg=29.51881980895996, test_abs_avg=29.52052116394043
production_forward grad[49] vs paper_forward: mean_abs=0.6496061086654663, max_abs=4.25, mean_rel=0.14014169573783875, max_rel=636.0601806640625, norm_rel=0.022219326347112656, ref_abs_avg=29.2836971282959, test_abs_avg=29.286693572998047
production_forward grad[50] vs paper_forward: mean_abs=0.6268033981323242, max_abs=2.25, mean_rel=0.11076779663562775, max_rel=9.423103332519531, norm_rel=0.02265867032110691, ref_abs_avg=27.64927101135254, test_abs_avg=27.67476463317871
production_forward grad[51] vs paper_forward: mean_abs=0.7434896230697632, max_abs=4.828125, mean_rel=0.1667633056640625, max_rel=1411.4927978515625, norm_rel=0.023714348673820496, ref_abs_avg=31.420299530029297, test_abs_avg=31.4217472076416
production_forward grad[52] vs paper_forward: mean_abs=0.7382065057754517, max_abs=5.0, mean_rel=0.15094265341758728, max_rel=443.1817932128906, norm_rel=0.024036336690187454, ref_abs_avg=30.76669692993164, test_abs_avg=30.771888732910156
production_forward grad[53] vs paper_forward: mean_abs=0.5494399070739746, max_abs=2.625, mean_rel=0.08718769252300262, max_rel=2.7291581630706787, norm_rel=0.023413021117448807, ref_abs_avg=23.56911849975586, test_abs_avg=23.54537582397461
production_forward grad[54] vs paper_forward: mean_abs=0.6742558479309082, max_abs=5.0, mean_rel=0.16693058609962463, max_rel=934.6318359375, norm_rel=0.02361181192100048, ref_abs_avg=28.576711654663086, test_abs_avg=28.581111907958984
production_forward grad[55] vs paper_forward: mean_abs=0.6671454906463623, max_abs=4.5, mean_rel=0.16054578125476837, max_rel=784.5194091796875, norm_rel=0.023264121264219284, ref_abs_avg=28.705341339111328, test_abs_avg=28.70557403564453
production_forward grad[56] vs paper_forward: mean_abs=0.5353915691375732, max_abs=2.0, mean_rel=0.09346319735050201, max_rel=6.284079551696777, norm_rel=0.022950518876314163, ref_abs_avg=23.17650032043457, test_abs_avg=23.207786560058594
production_forward grad[57] vs paper_forward: mean_abs=0.6330678462982178, max_abs=4.5, mean_rel=0.15308497846126556, max_rel=1414.5902099609375, norm_rel=0.02290353551506996, ref_abs_avg=27.60393524169922, test_abs_avg=27.606447219848633
production_forward grad[58] vs paper_forward: mean_abs=0.6213299036026001, max_abs=5.0, mean_rel=0.15110404789447784, max_rel=861.9208984375, norm_rel=0.023102760314941406, ref_abs_avg=26.958324432373047, test_abs_avg=26.969139099121094
production_forward grad[59] vs paper_forward: mean_abs=0.4789128303527832, max_abs=1.875, mean_rel=0.10012682527303696, max_rel=9.084064483642578, norm_rel=0.021473903208971024, ref_abs_avg=22.451208114624023, test_abs_avg=22.44849395751953
production_forward grad[60] vs paper_forward: mean_abs=0.5947446823120117, max_abs=4.6015625, mean_rel=0.14784464240074158, max_rel=809.255859375, norm_rel=0.02271472103893757, ref_abs_avg=26.182294845581055, test_abs_avg=26.18319320678711
production_forward grad[61] vs paper_forward: mean_abs=0.5848313570022583, max_abs=4.15625, mean_rel=0.1419922411441803, max_rel=857.1818237304688, norm_rel=0.022563299164175987, ref_abs_avg=25.94542121887207, test_abs_avg=25.94832992553711
production_forward grad[62] vs paper_forward: mean_abs=0.43177393078804016, max_abs=1.9375, mean_rel=0.13222357630729675, max_rel=25.01220703125, norm_rel=0.02114569954574108, ref_abs_avg=20.596607208251953, test_abs_avg=20.595626831054688
production_forward grad[63] vs paper_forward: mean_abs=0.5620721578598022, max_abs=3.75, mean_rel=0.14966082572937012, max_rel=1162.0479736328125, norm_rel=0.022193171083927155, ref_abs_avg=25.30797576904297, test_abs_avg=25.311521530151367
production_forward grad[64] vs paper_forward: mean_abs=0.5478853583335876, max_abs=3.625, mean_rel=0.14317180216312408, max_rel=744.4080810546875, norm_rel=0.021830955520272255, ref_abs_avg=25.07516098022461, test_abs_avg=25.083797454833984
production_forward grad[65] vs paper_forward: mean_abs=0.4588918685913086, max_abs=1.75, mean_rel=0.1451718509197235, max_rel=36.67066192626953, norm_rel=0.02237858809530735, ref_abs_avg=21.044422149658203, test_abs_avg=21.010879516601562
production_forward grad[66] vs paper_forward: mean_abs=0.533185601234436, max_abs=4.00390625, mean_rel=0.14463451504707336, max_rel=879.9392700195312, norm_rel=0.021940162405371666, ref_abs_avg=24.267200469970703, test_abs_avg=24.268924713134766
production_forward grad[67] vs paper_forward: mean_abs=0.5247484445571899, max_abs=3.78125, mean_rel=0.1380244791507721, max_rel=1394.847900390625, norm_rel=0.022180043160915375, ref_abs_avg=23.700546264648438, test_abs_avg=23.696441650390625
production_forward grad[68] vs paper_forward: mean_abs=0.4290642738342285, max_abs=1.984375, mean_rel=0.20264121890068054, max_rel=27.6298885345459, norm_rel=0.02357061207294464, ref_abs_avg=18.261444091796875, test_abs_avg=18.281452178955078
production_forward grad[69] vs paper_forward: mean_abs=0.5085040926933289, max_abs=4.0, mean_rel=0.1443568468093872, max_rel=946.9246215820312, norm_rel=0.021502742543816566, ref_abs_avg=23.626577377319336, test_abs_avg=23.62744140625
production_forward grad[70] vs paper_forward: mean_abs=0.49416089057922363, max_abs=4.5, mean_rel=0.1245126724243164, max_rel=393.4530334472656, norm_rel=0.021355703473091125, ref_abs_avg=23.11270523071289, test_abs_avg=23.106033325195312
production_forward grad[71] vs paper_forward: mean_abs=0.3767229914665222, max_abs=1.5, mean_rel=0.10536643862724304, max_rel=15.02038288116455, norm_rel=0.02080671861767769, ref_abs_avg=18.145244598388672, test_abs_avg=18.155685424804688
production_forward grad[72] vs paper_forward: mean_abs=0.47584307193756104, max_abs=4.0, mean_rel=0.13946303725242615, max_rel=1108.808349609375, norm_rel=0.02095106430351734, ref_abs_avg=22.66850471496582, test_abs_avg=22.668598175048828
production_forward grad[73] vs paper_forward: mean_abs=0.4656485617160797, max_abs=4.25, mean_rel=0.13101282715797424, max_rel=531.969970703125, norm_rel=0.020761607214808464, ref_abs_avg=22.434967041015625, test_abs_avg=22.425003051757812
production_forward grad[74] vs paper_forward: mean_abs=0.451066255569458, max_abs=1.6875, mean_rel=0.5344111919403076, max_rel=205.0818634033203, norm_rel=0.024140115827322006, ref_abs_avg=18.353347778320312, test_abs_avg=18.390968322753906
production_forward grad[75] vs paper_forward: mean_abs=0.5274659991264343, max_abs=3.6875, mean_rel=0.15377306938171387, max_rel=991.9544067382812, norm_rel=0.0230641420930624, ref_abs_avg=22.882007598876953, test_abs_avg=22.885005950927734
production_forward grad[76] vs paper_forward: mean_abs=0.5064041614532471, max_abs=4.0, mean_rel=0.1431490033864975, max_rel=454.27606201171875, norm_rel=0.022449353709816933, ref_abs_avg=22.564922332763672, test_abs_avg=22.56316375732422
production_forward grad[77] vs paper_forward: mean_abs=0.3878515660762787, max_abs=1.625, mean_rel=0.16340813040733337, max_rel=42.74515151977539, norm_rel=0.02159438654780388, ref_abs_avg=18.42902374267578, test_abs_avg=18.416732788085938
production_forward grad[78] vs paper_forward: mean_abs=0.47598502039909363, max_abs=4.375, mean_rel=0.15784095227718353, max_rel=1095.6190185546875, norm_rel=0.022357016801834106, ref_abs_avg=21.334182739257812, test_abs_avg=21.33578872680664
production_forward grad[79] vs paper_forward: mean_abs=0.4716716408729553, max_abs=3.5, mean_rel=0.14962728321552277, max_rel=609.8856201171875, norm_rel=0.02276609092950821, ref_abs_avg=20.734169006347656, test_abs_avg=20.73128890991211
production_forward grad[80] vs paper_forward: mean_abs=0.37123021483421326, max_abs=1.5, mean_rel=0.15279150009155273, max_rel=22.788965225219727, norm_rel=0.02249573916196823, ref_abs_avg=16.850479125976562, test_abs_avg=16.84347915649414
production_forward grad[81] vs paper_forward: mean_abs=0.44006526470184326, max_abs=3.985107421875, mean_rel=0.14282023906707764, max_rel=1456.9908447265625, norm_rel=0.021593447774648666, ref_abs_avg=20.39982032775879, test_abs_avg=20.401309967041016
production_forward grad[82] vs paper_forward: mean_abs=0.4301823079586029, max_abs=4.0, mean_rel=0.13341167569160461, max_rel=519.8732299804688, norm_rel=0.021289056167006493, ref_abs_avg=20.280128479003906, test_abs_avg=20.285186767578125
production_forward grad[83] vs paper_forward: mean_abs=0.3479856252670288, max_abs=1.375, mean_rel=0.12238921225070953, max_rel=9.052258491516113, norm_rel=0.020958835259079933, ref_abs_avg=17.092798233032227, test_abs_avg=17.056758880615234
production_forward grad[84] vs paper_forward: mean_abs=0.41844475269317627, max_abs=4.0, mean_rel=0.13337576389312744, max_rel=1008.0407104492188, norm_rel=0.021283892914652824, ref_abs_avg=19.704025268554688, test_abs_avg=19.703998565673828
production_forward grad[85] vs paper_forward: mean_abs=0.41410160064697266, max_abs=3.75, mean_rel=0.1326846033334732, max_rel=517.8328857421875, norm_rel=0.021788086742162704, ref_abs_avg=19.15911102294922, test_abs_avg=19.163265228271484
production_forward grad[86] vs paper_forward: mean_abs=0.3115577697753906, max_abs=1.4375, mean_rel=0.06976693868637085, max_rel=3.1867268085479736, norm_rel=0.020220991224050522, ref_abs_avg=15.353008270263672, test_abs_avg=15.36617374420166
production_forward grad[87] vs paper_forward: mean_abs=0.3855869174003601, max_abs=3.375, mean_rel=0.13265551626682281, max_rel=906.5843505859375, norm_rel=0.020429013296961784, ref_abs_avg=18.96088981628418, test_abs_avg=18.96114730834961
production_forward grad[88] vs paper_forward: mean_abs=0.3701796531677246, max_abs=3.0, mean_rel=0.1313062310218811, max_rel=639.23681640625, norm_rel=0.01983460783958435, ref_abs_avg=18.66637420654297, test_abs_avg=18.659626007080078
production_forward grad[89] vs paper_forward: mean_abs=0.31139636039733887, max_abs=1.46875, mean_rel=0.11022527515888214, max_rel=7.047436237335205, norm_rel=0.021667860448360443, ref_abs_avg=14.16032600402832, test_abs_avg=14.13028335571289
production_forward grad[90] vs paper_forward: mean_abs=0.3611295819282532, max_abs=3.125, mean_rel=0.125728577375412, max_rel=635.7034301757812, norm_rel=0.0199397262185812, ref_abs_avg=18.19207000732422, test_abs_avg=18.191360473632812
production_forward grad[91] vs paper_forward: mean_abs=0.3645060658454895, max_abs=3.625, mean_rel=0.12615081667900085, max_rel=495.1932067871094, norm_rel=0.020559990778565407, ref_abs_avg=17.953399658203125, test_abs_avg=17.941936492919922
production_forward grad[92] vs paper_forward: mean_abs=0.27657580375671387, max_abs=1.25, mean_rel=0.1051197499036789, max_rel=15.984529495239258, norm_rel=0.01879224181175232, ref_abs_avg=14.738068580627441, test_abs_avg=14.741830825805664
production_forward grad[93] vs paper_forward: mean_abs=0.34897661209106445, max_abs=3.75, mean_rel=0.1253182291984558, max_rel=1066.046630859375, norm_rel=0.019801221787929535, ref_abs_avg=17.81549072265625, test_abs_avg=17.81545639038086
production_forward grad[94] vs paper_forward: mean_abs=0.3381864130496979, max_abs=4.0, mean_rel=0.12951579689979553, max_rel=1247.3631591796875, norm_rel=0.01988585852086544, ref_abs_avg=17.28150177001953, test_abs_avg=17.290739059448242
production_forward grad[95] vs paper_forward: mean_abs=0.27795934677124023, max_abs=1.25, mean_rel=0.11068671941757202, max_rel=15.757584571838379, norm_rel=0.019552795216441154, ref_abs_avg=14.577920913696289, test_abs_avg=14.557585716247559
production_forward grad[96] vs paper_forward: mean_abs=0.33358120918273926, max_abs=4.0, mean_rel=0.12060453742742538, max_rel=500.4085388183594, norm_rel=0.01930890418589115, ref_abs_avg=17.56993293762207, test_abs_avg=17.57073211669922
production_forward grad[97] vs paper_forward: mean_abs=0.3206317126750946, max_abs=3.5, mean_rel=0.1169048398733139, max_rel=522.024169921875, norm_rel=0.01888602040708065, ref_abs_avg=17.226057052612305, test_abs_avg=17.227840423583984
torch_compile_phases_forward vs paper_forward output: mean_abs=0.001637764973565936, max_abs=0.046875
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008575942367315292, max_abs=0.5, mean_rel=0.07411276549100876, max_rel=97.37126159667969, norm_rel=0.02020072564482689, ref_abs_avg=0.456993043422699, test_abs_avg=0.456997275352478
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.288017749786377, max_abs=64.0, mean_rel=0.22830979526042938, max_rel=1100.755859375, norm_rel=0.020465968176722527, ref_abs_avg=314.8877258300781, test_abs_avg=314.84527587890625
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.26171875, max_abs=5.0, mean_rel=0.059005290269851685, max_rel=1.3890215158462524, norm_rel=0.021887794137001038, ref_abs_avg=59.75926208496094, test_abs_avg=59.75569152832031
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.572537899017334, max_abs=10.5, mean_rel=0.17095085978507996, max_rel=2467.406982421875, norm_rel=0.023605389520525932, ref_abs_avg=66.94349670410156, test_abs_avg=66.94417572021484
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5417290925979614, max_abs=10.0, mean_rel=0.17850862443447113, max_rel=1657.0567626953125, norm_rel=0.02346160262823105, ref_abs_avg=66.17384338378906, test_abs_avg=66.168701171875
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.1627197265625, max_abs=5.5, mean_rel=0.10152488201856613, max_rel=6.733495235443115, norm_rel=0.02545752190053463, ref_abs_avg=45.804710388183594, test_abs_avg=45.78546905517578
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.3472058773040771, max_abs=8.046875, mean_rel=0.16338659822940826, max_rel=1754.44970703125, norm_rel=0.02325701154768467, ref_abs_avg=58.21352005004883, test_abs_avg=58.215213775634766
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3092304468154907, max_abs=8.0, mean_rel=0.16934704780578613, max_rel=1008.2957153320312, norm_rel=0.02280297689139843, ref_abs_avg=57.71700668334961, test_abs_avg=57.72109603881836
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0018160343170166, max_abs=3.5, mean_rel=0.33057165145874023, max_rel=98.4673080444336, norm_rel=0.023640980944037437, ref_abs_avg=41.866615295410156, test_abs_avg=41.850589752197266
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.2066690921783447, max_abs=8.0, mean_rel=0.1741824895143509, max_rel=2997.619873046875, norm_rel=0.023013029247522354, ref_abs_avg=52.712955474853516, test_abs_avg=52.715126037597656
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.1721985340118408, max_abs=7.5, mean_rel=0.16015200316905975, max_rel=1099.3572998046875, norm_rel=0.022834062576293945, ref_abs_avg=51.598976135253906, test_abs_avg=51.596168518066406
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9274368286132812, max_abs=4.0, mean_rel=0.12127361446619034, max_rel=13.644918441772461, norm_rel=0.022311486303806305, ref_abs_avg=41.914207458496094, test_abs_avg=41.920753479003906
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.1113219261169434, max_abs=7.0, mean_rel=0.1650705635547638, max_rel=1799.8148193359375, norm_rel=0.022723326459527016, ref_abs_avg=49.134490966796875, test_abs_avg=49.13260269165039
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.0914788246154785, max_abs=7.125, mean_rel=0.14352604746818542, max_rel=762.9241333007812, norm_rel=0.022475555539131165, ref_abs_avg=48.86534118652344, test_abs_avg=48.85874557495117
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8338489532470703, max_abs=3.875, mean_rel=0.10556014627218246, max_rel=9.753301620483398, norm_rel=0.02310248650610447, ref_abs_avg=36.836585998535156, test_abs_avg=36.85820770263672
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.0412707328796387, max_abs=6.25, mean_rel=0.1515396535396576, max_rel=1320.455810546875, norm_rel=0.022646278142929077, ref_abs_avg=46.164344787597656, test_abs_avg=46.16609191894531
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0140618085861206, max_abs=7.0, mean_rel=0.1802762746810913, max_rel=1954.001220703125, norm_rel=0.022361699491739273, ref_abs_avg=45.57128143310547, test_abs_avg=45.58051300048828
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.7801933288574219, max_abs=3.25, mean_rel=0.15086066722869873, max_rel=22.47224998474121, norm_rel=0.021985068917274475, ref_abs_avg=35.86116409301758, test_abs_avg=35.84547805786133
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=0.9796977043151855, max_abs=6.25, mean_rel=0.16072283685207367, max_rel=2254.535888671875, norm_rel=0.022361714392900467, ref_abs_avg=44.02367401123047, test_abs_avg=44.026206970214844
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9516762495040894, max_abs=6.0, mean_rel=0.14718399941921234, max_rel=792.7593994140625, norm_rel=0.02210851199924946, ref_abs_avg=43.24833679199219, test_abs_avg=43.24934005737305
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.792365312576294, max_abs=3.375, mean_rel=0.12622924149036407, max_rel=17.79470443725586, norm_rel=0.023492567241191864, ref_abs_avg=33.66946792602539, test_abs_avg=33.6118278503418
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9219762086868286, max_abs=5.5, mean_rel=0.1547042280435562, max_rel=1051.8199462890625, norm_rel=0.022277778014540672, ref_abs_avg=41.56105041503906, test_abs_avg=41.560272216796875
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.8971378803253174, max_abs=5.7451171875, mean_rel=0.1703798919916153, max_rel=1276.1187744140625, norm_rel=0.021787593141198158, ref_abs_avg=41.28948974609375, test_abs_avg=41.28633117675781
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7317295074462891, max_abs=2.75, mean_rel=0.11741461604833603, max_rel=4.899865627288818, norm_rel=0.022467035800218582, ref_abs_avg=32.219512939453125, test_abs_avg=32.205291748046875
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.8806504011154175, max_abs=5.46875, mean_rel=0.14806848764419556, max_rel=1396.1942138671875, norm_rel=0.022186417132616043, ref_abs_avg=39.834716796875, test_abs_avg=39.835487365722656
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8579113483428955, max_abs=5.375, mean_rel=0.14151215553283691, max_rel=697.1195678710938, norm_rel=0.021894969046115875, ref_abs_avg=39.30659484863281, test_abs_avg=39.30967330932617
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8453421592712402, max_abs=3.5, mean_rel=0.12046311050653458, max_rel=15.818480491638184, norm_rel=0.02354404516518116, ref_abs_avg=35.32591247558594, test_abs_avg=35.364463806152344
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.0136733055114746, max_abs=7.0, mean_rel=0.16901904344558716, max_rel=1456.8607177734375, norm_rel=0.0238454882055521, ref_abs_avg=42.684913635253906, test_abs_avg=42.68458557128906
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.9884141683578491, max_abs=6.0, mean_rel=0.15564097464084625, max_rel=601.4008178710938, norm_rel=0.023672383278608322, ref_abs_avg=41.88469314575195, test_abs_avg=41.885990142822266
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.8272819519042969, max_abs=3.5, mean_rel=0.21245482563972473, max_rel=29.582935333251953, norm_rel=0.02671375684440136, ref_abs_avg=31.035839080810547, test_abs_avg=31.085975646972656
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.9393855929374695, max_abs=5.875, mean_rel=0.16321268677711487, max_rel=1285.24267578125, norm_rel=0.024168066680431366, ref_abs_avg=38.982383728027344, test_abs_avg=38.985023498535156
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9264876842498779, max_abs=6.0, mean_rel=0.1630629450082779, max_rel=1192.3101806640625, norm_rel=0.024058721959590912, ref_abs_avg=38.69256591796875, test_abs_avg=38.69196319580078
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.6970393657684326, max_abs=3.25, mean_rel=0.12668943405151367, max_rel=23.469406127929688, norm_rel=0.022671308368444443, ref_abs_avg=31.320598602294922, test_abs_avg=31.375173568725586
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.8813201189041138, max_abs=6.2578125, mean_rel=0.16874709725379944, max_rel=1921.3759765625, norm_rel=0.02404005639255047, ref_abs_avg=36.759422302246094, test_abs_avg=36.760162353515625
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8637678623199463, max_abs=5.25, mean_rel=0.15676730871200562, max_rel=1193.88525390625, norm_rel=0.023773513734340668, ref_abs_avg=36.404022216796875, test_abs_avg=36.40174865722656
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.6690549850463867, max_abs=2.625, mean_rel=0.15996354818344116, max_rel=22.079084396362305, norm_rel=0.023890173062682152, ref_abs_avg=28.551488876342773, test_abs_avg=28.51071548461914
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8185821771621704, max_abs=5.03125, mean_rel=0.16405361890792847, max_rel=1319.647216796875, norm_rel=0.023831941187381744, ref_abs_avg=34.43366622924805, test_abs_avg=34.43233108520508
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8068695068359375, max_abs=4.625, mean_rel=0.16564509272575378, max_rel=914.5486450195312, norm_rel=0.023452913388609886, ref_abs_avg=34.46232986450195, test_abs_avg=34.462711334228516
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6272034645080566, max_abs=2.125, mean_rel=0.08076809346675873, max_rel=4.785099506378174, norm_rel=0.023372389376163483, ref_abs_avg=27.284547805786133, test_abs_avg=27.323524475097656
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.7758051753044128, max_abs=5.0, mean_rel=0.16240017116069794, max_rel=1000.0106811523438, norm_rel=0.023504601791501045, ref_abs_avg=33.09608840942383, test_abs_avg=33.09839630126953
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7635319232940674, max_abs=5.0, mean_rel=0.14979596436023712, max_rel=543.9823608398438, norm_rel=0.023264208808541298, ref_abs_avg=32.85040283203125, test_abs_avg=32.85186767578125
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6041104793548584, max_abs=2.53125, mean_rel=0.19267788529396057, max_rel=29.09578514099121, norm_rel=0.024516960605978966, ref_abs_avg=24.798213958740234, test_abs_avg=24.760852813720703
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7409446239471436, max_abs=5.0, mean_rel=0.1606500744819641, max_rel=1907.0748291015625, norm_rel=0.023256437852978706, ref_abs_avg=31.890911102294922, test_abs_avg=31.893173217773438
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7271105051040649, max_abs=4.625, mean_rel=0.16436320543289185, max_rel=1072.9383544921875, norm_rel=0.023181235417723656, ref_abs_avg=31.42582893371582, test_abs_avg=31.433330535888672
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5673424005508423, max_abs=2.0, mean_rel=0.35658738017082214, max_rel=119.9948501586914, norm_rel=0.021962756291031837, ref_abs_avg=25.725791931152344, test_abs_avg=25.712379455566406
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.699942946434021, max_abs=4.75, mean_rel=0.14926087856292725, max_rel=819.2000732421875, norm_rel=0.022877348586916924, ref_abs_avg=30.64044189453125, test_abs_avg=30.640071868896484
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.6936473250389099, max_abs=4.5, mean_rel=0.17681202292442322, max_rel=1928.62451171875, norm_rel=0.022927800193428993, ref_abs_avg=30.235321044921875, test_abs_avg=30.23928451538086
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5641403198242188, max_abs=2.25, mean_rel=0.20480483770370483, max_rel=44.72245407104492, norm_rel=0.02305544912815094, ref_abs_avg=24.944643020629883, test_abs_avg=24.97802734375
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.6686649322509766, max_abs=4.5, mean_rel=0.15921995043754578, max_rel=1214.3658447265625, norm_rel=0.022723140195012093, ref_abs_avg=29.51881980895996, test_abs_avg=29.519302368164062
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6591405868530273, max_abs=4.5, mean_rel=0.14700847864151, max_rel=674.1168823242188, norm_rel=0.022557437419891357, ref_abs_avg=29.2836971282959, test_abs_avg=29.286338806152344
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6466622352600098, max_abs=2.75, mean_rel=0.12472587823867798, max_rel=12.274632453918457, norm_rel=0.023021988570690155, ref_abs_avg=27.64927101135254, test_abs_avg=27.633995056152344
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.7569087743759155, max_abs=5.0, mean_rel=0.16730746626853943, max_rel=1028.194091796875, norm_rel=0.02412976510822773, ref_abs_avg=31.420299530029297, test_abs_avg=31.420190811157227
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7515445947647095, max_abs=5.0, mean_rel=0.15466970205307007, max_rel=520.7101440429688, norm_rel=0.024464407935738564, ref_abs_avg=30.76669692993164, test_abs_avg=30.76923370361328
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5605068206787109, max_abs=2.75, mean_rel=0.08836276084184647, max_rel=5.8755998611450195, norm_rel=0.024245265871286392, ref_abs_avg=23.56911849975586, test_abs_avg=23.56655502319336
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.6860784292221069, max_abs=4.75, mean_rel=0.16733551025390625, max_rel=859.9068603515625, norm_rel=0.024030858650803566, ref_abs_avg=28.576711654663086, test_abs_avg=28.580047607421875
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6782814264297485, max_abs=4.3125, mean_rel=0.16296550631523132, max_rel=1424.4248046875, norm_rel=0.023642949759960175, ref_abs_avg=28.705341339111328, test_abs_avg=28.705089569091797
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5297021865844727, max_abs=2.0, mean_rel=0.10110679268836975, max_rel=5.924561500549316, norm_rel=0.02304486557841301, ref_abs_avg=23.17650032043457, test_abs_avg=23.194473266601562
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6425589323043823, max_abs=4.5, mean_rel=0.15357279777526855, max_rel=1034.302490234375, norm_rel=0.0232465211302042, ref_abs_avg=27.60393524169922, test_abs_avg=27.604717254638672
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.62950599193573, max_abs=4.5, mean_rel=0.15416742861270905, max_rel=728.1859130859375, norm_rel=0.023428156971931458, ref_abs_avg=26.958324432373047, test_abs_avg=26.969932556152344
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.48375260829925537, max_abs=1.984375, mean_rel=0.11202473938465118, max_rel=15.958491325378418, norm_rel=0.022098146378993988, ref_abs_avg=22.451208114624023, test_abs_avg=22.449542999267578
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6034215688705444, max_abs=4.25, mean_rel=0.15166005492210388, max_rel=1158.485595703125, norm_rel=0.02303430065512657, ref_abs_avg=26.182294845581055, test_abs_avg=26.182723999023438
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.594562292098999, max_abs=4.375, mean_rel=0.1398690938949585, max_rel=485.2445373535156, norm_rel=0.02294023707509041, ref_abs_avg=25.94542121887207, test_abs_avg=25.947446823120117
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4276390075683594, max_abs=1.625, mean_rel=0.12704125046730042, max_rel=22.693294525146484, norm_rel=0.021023692563176155, ref_abs_avg=20.596607208251953, test_abs_avg=20.602821350097656
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.5694558024406433, max_abs=4.0, mean_rel=0.14969736337661743, max_rel=1335.5543212890625, norm_rel=0.02246650867164135, ref_abs_avg=25.30797576904297, test_abs_avg=25.309860229492188
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5543574690818787, max_abs=3.75, mean_rel=0.14421527087688446, max_rel=817.3522338867188, norm_rel=0.022090177983045578, ref_abs_avg=25.07516098022461, test_abs_avg=25.083297729492188
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4633145332336426, max_abs=2.0, mean_rel=0.1459149420261383, max_rel=34.2725830078125, norm_rel=0.022731253877282143, ref_abs_avg=21.044422149658203, test_abs_avg=21.001087188720703
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5394216775894165, max_abs=4.0, mean_rel=0.14690840244293213, max_rel=658.4970703125, norm_rel=0.02219918742775917, ref_abs_avg=24.267200469970703, test_abs_avg=24.269031524658203
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5321642160415649, max_abs=4.0, mean_rel=0.1363636553287506, max_rel=1207.4952392578125, norm_rel=0.02252422459423542, ref_abs_avg=23.700546264648438, test_abs_avg=23.698291778564453
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.4281422197818756, max_abs=1.8125, mean_rel=0.15474221110343933, max_rel=20.236469268798828, norm_rel=0.023618467152118683, ref_abs_avg=18.261444091796875, test_abs_avg=18.263751983642578
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.51432204246521, max_abs=3.75, mean_rel=0.14652542769908905, max_rel=753.86279296875, norm_rel=0.021725352853536606, ref_abs_avg=23.626577377319336, test_abs_avg=23.627286911010742
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5003998279571533, max_abs=4.0, mean_rel=0.12395575642585754, max_rel=364.61529541015625, norm_rel=0.021637530997395515, ref_abs_avg=23.11270523071289, test_abs_avg=23.104961395263672
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.38049477338790894, max_abs=1.625, mean_rel=0.1042361855506897, max_rel=9.606197357177734, norm_rel=0.020773086696863174, ref_abs_avg=18.145244598388672, test_abs_avg=18.131839752197266
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.480835497379303, max_abs=4.0, mean_rel=0.14006778597831726, max_rel=1010.3328857421875, norm_rel=0.021156784147024155, ref_abs_avg=22.66850471496582, test_abs_avg=22.668405532836914
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.4700825810432434, max_abs=3.8125, mean_rel=0.13302144408226013, max_rel=524.0349731445312, norm_rel=0.02095402218401432, ref_abs_avg=22.434967041015625, test_abs_avg=22.42648696899414
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.46324968338012695, max_abs=1.6875, mean_rel=0.8140668272972107, max_rel=352.1859436035156, norm_rel=0.024713784456253052, ref_abs_avg=18.353347778320312, test_abs_avg=18.39710235595703
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5353555083274841, max_abs=3.875, mean_rel=0.1508753001689911, max_rel=847.5274658203125, norm_rel=0.023400014266371727, ref_abs_avg=22.882007598876953, test_abs_avg=22.884063720703125
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5144885778427124, max_abs=3.875, mean_rel=0.1454324722290039, max_rel=505.08795166015625, norm_rel=0.022760290652513504, ref_abs_avg=22.564922332763672, test_abs_avg=22.565839767456055
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.40403053164482117, max_abs=1.6875, mean_rel=0.09256869554519653, max_rel=6.661595821380615, norm_rel=0.021826930344104767, ref_abs_avg=18.42902374267578, test_abs_avg=18.426410675048828
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.48258113861083984, max_abs=4.0, mean_rel=0.156553715467453, max_rel=1689.92919921875, norm_rel=0.022666316479444504, ref_abs_avg=21.334182739257812, test_abs_avg=21.334575653076172
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.47885775566101074, max_abs=4.0, mean_rel=0.15015731751918793, max_rel=612.5272216796875, norm_rel=0.023109041154384613, ref_abs_avg=20.734169006347656, test_abs_avg=20.73349380493164
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.37226057052612305, max_abs=1.5, mean_rel=0.1462664008140564, max_rel=22.19672393798828, norm_rel=0.02242991141974926, ref_abs_avg=16.850479125976562, test_abs_avg=16.84908676147461
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.4458387494087219, max_abs=4.0, mean_rel=0.1494419127702713, max_rel=1870.1878662109375, norm_rel=0.021859729662537575, ref_abs_avg=20.39982032775879, test_abs_avg=20.400394439697266
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.43611660599708557, max_abs=4.5, mean_rel=0.13554413616657257, max_rel=702.926513671875, norm_rel=0.021605202928185463, ref_abs_avg=20.280128479003906, test_abs_avg=20.28789520263672
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3586890697479248, max_abs=1.375, mean_rel=0.13562050461769104, max_rel=11.667316436767578, norm_rel=0.02139747515320778, ref_abs_avg=17.092798233032227, test_abs_avg=17.058177947998047
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.4226164221763611, max_abs=4.5, mean_rel=0.13371121883392334, max_rel=908.004638671875, norm_rel=0.02147578075528145, ref_abs_avg=19.704025268554688, test_abs_avg=19.703540802001953
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.41320091485977173, max_abs=3.578125, mean_rel=0.1376672387123108, max_rel=869.576171875, norm_rel=0.021690767258405685, ref_abs_avg=19.15911102294922, test_abs_avg=19.16150665283203
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.3027503490447998, max_abs=1.5, mean_rel=0.06943562626838684, max_rel=4.1094255447387695, norm_rel=0.019889507442712784, ref_abs_avg=15.353008270263672, test_abs_avg=15.349002838134766
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.3891471028327942, max_abs=3.5, mean_rel=0.1331654191017151, max_rel=858.2886352539062, norm_rel=0.020612243562936783, ref_abs_avg=18.96088981628418, test_abs_avg=18.960485458374023
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.37208038568496704, max_abs=4.0, mean_rel=0.13292883336544037, max_rel=875.9013061523438, norm_rel=0.019910788163542747, ref_abs_avg=18.66637420654297, test_abs_avg=18.65980339050293
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.3204019069671631, max_abs=1.28125, mean_rel=0.12247949838638306, max_rel=6.721522331237793, norm_rel=0.021974291652441025, ref_abs_avg=14.16032600402832, test_abs_avg=14.139232635498047
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.3639909625053406, max_abs=3.25, mean_rel=0.1251617968082428, max_rel=705.9620361328125, norm_rel=0.02007937617599964, ref_abs_avg=18.19207000732422, test_abs_avg=18.1915283203125
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.36414778232574463, max_abs=3.5, mean_rel=0.12824100255966187, max_rel=584.713623046875, norm_rel=0.020531365647912025, ref_abs_avg=17.953399658203125, test_abs_avg=17.943708419799805
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.27501940727233887, max_abs=1.2880859375, mean_rel=0.09177964180707932, max_rel=9.032761573791504, norm_rel=0.019089996814727783, ref_abs_avg=14.738068580627441, test_abs_avg=14.749135971069336
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.3506704568862915, max_abs=3.75, mean_rel=0.12500888109207153, max_rel=862.154541015625, norm_rel=0.019882870838046074, ref_abs_avg=17.81549072265625, test_abs_avg=17.814346313476562
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.33737462759017944, max_abs=3.8125, mean_rel=0.13052359223365784, max_rel=1382.6634521484375, norm_rel=0.01987132616341114, ref_abs_avg=17.28150177001953, test_abs_avg=17.287057876586914
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.28413450717926025, max_abs=1.125, mean_rel=0.09393680095672607, max_rel=13.254684448242188, norm_rel=0.019583113491535187, ref_abs_avg=14.577920913696289, test_abs_avg=14.55963134765625
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3349457383155823, max_abs=4.0, mean_rel=0.12223339825868607, max_rel=676.4044799804688, norm_rel=0.019368037581443787, ref_abs_avg=17.56993293762207, test_abs_avg=17.57042694091797
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.32056868076324463, max_abs=3.5, mean_rel=0.11675742268562317, max_rel=560.6551513671875, norm_rel=0.018925124779343605, ref_abs_avg=17.226057052612305, test_abs_avg=17.225570678710938
identity layers + randn queries
paper_forward fwd+bwd:  221.176 ms
paper_forward bwd-only: 173.973 ms
paper_forward peak allocated: fwd=35.128 GiB, fwd+bwd=37.247 GiB
paper_forward peak reserved:  fwd=36.168 GiB, fwd+bwd=38.668 GiB
torch_compile_phases_forward fwd+bwd:  94.921 ms
torch_compile_phases_forward bwd-only: 76.563 ms
torch_compile_phases_forward peak allocated: fwd=18.203 GiB, fwd+bwd=18.831 GiB
torch_compile_phases_forward peak reserved:  fwd=27.398 GiB, fwd+bwd=27.398 GiB
production_forward fwd+bwd:  66.258 ms
production_forward bwd-only: 56.422 ms
production_forward peak allocated: fwd=7.614 GiB, fwd+bwd=15.618 GiB
production_forward peak reserved:  fwd=27.375 GiB, fwd+bwd=27.375 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016579176299273968, max_abs=0.0390625
production_forward grad[0] vs paper_forward: mean_abs=0.008292656391859055, max_abs=0.34375, mean_rel=0.07241939753293991, max_rel=93.43743133544922, norm_rel=0.019752874970436096, ref_abs_avg=0.45405352115631104, test_abs_avg=0.45407339930534363
production_forward grad[1] vs paper_forward: mean_abs=7.162291049957275, max_abs=56.0, mean_rel=0.24785634875297546, max_rel=1127.971435546875, norm_rel=0.019872276112437248, ref_abs_avg=318.1999816894531, test_abs_avg=318.1485900878906
production_forward grad[2] vs paper_forward: mean_abs=1.232395887374878, max_abs=5.0, mean_rel=0.5761023759841919, max_rel=233.38526916503906, norm_rel=0.024289622902870178, ref_abs_avg=51.07407760620117, test_abs_avg=51.009246826171875
production_forward grad[3] vs paper_forward: mean_abs=1.4873377084732056, max_abs=9.5, mean_rel=0.1786383092403412, max_rel=2845.004150390625, norm_rel=0.02300567738711834, ref_abs_avg=64.96184539794922, test_abs_avg=64.96481323242188
production_forward grad[4] vs paper_forward: mean_abs=1.4550061225891113, max_abs=10.0, mean_rel=0.15599724650382996, max_rel=735.5183715820312, norm_rel=0.022831564769148827, ref_abs_avg=64.13980865478516, test_abs_avg=64.13904571533203
production_forward grad[5] vs paper_forward: mean_abs=1.0402908325195312, max_abs=4.125, mean_rel=0.11708630621433258, max_rel=7.234131813049316, norm_rel=0.022432947531342506, ref_abs_avg=45.53087615966797, test_abs_avg=45.5377197265625
production_forward grad[6] vs paper_forward: mean_abs=1.3023672103881836, max_abs=8.5, mean_rel=0.16487887501716614, max_rel=1145.384521484375, norm_rel=0.02274002507328987, ref_abs_avg=57.588905334472656, test_abs_avg=57.58946990966797
production_forward grad[7] vs paper_forward: mean_abs=1.2661290168762207, max_abs=8.0, mean_rel=0.16527575254440308, max_rel=1542.828369140625, norm_rel=0.022485237568616867, ref_abs_avg=56.718170166015625, test_abs_avg=56.7188835144043
production_forward grad[8] vs paper_forward: mean_abs=0.8989152908325195, max_abs=4.3125, mean_rel=0.089952751994133, max_rel=6.293116569519043, norm_rel=0.020897449925541878, ref_abs_avg=44.03266143798828, test_abs_avg=44.010494232177734
production_forward grad[9] vs paper_forward: mean_abs=1.1821048259735107, max_abs=7.75, mean_rel=0.15619289875030518, max_rel=1233.4822998046875, norm_rel=0.022602584213018417, ref_abs_avg=52.571250915527344, test_abs_avg=52.575775146484375
production_forward grad[10] vs paper_forward: mean_abs=1.1498104333877563, max_abs=7.09375, mean_rel=0.17833366990089417, max_rel=1741.7242431640625, norm_rel=0.02236093208193779, ref_abs_avg=51.689125061035156, test_abs_avg=51.692291259765625
production_forward grad[11] vs paper_forward: mean_abs=0.8982534408569336, max_abs=4.5, mean_rel=0.1332411766052246, max_rel=26.791425704956055, norm_rel=0.022958924993872643, ref_abs_avg=39.56361770629883, test_abs_avg=39.644927978515625
production_forward grad[12] vs paper_forward: mean_abs=1.0807480812072754, max_abs=6.75, mean_rel=0.15982946753501892, max_rel=1388.197265625, norm_rel=0.02234661765396595, ref_abs_avg=48.63988494873047, test_abs_avg=48.64141082763672
production_forward grad[13] vs paper_forward: mean_abs=1.0519638061523438, max_abs=6.0, mean_rel=0.15257805585861206, max_rel=624.0219116210938, norm_rel=0.022095071151852608, ref_abs_avg=47.87885665893555, test_abs_avg=47.8753547668457
production_forward grad[14] vs paper_forward: mean_abs=0.8213768005371094, max_abs=4.0, mean_rel=0.09829491376876831, max_rel=7.289332866668701, norm_rel=0.02379992976784706, ref_abs_avg=35.313270568847656, test_abs_avg=35.3101806640625
production_forward grad[15] vs paper_forward: mean_abs=1.0114439725875854, max_abs=7.0625, mean_rel=0.16145852208137512, max_rel=2121.276611328125, norm_rel=0.022272665053606033, ref_abs_avg=45.58818817138672, test_abs_avg=45.589210510253906
production_forward grad[16] vs paper_forward: mean_abs=0.981478214263916, max_abs=6.0, mean_rel=0.16204425692558289, max_rel=2166.1015625, norm_rel=0.021939855068922043, ref_abs_avg=45.0018196105957, test_abs_avg=45.00605392456055
production_forward grad[17] vs paper_forward: mean_abs=0.7635037302970886, max_abs=3.0, mean_rel=0.09611309319734573, max_rel=3.7979493141174316, norm_rel=0.02268272079527378, ref_abs_avg=33.06625747680664, test_abs_avg=33.052696228027344
production_forward grad[18] vs paper_forward: mean_abs=0.954714298248291, max_abs=6.0, mean_rel=0.15539637207984924, max_rel=1171.550048828125, norm_rel=0.02222341112792492, ref_abs_avg=43.19220733642578, test_abs_avg=43.19563293457031
production_forward grad[19] vs paper_forward: mean_abs=0.930327296257019, max_abs=5.5625, mean_rel=0.14760923385620117, max_rel=919.8585815429688, norm_rel=0.02187463827431202, ref_abs_avg=42.74191665649414, test_abs_avg=42.75080871582031
production_forward grad[20] vs paper_forward: mean_abs=0.7596385478973389, max_abs=3.0, mean_rel=0.08152620494365692, max_rel=2.966245651245117, norm_rel=0.0237332321703434, ref_abs_avg=32.17807388305664, test_abs_avg=32.15974426269531
production_forward grad[21] vs paper_forward: mean_abs=0.9053003787994385, max_abs=6.125, mean_rel=0.15795676410198212, max_rel=1334.3868408203125, norm_rel=0.021952452138066292, ref_abs_avg=41.47300720214844, test_abs_avg=41.47527313232422
production_forward grad[22] vs paper_forward: mean_abs=0.8854542970657349, max_abs=5.6875, mean_rel=0.15358445048332214, max_rel=2015.573486328125, norm_rel=0.02162793278694153, ref_abs_avg=41.12123107910156, test_abs_avg=41.118438720703125
production_forward grad[23] vs paper_forward: mean_abs=0.6982407569885254, max_abs=2.75, mean_rel=0.07159707695245743, max_rel=2.975630283355713, norm_rel=0.020872462540864944, ref_abs_avg=33.14992904663086, test_abs_avg=33.17573547363281
production_forward grad[24] vs paper_forward: mean_abs=0.8625379800796509, max_abs=5.5, mean_rel=0.1486361026763916, max_rel=940.3710327148438, norm_rel=0.021860815584659576, ref_abs_avg=39.675743103027344, test_abs_avg=39.6786994934082
production_forward grad[25] vs paper_forward: mean_abs=0.841609537601471, max_abs=4.9375, mean_rel=0.15079662203788757, max_rel=1382.9498291015625, norm_rel=0.021491320803761482, ref_abs_avg=39.348785400390625, test_abs_avg=39.350669860839844
production_forward grad[26] vs paper_forward: mean_abs=0.8554301261901855, max_abs=3.125, mean_rel=0.08133284747600555, max_rel=4.785343170166016, norm_rel=0.023515554144978523, ref_abs_avg=36.81463623046875, test_abs_avg=36.744869232177734
production_forward grad[27] vs paper_forward: mean_abs=0.9993661642074585, max_abs=6.375, mean_rel=0.1657930314540863, max_rel=1467.517333984375, norm_rel=0.023447521030902863, ref_abs_avg=42.809532165527344, test_abs_avg=42.814788818359375
production_forward grad[28] vs paper_forward: mean_abs=0.9736652374267578, max_abs=6.0, mean_rel=0.15019673109054565, max_rel=577.4152221679688, norm_rel=0.02319309301674366, ref_abs_avg=42.20977020263672, test_abs_avg=42.21045684814453
production_forward grad[29] vs paper_forward: mean_abs=0.78082275390625, max_abs=2.875, mean_rel=0.08651357889175415, max_rel=9.789260864257812, norm_rel=0.023310355842113495, ref_abs_avg=33.19499206542969, test_abs_avg=33.212120056152344
production_forward grad[30] vs paper_forward: mean_abs=0.9310075640678406, max_abs=6.75, mean_rel=0.16646817326545715, max_rel=1186.7322998046875, norm_rel=0.02384503372013569, ref_abs_avg=39.16741943359375, test_abs_avg=39.171226501464844
production_forward grad[31] vs paper_forward: mean_abs=0.9067498445510864, max_abs=5.5, mean_rel=0.16749219596385956, max_rel=1237.8004150390625, norm_rel=0.023765338584780693, ref_abs_avg=38.30652618408203, test_abs_avg=38.299068450927734
production_forward grad[32] vs paper_forward: mean_abs=0.7043137550354004, max_abs=2.5, mean_rel=0.1236896961927414, max_rel=8.320008277893066, norm_rel=0.023983394727110863, ref_abs_avg=29.283679962158203, test_abs_avg=29.341705322265625
production_forward grad[33] vs paper_forward: mean_abs=0.8605364561080933, max_abs=5.75, mean_rel=0.16126272082328796, max_rel=2182.595703125, norm_rel=0.02372160740196705, ref_abs_avg=36.39612579345703, test_abs_avg=36.39680862426758
production_forward grad[34] vs paper_forward: mean_abs=0.8445529937744141, max_abs=5.25, mean_rel=0.17033925652503967, max_rel=1096.2589111328125, norm_rel=0.023637492209672928, ref_abs_avg=35.89017105102539, test_abs_avg=35.892433166503906
production_forward grad[35] vs paper_forward: mean_abs=0.6800722479820251, max_abs=2.75, mean_rel=0.17707319557666779, max_rel=23.18572425842285, norm_rel=0.02310323156416416, ref_abs_avg=28.878196716308594, test_abs_avg=28.877147674560547
production_forward grad[36] vs paper_forward: mean_abs=0.8149206042289734, max_abs=5.0, mean_rel=0.15152880549430847, max_rel=799.0091552734375, norm_rel=0.02348557487130165, ref_abs_avg=34.81859588623047, test_abs_avg=34.821266174316406
production_forward grad[37] vs paper_forward: mean_abs=0.79562908411026, max_abs=5.197265625, mean_rel=0.15699225664138794, max_rel=815.4437866210938, norm_rel=0.023336486890912056, ref_abs_avg=34.192230224609375, test_abs_avg=34.19631576538086
production_forward grad[38] vs paper_forward: mean_abs=0.6638903617858887, max_abs=2.4375, mean_rel=0.1158587709069252, max_rel=10.67872428894043, norm_rel=0.023513449355959892, ref_abs_avg=27.091915130615234, test_abs_avg=27.08501434326172
production_forward grad[39] vs paper_forward: mean_abs=0.7679073214530945, max_abs=4.6875, mean_rel=0.15078604221343994, max_rel=956.1873168945312, norm_rel=0.023282088339328766, ref_abs_avg=33.09113311767578, test_abs_avg=33.09059143066406
production_forward grad[40] vs paper_forward: mean_abs=0.7582908868789673, max_abs=4.625, mean_rel=0.15754520893096924, max_rel=1213.749267578125, norm_rel=0.023180672898888588, ref_abs_avg=32.751827239990234, test_abs_avg=32.755218505859375
production_forward grad[41] vs paper_forward: mean_abs=0.6035351753234863, max_abs=2.125, mean_rel=0.11514412611722946, max_rel=11.282848358154297, norm_rel=0.022732168436050415, ref_abs_avg=25.949604034423828, test_abs_avg=25.913339614868164
production_forward grad[42] vs paper_forward: mean_abs=0.7310125827789307, max_abs=5.5, mean_rel=0.15362243354320526, max_rel=989.2796020507812, norm_rel=0.022970084100961685, ref_abs_avg=31.87745475769043, test_abs_avg=31.877931594848633
production_forward grad[43] vs paper_forward: mean_abs=0.71357262134552, max_abs=4.875, mean_rel=0.13541221618652344, max_rel=581.530517578125, norm_rel=0.02301245555281639, ref_abs_avg=31.05213165283203, test_abs_avg=31.056053161621094
production_forward grad[44] vs paper_forward: mean_abs=0.5828729867935181, max_abs=2.40625, mean_rel=0.15735885500907898, max_rel=35.060543060302734, norm_rel=0.02293744497001171, ref_abs_avg=25.10919189453125, test_abs_avg=25.132755279541016
production_forward grad[45] vs paper_forward: mean_abs=0.6935769319534302, max_abs=5.0, mean_rel=0.1576288342475891, max_rel=805.901123046875, norm_rel=0.02289547771215439, ref_abs_avg=30.30608367919922, test_abs_avg=30.305152893066406
production_forward grad[46] vs paper_forward: mean_abs=0.6843122243881226, max_abs=4.875, mean_rel=0.14719125628471375, max_rel=1067.03857421875, norm_rel=0.022848550230264664, ref_abs_avg=30.038532257080078, test_abs_avg=30.038238525390625
production_forward grad[47] vs paper_forward: mean_abs=0.5395194292068481, max_abs=2.1875, mean_rel=0.10294991731643677, max_rel=12.008896827697754, norm_rel=0.02228705957531929, ref_abs_avg=24.037067413330078, test_abs_avg=24.008777618408203
production_forward grad[48] vs paper_forward: mean_abs=0.6645925045013428, max_abs=4.453125, mean_rel=0.15437272191047668, max_rel=1074.753173828125, norm_rel=0.022596057504415512, ref_abs_avg=29.430255889892578, test_abs_avg=29.43206787109375
production_forward grad[49] vs paper_forward: mean_abs=0.64892578125, max_abs=4.0, mean_rel=0.15985454618930817, max_rel=1179.167724609375, norm_rel=0.0226287841796875, ref_abs_avg=28.779319763183594, test_abs_avg=28.780607223510742
production_forward grad[50] vs paper_forward: mean_abs=0.6420037746429443, max_abs=2.59375, mean_rel=0.1899673193693161, max_rel=20.720731735229492, norm_rel=0.024933120235800743, ref_abs_avg=25.8507137298584, test_abs_avg=25.89689064025879
production_forward grad[51] vs paper_forward: mean_abs=0.7523598670959473, max_abs=5.25, mean_rel=0.15695685148239136, max_rel=817.33154296875, norm_rel=0.023970916867256165, ref_abs_avg=31.466035842895508, test_abs_avg=31.470176696777344
production_forward grad[52] vs paper_forward: mean_abs=0.7345884442329407, max_abs=4.75, mean_rel=0.1711166501045227, max_rel=907.7113037109375, norm_rel=0.023715898394584656, ref_abs_avg=31.051055908203125, test_abs_avg=31.06100082397461
production_forward grad[53] vs paper_forward: mean_abs=0.6062335968017578, max_abs=2.40625, mean_rel=0.09204479306936264, max_rel=6.836872577667236, norm_rel=0.025580763816833496, ref_abs_avg=23.976476669311523, test_abs_avg=23.942428588867188
production_forward grad[54] vs paper_forward: mean_abs=0.6935724020004272, max_abs=4.875, mean_rel=0.1586129069328308, max_rel=1261.3729248046875, norm_rel=0.02376750111579895, ref_abs_avg=29.24993896484375, test_abs_avg=29.25189208984375
production_forward grad[55] vs paper_forward: mean_abs=0.6789667010307312, max_abs=4.5, mean_rel=0.15924638509750366, max_rel=1059.64306640625, norm_rel=0.023485323414206505, ref_abs_avg=29.05459976196289, test_abs_avg=29.056522369384766
production_forward grad[56] vs paper_forward: mean_abs=0.5340900421142578, max_abs=2.375, mean_rel=0.12862083315849304, max_rel=18.097251892089844, norm_rel=0.023274963721632957, ref_abs_avg=23.081878662109375, test_abs_avg=23.092769622802734
production_forward grad[57] vs paper_forward: mean_abs=0.6396002769470215, max_abs=4.5, mean_rel=0.15011456608772278, max_rel=709.1118774414062, norm_rel=0.02321173995733261, ref_abs_avg=27.556415557861328, test_abs_avg=27.557723999023438
production_forward grad[58] vs paper_forward: mean_abs=0.6289541721343994, max_abs=5.0, mean_rel=0.15150675177574158, max_rel=1236.2078857421875, norm_rel=0.02306615561246872, ref_abs_avg=27.356189727783203, test_abs_avg=27.367347717285156
production_forward grad[59] vs paper_forward: mean_abs=0.4938678741455078, max_abs=2.125, mean_rel=0.08215732872486115, max_rel=3.278632164001465, norm_rel=0.021660396829247475, ref_abs_avg=23.173511505126953, test_abs_avg=23.092369079589844
production_forward grad[60] vs paper_forward: mean_abs=0.5997335314750671, max_abs=4.125, mean_rel=0.15913355350494385, max_rel=1819.4443359375, norm_rel=0.02269861102104187, ref_abs_avg=26.45240020751953, test_abs_avg=26.454320907592773
production_forward grad[61] vs paper_forward: mean_abs=0.5883749127388, max_abs=4.0625, mean_rel=0.1397552788257599, max_rel=396.24578857421875, norm_rel=0.022327322512865067, ref_abs_avg=26.35116195678711, test_abs_avg=26.35294532775879
production_forward grad[62] vs paper_forward: mean_abs=0.45442843437194824, max_abs=2.125, mean_rel=0.07958345115184784, max_rel=2.8759729862213135, norm_rel=0.02204032614827156, ref_abs_avg=20.363101959228516, test_abs_avg=20.382413864135742
production_forward grad[63] vs paper_forward: mean_abs=0.5690367221832275, max_abs=4.0, mean_rel=0.15606257319450378, max_rel=1459.4136962890625, norm_rel=0.02247019112110138, ref_abs_avg=25.342960357666016, test_abs_avg=25.343379974365234
production_forward grad[64] vs paper_forward: mean_abs=0.5655851364135742, max_abs=4.0, mean_rel=0.14369943737983704, max_rel=747.1201171875, norm_rel=0.022082658484578133, ref_abs_avg=25.55504608154297, test_abs_avg=25.55727767944336
production_forward grad[65] vs paper_forward: mean_abs=0.4434695243835449, max_abs=1.84375, mean_rel=0.16312605142593384, max_rel=29.532039642333984, norm_rel=0.021041061729192734, ref_abs_avg=21.300487518310547, test_abs_avg=21.30437469482422
production_forward grad[66] vs paper_forward: mean_abs=0.5484657287597656, max_abs=4.5, mean_rel=0.15368413925170898, max_rel=1071.6209716796875, norm_rel=0.02196706458926201, ref_abs_avg=24.914077758789062, test_abs_avg=24.91594696044922
production_forward grad[67] vs paper_forward: mean_abs=0.5318648815155029, max_abs=3.5, mean_rel=0.14773204922676086, max_rel=584.9509887695312, norm_rel=0.02205795794725418, ref_abs_avg=24.119800567626953, test_abs_avg=24.11200714111328
production_forward grad[68] vs paper_forward: mean_abs=0.4376441240310669, max_abs=1.75, mean_rel=0.07442688941955566, max_rel=2.320185661315918, norm_rel=0.021732190623879433, ref_abs_avg=20.062740325927734, test_abs_avg=20.03525161743164
production_forward grad[69] vs paper_forward: mean_abs=0.5137614607810974, max_abs=3.94140625, mean_rel=0.14860394597053528, max_rel=1177.3258056640625, norm_rel=0.021630264818668365, ref_abs_avg=23.751163482666016, test_abs_avg=23.753421783447266
production_forward grad[70] vs paper_forward: mean_abs=0.4977809190750122, max_abs=3.5, mean_rel=0.13953380286693573, max_rel=906.0075073242188, norm_rel=0.021267607808113098, ref_abs_avg=23.361597061157227, test_abs_avg=23.36359214782715
production_forward grad[71] vs paper_forward: mean_abs=0.41892552375793457, max_abs=2.25, mean_rel=0.1049952507019043, max_rel=7.250253677368164, norm_rel=0.020433614030480385, ref_abs_avg=20.926864624023438, test_abs_avg=20.927532196044922
production_forward grad[72] vs paper_forward: mean_abs=0.490253746509552, max_abs=3.75, mean_rel=0.14139337837696075, max_rel=1056.9119873046875, norm_rel=0.021250776946544647, ref_abs_avg=23.03496742248535, test_abs_avg=23.036243438720703
production_forward grad[73] vs paper_forward: mean_abs=0.477897047996521, max_abs=3.625, mean_rel=0.13603432476520538, max_rel=825.4508056640625, norm_rel=0.021133629605174065, ref_abs_avg=22.597213745117188, test_abs_avg=22.601638793945312
production_forward grad[74] vs paper_forward: mean_abs=0.4458894729614258, max_abs=1.53125, mean_rel=0.10527218878269196, max_rel=7.920541286468506, norm_rel=0.023204734548926353, ref_abs_avg=19.857913970947266, test_abs_avg=19.87029457092285
production_forward grad[75] vs paper_forward: mean_abs=0.5441898107528687, max_abs=4.0, mean_rel=0.15115094184875488, max_rel=937.886962890625, norm_rel=0.022734016180038452, ref_abs_avg=23.892282485961914, test_abs_avg=23.890579223632812
production_forward grad[76] vs paper_forward: mean_abs=0.5275356769561768, max_abs=4.5, mean_rel=0.13347846269607544, max_rel=456.49884033203125, norm_rel=0.02233738638460636, ref_abs_avg=23.664505004882812, test_abs_avg=23.65782356262207
production_forward grad[77] vs paper_forward: mean_abs=0.398392915725708, max_abs=1.5859375, mean_rel=0.08171112090349197, max_rel=10.004514694213867, norm_rel=0.021586494520306587, ref_abs_avg=18.727855682373047, test_abs_avg=18.713655471801758
production_forward grad[78] vs paper_forward: mean_abs=0.4882645606994629, max_abs=4.0, mean_rel=0.1452716886997223, max_rel=773.107421875, norm_rel=0.022204305976629257, ref_abs_avg=21.990060806274414, test_abs_avg=21.98965072631836
production_forward grad[79] vs paper_forward: mean_abs=0.48078200221061707, max_abs=3.5, mean_rel=0.14115409553050995, max_rel=770.0765380859375, norm_rel=0.021863892674446106, ref_abs_avg=22.04568862915039, test_abs_avg=22.047351837158203
production_forward grad[80] vs paper_forward: mean_abs=0.3774373531341553, max_abs=1.90625, mean_rel=0.09625702351331711, max_rel=10.67839241027832, norm_rel=0.02237633429467678, ref_abs_avg=17.199726104736328, test_abs_avg=17.14969253540039
production_forward grad[81] vs paper_forward: mean_abs=0.4588032364845276, max_abs=3.796875, mean_rel=0.13863614201545715, max_rel=808.609619140625, norm_rel=0.021868575364351273, ref_abs_avg=20.983619689941406, test_abs_avg=20.981428146362305
production_forward grad[82] vs paper_forward: mean_abs=0.4548463821411133, max_abs=3.5, mean_rel=0.1396861970424652, max_rel=704.2865600585938, norm_rel=0.02151837758719921, ref_abs_avg=21.18218994140625, test_abs_avg=21.18318748474121
production_forward grad[83] vs paper_forward: mean_abs=0.3467979431152344, max_abs=1.875, mean_rel=0.10214969515800476, max_rel=9.106371879577637, norm_rel=0.02252870425581932, ref_abs_avg=15.49543571472168, test_abs_avg=15.528078079223633
production_forward grad[84] vs paper_forward: mean_abs=0.42705050110816956, max_abs=4.5, mean_rel=0.12953001260757446, max_rel=790.753662109375, norm_rel=0.02114664949476719, ref_abs_avg=20.234012603759766, test_abs_avg=20.233158111572266
production_forward grad[85] vs paper_forward: mean_abs=0.42285823822021484, max_abs=3.25, mean_rel=0.13517695665359497, max_rel=351.4343566894531, norm_rel=0.021245071664452553, ref_abs_avg=19.984878540039062, test_abs_avg=19.986295700073242
production_forward grad[86] vs paper_forward: mean_abs=0.3648960590362549, max_abs=1.671875, mean_rel=0.1176082193851471, max_rel=7.602685451507568, norm_rel=0.022036416456103325, ref_abs_avg=16.840824127197266, test_abs_avg=16.850421905517578
production_forward grad[87] vs paper_forward: mean_abs=0.4091130495071411, max_abs=4.0, mean_rel=0.1307191401720047, max_rel=1090.7044677734375, norm_rel=0.020622864365577698, ref_abs_avg=19.916149139404297, test_abs_avg=19.91645050048828
production_forward grad[88] vs paper_forward: mean_abs=0.3953750729560852, max_abs=3.8125, mean_rel=0.12977130711078644, max_rel=338.19525146484375, norm_rel=0.02076437510550022, ref_abs_avg=19.205013275146484, test_abs_avg=19.208171844482422
production_forward grad[89] vs paper_forward: mean_abs=0.31765127182006836, max_abs=1.25, mean_rel=0.05877804756164551, max_rel=1.6416510343551636, norm_rel=0.01985512115061283, ref_abs_avg=15.761298179626465, test_abs_avg=15.726049423217773
production_forward grad[90] vs paper_forward: mean_abs=0.3771781921386719, max_abs=4.0, mean_rel=0.117685467004776, max_rel=548.755615234375, norm_rel=0.020102038979530334, ref_abs_avg=18.889312744140625, test_abs_avg=18.889060974121094
production_forward grad[91] vs paper_forward: mean_abs=0.3721048831939697, max_abs=3.25, mean_rel=0.1270575225353241, max_rel=932.9203491210938, norm_rel=0.019983740523457527, ref_abs_avg=18.838165283203125, test_abs_avg=18.836856842041016
production_forward grad[92] vs paper_forward: mean_abs=0.320098876953125, max_abs=1.2265625, mean_rel=0.06688331067562103, max_rel=4.232787609100342, norm_rel=0.020227955654263496, ref_abs_avg=15.613529205322266, test_abs_avg=15.60006332397461
production_forward grad[93] vs paper_forward: mean_abs=0.3638719320297241, max_abs=4.25, mean_rel=0.12090642750263214, max_rel=572.56494140625, norm_rel=0.01982574723660946, ref_abs_avg=18.530500411987305, test_abs_avg=18.532520294189453
production_forward grad[94] vs paper_forward: mean_abs=0.3512691855430603, max_abs=3.3125, mean_rel=0.12201802432537079, max_rel=421.4920959472656, norm_rel=0.01948489435017109, ref_abs_avg=18.21851921081543, test_abs_avg=18.21973991394043
production_forward grad[95] vs paper_forward: mean_abs=0.2872910499572754, max_abs=1.0, mean_rel=0.05828572064638138, max_rel=2.756908655166626, norm_rel=0.01890357956290245, ref_abs_avg=15.376131057739258, test_abs_avg=15.360245704650879
production_forward grad[96] vs paper_forward: mean_abs=0.33910107612609863, max_abs=3.875, mean_rel=0.11738995462656021, max_rel=1355.049072265625, norm_rel=0.019280176609754562, ref_abs_avg=17.865320205688477, test_abs_avg=17.865535736083984
production_forward grad[97] vs paper_forward: mean_abs=0.3323446214199066, max_abs=3.125, mean_rel=0.1066681295633316, max_rel=280.3134460449219, norm_rel=0.018984947353601456, ref_abs_avg=17.766468048095703, test_abs_avg=17.774620056152344
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016605451237410307, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008640369400382042, max_abs=0.40625, mean_rel=0.07506420463323593, max_rel=143.47726440429688, norm_rel=0.020476413890719414, ref_abs_avg=0.45405352115631104, test_abs_avg=0.45406150817871094
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.348998546600342, max_abs=48.0, mean_rel=0.22750231623649597, max_rel=1188.473876953125, norm_rel=0.020369283854961395, ref_abs_avg=318.1999816894531, test_abs_avg=318.2078552246094
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.3086512088775635, max_abs=6.5, mean_rel=0.2917006015777588, max_rel=79.16001892089844, norm_rel=0.025957761332392693, ref_abs_avg=51.07407760620117, test_abs_avg=51.02222442626953
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.5415109395980835, max_abs=10.375, mean_rel=0.1750039905309677, max_rel=3606.382080078125, norm_rel=0.02384301647543907, ref_abs_avg=64.96184539794922, test_abs_avg=64.95926666259766
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5158162117004395, max_abs=10.0, mean_rel=0.16269272565841675, max_rel=1187.5328369140625, norm_rel=0.023759564384818077, ref_abs_avg=64.13980865478516, test_abs_avg=64.13525390625
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.1207175254821777, max_abs=4.875, mean_rel=0.1352660059928894, max_rel=9.353877067565918, norm_rel=0.023951886221766472, ref_abs_avg=45.53087615966797, test_abs_avg=45.537330627441406
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.3498848676681519, max_abs=8.0, mean_rel=0.17265823483467102, max_rel=1802.892578125, norm_rel=0.02355162799358368, ref_abs_avg=57.588905334472656, test_abs_avg=57.58413314819336
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3100820779800415, max_abs=7.6875, mean_rel=0.16291919350624084, max_rel=1254.7620849609375, norm_rel=0.023226147517561913, ref_abs_avg=56.718170166015625, test_abs_avg=56.71337127685547
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.9537090063095093, max_abs=4.0, mean_rel=0.08059719204902649, max_rel=2.697049856185913, norm_rel=0.021864868700504303, ref_abs_avg=44.03266143798828, test_abs_avg=44.037330627441406
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.2240920066833496, max_abs=7.140625, mean_rel=0.16239431500434875, max_rel=1185.236083984375, norm_rel=0.023386437445878983, ref_abs_avg=52.571250915527344, test_abs_avg=52.57255554199219
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.1864113807678223, max_abs=7.0, mean_rel=0.19147242605686188, max_rel=2784.0888671875, norm_rel=0.02307838760316372, ref_abs_avg=51.689125061035156, test_abs_avg=51.69110107421875
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.8904852867126465, max_abs=4.03125, mean_rel=0.09738779067993164, max_rel=14.422051429748535, norm_rel=0.02274475060403347, ref_abs_avg=39.56361770629883, test_abs_avg=39.61296463012695
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.1147973537445068, max_abs=6.75, mean_rel=0.1634277105331421, max_rel=1279.19482421875, norm_rel=0.023060763254761696, ref_abs_avg=48.63988494873047, test_abs_avg=48.64227294921875
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.088549017906189, max_abs=6.25, mean_rel=0.1551700085401535, max_rel=862.004638671875, norm_rel=0.0228589978069067, ref_abs_avg=47.87885665893555, test_abs_avg=47.871246337890625
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8486690521240234, max_abs=4.0, mean_rel=0.09087380766868591, max_rel=3.667465925216675, norm_rel=0.024453071877360344, ref_abs_avg=35.313270568847656, test_abs_avg=35.30641174316406
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.0413639545440674, max_abs=6.5, mean_rel=0.16893714666366577, max_rel=2586.96044921875, norm_rel=0.022916045039892197, ref_abs_avg=45.58818817138672, test_abs_avg=45.588130950927734
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0122915506362915, max_abs=6.125, mean_rel=0.16561535000801086, max_rel=2275.094482421875, norm_rel=0.022607238963246346, ref_abs_avg=45.0018196105957, test_abs_avg=45.00464630126953
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8115549087524414, max_abs=3.0, mean_rel=0.14098936319351196, max_rel=21.4429988861084, norm_rel=0.023930056020617485, ref_abs_avg=33.06625747680664, test_abs_avg=33.0983772277832
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=0.9817490577697754, max_abs=6.5, mean_rel=0.1632421910762787, max_rel=1829.530517578125, norm_rel=0.022844532504677773, ref_abs_avg=43.19220733642578, test_abs_avg=43.19373321533203
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9588474035263062, max_abs=6.5, mean_rel=0.14827106893062592, max_rel=813.4110717773438, norm_rel=0.02256341278553009, ref_abs_avg=42.74191665649414, test_abs_avg=42.74791717529297
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.8168342113494873, max_abs=3.75, mean_rel=0.11675762385129929, max_rel=12.704822540283203, norm_rel=0.025241272523999214, ref_abs_avg=32.17807388305664, test_abs_avg=32.17869186401367
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9320552349090576, max_abs=6.0, mean_rel=0.162584125995636, max_rel=1153.7447509765625, norm_rel=0.022578638046979904, ref_abs_avg=41.47300720214844, test_abs_avg=41.475547790527344
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9073745608329773, max_abs=5.5, mean_rel=0.1528444141149521, max_rel=1303.263916015625, norm_rel=0.02215905487537384, ref_abs_avg=41.12123107910156, test_abs_avg=41.120033264160156
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7406711578369141, max_abs=3.0, mean_rel=0.09573210775852203, max_rel=10.255927085876465, norm_rel=0.021961845457553864, ref_abs_avg=33.14992904663086, test_abs_avg=33.19647216796875
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.8864951133728027, max_abs=6.0, mean_rel=0.1542549580335617, max_rel=994.1309204101562, norm_rel=0.022444620728492737, ref_abs_avg=39.675743103027344, test_abs_avg=39.677894592285156
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8620786070823669, max_abs=5.25, mean_rel=0.15739448368549347, max_rel=2131.83837890625, norm_rel=0.02203608676791191, ref_abs_avg=39.348785400390625, test_abs_avg=39.35540008544922
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8903331756591797, max_abs=4.0, mean_rel=0.10832048952579498, max_rel=11.639452934265137, norm_rel=0.02394135110080242, ref_abs_avg=36.81463623046875, test_abs_avg=36.72972106933594
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.0248767137527466, max_abs=6.75, mean_rel=0.1708867847919464, max_rel=1314.71435546875, norm_rel=0.024042630568146706, ref_abs_avg=42.809532165527344, test_abs_avg=42.814414978027344
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0023064613342285, max_abs=6.5, mean_rel=0.1551302671432495, max_rel=579.2517700195312, norm_rel=0.023865027353167534, ref_abs_avg=42.20977020263672, test_abs_avg=42.2083740234375
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.8065967559814453, max_abs=3.75, mean_rel=0.08645720779895782, max_rel=7.667964935302734, norm_rel=0.024434972554445267, ref_abs_avg=33.19499206542969, test_abs_avg=33.20309829711914
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.9523254632949829, max_abs=6.78125, mean_rel=0.16988220810890198, max_rel=1238.3072509765625, norm_rel=0.02439194917678833, ref_abs_avg=39.16741943359375, test_abs_avg=39.16892623901367
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9316763877868652, max_abs=5.75, mean_rel=0.1757030487060547, max_rel=1628.781005859375, norm_rel=0.024412205442786217, ref_abs_avg=38.30652618408203, test_abs_avg=38.297523498535156
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7536087036132812, max_abs=3.0, mean_rel=0.13281741738319397, max_rel=13.054814338684082, norm_rel=0.025399288162589073, ref_abs_avg=29.283679962158203, test_abs_avg=29.36459732055664
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.8791964054107666, max_abs=5.75, mean_rel=0.16139072179794312, max_rel=2029.206787109375, norm_rel=0.024222321808338165, ref_abs_avg=36.39612579345703, test_abs_avg=36.396156311035156
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8632778525352478, max_abs=5.875, mean_rel=0.16662786900997162, max_rel=1295.5821533203125, norm_rel=0.02416367270052433, ref_abs_avg=35.89017105102539, test_abs_avg=35.891048431396484
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.7091519832611084, max_abs=3.0, mean_rel=0.19625715911388397, max_rel=29.511816024780273, norm_rel=0.02463592030107975, ref_abs_avg=28.878196716308594, test_abs_avg=28.893909454345703
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8333935737609863, max_abs=5.5, mean_rel=0.15378421545028687, max_rel=1451.35009765625, norm_rel=0.023997006937861443, ref_abs_avg=34.81859588623047, test_abs_avg=34.82045364379883
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8148396015167236, max_abs=5.25, mean_rel=0.16674678027629852, max_rel=1177.187255859375, norm_rel=0.023862972855567932, ref_abs_avg=34.192230224609375, test_abs_avg=34.195003509521484
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6601057052612305, max_abs=2.5625, mean_rel=0.10949939489364624, max_rel=8.242412567138672, norm_rel=0.023754164576530457, ref_abs_avg=27.091915130615234, test_abs_avg=27.06133270263672
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.7831794023513794, max_abs=5.4375, mean_rel=0.1547922044992447, max_rel=640.8402099609375, norm_rel=0.02372930943965912, ref_abs_avg=33.09113311767578, test_abs_avg=33.091068267822266
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.770987868309021, max_abs=5.25, mean_rel=0.1566353440284729, max_rel=1004.02783203125, norm_rel=0.023579098284244537, ref_abs_avg=32.751827239990234, test_abs_avg=32.75386047363281
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.619696855545044, max_abs=2.53125, mean_rel=0.10873974114656448, max_rel=12.556621551513672, norm_rel=0.023429743945598602, ref_abs_avg=25.949604034423828, test_abs_avg=25.928918838500977
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7449498772621155, max_abs=5.0, mean_rel=0.15453428030014038, max_rel=764.6697998046875, norm_rel=0.023405635729432106, ref_abs_avg=31.87745475769043, test_abs_avg=31.878280639648438
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7259587049484253, max_abs=5.0, mean_rel=0.13648675382137299, max_rel=361.77825927734375, norm_rel=0.023402659222483635, ref_abs_avg=31.05213165283203, test_abs_avg=31.05453872680664
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5939265489578247, max_abs=2.125, mean_rel=0.15780171751976013, max_rel=34.38154220581055, norm_rel=0.023405732586979866, ref_abs_avg=25.10919189453125, test_abs_avg=25.109987258911133
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.705150842666626, max_abs=4.734375, mean_rel=0.16258859634399414, max_rel=1034.8004150390625, norm_rel=0.023278765380382538, ref_abs_avg=30.30608367919922, test_abs_avg=30.303874969482422
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.6961995959281921, max_abs=5.125, mean_rel=0.15039700269699097, max_rel=1274.279541015625, norm_rel=0.023216553032398224, ref_abs_avg=30.038532257080078, test_abs_avg=30.039180755615234
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5605348348617554, max_abs=2.0, mean_rel=0.10450375080108643, max_rel=13.208537101745605, norm_rel=0.022904371842741966, ref_abs_avg=24.037067413330078, test_abs_avg=23.989805221557617
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.6751802563667297, max_abs=4.25, mean_rel=0.15872356295585632, max_rel=1374.1529541015625, norm_rel=0.022929228842258453, ref_abs_avg=29.430255889892578, test_abs_avg=29.431743621826172
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.659982442855835, max_abs=4.5, mean_rel=0.16049040853977203, max_rel=781.96533203125, norm_rel=0.023004960268735886, ref_abs_avg=28.779319763183594, test_abs_avg=28.780807495117188
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6500680446624756, max_abs=2.625, mean_rel=0.2154863178730011, max_rel=31.530576705932617, norm_rel=0.02524639293551445, ref_abs_avg=25.8507137298584, test_abs_avg=25.873048782348633
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.7657672762870789, max_abs=5.5, mean_rel=0.16076874732971191, max_rel=773.7776489257812, norm_rel=0.024378936737775803, ref_abs_avg=31.466035842895508, test_abs_avg=31.46875
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7478049993515015, max_abs=5.0, mean_rel=0.1721556931734085, max_rel=728.2138671875, norm_rel=0.024151748046278954, ref_abs_avg=31.051055908203125, test_abs_avg=31.060579299926758
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.6002254486083984, max_abs=2.25, mean_rel=0.0960252657532692, max_rel=4.72455358505249, norm_rel=0.02491571754217148, ref_abs_avg=23.976476669311523, test_abs_avg=23.948345184326172
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7053253054618835, max_abs=4.75, mean_rel=0.16173583269119263, max_rel=1972.3406982421875, norm_rel=0.024160368368029594, ref_abs_avg=29.24993896484375, test_abs_avg=29.250743865966797
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6922075748443604, max_abs=4.375, mean_rel=0.16038164496421814, max_rel=775.5889282226562, norm_rel=0.023944111540913582, ref_abs_avg=29.05459976196289, test_abs_avg=29.059337615966797
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5573539733886719, max_abs=2.125, mean_rel=0.11946210265159607, max_rel=11.683280944824219, norm_rel=0.024238666519522667, ref_abs_avg=23.081878662109375, test_abs_avg=23.115947723388672
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6499986052513123, max_abs=4.5, mean_rel=0.1511705368757248, max_rel=996.5972900390625, norm_rel=0.023580528795719147, ref_abs_avg=27.556415557861328, test_abs_avg=27.556110382080078
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6391294598579407, max_abs=5.0, mean_rel=0.15800437331199646, max_rel=1055.404541015625, norm_rel=0.023431336507201195, ref_abs_avg=27.356189727783203, test_abs_avg=27.365406036376953
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5052597522735596, max_abs=2.0, mean_rel=0.07823167741298676, max_rel=4.119747161865234, norm_rel=0.022230971604585648, ref_abs_avg=23.173511505126953, test_abs_avg=23.124588012695312
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6079587936401367, max_abs=4.25, mean_rel=0.162664994597435, max_rel=1677.7628173828125, norm_rel=0.02299647219479084, ref_abs_avg=26.45240020751953, test_abs_avg=26.45358657836914
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.6005064845085144, max_abs=3.75, mean_rel=0.141209676861763, max_rel=249.47784423828125, norm_rel=0.022786220535635948, ref_abs_avg=26.35116195678711, test_abs_avg=26.352062225341797
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.48228347301483154, max_abs=1.75, mean_rel=0.09164468944072723, max_rel=4.5101704597473145, norm_rel=0.02286553755402565, ref_abs_avg=20.363101959228516, test_abs_avg=20.37067985534668
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.5769134759902954, max_abs=4.5, mean_rel=0.1630432903766632, max_rel=1047.5655517578125, norm_rel=0.022788506001234055, ref_abs_avg=25.342960357666016, test_abs_avg=25.343061447143555
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.572912871837616, max_abs=4.0, mean_rel=0.1466306746006012, max_rel=777.6005249023438, norm_rel=0.022363953292369843, ref_abs_avg=25.55504608154297, test_abs_avg=25.555400848388672
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4586823284626007, max_abs=1.625, mean_rel=0.15315234661102295, max_rel=16.00356674194336, norm_rel=0.021335897967219353, ref_abs_avg=21.300487518310547, test_abs_avg=21.32154655456543
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.555275559425354, max_abs=4.375, mean_rel=0.15445873141288757, max_rel=1172.0687255859375, norm_rel=0.022235147655010223, ref_abs_avg=24.914077758789062, test_abs_avg=24.915098190307617
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5360482335090637, max_abs=3.5, mean_rel=0.15289141237735748, max_rel=871.44140625, norm_rel=0.02222956344485283, ref_abs_avg=24.119800567626953, test_abs_avg=24.112396240234375
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.4191157817840576, max_abs=1.75, mean_rel=0.06851182878017426, max_rel=3.1610186100006104, norm_rel=0.02114746905863285, ref_abs_avg=20.062740325927734, test_abs_avg=20.043441772460938
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5191798210144043, max_abs=3.75, mean_rel=0.15250040590763092, max_rel=1110.8997802734375, norm_rel=0.02185470424592495, ref_abs_avg=23.751163482666016, test_abs_avg=23.753326416015625
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5041990876197815, max_abs=4.0, mean_rel=0.14448878169059753, max_rel=1137.8677978515625, norm_rel=0.021517300978302956, ref_abs_avg=23.361597061157227, test_abs_avg=23.363271713256836
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.41885852813720703, max_abs=2.25, mean_rel=0.10645321011543274, max_rel=6.611767292022705, norm_rel=0.02071552537381649, ref_abs_avg=20.926864624023438, test_abs_avg=20.924585342407227
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.4953349530696869, max_abs=3.5, mean_rel=0.14113980531692505, max_rel=844.7758178710938, norm_rel=0.021472228690981865, ref_abs_avg=23.03496742248535, test_abs_avg=23.035675048828125
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.48230576515197754, max_abs=3.4375, mean_rel=0.14082112908363342, max_rel=883.47802734375, norm_rel=0.021306799724698067, ref_abs_avg=22.597213745117188, test_abs_avg=22.59884262084961
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.45821714401245117, max_abs=1.6875, mean_rel=0.1082509458065033, max_rel=8.430546760559082, norm_rel=0.02401508018374443, ref_abs_avg=19.857913970947266, test_abs_avg=19.85384178161621
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5508025884628296, max_abs=4.0, mean_rel=0.1520167887210846, max_rel=731.71630859375, norm_rel=0.023015109822154045, ref_abs_avg=23.892282485961914, test_abs_avg=23.89036750793457
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5343574285507202, max_abs=4.5, mean_rel=0.13385409116744995, max_rel=478.4145202636719, norm_rel=0.02261587232351303, ref_abs_avg=23.664505004882812, test_abs_avg=23.657203674316406
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.4158611297607422, max_abs=1.5703125, mean_rel=0.09927334636449814, max_rel=16.693702697753906, norm_rel=0.02295820042490959, ref_abs_avg=18.727855682373047, test_abs_avg=18.72063446044922
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.49457022547721863, max_abs=4.0, mean_rel=0.1449594348669052, max_rel=841.3467407226562, norm_rel=0.022493569180369377, ref_abs_avg=21.990060806274414, test_abs_avg=21.989986419677734
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.4840051531791687, max_abs=3.5, mean_rel=0.14475952088832855, max_rel=591.5744018554688, norm_rel=0.021977942436933517, ref_abs_avg=22.04568862915039, test_abs_avg=22.045604705810547
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.3802456855773926, max_abs=1.75, mean_rel=0.11507132649421692, max_rel=10.164459228515625, norm_rel=0.0223315991461277, ref_abs_avg=17.199726104736328, test_abs_avg=17.160598754882812
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.46309638023376465, max_abs=4.25, mean_rel=0.14077967405319214, max_rel=659.5811157226562, norm_rel=0.022078529000282288, ref_abs_avg=20.983619689941406, test_abs_avg=20.981281280517578
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.45981675386428833, max_abs=3.5, mean_rel=0.13947410881519318, max_rel=498.92755126953125, norm_rel=0.021760649979114532, ref_abs_avg=21.18218994140625, test_abs_avg=21.182445526123047
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.359133243560791, max_abs=1.75, mean_rel=0.1180504560470581, max_rel=12.817415237426758, norm_rel=0.02332288958132267, ref_abs_avg=15.49543571472168, test_abs_avg=15.512744903564453
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.43183884024620056, max_abs=4.25, mean_rel=0.13146528601646423, max_rel=644.6476440429688, norm_rel=0.02135518006980419, ref_abs_avg=20.234012603759766, test_abs_avg=20.233795166015625
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.42514997720718384, max_abs=3.5, mean_rel=0.13709944486618042, max_rel=372.4264221191406, norm_rel=0.021359749138355255, ref_abs_avg=19.984878540039062, test_abs_avg=19.980018615722656
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.36089086532592773, max_abs=1.609375, mean_rel=0.1206551343202591, max_rel=8.244470596313477, norm_rel=0.021691910922527313, ref_abs_avg=16.840824127197266, test_abs_avg=16.83531951904297
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4115428030490875, max_abs=4.5, mean_rel=0.1309741735458374, max_rel=594.0381469726562, norm_rel=0.02073594741523266, ref_abs_avg=19.916149139404297, test_abs_avg=19.91616439819336
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.39764508605003357, max_abs=4.3125, mean_rel=0.12766042351722717, max_rel=358.7763671875, norm_rel=0.020910171791911125, ref_abs_avg=19.205013275146484, test_abs_avg=19.208498001098633
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.32312068343162537, max_abs=1.25, mean_rel=0.06485193222761154, max_rel=3.2791876792907715, norm_rel=0.02013552561402321, ref_abs_avg=15.761298179626465, test_abs_avg=15.735383987426758
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.37920230627059937, max_abs=4.0, mean_rel=0.12085451930761337, max_rel=875.1644287109375, norm_rel=0.02019607461988926, ref_abs_avg=18.889312744140625, test_abs_avg=18.889362335205078
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.3741839826107025, max_abs=3.0, mean_rel=0.12834098935127258, max_rel=735.1317749023438, norm_rel=0.020072931423783302, ref_abs_avg=18.838165283203125, test_abs_avg=18.840091705322266
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.31896281242370605, max_abs=1.21875, mean_rel=0.0581996850669384, max_rel=2.8270647525787354, norm_rel=0.020636802539229393, ref_abs_avg=15.613529205322266, test_abs_avg=15.598017692565918
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.36558422446250916, max_abs=4.125, mean_rel=0.12375643849372864, max_rel=629.20166015625, norm_rel=0.019920382648706436, ref_abs_avg=18.530500411987305, test_abs_avg=18.53208351135254
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3514863848686218, max_abs=3.5, mean_rel=0.12303051352500916, max_rel=525.663330078125, norm_rel=0.019573984667658806, ref_abs_avg=18.21851921081543, test_abs_avg=18.22063446044922
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.2890653610229492, max_abs=1.1875, mean_rel=0.07239298522472382, max_rel=6.755123615264893, norm_rel=0.018935848027467728, ref_abs_avg=15.376131057739258, test_abs_avg=15.347272872924805
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3397558331489563, max_abs=3.5, mean_rel=0.1186373233795166, max_rel=992.755859375, norm_rel=0.019306831061840057, ref_abs_avg=17.865320205688477, test_abs_avg=17.865123748779297
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.33324581384658813, max_abs=3.5, mean_rel=0.1085646003484726, max_rel=270.2839660644531, norm_rel=0.01903490349650383, ref_abs_avg=17.766468048095703, test_abs_avg=17.776248931884766
identity layers + randn queries
production_forward fwd+bwd:  66.271 ms
production_forward bwd-only: 56.460 ms
production_forward peak allocated: fwd=7.614 GiB, fwd+bwd=15.618 GiB
production_forward peak reserved:  fwd=27.377 GiB, fwd+bwd=27.377 GiB
paper_forward fwd+bwd:  221.185 ms
paper_forward bwd-only: 174.012 ms
paper_forward peak allocated: fwd=35.128 GiB, fwd+bwd=37.247 GiB
paper_forward peak reserved:  fwd=36.170 GiB, fwd+bwd=38.670 GiB
torch_compile_phases_forward fwd+bwd:  94.930 ms
torch_compile_phases_forward bwd-only: 76.555 ms
torch_compile_phases_forward peak allocated: fwd=18.203 GiB, fwd+bwd=18.831 GiB
torch_compile_phases_forward peak reserved:  fwd=27.398 GiB, fwd+bwd=27.398 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016243315767496824, max_abs=0.052734375
production_forward grad[0] vs paper_forward: mean_abs=0.008312549442052841, max_abs=0.53125, mean_rel=0.07184763252735138, max_rel=95.44132232666016, norm_rel=0.019722798839211464, ref_abs_avg=0.45900365710258484, test_abs_avg=0.459018349647522
production_forward grad[1] vs paper_forward: mean_abs=7.250375747680664, max_abs=64.0, mean_rel=0.12480230629444122, max_rel=157.69081115722656, norm_rel=0.02049696445465088, ref_abs_avg=318.0146789550781, test_abs_avg=318.10595703125
production_forward grad[2] vs paper_forward: mean_abs=1.265977382659912, max_abs=5.0, mean_rel=0.32159683108329773, max_rel=121.9129409790039, norm_rel=0.024635883048176765, ref_abs_avg=50.5378532409668, test_abs_avg=50.50849151611328
production_forward grad[3] vs paper_forward: mean_abs=1.5130271911621094, max_abs=9.5, mean_rel=0.17200538516044617, max_rel=2116.48828125, norm_rel=0.022806430235505104, ref_abs_avg=66.7059326171875, test_abs_avg=66.71083068847656
production_forward grad[4] vs paper_forward: mean_abs=1.4754513502120972, max_abs=8.75, mean_rel=0.16360554099082947, max_rel=1160.4578857421875, norm_rel=0.02261250838637352, ref_abs_avg=65.58335876464844, test_abs_avg=65.58541870117188
production_forward grad[5] vs paper_forward: mean_abs=1.120647668838501, max_abs=5.0, mean_rel=0.27613112330436707, max_rel=77.56362915039062, norm_rel=0.022082995623350143, ref_abs_avg=52.26374435424805, test_abs_avg=52.24694061279297
production_forward grad[6] vs paper_forward: mean_abs=1.3244507312774658, max_abs=9.375, mean_rel=0.15385091304779053, max_rel=2109.69921875, norm_rel=0.02247997745871544, ref_abs_avg=59.270904541015625, test_abs_avg=59.27217102050781
production_forward grad[7] vs paper_forward: mean_abs=1.2982391119003296, max_abs=8.5, mean_rel=0.16377007961273193, max_rel=2441.294921875, norm_rel=0.02229774184525013, ref_abs_avg=58.55952453613281, test_abs_avg=58.57156753540039
production_forward grad[8] vs paper_forward: mean_abs=0.952552318572998, max_abs=4.0, mean_rel=0.145786315202713, max_rel=25.78120994567871, norm_rel=0.02171470783650875, ref_abs_avg=44.80848693847656, test_abs_avg=44.794952392578125
production_forward grad[9] vs paper_forward: mean_abs=1.193511962890625, max_abs=8.0, mean_rel=0.1540602147579193, max_rel=977.9376220703125, norm_rel=0.02236875705420971, ref_abs_avg=53.6392707824707, test_abs_avg=53.64189910888672
production_forward grad[10] vs paper_forward: mean_abs=1.1700150966644287, max_abs=7.0, mean_rel=0.14274659752845764, max_rel=1064.44482421875, norm_rel=0.022096281871199608, ref_abs_avg=53.31256866455078, test_abs_avg=53.31145477294922
production_forward grad[11] vs paper_forward: mean_abs=0.9482724070549011, max_abs=3.75, mean_rel=0.08960539102554321, max_rel=5.931550025939941, norm_rel=0.023967113345861435, ref_abs_avg=40.805442810058594, test_abs_avg=40.863983154296875
production_forward grad[12] vs paper_forward: mean_abs=1.1082613468170166, max_abs=8.0, mean_rel=0.16165512800216675, max_rel=1522.3646240234375, norm_rel=0.022156979888677597, ref_abs_avg=50.227272033691406, test_abs_avg=50.230438232421875
production_forward grad[13] vs paper_forward: mean_abs=1.0715616941452026, max_abs=6.5, mean_rel=0.15534275770187378, max_rel=2364.01953125, norm_rel=0.021806109696626663, ref_abs_avg=49.3893928527832, test_abs_avg=49.391815185546875
production_forward grad[14] vs paper_forward: mean_abs=0.8498353958129883, max_abs=3.5, mean_rel=0.09447656571865082, max_rel=13.248061180114746, norm_rel=0.023108161985874176, ref_abs_avg=38.528953552246094, test_abs_avg=38.525238037109375
production_forward grad[15] vs paper_forward: mean_abs=1.0231332778930664, max_abs=7.0, mean_rel=0.15819987654685974, max_rel=1440.0948486328125, norm_rel=0.022036191076040268, ref_abs_avg=46.68292999267578, test_abs_avg=46.68593215942383
production_forward grad[16] vs paper_forward: mean_abs=1.0030977725982666, max_abs=6.5, mean_rel=0.1516636312007904, max_rel=1228.659912109375, norm_rel=0.021603604778647423, ref_abs_avg=46.676910400390625, test_abs_avg=46.67105484008789
production_forward grad[17] vs paper_forward: mean_abs=0.7768516540527344, max_abs=3.5, mean_rel=0.07975426316261292, max_rel=4.124048233032227, norm_rel=0.022584039717912674, ref_abs_avg=34.83184051513672, test_abs_avg=34.829490661621094
production_forward grad[18] vs paper_forward: mean_abs=0.9679564237594604, max_abs=6.375, mean_rel=0.15741610527038574, max_rel=2137.565673828125, norm_rel=0.021946702152490616, ref_abs_avg=44.33695983886719, test_abs_avg=44.341278076171875
production_forward grad[19] vs paper_forward: mean_abs=0.9426416158676147, max_abs=5.8125, mean_rel=0.15255597233772278, max_rel=1103.6827392578125, norm_rel=0.02172044664621353, ref_abs_avg=43.645320892333984, test_abs_avg=43.65253448486328
production_forward grad[20] vs paper_forward: mean_abs=0.778107762336731, max_abs=3.25, mean_rel=0.16052231192588806, max_rel=34.0057487487793, norm_rel=0.023124217987060547, ref_abs_avg=33.687049865722656, test_abs_avg=33.60819625854492
production_forward grad[21] vs paper_forward: mean_abs=0.9197556376457214, max_abs=6.0625, mean_rel=0.15680846571922302, max_rel=1521.8878173828125, norm_rel=0.02180413156747818, ref_abs_avg=42.35471725463867, test_abs_avg=42.357574462890625
production_forward grad[22] vs paper_forward: mean_abs=0.899166464805603, max_abs=6.25, mean_rel=0.15273916721343994, max_rel=1362.3687744140625, norm_rel=0.02166210301220417, ref_abs_avg=41.76141357421875, test_abs_avg=41.7681999206543
production_forward grad[23] vs paper_forward: mean_abs=0.7442314624786377, max_abs=3.330078125, mean_rel=0.06733528524637222, max_rel=3.6434361934661865, norm_rel=0.023017624393105507, ref_abs_avg=33.2318115234375, test_abs_avg=33.241172790527344
production_forward grad[24] vs paper_forward: mean_abs=0.8746582269668579, max_abs=5.6875, mean_rel=0.15639518201351166, max_rel=1274.93017578125, norm_rel=0.021638493984937668, ref_abs_avg=40.593971252441406, test_abs_avg=40.59626770019531
production_forward grad[25] vs paper_forward: mean_abs=0.8550758361816406, max_abs=5.0625, mean_rel=0.14515367150306702, max_rel=900.0879516601562, norm_rel=0.021233728155493736, ref_abs_avg=40.462196350097656, test_abs_avg=40.466190338134766
production_forward grad[26] vs paper_forward: mean_abs=0.8048402070999146, max_abs=3.2109375, mean_rel=0.10089252889156342, max_rel=5.362276077270508, norm_rel=0.022130344063043594, ref_abs_avg=36.414100646972656, test_abs_avg=36.459442138671875
production_forward grad[27] vs paper_forward: mean_abs=1.0198514461517334, max_abs=7.0, mean_rel=0.16682207584381104, max_rel=1701.8153076171875, norm_rel=0.023667512461543083, ref_abs_avg=43.21271896362305, test_abs_avg=43.21535110473633
production_forward grad[28] vs paper_forward: mean_abs=0.9848501086235046, max_abs=6.5, mean_rel=0.18940426409244537, max_rel=1918.959228515625, norm_rel=0.023252476006746292, ref_abs_avg=42.55870056152344, test_abs_avg=42.56544494628906
production_forward grad[29] vs paper_forward: mean_abs=0.8084883689880371, max_abs=3.1875, mean_rel=0.11609440296888351, max_rel=9.10708999633789, norm_rel=0.026259135454893112, ref_abs_avg=30.006193161010742, test_abs_avg=29.990798950195312
production_forward grad[30] vs paper_forward: mean_abs=0.9335209131240845, max_abs=6.5, mean_rel=0.15914562344551086, max_rel=841.5496215820312, norm_rel=0.023936927318572998, ref_abs_avg=39.192100524902344, test_abs_avg=39.19363784790039
production_forward grad[31] vs paper_forward: mean_abs=0.9246002435684204, max_abs=5.25, mean_rel=0.1690729260444641, max_rel=994.0604248046875, norm_rel=0.0240170881152153, ref_abs_avg=38.66619873046875, test_abs_avg=38.6649055480957
production_forward grad[32] vs paper_forward: mean_abs=0.6901950836181641, max_abs=2.5, mean_rel=0.11421804875135422, max_rel=7.978723049163818, norm_rel=0.02242213860154152, ref_abs_avg=30.285581588745117, test_abs_avg=30.242015838623047
production_forward grad[33] vs paper_forward: mean_abs=0.8722044229507446, max_abs=5.5, mean_rel=0.1633419543504715, max_rel=1705.2015380859375, norm_rel=0.023808445781469345, ref_abs_avg=36.698577880859375, test_abs_avg=36.698944091796875
production_forward grad[34] vs paper_forward: mean_abs=0.8562722206115723, max_abs=5.25, mean_rel=0.17459462583065033, max_rel=888.6375122070312, norm_rel=0.023754026740789413, ref_abs_avg=36.18952178955078, test_abs_avg=36.19391632080078
production_forward grad[35] vs paper_forward: mean_abs=0.6592252254486084, max_abs=2.5, mean_rel=0.13954229652881622, max_rel=28.47772979736328, norm_rel=0.024127637967467308, ref_abs_avg=27.798349380493164, test_abs_avg=27.883014678955078
production_forward grad[36] vs paper_forward: mean_abs=0.8179728984832764, max_abs=6.0, mean_rel=0.162607803940773, max_rel=1688.5374755859375, norm_rel=0.023585287854075432, ref_abs_avg=34.786766052246094, test_abs_avg=34.7873420715332
production_forward grad[37] vs paper_forward: mean_abs=0.8010750412940979, max_abs=4.75, mean_rel=0.15476275980472565, max_rel=638.3181762695312, norm_rel=0.02327203005552292, ref_abs_avg=34.53207778930664, test_abs_avg=34.52839279174805
production_forward grad[38] vs paper_forward: mean_abs=0.6243810653686523, max_abs=2.5, mean_rel=0.14458292722702026, max_rel=19.422636032104492, norm_rel=0.023272013291716576, ref_abs_avg=26.904415130615234, test_abs_avg=26.93291473388672
production_forward grad[39] vs paper_forward: mean_abs=0.7710553407669067, max_abs=5.0625, mean_rel=0.16658830642700195, max_rel=1558.8140869140625, norm_rel=0.02326444908976555, ref_abs_avg=33.23567199707031, test_abs_avg=33.23810958862305
production_forward grad[40] vs paper_forward: mean_abs=0.7517980337142944, max_abs=5.0, mean_rel=0.14209669828414917, max_rel=886.8693237304688, norm_rel=0.023059478029608727, ref_abs_avg=32.70289611816406, test_abs_avg=32.70353698730469
production_forward grad[41] vs paper_forward: mean_abs=0.5940923690795898, max_abs=2.25, mean_rel=0.13190621137619019, max_rel=28.690187454223633, norm_rel=0.023374982178211212, ref_abs_avg=25.45154571533203, test_abs_avg=25.448959350585938
production_forward grad[42] vs paper_forward: mean_abs=0.7330238819122314, max_abs=4.9140625, mean_rel=0.15798704326152802, max_rel=927.4013671875, norm_rel=0.023120615631341934, ref_abs_avg=31.7033748626709, test_abs_avg=31.706012725830078
production_forward grad[43] vs paper_forward: mean_abs=0.7202659845352173, max_abs=4.875, mean_rel=0.15066485106945038, max_rel=1198.304931640625, norm_rel=0.02298274077475071, ref_abs_avg=31.43498992919922, test_abs_avg=31.44170379638672
production_forward grad[44] vs paper_forward: mean_abs=0.5994758605957031, max_abs=2.1875, mean_rel=0.10522003471851349, max_rel=9.556793212890625, norm_rel=0.02394292689859867, ref_abs_avg=25.020374298095703, test_abs_avg=25.00222396850586
production_forward grad[45] vs paper_forward: mean_abs=0.7018249034881592, max_abs=5.0, mean_rel=0.16090816259384155, max_rel=1031.6702880859375, norm_rel=0.02304493449628353, ref_abs_avg=30.49014663696289, test_abs_avg=30.49094581604004
production_forward grad[46] vs paper_forward: mean_abs=0.689923882484436, max_abs=4.25, mean_rel=0.15837833285331726, max_rel=1011.2694091796875, norm_rel=0.02286975085735321, ref_abs_avg=30.272903442382812, test_abs_avg=30.27068328857422
production_forward grad[47] vs paper_forward: mean_abs=0.5385677218437195, max_abs=2.25, mean_rel=0.12930519878864288, max_rel=31.507537841796875, norm_rel=0.023249492049217224, ref_abs_avg=24.252643585205078, test_abs_avg=24.262983322143555
production_forward grad[48] vs paper_forward: mean_abs=0.671581506729126, max_abs=4.375, mean_rel=0.15522506833076477, max_rel=1382.6033935546875, norm_rel=0.022885112091898918, ref_abs_avg=29.400882720947266, test_abs_avg=29.401071548461914
production_forward grad[49] vs paper_forward: mean_abs=0.659932553768158, max_abs=4.25, mean_rel=0.16864925622940063, max_rel=1395.285400390625, norm_rel=0.022492894902825356, ref_abs_avg=29.442171096801758, test_abs_avg=29.44540023803711
production_forward grad[50] vs paper_forward: mean_abs=0.6201133728027344, max_abs=2.4375, mean_rel=0.21856707334518433, max_rel=47.21553039550781, norm_rel=0.02431366592645645, ref_abs_avg=25.616474151611328, test_abs_avg=25.586654663085938
production_forward grad[51] vs paper_forward: mean_abs=0.7520214319229126, max_abs=5.0, mean_rel=0.16267633438110352, max_rel=1581.79248046875, norm_rel=0.024418655782938004, ref_abs_avg=30.849048614501953, test_abs_avg=30.850543975830078
production_forward grad[52] vs paper_forward: mean_abs=0.7360180616378784, max_abs=5.0, mean_rel=0.17116719484329224, max_rel=1300.2987060546875, norm_rel=0.02440405637025833, ref_abs_avg=30.23198699951172, test_abs_avg=30.228757858276367
production_forward grad[53] vs paper_forward: mean_abs=0.5747187733650208, max_abs=2.375, mean_rel=0.10386218875646591, max_rel=8.07938289642334, norm_rel=0.02372632548213005, ref_abs_avg=23.53427505493164, test_abs_avg=23.52526092529297
production_forward grad[54] vs paper_forward: mean_abs=0.6937652826309204, max_abs=5.375, mean_rel=0.17348258197307587, max_rel=1011.01611328125, norm_rel=0.024116825312376022, ref_abs_avg=28.810100555419922, test_abs_avg=28.81058120727539
production_forward grad[55] vs paper_forward: mean_abs=0.683586061000824, max_abs=4.3828125, mean_rel=0.16090431809425354, max_rel=762.6036987304688, norm_rel=0.023977695032954216, ref_abs_avg=28.537097930908203, test_abs_avg=28.541311264038086
production_forward grad[56] vs paper_forward: mean_abs=0.5661563873291016, max_abs=2.125, mean_rel=0.0882938802242279, max_rel=3.9696784019470215, norm_rel=0.025409452617168427, ref_abs_avg=22.250431060791016, test_abs_avg=22.225688934326172
production_forward grad[57] vs paper_forward: mean_abs=0.6436418294906616, max_abs=4.5, mean_rel=0.15974262356758118, max_rel=922.8728637695312, norm_rel=0.023651493713259697, ref_abs_avg=27.212926864624023, test_abs_avg=27.215179443359375
production_forward grad[58] vs paper_forward: mean_abs=0.6322170495986938, max_abs=4.125, mean_rel=0.15720078349113464, max_rel=576.1076049804688, norm_rel=0.023649152368307114, ref_abs_avg=26.801654815673828, test_abs_avg=26.79607391357422
production_forward grad[59] vs paper_forward: mean_abs=0.49230504035949707, max_abs=1.875, mean_rel=0.09476194530725479, max_rel=6.780533790588379, norm_rel=0.022984270006418228, ref_abs_avg=21.870973587036133, test_abs_avg=21.878131866455078
production_forward grad[60] vs paper_forward: mean_abs=0.5998861789703369, max_abs=4.015625, mean_rel=0.14437253773212433, max_rel=529.2903442382812, norm_rel=0.023032091557979584, ref_abs_avg=26.038732528686523, test_abs_avg=26.03843879699707
production_forward grad[61] vs paper_forward: mean_abs=0.5878557562828064, max_abs=4.0, mean_rel=0.1546320617198944, max_rel=887.6880493164062, norm_rel=0.02300814725458622, ref_abs_avg=25.580764770507812, test_abs_avg=25.58209991455078
production_forward grad[62] vs paper_forward: mean_abs=0.4691673517227173, max_abs=2.09375, mean_rel=0.12265796959400177, max_rel=10.438304901123047, norm_rel=0.022788893431425095, ref_abs_avg=20.00861930847168, test_abs_avg=20.0374698638916
production_forward grad[63] vs paper_forward: mean_abs=0.5634803771972656, max_abs=3.6796875, mean_rel=0.15858931839466095, max_rel=699.886474609375, norm_rel=0.02274515852332115, ref_abs_avg=24.77130126953125, test_abs_avg=24.77294158935547
production_forward grad[64] vs paper_forward: mean_abs=0.5516859889030457, max_abs=4.0, mean_rel=0.15988250076770782, max_rel=918.1895751953125, norm_rel=0.022721342742443085, ref_abs_avg=24.348068237304688, test_abs_avg=24.347591400146484
production_forward grad[65] vs paper_forward: mean_abs=0.447863906621933, max_abs=1.6875, mean_rel=0.36918339133262634, max_rel=133.54867553710938, norm_rel=0.022736823186278343, ref_abs_avg=19.664630889892578, test_abs_avg=19.65656852722168
production_forward grad[66] vs paper_forward: mean_abs=0.5332703590393066, max_abs=4.5, mean_rel=0.15220141410827637, max_rel=789.9896850585938, norm_rel=0.022510964423418045, ref_abs_avg=23.648590087890625, test_abs_avg=23.649930953979492
production_forward grad[67] vs paper_forward: mean_abs=0.5248018503189087, max_abs=3.5, mean_rel=0.13256308436393738, max_rel=379.7287902832031, norm_rel=0.021980585530400276, ref_abs_avg=23.877647399902344, test_abs_avg=23.876060485839844
production_forward grad[68] vs paper_forward: mean_abs=0.4327845573425293, max_abs=1.8125, mean_rel=0.0752091109752655, max_rel=2.739074230194092, norm_rel=0.02210666425526142, ref_abs_avg=19.68343162536621, test_abs_avg=19.692710876464844
production_forward grad[69] vs paper_forward: mean_abs=0.5091625452041626, max_abs=4.0, mean_rel=0.1504591554403305, max_rel=908.1450805664062, norm_rel=0.021705416962504387, ref_abs_avg=23.347707748413086, test_abs_avg=23.349449157714844
production_forward grad[70] vs paper_forward: mean_abs=0.4960731863975525, max_abs=3.46875, mean_rel=0.13534335792064667, max_rel=726.0724487304688, norm_rel=0.02171836979687214, ref_abs_avg=22.857425689697266, test_abs_avg=22.86824607849121
production_forward grad[71] vs paper_forward: mean_abs=0.4252939224243164, max_abs=1.625, mean_rel=0.07057367265224457, max_rel=3.536388874053955, norm_rel=0.020965145900845528, ref_abs_avg=20.226058959960938, test_abs_avg=20.238496780395508
production_forward grad[72] vs paper_forward: mean_abs=0.49151575565338135, max_abs=4.0, mean_rel=0.14526939392089844, max_rel=1054.0843505859375, norm_rel=0.021495014429092407, ref_abs_avg=22.793434143066406, test_abs_avg=22.79418182373047
production_forward grad[73] vs paper_forward: mean_abs=0.48094531893730164, max_abs=3.396484375, mean_rel=0.1430395543575287, max_rel=458.211669921875, norm_rel=0.021770941093564034, ref_abs_avg=22.113439559936523, test_abs_avg=22.10977554321289
production_forward grad[74] vs paper_forward: mean_abs=0.4433675706386566, max_abs=1.796875, mean_rel=1.310410976409912, max_rel=629.1813354492188, norm_rel=0.0246164221316576, ref_abs_avg=18.23542022705078, test_abs_avg=18.250640869140625
production_forward grad[75] vs paper_forward: mean_abs=0.5587825179100037, max_abs=4.25, mean_rel=0.15605276823043823, max_rel=1280.10009765625, norm_rel=0.02338387630879879, ref_abs_avg=23.87489128112793, test_abs_avg=23.875450134277344
production_forward grad[76] vs paper_forward: mean_abs=0.5448854565620422, max_abs=3.75, mean_rel=0.1435476392507553, max_rel=654.0987548828125, norm_rel=0.023189324885606766, ref_abs_avg=23.542142868041992, test_abs_avg=23.547401428222656
production_forward grad[77] vs paper_forward: mean_abs=0.45293712615966797, max_abs=1.75, mean_rel=0.1316843330860138, max_rel=7.30240535736084, norm_rel=0.023309024050831795, ref_abs_avg=19.438207626342773, test_abs_avg=19.421306610107422
production_forward grad[78] vs paper_forward: mean_abs=0.5114079713821411, max_abs=4.0, mean_rel=0.1569032371044159, max_rel=1039.0010986328125, norm_rel=0.023081060498952866, ref_abs_avg=22.151357650756836, test_abs_avg=22.151111602783203
production_forward grad[79] vs paper_forward: mean_abs=0.5015344619750977, max_abs=4.375, mean_rel=0.14182499051094055, max_rel=829.3743896484375, norm_rel=0.023084748536348343, ref_abs_avg=21.868728637695312, test_abs_avg=21.862821578979492
production_forward grad[80] vs paper_forward: mean_abs=0.3670746088027954, max_abs=1.75, mean_rel=0.20154035091400146, max_rel=44.60352325439453, norm_rel=0.020081276074051857, ref_abs_avg=17.983272552490234, test_abs_avg=17.977033615112305
production_forward grad[81] vs paper_forward: mean_abs=0.4685741066932678, max_abs=3.75, mean_rel=0.1494239866733551, max_rel=774.5162353515625, norm_rel=0.02234972082078457, ref_abs_avg=20.966819763183594, test_abs_avg=20.967182159423828
production_forward grad[82] vs paper_forward: mean_abs=0.4576917886734009, max_abs=3.5, mean_rel=0.1536957025527954, max_rel=733.5191650390625, norm_rel=0.021974192932248116, ref_abs_avg=20.879932403564453, test_abs_avg=20.87976837158203
production_forward grad[83] vs paper_forward: mean_abs=0.3572044372558594, max_abs=1.25, mean_rel=0.10753217339515686, max_rel=17.346588134765625, norm_rel=0.021048204973340034, ref_abs_avg=16.753599166870117, test_abs_avg=16.749454498291016
production_forward grad[84] vs paper_forward: mean_abs=0.4336031675338745, max_abs=3.578125, mean_rel=0.13334158062934875, max_rel=878.9577026367188, norm_rel=0.02153753489255905, ref_abs_avg=20.143901824951172, test_abs_avg=20.1450252532959
production_forward grad[85] vs paper_forward: mean_abs=0.42108678817749023, max_abs=3.625, mean_rel=0.1314578652381897, max_rel=323.7043762207031, norm_rel=0.021265294402837753, ref_abs_avg=19.763866424560547, test_abs_avg=19.769256591796875
production_forward grad[86] vs paper_forward: mean_abs=0.32230842113494873, max_abs=1.25, mean_rel=0.1464235484600067, max_rel=33.845184326171875, norm_rel=0.01960928551852703, ref_abs_avg=16.418872833251953, test_abs_avg=16.397537231445312
production_forward grad[87] vs paper_forward: mean_abs=0.40456241369247437, max_abs=4.0, mean_rel=0.13652341067790985, max_rel=679.8616333007812, norm_rel=0.02082633599638939, ref_abs_avg=19.52473258972168, test_abs_avg=19.52605628967285
production_forward grad[88] vs paper_forward: mean_abs=0.3928924798965454, max_abs=3.0, mean_rel=0.13281327486038208, max_rel=1032.6102294921875, norm_rel=0.020706461742520332, ref_abs_avg=19.091655731201172, test_abs_avg=19.090408325195312
production_forward grad[89] vs paper_forward: mean_abs=0.3197695016860962, max_abs=1.4375, mean_rel=0.0829627737402916, max_rel=2.833897113800049, norm_rel=0.021198004484176636, ref_abs_avg=15.448554992675781, test_abs_avg=15.419412612915039
production_forward grad[90] vs paper_forward: mean_abs=0.37141114473342896, max_abs=3.90625, mean_rel=0.12730109691619873, max_rel=1108.239013671875, norm_rel=0.020475519821047783, ref_abs_avg=18.276779174804688, test_abs_avg=18.27645492553711
production_forward grad[91] vs paper_forward: mean_abs=0.363680362701416, max_abs=3.46875, mean_rel=0.12803950905799866, max_rel=660.7545166015625, norm_rel=0.019662657752633095, ref_abs_avg=18.602201461791992, test_abs_avg=18.60840606689453
production_forward grad[92] vs paper_forward: mean_abs=0.3194150924682617, max_abs=1.375, mean_rel=0.08471788465976715, max_rel=7.7846856117248535, norm_rel=0.020283710211515427, ref_abs_avg=16.192577362060547, test_abs_avg=16.198945999145508
production_forward grad[93] vs paper_forward: mean_abs=0.35037410259246826, max_abs=3.625, mean_rel=0.11673387140035629, max_rel=486.7558288574219, norm_rel=0.01954703778028488, ref_abs_avg=18.11627197265625, test_abs_avg=18.117273330688477
production_forward grad[94] vs paper_forward: mean_abs=0.34074172377586365, max_abs=3.0, mean_rel=0.12115390598773956, max_rel=497.25616455078125, norm_rel=0.019083915278315544, ref_abs_avg=18.014453887939453, test_abs_avg=18.01584243774414
production_forward grad[95] vs paper_forward: mean_abs=0.2684306502342224, max_abs=1.1875, mean_rel=0.1283511370420456, max_rel=31.191600799560547, norm_rel=0.018454082310199738, ref_abs_avg=14.57192325592041, test_abs_avg=14.60360050201416
production_forward grad[96] vs paper_forward: mean_abs=0.3294178247451782, max_abs=3.625, mean_rel=0.12198831140995026, max_rel=606.2544555664062, norm_rel=0.019248485565185547, ref_abs_avg=17.365985870361328, test_abs_avg=17.367334365844727
production_forward grad[97] vs paper_forward: mean_abs=0.32637667655944824, max_abs=2.9375, mean_rel=0.12214578688144684, max_rel=505.4992980957031, norm_rel=0.01922650821506977, ref_abs_avg=17.260604858398438, test_abs_avg=17.259634017944336
torch_compile_phases_forward vs paper_forward output: mean_abs=0.001628721714951098, max_abs=0.052734375
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008659516461193562, max_abs=0.4375, mean_rel=0.07449904084205627, max_rel=86.45276641845703, norm_rel=0.020413076505064964, ref_abs_avg=0.45900365710258484, test_abs_avg=0.45900464057922363
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.380950927734375, max_abs=62.0, mean_rel=0.11823297291994095, max_rel=91.314208984375, norm_rel=0.020881690084934235, ref_abs_avg=318.0146789550781, test_abs_avg=318.0987243652344
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.3049311637878418, max_abs=5.0, mean_rel=0.2408076971769333, max_rel=80.40624237060547, norm_rel=0.025526555255055428, ref_abs_avg=50.5378532409668, test_abs_avg=50.562374114990234
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.5654791593551636, max_abs=10.0, mean_rel=0.17058390378952026, max_rel=1627.384521484375, norm_rel=0.02358577400445938, ref_abs_avg=66.7059326171875, test_abs_avg=66.70904541015625
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5286736488342285, max_abs=10.75, mean_rel=0.1728208363056183, max_rel=1837.7476806640625, norm_rel=0.023416686803102493, ref_abs_avg=65.58335876464844, test_abs_avg=65.59288024902344
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.202251672744751, max_abs=4.75, mean_rel=0.24289797246456146, max_rel=61.901424407958984, norm_rel=0.02351827174425125, ref_abs_avg=52.26374435424805, test_abs_avg=52.256866455078125
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.3685154914855957, max_abs=8.625, mean_rel=0.15827804803848267, max_rel=1383.8271484375, norm_rel=0.023209571838378906, ref_abs_avg=59.270904541015625, test_abs_avg=59.273719787597656
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3375829458236694, max_abs=9.0, mean_rel=0.17251615226268768, max_rel=2716.745361328125, norm_rel=0.022985244169831276, ref_abs_avg=58.55952453613281, test_abs_avg=58.56230926513672
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.986170768737793, max_abs=4.5, mean_rel=0.15097783505916595, max_rel=31.215709686279297, norm_rel=0.02242872305214405, ref_abs_avg=44.80848693847656, test_abs_avg=44.7840461730957
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.23079252243042, max_abs=7.5, mean_rel=0.16156813502311707, max_rel=2471.422607421875, norm_rel=0.023055875673890114, ref_abs_avg=53.6392707824707, test_abs_avg=53.64133834838867
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.210733413696289, max_abs=7.5, mean_rel=0.144732266664505, max_rel=1071.0611572265625, norm_rel=0.02284327521920204, ref_abs_avg=53.31256866455078, test_abs_avg=53.31010437011719
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=1.0055460929870605, max_abs=4.125, mean_rel=0.08892977237701416, max_rel=7.748342514038086, norm_rel=0.02535640448331833, ref_abs_avg=40.805442810058594, test_abs_avg=40.8709716796875
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.1401681900024414, max_abs=7.0, mean_rel=0.16624532639980316, max_rel=1815.683837890625, norm_rel=0.02280009165406227, ref_abs_avg=50.227272033691406, test_abs_avg=50.227386474609375
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1070297956466675, max_abs=6.5, mean_rel=0.17584344744682312, max_rel=2741.22802734375, norm_rel=0.022526169195771217, ref_abs_avg=49.3893928527832, test_abs_avg=49.390037536621094
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8753805160522461, max_abs=4.25, mean_rel=0.08889545500278473, max_rel=9.78515338897705, norm_rel=0.023565422743558884, ref_abs_avg=38.528953552246094, test_abs_avg=38.538543701171875
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.0527716875076294, max_abs=7.0, mean_rel=0.1630794107913971, max_rel=1589.0076904296875, norm_rel=0.022647395730018616, ref_abs_avg=46.68292999267578, test_abs_avg=46.68269348144531
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.032213807106018, max_abs=7.0, mean_rel=0.15726299583911896, max_rel=1796.2442626953125, norm_rel=0.022232700139284134, ref_abs_avg=46.676910400390625, test_abs_avg=46.670188903808594
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.7975559234619141, max_abs=3.25, mean_rel=0.08088120073080063, max_rel=3.816426992416382, norm_rel=0.0228215754032135, ref_abs_avg=34.83184051513672, test_abs_avg=34.85816192626953
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=0.9931741952896118, max_abs=6.0, mean_rel=0.1567816585302353, max_rel=1878.23583984375, norm_rel=0.022516349330544472, ref_abs_avg=44.33695983886719, test_abs_avg=44.340274810791016
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9685423970222473, max_abs=5.75, mean_rel=0.15788018703460693, max_rel=1114.953857421875, norm_rel=0.022304529324173927, ref_abs_avg=43.645320892333984, test_abs_avg=43.651607513427734
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.8222875595092773, max_abs=3.0, mean_rel=0.19450542330741882, max_rel=44.49787902832031, norm_rel=0.024154992774128914, ref_abs_avg=33.687049865722656, test_abs_avg=33.62723922729492
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9438385963439941, max_abs=6.1875, mean_rel=0.15713617205619812, max_rel=1385.5633544921875, norm_rel=0.022365685552358627, ref_abs_avg=42.35471725463867, test_abs_avg=42.35651779174805
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9253944754600525, max_abs=5.9375, mean_rel=0.15796631574630737, max_rel=1336.2916259765625, norm_rel=0.022271055728197098, ref_abs_avg=41.76141357421875, test_abs_avg=41.76494598388672
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7478408813476562, max_abs=3.125, mean_rel=0.06788109242916107, max_rel=2.8672657012939453, norm_rel=0.02249321900308132, ref_abs_avg=33.2318115234375, test_abs_avg=33.20997619628906
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.8958021402359009, max_abs=5.5, mean_rel=0.15580038726329803, max_rel=1041.1861572265625, norm_rel=0.022146975621581078, ref_abs_avg=40.593971252441406, test_abs_avg=40.594852447509766
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8783886432647705, max_abs=5.75, mean_rel=0.14633703231811523, max_rel=754.12158203125, norm_rel=0.021808531135320663, ref_abs_avg=40.462196350097656, test_abs_avg=40.465396881103516
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8477253913879395, max_abs=3.25, mean_rel=0.09611010551452637, max_rel=4.499553680419922, norm_rel=0.02288900502026081, ref_abs_avg=36.414100646972656, test_abs_avg=36.45774841308594
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.0445499420166016, max_abs=7.375, mean_rel=0.17114034295082092, max_rel=1635.5611572265625, norm_rel=0.02425394207239151, ref_abs_avg=43.21271896362305, test_abs_avg=43.214046478271484
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.008711338043213, max_abs=7.1875, mean_rel=0.2020302712917328, max_rel=2151.59130859375, norm_rel=0.02378496527671814, ref_abs_avg=42.55870056152344, test_abs_avg=42.561561584472656
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.7849071025848389, max_abs=2.984375, mean_rel=0.11748133599758148, max_rel=13.447283744812012, norm_rel=0.025772010907530785, ref_abs_avg=30.006193161010742, test_abs_avg=29.948333740234375
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.9559370279312134, max_abs=6.5, mean_rel=0.16342949867248535, max_rel=1024.4700927734375, norm_rel=0.024482158944010735, ref_abs_avg=39.192100524902344, test_abs_avg=39.19268798828125
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9438890218734741, max_abs=5.5, mean_rel=0.1689774990081787, max_rel=989.9894409179688, norm_rel=0.024505769833922386, ref_abs_avg=38.66619873046875, test_abs_avg=38.66447067260742
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7164314985275269, max_abs=3.3125, mean_rel=0.10680884122848511, max_rel=7.78459358215332, norm_rel=0.023509014397859573, ref_abs_avg=30.285581588745117, test_abs_avg=30.24393081665039
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.8904069662094116, max_abs=6.0, mean_rel=0.1708376705646515, max_rel=1642.5101318359375, norm_rel=0.02432367578148842, ref_abs_avg=36.698577880859375, test_abs_avg=36.69788360595703
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8776080012321472, max_abs=5.25, mean_rel=0.17294391989707947, max_rel=843.7738647460938, norm_rel=0.024314571171998978, ref_abs_avg=36.18952178955078, test_abs_avg=36.19185256958008
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.6943774223327637, max_abs=2.75, mean_rel=0.12344487011432648, max_rel=17.588560104370117, norm_rel=0.02512107416987419, ref_abs_avg=27.798349380493164, test_abs_avg=27.860929489135742
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8332776427268982, max_abs=5.5, mean_rel=0.16583585739135742, max_rel=2002.714111328125, norm_rel=0.024019066244363785, ref_abs_avg=34.786766052246094, test_abs_avg=34.78669738769531
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8139519691467285, max_abs=5.5, mean_rel=0.1576187163591385, max_rel=1067.56787109375, norm_rel=0.023649759590625763, ref_abs_avg=34.53207778930664, test_abs_avg=34.52948760986328
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6678075790405273, max_abs=2.40625, mean_rel=0.13986797630786896, max_rel=15.216832160949707, norm_rel=0.02422354370355606, ref_abs_avg=26.904415130615234, test_abs_avg=26.917762756347656
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.7868983745574951, max_abs=5.0, mean_rel=0.17118284106254578, max_rel=1762.6285400390625, norm_rel=0.02373393438756466, ref_abs_avg=33.23567199707031, test_abs_avg=33.23662567138672
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7670755982398987, max_abs=5.25, mean_rel=0.14573487639427185, max_rel=1029.33349609375, norm_rel=0.02354907989501953, ref_abs_avg=32.70289611816406, test_abs_avg=32.70173645019531
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.618372917175293, max_abs=2.625, mean_rel=0.1514763981103897, max_rel=31.007415771484375, norm_rel=0.023847460746765137, ref_abs_avg=25.45154571533203, test_abs_avg=25.426151275634766
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7453640699386597, max_abs=4.6875, mean_rel=0.16048109531402588, max_rel=859.5174560546875, norm_rel=0.02350090630352497, ref_abs_avg=31.7033748626709, test_abs_avg=31.704896926879883
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7350059747695923, max_abs=4.5, mean_rel=0.15625014901161194, max_rel=1178.0992431640625, norm_rel=0.023426542058587074, ref_abs_avg=31.43498992919922, test_abs_avg=31.441679000854492
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.594537615776062, max_abs=2.5, mean_rel=0.08319413661956787, max_rel=8.509472846984863, norm_rel=0.02392507717013359, ref_abs_avg=25.020374298095703, test_abs_avg=25.016244888305664
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7128392457962036, max_abs=5.0, mean_rel=0.16401326656341553, max_rel=950.426513671875, norm_rel=0.023406589403748512, ref_abs_avg=30.49014663696289, test_abs_avg=30.489582061767578
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7010232210159302, max_abs=4.25, mean_rel=0.16133232414722443, max_rel=1230.7025146484375, norm_rel=0.023244386538863182, ref_abs_avg=30.272903442382812, test_abs_avg=30.269346237182617
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5583095550537109, max_abs=2.375, mean_rel=0.12653642892837524, max_rel=29.55405616760254, norm_rel=0.02373039536178112, ref_abs_avg=24.252643585205078, test_abs_avg=24.251506805419922
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.6813812255859375, max_abs=4.25, mean_rel=0.15846288204193115, max_rel=976.5048828125, norm_rel=0.023218216374516487, ref_abs_avg=29.400882720947266, test_abs_avg=29.399673461914062
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.668361485004425, max_abs=4.375, mean_rel=0.1668034791946411, max_rel=1123.5836181640625, norm_rel=0.022785848006606102, ref_abs_avg=29.442171096801758, test_abs_avg=29.44268798828125
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6534733772277832, max_abs=2.9375, mean_rel=0.18371881544589996, max_rel=34.29252243041992, norm_rel=0.025805659592151642, ref_abs_avg=25.616474151611328, test_abs_avg=25.595325469970703
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.7638387680053711, max_abs=5.34375, mean_rel=0.1685163378715515, max_rel=2194.032958984375, norm_rel=0.024791797623038292, ref_abs_avg=30.849048614501953, test_abs_avg=30.84988021850586
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7506737112998962, max_abs=4.546875, mean_rel=0.1740013062953949, max_rel=1300.2987060546875, norm_rel=0.024900691583752632, ref_abs_avg=30.23198699951172, test_abs_avg=30.22882080078125
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.579059362411499, max_abs=2.375, mean_rel=0.12432406842708588, max_rel=13.781036376953125, norm_rel=0.02384325861930847, ref_abs_avg=23.53427505493164, test_abs_avg=23.5336856842041
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7042298316955566, max_abs=4.875, mean_rel=0.17073702812194824, max_rel=1064.9296875, norm_rel=0.024486813694238663, ref_abs_avg=28.810100555419922, test_abs_avg=28.810739517211914
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6946491599082947, max_abs=4.5, mean_rel=0.1637147068977356, max_rel=818.9013671875, norm_rel=0.024351535364985466, ref_abs_avg=28.537097930908203, test_abs_avg=28.54248046875
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5597769021987915, max_abs=2.0, mean_rel=0.08414409309625626, max_rel=3.205519437789917, norm_rel=0.025213684886693954, ref_abs_avg=22.250431060791016, test_abs_avg=22.23694610595703
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6530506610870361, max_abs=4.75, mean_rel=0.16576051712036133, max_rel=1343.247314453125, norm_rel=0.02398981712758541, ref_abs_avg=27.212926864624023, test_abs_avg=27.214153289794922
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6414170265197754, max_abs=4.4375, mean_rel=0.1607012152671814, max_rel=932.7471313476562, norm_rel=0.023987602442502975, ref_abs_avg=26.801654815673828, test_abs_avg=26.79622459411621
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5167007446289062, max_abs=1.8125, mean_rel=0.09727363288402557, max_rel=4.5955352783203125, norm_rel=0.023782912641763687, ref_abs_avg=21.870973587036133, test_abs_avg=21.891399383544922
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6087609529495239, max_abs=4.296875, mean_rel=0.14781707525253296, max_rel=665.3488159179688, norm_rel=0.023358479142189026, ref_abs_avg=26.038732528686523, test_abs_avg=26.03791618347168
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.5936946272850037, max_abs=3.625, mean_rel=0.15602821111679077, max_rel=958.078125, norm_rel=0.02324542962014675, ref_abs_avg=25.580764770507812, test_abs_avg=25.579776763916016
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4750809669494629, max_abs=1.78125, mean_rel=0.10632405430078506, max_rel=8.212430953979492, norm_rel=0.023081695660948753, ref_abs_avg=20.00861930847168, test_abs_avg=20.041290283203125
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.571259081363678, max_abs=4.0, mean_rel=0.1639147251844406, max_rel=823.747802734375, norm_rel=0.023053240031003952, ref_abs_avg=24.77130126953125, test_abs_avg=24.77235221862793
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5606810450553894, max_abs=4.0, mean_rel=0.16211627423763275, max_rel=1101.2491455078125, norm_rel=0.023053733631968498, ref_abs_avg=24.348068237304688, test_abs_avg=24.346630096435547
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4422391355037689, max_abs=2.375, mean_rel=0.4014458656311035, max_rel=153.2376708984375, norm_rel=0.022875508293509483, ref_abs_avg=19.664630889892578, test_abs_avg=19.643579483032227
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5393567681312561, max_abs=4.25, mean_rel=0.15326537191867828, max_rel=1085.034423828125, norm_rel=0.022752469405531883, ref_abs_avg=23.648590087890625, test_abs_avg=23.650344848632812
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5273976922035217, max_abs=3.53125, mean_rel=0.13693565130233765, max_rel=429.5353088378906, norm_rel=0.022095395252108574, ref_abs_avg=23.877647399902344, test_abs_avg=23.878190994262695
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.43529605865478516, max_abs=1.875, mean_rel=0.09397705644369125, max_rel=8.44424819946289, norm_rel=0.022399727255105972, ref_abs_avg=19.68343162536621, test_abs_avg=19.68709945678711
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5147488117218018, max_abs=4.3125, mean_rel=0.15264317393302917, max_rel=897.520751953125, norm_rel=0.02193189598619938, ref_abs_avg=23.347707748413086, test_abs_avg=23.348323822021484
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5014115571975708, max_abs=4.0, mean_rel=0.13727021217346191, max_rel=877.0269775390625, norm_rel=0.021939614787697792, ref_abs_avg=22.857425689697266, test_abs_avg=22.87035369873047
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.42098140716552734, max_abs=1.875, mean_rel=0.06885573267936707, max_rel=4.660619735717773, norm_rel=0.021127503365278244, ref_abs_avg=20.226058959960938, test_abs_avg=20.229307174682617
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.4952586889266968, max_abs=3.5, mean_rel=0.1446453034877777, max_rel=749.2494506835938, norm_rel=0.021644597873091698, ref_abs_avg=22.793434143066406, test_abs_avg=22.79479217529297
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.48294681310653687, max_abs=3.0, mean_rel=0.1417168527841568, max_rel=473.1566467285156, norm_rel=0.02186828851699829, ref_abs_avg=22.113439559936523, test_abs_avg=22.11087989807129
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4666038453578949, max_abs=2.25, mean_rel=1.1177095174789429, max_rel=520.3236694335938, norm_rel=0.025564055889844894, ref_abs_avg=18.23542022705078, test_abs_avg=18.262989044189453
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5662635564804077, max_abs=4.6875, mean_rel=0.1605536937713623, max_rel=1189.2877197265625, norm_rel=0.023687992244958878, ref_abs_avg=23.87489128112793, test_abs_avg=23.875808715820312
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5505328178405762, max_abs=5.3125, mean_rel=0.15086494386196136, max_rel=771.3076782226562, norm_rel=0.023418238386511803, ref_abs_avg=23.542142868041992, test_abs_avg=23.547954559326172
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.4571535587310791, max_abs=1.75, mean_rel=0.11915554851293564, max_rel=8.599902153015137, norm_rel=0.0236513651907444, ref_abs_avg=19.438207626342773, test_abs_avg=19.4449462890625
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5162903666496277, max_abs=4.0, mean_rel=0.16061344742774963, max_rel=1244.01025390625, norm_rel=0.02328937128186226, ref_abs_avg=22.151357650756836, test_abs_avg=22.151561737060547
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.5068451166152954, max_abs=4.0, mean_rel=0.14443540573120117, max_rel=605.2706909179688, norm_rel=0.023317938670516014, ref_abs_avg=21.868728637695312, test_abs_avg=21.865333557128906
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.3660726547241211, max_abs=1.625, mean_rel=0.1425323188304901, max_rel=21.862720489501953, norm_rel=0.020256996154785156, ref_abs_avg=17.983272552490234, test_abs_avg=17.978759765625
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.472954124212265, max_abs=3.625, mean_rel=0.15334489941596985, max_rel=732.2998657226562, norm_rel=0.02256583236157894, ref_abs_avg=20.966819763183594, test_abs_avg=20.967626571655273
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4630689024925232, max_abs=3.09375, mean_rel=0.15337708592414856, max_rel=624.9094848632812, norm_rel=0.02220594696700573, ref_abs_avg=20.879932403564453, test_abs_avg=20.879507064819336
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3689260482788086, max_abs=1.3125, mean_rel=0.10807623714208603, max_rel=16.393552780151367, norm_rel=0.021688053384423256, ref_abs_avg=16.753599166870117, test_abs_avg=16.75820541381836
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.4368281364440918, max_abs=4.0, mean_rel=0.13272005319595337, max_rel=819.1512451171875, norm_rel=0.021706843748688698, ref_abs_avg=20.143901824951172, test_abs_avg=20.14537239074707
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.4259467124938965, max_abs=3.0, mean_rel=0.13749021291732788, max_rel=466.7472229003906, norm_rel=0.02150912582874298, ref_abs_avg=19.763866424560547, test_abs_avg=19.769264221191406
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.3394566774368286, max_abs=1.4375, mean_rel=0.14429286122322083, max_rel=28.07882308959961, norm_rel=0.020623529329895973, ref_abs_avg=16.418872833251953, test_abs_avg=16.397546768188477
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.40784355998039246, max_abs=4.25, mean_rel=0.1366041898727417, max_rel=646.6974487304688, norm_rel=0.020983198657631874, ref_abs_avg=19.52473258972168, test_abs_avg=19.52613639831543
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.3923676609992981, max_abs=3.25, mean_rel=0.1323767900466919, max_rel=911.0794067382812, norm_rel=0.020649800077080727, ref_abs_avg=19.091655731201172, test_abs_avg=19.094100952148438
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.3130154013633728, max_abs=1.4375, mean_rel=0.09609034657478333, max_rel=5.652196884155273, norm_rel=0.020681606605648994, ref_abs_avg=15.448554992675781, test_abs_avg=15.418102264404297
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.3735887408256531, max_abs=4.0, mean_rel=0.12724918127059937, max_rel=1191.171142578125, norm_rel=0.020603934302926064, ref_abs_avg=18.276779174804688, test_abs_avg=18.27587127685547
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.36883842945098877, max_abs=3.34375, mean_rel=0.12824422121047974, max_rel=637.9689331054688, norm_rel=0.019957320764660835, ref_abs_avg=18.602201461791992, test_abs_avg=18.606449127197266
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.32082104682922363, max_abs=1.53515625, mean_rel=0.08848682790994644, max_rel=7.458713531494141, norm_rel=0.020282646641135216, ref_abs_avg=16.192577362060547, test_abs_avg=16.189617156982422
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.3517065644264221, max_abs=3.75, mean_rel=0.11752033233642578, max_rel=427.4998474121094, norm_rel=0.019609030336141586, ref_abs_avg=18.11627197265625, test_abs_avg=18.117652893066406
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3402172923088074, max_abs=2.75, mean_rel=0.12185314297676086, max_rel=505.3010559082031, norm_rel=0.019059423357248306, ref_abs_avg=18.014453887939453, test_abs_avg=18.021968841552734
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.2860214114189148, max_abs=1.0625, mean_rel=0.1223585233092308, max_rel=25.85236930847168, norm_rel=0.018924064934253693, ref_abs_avg=14.57192325592041, test_abs_avg=14.595293045043945
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3294438123703003, max_abs=4.125, mean_rel=0.12208080291748047, max_rel=523.5607299804688, norm_rel=0.019266018643975258, ref_abs_avg=17.365985870361328, test_abs_avg=17.367982864379883
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.3235698938369751, max_abs=2.8125, mean_rel=0.12195733189582825, max_rel=465.1230163574219, norm_rel=0.019050447270274162, ref_abs_avg=17.260604858398438, test_abs_avg=17.2615909576416

